solid_queue_web 1.1.0 → 1.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/README.md +5 -1
- data/app/assets/stylesheets/solid_queue_web/_08_detail.css +7 -0
- data/app/assets/stylesheets/solid_queue_web/_11_throughput.css +4 -0
- data/app/controllers/solid_queue_web/failed_jobs/errors_controller.rb +9 -0
- data/app/services/solid_queue_web/dashboard_stats.rb +8 -1
- data/app/services/solid_queue_web/error_frequency_report.rb +34 -0
- data/app/services/solid_queue_web/job_performance_stats.rb +9 -1
- data/app/views/solid_queue_web/dashboard/index.html.erb +29 -0
- data/app/views/solid_queue_web/failed_jobs/errors/index.html.erb +44 -0
- data/app/views/solid_queue_web/failed_jobs/index.html.erb +1 -0
- data/app/views/solid_queue_web/performance/index.html.erb +4 -0
- data/config/routes.rb +2 -0
- data/lib/solid_queue_web/version.rb +1 -1
- metadata +4 -1
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: e4178da05b230b0990212f6dd74b68f45b18e97e50fd1b732bc54a3243e1d61d
|
|
4
|
+
data.tar.gz: af6f4c2352ce0076fb441fe90d8e38f233bb3bee99fce040155d34e23af2e0b9
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: '0751948c9e66b3b5d0bbd3bb47007121466ebec956b523b2d3c3b1e16df399005d4acc823b2a1196f75a9511cc5167f135ba8bbb83f6e5c3778f405accc62e3f'
|
|
7
|
+
data.tar.gz: 422342bb5b419181d5ed4612cf2b06f907f5dcf540d43bb88973b145d1b08b5e4afd9e6319af5249dd6d16351480ab5440b54b111390f6692fc1209a217cfb4d
|
data/README.md
CHANGED
|
@@ -8,6 +8,8 @@
|
|
|
8
8
|
|
|
9
9
|
A monitoring and management dashboard for [Solid Queue](https://github.com/rails/solid_queue), mountable as a Rails engine in any app.
|
|
10
10
|
|
|
11
|
+
> **Note:** Development of this gem will continue, but if you need a unified dashboard that covers **Solid Queue**, **Solid Cable**, and **Solid Cache** in a single interface, check out [solid_stack_web](https://github.com/eclectic-coding/solid_stack_web).
|
|
12
|
+
|
|
11
13
|

|
|
12
14
|
|
|
13
15
|
## The problem
|
|
@@ -52,7 +54,9 @@ SolidQueueWeb surfaces all of this in a browser UI available at any route you ch
|
|
|
52
54
|
- **CSV export** — "Export CSV" button on the jobs, failed jobs, and history pages downloads all records matching the current filters; columns are tailored per view
|
|
53
55
|
- **Slow job detection** — when `slow_job_threshold` is configured, claimed jobs running longer than the threshold are flagged with an orange row, a "slow" badge, and a "Running For" duration column on the Running tab; a "Slow Jobs" warning card appears on the dashboard with a link to the Running tab
|
|
54
56
|
- **Webhook alerts** — set `alert_webhook_url` and `alert_failure_threshold` to receive a POST request whenever the failed job count meets or exceeds the threshold; fires asynchronously so dashboard performance is unaffected; a configurable cooldown (default 1 h) prevents repeated alerts while the count stays elevated
|
|
55
|
-
- **Performance analytics** — per-job-class statistics at `/jobs/performance` showing run count, average, p50, p95, min, and max duration; sorted by p95 descending so the slowest classes surface first; period filter scopes to 1h / 24h / 7d or all time; each class name links to the filtered History view
|
|
57
|
+
- **Performance analytics** — per-job-class statistics at `/jobs/performance` showing run count, average, p50, p95, p99, standard deviation, min, and max duration; sorted by p95 descending so the slowest classes surface first; high std dev surfaces inconsistent jobs worth investigating; period filter scopes to 1h / 24h / 7d or all time; each class name links to the filtered History view
|
|
58
|
+
- **Failed job trend chart** — a "Failures — Last 12 Hours" bar chart on the dashboard shows failures per hour over the last 12 hours; bars are red, making failure spikes visible before clicking into the failed jobs list
|
|
59
|
+
- **Error frequency report** — `GET /jobs/failed_jobs/errors` groups all failed jobs by error class and message prefix, shows a count per group, and surfaces a sample backtrace in an expandable row; sorted by count descending so the most common errors appear first; accessible via the "Error Summary" button on the Failed Jobs page
|
|
56
60
|
- **Metrics / health endpoint** — `GET /jobs/metrics.json` returns a machine-readable JSON document with job counts, throughput, per-queue depth and pause state, and process health summary; suitable for Prometheus scraping, uptime monitors, or external dashboards; `slow_jobs` count included when `slow_job_threshold` is configured
|
|
57
61
|
|
|
58
62
|
## Compatibility
|
|
@@ -75,6 +75,13 @@
|
|
|
75
75
|
|
|
76
76
|
.sqd-pre--muted { color: var(--muted); }
|
|
77
77
|
|
|
78
|
+
.sqd-error-details summary {
|
|
79
|
+
cursor: pointer;
|
|
80
|
+
list-style: none;
|
|
81
|
+
}
|
|
82
|
+
.sqd-error-details summary::-webkit-details-marker { display: none; }
|
|
83
|
+
.sqd-error-details .sqd-pre { margin-top: 0.5rem; }
|
|
84
|
+
|
|
78
85
|
.sqd-error-header {
|
|
79
86
|
font-size: 13px;
|
|
80
87
|
padding: 0.5rem 0.75rem;
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
module SolidQueueWeb
|
|
2
2
|
class DashboardStats
|
|
3
|
-
attr_reader :counts, :throughput, :sparkline, :depth_sparkline, :slow_jobs_count
|
|
3
|
+
attr_reader :counts, :throughput, :sparkline, :depth_sparkline, :failure_sparkline, :slow_jobs_count
|
|
4
4
|
|
|
5
5
|
def initialize
|
|
6
6
|
@now = Time.current
|
|
@@ -32,6 +32,13 @@ module SolidQueueWeb
|
|
|
32
32
|
finished_times.count { |t| t >= from && t < to }
|
|
33
33
|
end
|
|
34
34
|
|
|
35
|
+
failed_times = SolidQueue::FailedExecution.where(created_at: 12.hours.ago..@now).pluck(:created_at)
|
|
36
|
+
@failure_sparkline = 12.times.map do |i|
|
|
37
|
+
from = (12 - i).hours.ago
|
|
38
|
+
to = i == 11 ? @now : (11 - i).hours.ago
|
|
39
|
+
failed_times.count { |t| t >= from && t < to }
|
|
40
|
+
end
|
|
41
|
+
|
|
35
42
|
threshold = SolidQueueWeb.slow_job_threshold
|
|
36
43
|
@slow_jobs_count = threshold ? SolidQueue::ClaimedExecution.where("created_at <= ?", threshold.ago).count : 0
|
|
37
44
|
|
|
@@ -0,0 +1,34 @@
|
|
|
1
|
+
module SolidQueueWeb
|
|
2
|
+
class ErrorFrequencyReport
|
|
3
|
+
Row = Data.define(:exception_class, :message_prefix, :count, :sample_backtrace)
|
|
4
|
+
|
|
5
|
+
MESSAGE_LIMIT = 120
|
|
6
|
+
|
|
7
|
+
def groups
|
|
8
|
+
SolidQueue::FailedExecution
|
|
9
|
+
.order(created_at: :desc)
|
|
10
|
+
.each_with_object({}) do |execution, acc|
|
|
11
|
+
key = [execution.exception_class.to_s, message_prefix(execution.message)]
|
|
12
|
+
entry = acc[key] ||= { count: 0, sample_backtrace: nil }
|
|
13
|
+
entry[:count] += 1
|
|
14
|
+
entry[:sample_backtrace] ||= execution.backtrace
|
|
15
|
+
end
|
|
16
|
+
.map do |(exception_class, prefix), data|
|
|
17
|
+
Row.new(
|
|
18
|
+
exception_class: exception_class,
|
|
19
|
+
message_prefix: prefix,
|
|
20
|
+
count: data[:count],
|
|
21
|
+
sample_backtrace: data[:sample_backtrace]
|
|
22
|
+
)
|
|
23
|
+
end
|
|
24
|
+
.sort_by { |row| -row.count }
|
|
25
|
+
end
|
|
26
|
+
|
|
27
|
+
private
|
|
28
|
+
|
|
29
|
+
def message_prefix(message)
|
|
30
|
+
return "" if message.nil?
|
|
31
|
+
message.length > MESSAGE_LIMIT ? "#{message[0, MESSAGE_LIMIT]}…" : message
|
|
32
|
+
end
|
|
33
|
+
end
|
|
34
|
+
end
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
module SolidQueueWeb
|
|
2
2
|
class JobPerformanceStats
|
|
3
|
-
Row = Struct.new(:class_name, :count, :avg, :p50, :p95, :min, :max, keyword_init: true)
|
|
3
|
+
Row = Struct.new(:class_name, :count, :avg, :p50, :p95, :p99, :std_dev, :min, :max, keyword_init: true)
|
|
4
4
|
|
|
5
5
|
def initialize(scope)
|
|
6
6
|
@scope = scope
|
|
@@ -18,6 +18,8 @@ module SolidQueueWeb
|
|
|
18
18
|
avg: mean(durations),
|
|
19
19
|
p50: percentile(durations, 50),
|
|
20
20
|
p95: percentile(durations, 95),
|
|
21
|
+
p99: percentile(durations, 99),
|
|
22
|
+
std_dev: std_dev(durations),
|
|
21
23
|
min: durations.first,
|
|
22
24
|
max: durations.last
|
|
23
25
|
)
|
|
@@ -34,5 +36,11 @@ module SolidQueueWeb
|
|
|
34
36
|
idx = [(pct / 100.0 * sorted.size).ceil - 1, 0].max
|
|
35
37
|
sorted[idx]
|
|
36
38
|
end
|
|
39
|
+
|
|
40
|
+
def std_dev(sorted)
|
|
41
|
+
return 0.0 if sorted.size < 2
|
|
42
|
+
m = mean(sorted)
|
|
43
|
+
Math.sqrt(sorted.sum { |x| (x - m)**2 } / sorted.size)
|
|
44
|
+
end
|
|
37
45
|
end
|
|
38
46
|
end
|
|
@@ -104,6 +104,35 @@
|
|
|
104
104
|
<% end %>
|
|
105
105
|
</div>
|
|
106
106
|
|
|
107
|
+
<% max_failures = [@stats.failure_sparkline.max, 1].max %>
|
|
108
|
+
<div class="sqd-card" style="margin-bottom: 1rem;">
|
|
109
|
+
<div class="sqd-card__header">
|
|
110
|
+
<span class="sqd-card__title">Failures — Last 12 Hours</span>
|
|
111
|
+
<div class="sqd-throughput__summary">
|
|
112
|
+
<span>Total: <strong><%= @stats.failure_sparkline.sum %></strong></span>
|
|
113
|
+
</div>
|
|
114
|
+
</div>
|
|
115
|
+
<% if @stats.failure_sparkline.all?(&:zero?) %>
|
|
116
|
+
<div class="sqd-sparkline__empty">No failures in the last 12 hours</div>
|
|
117
|
+
<% else %>
|
|
118
|
+
<div class="sqd-sparkline" aria-label="Failed jobs per hour over the last 12 hours">
|
|
119
|
+
<% @stats.failure_sparkline.each_with_index do |count, i| %>
|
|
120
|
+
<% pct = (count.to_f / max_failures * 100).round %>
|
|
121
|
+
<% hour_start = (12 - i).hours.ago %>
|
|
122
|
+
<% show_tick = [0, 3, 6, 9, 11].include?(i) %>
|
|
123
|
+
<div class="sqd-sparkline__col">
|
|
124
|
+
<div class="sqd-sparkline__bar-wrap">
|
|
125
|
+
<div class="sqd-sparkline__bar sqd-sparkline__bar--failure"
|
|
126
|
+
style="height: <%= [pct, 3].max %>%"
|
|
127
|
+
title="<%= hour_start.strftime('%-I%p').downcase %>: <%= count %> <%= "failure".pluralize(count) %>"></div>
|
|
128
|
+
</div>
|
|
129
|
+
<div class="sqd-sparkline__tick"><%= show_tick ? (i == 11 ? "now" : hour_start.strftime("%-I%p").downcase) : "" %></div>
|
|
130
|
+
</div>
|
|
131
|
+
<% end %>
|
|
132
|
+
</div>
|
|
133
|
+
<% end %>
|
|
134
|
+
</div>
|
|
135
|
+
|
|
107
136
|
<div style="display:grid; grid-template-columns: repeat(auto-fit, minmax(240px, 1fr)); gap: 1rem;">
|
|
108
137
|
<div class="sqd-card">
|
|
109
138
|
<div class="sqd-card__header">
|
|
@@ -0,0 +1,44 @@
|
|
|
1
|
+
<div class="sqd-page-header">
|
|
2
|
+
<h1 class="sqd-page-title">Error Summary</h1>
|
|
3
|
+
<div class="sqd-actions">
|
|
4
|
+
<%= link_to "← Failed Jobs", failed_jobs_path, class: "sqd-btn sqd-btn--muted sqd-btn--sm" %>
|
|
5
|
+
</div>
|
|
6
|
+
</div>
|
|
7
|
+
|
|
8
|
+
<% if @groups.any? %>
|
|
9
|
+
<div class="sqd-card">
|
|
10
|
+
<table>
|
|
11
|
+
<thead>
|
|
12
|
+
<tr>
|
|
13
|
+
<th scope="col">Error Class</th>
|
|
14
|
+
<th scope="col">Message</th>
|
|
15
|
+
<th scope="col" style="text-align: right;">Count</th>
|
|
16
|
+
</tr>
|
|
17
|
+
</thead>
|
|
18
|
+
<tbody>
|
|
19
|
+
<% @groups.each do |group| %>
|
|
20
|
+
<tr>
|
|
21
|
+
<td class="sqd-mono"><%= group.exception_class.presence || "—" %></td>
|
|
22
|
+
<td>
|
|
23
|
+
<% if group.sample_backtrace.present? %>
|
|
24
|
+
<details class="sqd-error-details">
|
|
25
|
+
<summary class="sqd-truncate" title="<%= group.message_prefix %>">
|
|
26
|
+
<%= group.message_prefix.presence || "—" %>
|
|
27
|
+
</summary>
|
|
28
|
+
<pre class="sqd-pre sqd-pre--muted"><%= Array(group.sample_backtrace).first(10).join("\n") %></pre>
|
|
29
|
+
</details>
|
|
30
|
+
<% else %>
|
|
31
|
+
<span class="sqd-truncate" title="<%= group.message_prefix %>"><%= group.message_prefix.presence || "—" %></span>
|
|
32
|
+
<% end %>
|
|
33
|
+
</td>
|
|
34
|
+
<td style="text-align: right;"><%= group.count %></td>
|
|
35
|
+
</tr>
|
|
36
|
+
<% end %>
|
|
37
|
+
</tbody>
|
|
38
|
+
</table>
|
|
39
|
+
</div>
|
|
40
|
+
<% else %>
|
|
41
|
+
<div class="sqd-card">
|
|
42
|
+
<div class="sqd-empty">No failed jobs. All clear!</div>
|
|
43
|
+
</div>
|
|
44
|
+
<% end %>
|
|
@@ -2,6 +2,7 @@
|
|
|
2
2
|
<h1 class="sqd-page-title">Failed Jobs</h1>
|
|
3
3
|
<% if @failed_jobs.any? %>
|
|
4
4
|
<div class="sqd-actions">
|
|
5
|
+
<%= link_to "Error Summary", failed_job_errors_path, class: "sqd-btn sqd-btn--muted sqd-btn--sm" %>
|
|
5
6
|
<%= link_to "Export CSV", failed_jobs_path(format: :csv, queue: @queue, q: @search, period: @period),
|
|
6
7
|
class: "sqd-btn sqd-btn--muted", data: { turbo: false } %>
|
|
7
8
|
<%= button_to "Retry All", retry_all_failed_jobs_path,
|
|
@@ -21,6 +21,8 @@
|
|
|
21
21
|
<th scope="col" style="text-align: right;">Avg</th>
|
|
22
22
|
<th scope="col" style="text-align: right;">p50</th>
|
|
23
23
|
<th scope="col" style="text-align: right;">p95</th>
|
|
24
|
+
<th scope="col" style="text-align: right;">p99</th>
|
|
25
|
+
<th scope="col" style="text-align: right;">Std Dev</th>
|
|
24
26
|
<th scope="col" style="text-align: right;">Min</th>
|
|
25
27
|
<th scope="col" style="text-align: right;">Max</th>
|
|
26
28
|
</tr>
|
|
@@ -36,6 +38,8 @@
|
|
|
36
38
|
<td class="sqd-mono" style="text-align: right;"><%= format_duration(row.avg) %></td>
|
|
37
39
|
<td class="sqd-mono" style="text-align: right;"><%= format_duration(row.p50) %></td>
|
|
38
40
|
<td class="sqd-mono" style="text-align: right;"><%= format_duration(row.p95) %></td>
|
|
41
|
+
<td class="sqd-mono" style="text-align: right;"><%= format_duration(row.p99) %></td>
|
|
42
|
+
<td class="sqd-mono" style="text-align: right;"><%= format_duration(row.std_dev) %></td>
|
|
39
43
|
<td class="sqd-mono" style="text-align: right;"><%= format_duration(row.min) %></td>
|
|
40
44
|
<td class="sqd-mono" style="text-align: right;"><%= format_duration(row.max) %></td>
|
|
41
45
|
</tr>
|
data/config/routes.rb
CHANGED
|
@@ -35,6 +35,8 @@ SolidQueueWeb::Engine.routes.draw do
|
|
|
35
35
|
end
|
|
36
36
|
end
|
|
37
37
|
|
|
38
|
+
get "failed_jobs/errors", to: "failed_jobs/errors#index", as: :failed_job_errors
|
|
39
|
+
|
|
38
40
|
resource :failed_job_selection, path: "failed_jobs/selection", only: [:create, :destroy],
|
|
39
41
|
controller: "failed_jobs/selections"
|
|
40
42
|
resources :failed_jobs, only: [:index, :destroy] do
|
metadata
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
|
2
2
|
name: solid_queue_web
|
|
3
3
|
version: !ruby/object:Gem::Version
|
|
4
|
-
version: 1.
|
|
4
|
+
version: 1.2.0
|
|
5
5
|
platform: ruby
|
|
6
6
|
authors:
|
|
7
7
|
- Chuck Smith
|
|
@@ -125,6 +125,7 @@ files:
|
|
|
125
125
|
- app/controllers/solid_queue_web/blocked_jobs_controller.rb
|
|
126
126
|
- app/controllers/solid_queue_web/dashboard_controller.rb
|
|
127
127
|
- app/controllers/solid_queue_web/failed_jobs/arguments_controller.rb
|
|
128
|
+
- app/controllers/solid_queue_web/failed_jobs/errors_controller.rb
|
|
128
129
|
- app/controllers/solid_queue_web/failed_jobs/selections_controller.rb
|
|
129
130
|
- app/controllers/solid_queue_web/failed_jobs_controller.rb
|
|
130
131
|
- app/controllers/solid_queue_web/history_controller.rb
|
|
@@ -152,12 +153,14 @@ files:
|
|
|
152
153
|
- app/models/solid_queue_web/job.rb
|
|
153
154
|
- app/services/solid_queue_web/alert_webhook.rb
|
|
154
155
|
- app/services/solid_queue_web/dashboard_stats.rb
|
|
156
|
+
- app/services/solid_queue_web/error_frequency_report.rb
|
|
155
157
|
- app/services/solid_queue_web/job_performance_stats.rb
|
|
156
158
|
- app/services/solid_queue_web/metrics_payload.rb
|
|
157
159
|
- app/services/solid_queue_web/queue_depth_alert.rb
|
|
158
160
|
- app/services/solid_queue_web/queue_stats.rb
|
|
159
161
|
- app/views/layouts/solid_queue_web/application.html.erb
|
|
160
162
|
- app/views/solid_queue_web/dashboard/index.html.erb
|
|
163
|
+
- app/views/solid_queue_web/failed_jobs/errors/index.html.erb
|
|
161
164
|
- app/views/solid_queue_web/failed_jobs/index.html.erb
|
|
162
165
|
- app/views/solid_queue_web/history/index.html.erb
|
|
163
166
|
- app/views/solid_queue_web/jobs/destroy.turbo_stream.erb
|