dispatch-rails 0.7.0 → 0.9.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 300eb860cc2bb5f7dff2322818cc6bd552507695c71c84c9be81ccbd6e34a3fd
4
- data.tar.gz: 4070748d3d36256a456e9eb960d36c47418ec55f699e0984a9fd49c4d67b501f
3
+ metadata.gz: 797a73d62a39b0ab3640364d8ef71186b61f3233da2788f7733b72c85e77cc75
4
+ data.tar.gz: 235fa20b827e25a6f0537e20d3cdd51221c0bebbed18013046901c9bdabf51ed
5
5
  SHA512:
6
- metadata.gz: 1dcbd3e9a81c078086e62e5e50bca81679832a2dff5e5fd1fe5b4a82bc052e445adb83dbc8e1f7c22b8699e94d6a7d5a6d843d2ae7f974a97cabf682289d767d
7
- data.tar.gz: 21765b8506a454e84d225b8d932ce6a15dc2cddd948d8247cddae59908cfe1499139f607a90892e1dffa247a299f156b3c09e1100a73efeb2ae8243af9a7a133
6
+ metadata.gz: f3cbf75d09473580ec5f66ffaaf7d7623631c1a453bbe9b67b86e091e4fbcb56cd7da1ea141080b6375b714802915c00c4fd9f64d7c649bb2de7e6d6b46d1c0b
7
+ data.tar.gz: f24f36ca5d2769053faa65cb5b77da6e09a28aea5b4e3166bcbb64e2473729cd9b54df4d6936572fe3ecb41937cb5e43e32f0b9de24f2c27b1879532da0232de
data/README.md CHANGED
@@ -98,6 +98,40 @@ c.capture_at_exit = true # default; set false to skip the crash-at-exit report
98
98
  c.shutdown_timeout = 3 # seconds to wait for the send queue to drain at exit; 0 skips
99
99
  ```
100
100
 
101
+ ### SolidQueue infrastructure failures
102
+
103
+ Errors raised **inside** a job's `perform` are reported automatically (ActiveJob
104
+ sends them through `Rails.error`, which the SDK subscribes to). But SolidQueue's
105
+ own *infrastructure* failures never touch that path — SolidQueue writes them
106
+ straight into its own tables, so they don't reach `Rails.error` and stay
107
+ invisible in the dashboard. The classic case: a container runs out of memory,
108
+ Heroku SIGKILLs the worker, and the supervisor later prunes the dead process and
109
+ force-fails the jobs it had claimed with `SolidQueue::Processes::ProcessPrunedError`
110
+ — a failure you'd only find by opening the queue.
111
+
112
+ When SolidQueue is present, the SDK subscribes to its `ActiveSupport::Notifications`
113
+ and turns these into first-class events (all tagged `source: solid_queue`):
114
+
115
+ - **Lost jobs** (`fail_many_claimed`) — claimed jobs force-failed because their
116
+ worker died (pruned after a missed heartbeat, or orphaned by a dead process).
117
+ Reported as `Dispatch::Rails::SolidQueueJobsLost` with `tags.jobs_lost`,
118
+ `tags.process_ids`, and `tags.job_ids`. **This is the OOM-restart case.**
119
+ - **Thread crashes** (`thread_error`) — an exception that escaped a SolidQueue
120
+ thread (supervisor/dispatcher/worker/scheduler), reported with its real class
121
+ and backtrace. Captured even if your app overrides `SolidQueue.on_thread_error`
122
+ away from the default `Rails.error.report` (deduped against it when it isn't).
123
+ - **Recurring-enqueue misses** (`enqueue_recurring_task`) — a cron task that
124
+ failed to enqueue, as `Dispatch::Rails::SolidQueueEnqueueError`.
125
+
126
+ ```ruby
127
+ c.capture_solid_queue = true # default; no-op unless SolidQueue is running
128
+ ```
129
+
130
+ It only does anything when SolidQueue is actually running, so apps that don't use
131
+ it pay nothing. Memory pressure itself (Heroku R14/R15) is a platform signal, not
132
+ an app exception — pair this with a Heroku memory alert to catch the root cause,
133
+ not just the orphaned jobs.
134
+
101
135
  ### Traffic heartbeats (confound signal)
102
136
 
103
137
  On by default in enabled environments, the SDK ships lightweight per-`transaction`
@@ -128,6 +162,28 @@ quote. With `c.annotate_error_body = true`, the same fields (`dispatch_request_i
128
162
  they sit beside `type`/`title`/`detail`). The middleware never changes status codes
129
163
  and passes through anything it can't safely parse.
130
164
 
165
+ ### Browser CSP & Reporting API
166
+
167
+ CSP violations and other browser-native reports (NEL, deprecation, intervention)
168
+ never reach `window.onerror`, so the SDK captures them two ways — pick **one** for
169
+ CSP, or a violation reported through both is counted twice. Both are opt-in.
170
+
171
+ ```ruby
172
+ # 1. Client-side: the JS error tracker also listens for SecurityPolicyViolationEvents.
173
+ # Needs dispatch_error_tracker_tag in your layout (it already loads the tracker).
174
+ c.capture_csp_violations = true
175
+
176
+ # 2. Server-side: a Rack endpoint accepts the browser's NATIVE report POSTs — point a
177
+ # CSP report-uri/report-to (or any Reporting-Endpoints group) at the path below and
178
+ # the browser posts straight to Dispatch. No host controller required.
179
+ c.capture_browser_reports = true
180
+ c.reporting_endpoint_path = "/dispatch/reports" # default; must be a path no route uses
181
+ ```
182
+
183
+ Option 2 also ingests non-CSP Reporting-API types (deprecation, intervention, NEL).
184
+ Reports flow through `error_sample_rate` and `before_send` like any other event, so
185
+ host-specific noise (browser-extension schemes, autofill) is best filtered there.
186
+
131
187
  ### Curated reports (programmatic / agent)
132
188
 
133
189
  A consumer (or an AI agent inside your app) turns a failure into a tracked report:
@@ -106,6 +106,45 @@
106
106
  };
107
107
  }
108
108
 
109
+ // CSP violations never reach window.onerror — the browser fires a dedicated
110
+ // SecurityPolicyViolationEvent. We synthesize a single stack frame from the
111
+ // violation's source location so the dashboard can point at the offending file.
112
+ function buildCspEvent(e) {
113
+ var directive = e.effectiveDirective || e.violatedDirective || "policy";
114
+ var blocked = e.blockedURI || "inline";
115
+ var frames = (e.sourceFile && e.lineNumber) ? [{
116
+ function: "?", filename: e.sourceFile, abs_path: e.sourceFile,
117
+ lineno: parseInt(e.lineNumber, 10) || 0, colno: parseInt(e.columnNumber, 10) || 0,
118
+ in_app: String(e.sourceFile).indexOf("http") === 0
119
+ }] : [];
120
+ return {
121
+ event_id: uuid(),
122
+ timestamp: now(),
123
+ platform: "javascript",
124
+ // report-only disposition is monitoring, not a live block — keep it quieter.
125
+ level: e.disposition === "report" ? "info" : "warning",
126
+ environment: cfg.environment,
127
+ release: cfg.release,
128
+ exception: { values: [{
129
+ type: "SecurityPolicyViolation",
130
+ value: (directive + " blocked " + blocked).slice(0, 2000),
131
+ mechanism: { type: "securitypolicyviolation", handled: false },
132
+ stacktrace: { frames: frames }
133
+ }] },
134
+ breadcrumbs: { values: breadcrumbs.slice() },
135
+ user_path: userPath(),
136
+ request: { url: e.documentURI || location.href, headers: { "User-Agent": navigator.userAgent } },
137
+ user: cfg.user || null,
138
+ tags: Object.assign({
139
+ csp: "true",
140
+ blocked_uri: blocked,
141
+ violated_directive: e.violatedDirective || directive,
142
+ effective_directive: directive,
143
+ disposition: e.disposition || "enforce"
144
+ }, cfg.tags || {})
145
+ };
146
+ }
147
+
109
148
  function sampledOut() {
110
149
  var rate = cfg.sampleRate == null ? 1 : cfg.sampleRate;
111
150
  return Math.random() > rate;
@@ -143,6 +182,21 @@
143
182
  send(buildEvent(r.name || "UnhandledRejection", value, r.stack));
144
183
  });
145
184
 
185
+ // Opt-in (cfg.captureCsp): a permissive or report-only policy can fire these in
186
+ // bulk, so we dedupe per page — a violation repeated on every render (e.g. a
187
+ // blocked image) is reported once. Bounded so a page spraying unique blocked
188
+ // URIs can't grow the key set without limit.
189
+ if (cfg.captureCsp) {
190
+ var seenCsp = {};
191
+ var seenCspCount = 0;
192
+ document.addEventListener("securitypolicyviolation", function (e) {
193
+ var key = [e.effectiveDirective, e.blockedURI, e.sourceFile, e.lineNumber].join("|");
194
+ if (seenCsp[key]) return;
195
+ if (seenCspCount < 200) { seenCsp[key] = 1; seenCspCount++; }
196
+ send(buildCspEvent(e));
197
+ });
198
+ }
199
+
146
200
  // Manual capture API: window.Dispatch.captureException(error)
147
201
  window.Dispatch = window.Dispatch || {};
148
202
  window.Dispatch.captureException = function (err) {
@@ -35,7 +35,8 @@ module Dispatch
35
35
  end
36
36
 
37
37
  # Render the browser exception tracker (captures uncaught JS errors and
38
- # unhandled promise rejections). Place once in your layout's <head>.
38
+ # unhandled promise rejections and, when config.capture_csp_violations is on,
39
+ # SecurityPolicyViolationEvents). Place once in your layout's <head>.
39
40
  def dispatch_error_tracker_tag(tags: {})
40
41
  config = Dispatch::Rails.configuration
41
42
  return nil if config.errors_only? # no browser surface in an API-only app
@@ -55,6 +56,7 @@ module Dispatch
55
56
  release: config.release,
56
57
  sampleRate: config.error_sample_rate,
57
58
  captureClicks: config.capture_clicks,
59
+ captureCsp: config.capture_csp_violations,
58
60
  user: normalized_user,
59
61
  tags: tags
60
62
  }.to_json
@@ -16,8 +16,13 @@ module Dispatch
16
16
  # Exception tracking
17
17
  attr_accessor :capture_exceptions, :capture_browser_errors, :error_endpoint,
18
18
  :environment, :release, :enabled_environments, :error_sample_rate, :before_send
19
+ # Browser security/reporting capture (both opt-in; high-volume + noise-prone)
20
+ attr_accessor :capture_csp_violations, :capture_browser_reports, :reporting_endpoint_path
19
21
  # Process lifecycle (crash-at-exit capture, rake failures, shutdown flush)
20
22
  attr_accessor :capture_at_exit, :shutdown_timeout
23
+ # SolidQueue infrastructure failures (pruned/orphaned jobs, thread crashes,
24
+ # recurring-enqueue misses) that never flow through Rails.error/ActiveJob.
25
+ attr_accessor :capture_solid_queue
21
26
  # Structured error responses (API-only)
22
27
  attr_accessor :structured_error_responses, :annotate_error_body, :report_base_url
23
28
  # Traffic heartbeats — per-transaction success counts that let the Dispatch
@@ -49,12 +54,29 @@ module Dispatch
49
54
  @error_sample_rate = 1.0
50
55
  @before_send = nil # ->(event) { event or nil to drop }
51
56
 
57
+ # Browser security/reporting capture. Both opt-in: a permissive or
58
+ # report-only CSP can emit these in bulk, and capture_browser_reports also
59
+ # makes the gem own a URL path. Pick ONE CSP mechanism — the JS
60
+ # securitypolicyviolation listener (capture_csp_violations) OR the native
61
+ # report endpoint (capture_browser_reports) — or a violation reported
62
+ # through both is counted twice.
63
+ @capture_csp_violations = false # JS: listen for SecurityPolicyViolationEvent
64
+ @capture_browser_reports = false # Server: accept native browser report POSTs
65
+ @reporting_endpoint_path = "/dispatch/reports"
66
+
52
67
  # Process lifecycle. Report the exception killing the process (a crash
53
68
  # during boot, a dying runner) and drain the send queue before exit so
54
69
  # deploys/restarts don't drop captured events.
55
70
  @capture_at_exit = true
56
71
  @shutdown_timeout = 3 # seconds to wait for the queue to drain at exit; 0 skips the flush
57
72
 
73
+ # SolidQueue infrastructure capture. On by default in enabled environments
74
+ # (no-op unless SolidQueue is actually running, so apps that don't use it
75
+ # pay nothing). These failures — a worker pruned after a missed heartbeat,
76
+ # jobs orphaned by a dead process, a recurring task that didn't enqueue —
77
+ # bypass Rails.error entirely, so without this they never reach Dispatch.
78
+ @capture_solid_queue = true
79
+
58
80
  # Structured error responses (off by default — opt-in so we never alter a
59
81
  # host app's error contract without being asked).
60
82
  @structured_error_responses = false
@@ -120,12 +142,27 @@ module Dispatch
120
142
  configured? && @capture_exceptions
121
143
  end
122
144
 
145
+ # Fast-path guard for ReportingEndpointMiddleware: own the reporting path only
146
+ # when the host opted in and we have credentials to deliver. Otherwise the
147
+ # middleware passes the request straight through (no surprise 204s).
148
+ def browser_reports_enabled?
149
+ configured? && @capture_browser_reports
150
+ end
151
+
123
152
  # Heartbeats piggyback on the same gating as error capture, plus their own
124
153
  # toggle. Off in non-enabled environments (so dev/test never phone home).
125
154
  def traffic_tracking_enabled?
126
155
  @capture_traffic && error_tracking_enabled? && environment_enabled?
127
156
  end
128
157
 
158
+ # SolidQueue capture shares error capture's gating (credentials + the
159
+ # capture_exceptions master switch + enabled environments), plus its own
160
+ # toggle. The subscriber is wired unconditionally at boot and checks this
161
+ # at event time, so toggling it in an initializer always takes effect.
162
+ def solid_queue_tracking_enabled?
163
+ @capture_solid_queue && error_tracking_enabled? && environment_enabled?
164
+ end
165
+
129
166
  def environment_enabled?
130
167
  list = Array(@enabled_environments).map(&:to_s)
131
168
  list.empty? || list.include?(effective_environment)
@@ -31,12 +31,32 @@ module Dispatch
31
31
  initializer "dispatch-rails.error_capture" do |app|
32
32
  app.config.middleware.use Dispatch::Rails::Middleware
33
33
 
34
+ # Native browser Reporting API endpoint (CSP report-uri/report-to, NEL,
35
+ # deprecation, …). Mounted unconditionally; owns
36
+ # config.reporting_endpoint_path only when the host opts in via
37
+ # config.capture_browser_reports, and passes through otherwise. Sits just
38
+ # inside the capture middleware so a report POST short-circuits to 204
39
+ # before the heartbeat/router layers below it.
40
+ app.config.middleware.use Dispatch::Rails::ReportingEndpointMiddleware
41
+
34
42
  # Catch background/non-request errors (ActiveJob, runners, handle blocks).
35
43
  if ::Rails.respond_to?(:error) && ::Rails.error.respond_to?(:subscribe)
36
44
  ::Rails.error.subscribe(Dispatch::Rails::ErrorSubscriber.new)
37
45
  end
38
46
  end
39
47
 
48
+ # SolidQueue infrastructure failures (a worker pruned after a missed
49
+ # heartbeat, jobs orphaned by a dead process, a recurring task that failed
50
+ # to enqueue) are recorded straight into SolidQueue's own tables and never
51
+ # flow through Rails.error/ActiveJob — so the subscriber above never sees
52
+ # them. Bridge SolidQueue's ActiveSupport::Notifications into capture.
53
+ # Subscribed unconditionally and idempotently; the subscriber no-ops unless
54
+ # config.solid_queue_tracking_enabled?, and the events only ever fire when
55
+ # SolidQueue is actually running.
56
+ initializer "dispatch-rails.solid_queue" do
57
+ Dispatch::Rails::SolidQueueSubscriber.install!
58
+ end
59
+
40
60
  # Per-transaction traffic heartbeats. Mounted unconditionally; the middleware
41
61
  # no-ops at request time unless config.traffic_tracking_enabled? (false in
42
62
  # dev/test), so it never phones home outside enabled environments.
@@ -0,0 +1,172 @@
1
+ require "json"
2
+
3
+ module Dispatch
4
+ module Rails
5
+ # A report the browser delivered through the native Reporting API (deprecation,
6
+ # intervention, NEL, …) — not a Ruby error and not a JS exception.
7
+ class BrowserReport < StandardError; end
8
+ # A Content-Security-Policy violation delivered by the browser's native
9
+ # report-uri/report-to channel. CSP violations never reach window.onerror, so
10
+ # nothing surfaces them unless the browser is told to POST a report here.
11
+ class CspViolation < BrowserReport; end
12
+
13
+ # Terminal Rack endpoint for the browser Reporting API. Point a CSP
14
+ # `report-uri`/`report-to` group (or any Reporting-Endpoints group) at
15
+ # config.reporting_endpoint_path and the browser POSTs its reports straight
16
+ # here — no host controller required. Each report becomes a synthetic exception
17
+ # pushed through Reporter.capture, so it lands as a first-class event with the
18
+ # usual sampling, before_send, and transport applied.
19
+ #
20
+ # Opt-in (config.capture_browser_reports). When disabled — or for any request
21
+ # that isn't a POST to the configured path — it passes straight through. When
22
+ # it does handle a request it answers 204 itself and never calls downstream, so
23
+ # the configured path must be one no host route uses.
24
+ #
25
+ # Pick ONE CSP mechanism: this endpoint OR the JS securitypolicyviolation
26
+ # listener (config.capture_csp_violations). Enabling both, with report-uri
27
+ # aimed here, double-counts every violation.
28
+ class ReportingEndpointMiddleware
29
+ MAX_BODY_BYTES = 64_000
30
+ MAX_REPORTS = 50
31
+
32
+ def initialize(app)
33
+ @app = app
34
+ end
35
+
36
+ def call(env)
37
+ config = Dispatch::Rails.configuration
38
+ return @app.call(env) unless handles?(config, env)
39
+
40
+ # @app.call above is deliberately OUTSIDE any rescue — a normal request's
41
+ # exceptions must propagate to Rails' exception handling, never be
42
+ # swallowed here. Only the report-handling path is rescued.
43
+ handle_report(env)
44
+ end
45
+
46
+ private
47
+
48
+ def handle_report(env)
49
+ capture_reports(env)
50
+ no_content
51
+ rescue StandardError => e
52
+ # A report endpoint must never error the browser's beacon — always 204.
53
+ warn "[dispatch-rails] reporting endpoint failed: #{e.class}: #{e.message}"
54
+ no_content
55
+ end
56
+
57
+ # A FRESH Rack triple per call — downstream middleware mutates the array and
58
+ # headers, so a shared/frozen response would raise or corrupt across requests.
59
+ def no_content
60
+ [204, { "Content-Type" => "text/plain" }, []]
61
+ end
62
+
63
+ def handles?(config, env)
64
+ config.browser_reports_enabled? &&
65
+ env["REQUEST_METHOD"] == "POST" &&
66
+ path_match?(config, env["PATH_INFO"])
67
+ end
68
+
69
+ def path_match?(config, path)
70
+ target = config.reporting_endpoint_path.to_s
71
+ return false if target.empty?
72
+
73
+ path == target || path == "#{target}/"
74
+ end
75
+
76
+ def capture_reports(env)
77
+ raw = read_body(env)
78
+ return if raw.empty?
79
+
80
+ parse_reports(content_type(env), raw).first(MAX_REPORTS).each do |report|
81
+ capture_one(report, env)
82
+ end
83
+ end
84
+
85
+ def capture_one(report, env)
86
+ type = report[:type].to_s
87
+ body = report[:body].is_a?(Hash) ? report[:body] : {}
88
+ csp?(type) ? capture_csp(body, env) : capture_generic(type, body, env)
89
+ end
90
+
91
+ def capture_csp(body, env)
92
+ fields = csp_fields(body)
93
+ directive = fields[:effective_directive] || fields[:violated_directive] || "policy"
94
+ blocked = fields[:blocked_uri] || "inline"
95
+ Reporter.capture(
96
+ CspViolation.new("#{directive} blocked #{blocked}"),
97
+ handled: true, env: env,
98
+ # report-only disposition is monitoring, not a live block — keep it quieter.
99
+ level: fields[:disposition].to_s == "report" ? "info" : "warning",
100
+ context: { tags: { csp: "true", report_type: "csp-violation" }.merge(fields) }
101
+ )
102
+ end
103
+
104
+ def capture_generic(type, body, env)
105
+ label = type.empty? ? "unknown" : type
106
+ Reporter.capture(
107
+ BrowserReport.new("Browser report: #{label}"),
108
+ handled: true, env: env, level: "info",
109
+ context: { tags: { report_type: label }.merge(stringify(body)) }
110
+ )
111
+ end
112
+
113
+ def csp?(type)
114
+ type == "csp-violation" || type == "csp"
115
+ end
116
+
117
+ # Normalize the two CSP spellings: report-uri's hyphen-case
118
+ # (blocked-uri/document-uri) and the Reporting API's camelCase, which also
119
+ # renamed the *-uri fields to *URL.
120
+ def csp_fields(body)
121
+ {
122
+ blocked_uri: body["blocked-uri"] || body["blockedURI"] || body["blockedURL"],
123
+ document_uri: body["document-uri"] || body["documentURI"] || body["documentURL"],
124
+ violated_directive: body["violated-directive"] || body["violatedDirective"],
125
+ effective_directive: body["effective-directive"] || body["effectiveDirective"],
126
+ source_file: body["source-file"] || body["sourceFile"],
127
+ line_number: body["line-number"] || body["lineNumber"],
128
+ column_number: body["column-number"] || body["columnNumber"],
129
+ disposition: body["disposition"]
130
+ }.compact
131
+ end
132
+
133
+ # Legacy report-uri sends a single { "csp-report": {...} } object; the
134
+ # Reporting API sends an array of { "type", "body" } under reports+json.
135
+ def parse_reports(content_type, raw)
136
+ data = JSON.parse(raw)
137
+ if content_type.include?("reports+json") || data.is_a?(Array)
138
+ Array(data).map { |r| { type: r["type"], body: r["body"] } }
139
+ elsif data.is_a?(Hash) && data.key?("csp-report")
140
+ [{ type: "csp-violation", body: data["csp-report"] }]
141
+ elsif data.is_a?(Hash)
142
+ [{ type: "csp-violation", body: data }]
143
+ else
144
+ []
145
+ end
146
+ rescue JSON::ParserError
147
+ []
148
+ end
149
+
150
+ def read_body(env)
151
+ input = env["rack.input"]
152
+ return "" unless input
153
+
154
+ raw = input.read(MAX_BODY_BYTES).to_s
155
+ input.rewind if input.respond_to?(:rewind)
156
+ raw
157
+ rescue StandardError
158
+ ""
159
+ end
160
+
161
+ def content_type(env)
162
+ (env["CONTENT_TYPE"] || env["HTTP_CONTENT_TYPE"]).to_s
163
+ end
164
+
165
+ def stringify(hash)
166
+ return {} unless hash.is_a?(Hash)
167
+
168
+ hash.each_with_object({}) { |(k, v), out| out[k.to_s] = v }
169
+ end
170
+ end
171
+ end
172
+ end
@@ -0,0 +1,168 @@
1
+ require "active_support/notifications"
2
+
3
+ module Dispatch
4
+ module Rails
5
+ # Synthetic exceptions for SolidQueue infrastructure failures that have no
6
+ # Ruby exception of their own (the real cause is written onto SolidQueue's
7
+ # own tables, not raised). thread_error reports the genuine exception instead.
8
+ class SolidQueueError < StandardError; end
9
+ # A worker process died (pruned after a missed heartbeat, or its process went
10
+ # missing) and SolidQueue force-failed the jobs it had claimed. This is the
11
+ # OOM-restart case: Heroku SIGKILLs the dyno, the supervisor later prunes the
12
+ # dead process and fails its claimed jobs straight into solid_queue_failed_executions.
13
+ class SolidQueueJobsLost < SolidQueueError; end
14
+ # A recurring (cron) task could not be enqueued — a silent scheduling miss.
15
+ class SolidQueueEnqueueError < SolidQueueError; end
16
+
17
+ # Bridges SolidQueue's ActiveSupport::Notifications into Dispatch error
18
+ # capture.
19
+ #
20
+ # SolidQueue records its *infrastructure* failures — a worker pruned after a
21
+ # missed heartbeat, claimed jobs orphaned by a dead process, a recurring task
22
+ # that failed to enqueue — straight into its own tables. They are NOT raised
23
+ # through ActiveJob's perform path, so they never reach Rails.error and never
24
+ # reach the Dispatch::Rails::ErrorSubscriber. The result is an invisible class
25
+ # of failure: jobs that died are only discoverable by opening the queue.
26
+ #
27
+ # This subscriber listens for those events and turns each into a first-class
28
+ # Dispatch event. Captured events (all gated by config.capture_solid_queue and
29
+ # the usual environment gating):
30
+ #
31
+ # fail_many_claimed -> SolidQueueJobsLost (pruned / orphaned / dead-worker jobs)
32
+ # thread_error -> the real exception (a crash in a SolidQueue thread)
33
+ # enqueue_recurring_task -> SolidQueueEnqueueError (only when enqueue_error is set)
34
+ #
35
+ # On thread_error: SolidQueue fires this event BEFORE calling on_thread_error
36
+ # (see SolidQueue::AppExecutor#handle_thread_error), and Reporter marks the
37
+ # exception object on capture. So when the default on_thread_error
38
+ # (-> Rails.error.report) runs next, the Dispatch error subscriber dedups it
39
+ # via that marker — no double report. Capturing here means we still see these
40
+ # crashes even when the host app overrides on_thread_error away from
41
+ # Rails.error (a common customization that would otherwise hide them).
42
+ #
43
+ # Idempotent: install! subscribes once per process; repeat calls no-op. The
44
+ # handlers never raise back into SolidQueue's instrumentation.
45
+ module SolidQueueSubscriber
46
+ EVENTS = %w[
47
+ fail_many_claimed.solid_queue
48
+ thread_error.solid_queue
49
+ enqueue_recurring_task.solid_queue
50
+ ].freeze
51
+
52
+ # How many job/process ids to inline into tags before truncating. The
53
+ # payload batches are bounded (prune runs in batches of 50), but we keep
54
+ # tag values short regardless.
55
+ MAX_IDS = 20
56
+
57
+ class << self
58
+ def install!
59
+ return if @installed
60
+
61
+ @installed = true
62
+ @subscriptions = EVENTS.map do |event|
63
+ ActiveSupport::Notifications.subscribe(event) do |name, _start, _finish, _id, payload|
64
+ handle(name, payload)
65
+ end
66
+ end
67
+ true
68
+ end
69
+
70
+ # Test/reset seam — unsubscribe so a fresh install! re-registers.
71
+ def uninstall!
72
+ Array(@subscriptions).each { |sub| ActiveSupport::Notifications.unsubscribe(sub) }
73
+ @subscriptions = nil
74
+ @installed = false
75
+ end
76
+
77
+ def installed?
78
+ @installed == true
79
+ end
80
+
81
+ private
82
+
83
+ # A subscriber that raises surfaces as an InstrumentationSubscriberError
84
+ # inside SolidQueue's prune/dispatch loop, so every path is guarded.
85
+ # Reporter.capture already never raises; this guards the tag-building too.
86
+ def handle(name, payload)
87
+ return unless Dispatch::Rails.configuration.solid_queue_tracking_enabled?
88
+
89
+ case name
90
+ when "fail_many_claimed.solid_queue" then on_jobs_lost(payload)
91
+ when "thread_error.solid_queue" then on_thread_error(payload)
92
+ when "enqueue_recurring_task.solid_queue" then on_enqueue_recurring_task(payload)
93
+ end
94
+ rescue StandardError => e
95
+ warn "[dispatch-rails] solid_queue subscriber failed: #{e.class}: #{e.message}"
96
+ nil
97
+ end
98
+
99
+ # The underlying ProcessPrunedError / ProcessMissingError isn't in the
100
+ # payload — SolidQueue writes it onto the failed_executions rows — so we
101
+ # synthesize a stable error (constant message for clean grouping) and
102
+ # carry the volatile ids/count as tags.
103
+ def on_jobs_lost(payload)
104
+ size = payload[:size].to_i
105
+ return if size.zero?
106
+
107
+ Reporter.capture(
108
+ SolidQueueJobsLost.new(
109
+ "SolidQueue force-failed claimed job(s) after a worker process died"
110
+ ),
111
+ handled: false,
112
+ level: "error",
113
+ context: { tags: {
114
+ source: "solid_queue",
115
+ solid_queue_event: "fail_many_claimed",
116
+ jobs_lost: size.to_s,
117
+ process_ids: truncate_ids(payload[:process_ids]),
118
+ job_ids: truncate_ids(payload[:job_ids])
119
+ }.compact }
120
+ )
121
+ end
122
+
123
+ # The real exception that escaped a SolidQueue thread (supervisor,
124
+ # dispatcher, worker, scheduler) — report it as-is for an accurate class
125
+ # and backtrace. Dedups against the default on_thread_error via the
126
+ # Reporter capture marker (see class comment).
127
+ def on_thread_error(payload)
128
+ error = payload[:error]
129
+ return unless error.is_a?(Exception)
130
+
131
+ Reporter.capture(
132
+ error,
133
+ handled: false,
134
+ level: "error",
135
+ context: { tags: { source: "solid_queue", solid_queue_event: "thread_error" } }
136
+ )
137
+ end
138
+
139
+ # A recurring task failed to enqueue. payload[:enqueue_error] is the
140
+ # message string (no exception object), present only on failure.
141
+ def on_enqueue_recurring_task(payload)
142
+ message = payload[:enqueue_error].to_s
143
+ return if message.empty?
144
+
145
+ task = payload[:task].to_s
146
+ Reporter.capture(
147
+ SolidQueueEnqueueError.new("SolidQueue failed to enqueue recurring task #{task}: #{message}"),
148
+ handled: false,
149
+ level: "error",
150
+ context: { tags: {
151
+ source: "solid_queue",
152
+ solid_queue_event: "enqueue_recurring_task",
153
+ task: task
154
+ }.reject { |_, v| v.nil? || v == "" } }
155
+ )
156
+ end
157
+
158
+ def truncate_ids(ids)
159
+ list = Array(ids)
160
+ return nil if list.empty?
161
+
162
+ shown = list.first(MAX_IDS).join(",")
163
+ list.size > MAX_IDS ? "#{shown},+#{list.size - MAX_IDS} more" : shown
164
+ end
165
+ end
166
+ end
167
+ end
168
+ end
@@ -1,5 +1,5 @@
1
1
  module Dispatch
2
2
  module Rails
3
- VERSION = "0.7.0".freeze
3
+ VERSION = "0.9.0".freeze
4
4
  end
5
5
  end
@@ -4,10 +4,12 @@ require "dispatch/rails/event_builder"
4
4
  require "dispatch/rails/transport"
5
5
  require "dispatch/rails/reporter"
6
6
  require "dispatch/rails/middleware"
7
+ require "dispatch/rails/reporting_endpoint_middleware"
7
8
  require "dispatch/rails/response_annotator"
8
9
  require "dispatch/rails/heartbeat_aggregator"
9
10
  require "dispatch/rails/heartbeat_middleware"
10
11
  require "dispatch/rails/error_subscriber"
12
+ require "dispatch/rails/solid_queue_subscriber"
11
13
  require "dispatch/rails/rake_handler"
12
14
  require "dispatch/rails/engine"
13
15
 
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: dispatch-rails
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.7.0
4
+ version: 0.9.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Dispatch Team
@@ -55,7 +55,9 @@ files:
55
55
  - lib/dispatch/rails/middleware.rb
56
56
  - lib/dispatch/rails/rake_handler.rb
57
57
  - lib/dispatch/rails/reporter.rb
58
+ - lib/dispatch/rails/reporting_endpoint_middleware.rb
58
59
  - lib/dispatch/rails/response_annotator.rb
60
+ - lib/dispatch/rails/solid_queue_subscriber.rb
59
61
  - lib/dispatch/rails/transport.rb
60
62
  - lib/dispatch/rails/version.rb
61
63
  homepage: https://dispatchit.app