speedshop-cloudwatch 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: cb7393054526aee599b81c7f6031475244449ec5e2f6137f509b0fde8811c496
4
+ data.tar.gz: 45eababb5f976f5fe8eeacf13d43a13e1edc3a06b3a57e5f7a127ffa0385324e
5
+ SHA512:
6
+ metadata.gz: f09b6d50481cb2ea0bb2d25a01fe93955321c1f528145a6e864b959d81724e87067e0851d8dd13d78d3864bbf60d6342bba737696350d2f10be50f6b5e06f1ec
7
+ data.tar.gz: 9c719ebf214f5112f95c15b31327ef71dff873b5eeb71ac2c5fbdf2ad1dffcba908abb74867f1429406095b8c7b3712d999650a8ca6d38af6be405ac68cd426e
data/LICENSE.txt ADDED
@@ -0,0 +1,21 @@
1
+ The MIT License (MIT)
2
+
3
+ Copyright (c) 2025 Nate Berkopec
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in
13
+ all copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21
+ THE SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,278 @@
1
+ # Speedshop::Cloudwatch
2
+
3
+ <p align="center">
4
+ <img src="https://github.com/user-attachments/assets/146ce110-311a-4acb-8000-66cd098f8b45" alt="Sit around and watch the clouds all day...">
5
+ </p>
6
+
7
+ This gem helps integrate your Ruby application with AWS CloudWatch for the purposes of auto-scaling. There are integrations for **Puma**, **Rack**, **Sidekiq** and **ActiveJob**.
8
+
9
+ This gem is for **infrastructure and queue metrics**, not application performance metrics, like response times, job execution times, or error rates. Use your APM for that stuff.
10
+
11
+ CloudWatch is unusually difficult to integrate with properly in Ruby, because the AWS library makes a synchronous HTTP request to AWS every time you record a metric. This is unlike the statsd or UDP-based models used by Datadog or other providers, which return more-or-less-instantaneously and are a lot less dangerous to use. Naively implementing this stuff yourself, you could end up adding 20-50ms of delay to your jobs or responses!
12
+
13
+ This library helps you avoid that latency by reporting to CloudWatch in a background thread.
14
+
15
+ This library supports **Ruby 2.7+, Sidekiq 7+, and Puma 6+**.
16
+
17
+ ## Metrics
18
+
19
+ For a full explanation of every metric, [read about them in the code.](./lib/speedshop/cloudwatch/metrics.rb)
20
+
21
+ By default, only essential queue metrics are enabled:
22
+
23
+ ```ruby
24
+ config.metrics[:puma] = [] # Disabled by default
25
+ config.metrics[:sidekiq] = [:QueueLatency] # Per queue
26
+ config.metrics[:rack] = [:RequestQueueTime]
27
+ config.metrics[:active_job] = [:QueueLatency] # Per queue
28
+ ```
29
+
30
+ To enable additional metrics, configure them explicitly:
31
+
32
+ ```ruby
33
+ # Enable all Puma metrics. These are based on reading Puma.stats.
34
+ config.metrics[:puma] = [
35
+ :Workers, :BootedWorkers, :OldWorkers, :Running, :Backlog, :PoolCapacity, :MaxThreads
36
+ ]
37
+
38
+ # Enable additional Sidekiq metrics
39
+ config.metrics[:sidekiq] = [
40
+ :EnqueuedJobs, :ProcessedJobs, :FailedJobs, :ScheduledJobs, :RetryJobs,
41
+ :DeadJobs, :Workers, :Processes, :DefaultQueueLatency, :Capacity,
42
+ :Utilization, :QueueLatency, :QueueSize
43
+ ]
44
+ ```
45
+
46
+ ## Installation
47
+
48
+ ```ruby
49
+ gem 'speedshop-cloudwatch'
50
+ ```
51
+
52
+ See each integration below for instructions on how to setup and configure that integration.
53
+
54
+ ## Configuration
55
+
56
+ You'll need to [configure your CloudWatch API credentials](https://github.com/aws/aws-sdk-ruby?tab=readme-ov-file#configuration), which is usually done via ENV var. If you're using one of their supported auto-config methods, you're good to go. If you're not, you'll need to provide your own `Aws::Cloudwatch::Client` object to the config (see below).
57
+
58
+ ```ruby
59
+ Speedshop::Cloudwatch.configure do |config|
60
+ config.client = Aws::CloudWatch::Client.new
61
+ config.interval = 60
62
+
63
+ # Optional: Custom logger (defaults to Rails.logger if available, otherwise STDOUT)
64
+ config.logger = Logger.new(Rails.root.join("log", "cloudwatch.log"))
65
+
66
+ # Customize which metrics to report (whitelist)
67
+ # Puma metrics are disabled by default, enable them explicitly:
68
+ config.metrics[:puma] = [:Workers, :BootedWorkers, :Running, :Backlog]
69
+ # Sidekiq defaults to [:QueueLatency], add more as needed:
70
+ config.metrics[:sidekiq] = [:EnqueuedJobs, :QueueLatency, :QueueSize]
71
+
72
+ # Customize which Sidekiq queues to monitor (all queues by default)
73
+ config.sidekiq_queues = ["critical", "default", "low_priority"]
74
+
75
+ # Customize CloudWatch namespaces
76
+ config.namespaces[:puma] = "MyApp/Puma"
77
+ config.namespaces[:sidekiq] = "MyApp/Sidekiq"
78
+ config.namespaces[:rack] = "MyApp/Rack"
79
+ config.namespaces[:active_job] = "MyApp/ActiveJob"
80
+
81
+ # Optional: Add custom dimensions to all metrics
82
+ config.dimensions[:Env] = ENV["RAILS_ENV"] || "development"
83
+ end
84
+ ```
85
+
86
+ > [!WARNING]
87
+ > Setting `config.interval` to less than 60 seconds automatically enables [high-resolution storage](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/publishingMetrics.html#high-resolution-metrics) (1-second granularity) in CloudWatch, which incurs additional costs.
88
+
89
+ ### Environment Control
90
+
91
+ **By default, the reporter only runs in production.** The environment is detected from `RAILS_ENV`, `RACK_ENV`, and defaults to `"development"`.
92
+
93
+ ```ruby
94
+ Speedshop::Cloudwatch.configure do |config|
95
+ config.enabled_environments = ["production", "staging"]
96
+ config.environment = "staging" # optional override
97
+ end
98
+ ```
99
+
100
+ ## Puma
101
+
102
+ Puma metrics are disabled by default. You must explicitly enable them in your configuration.
103
+
104
+ Add to your `config/puma.rb`:
105
+
106
+ ```ruby
107
+ require_relative "../config/environment"
108
+
109
+ Speedshop::Cloudwatch.configure do |config|
110
+ config.collectors << :puma
111
+ # Enable Puma metrics (disabled by default)
112
+ config.metrics[:puma] = [
113
+ :Workers, :BootedWorkers, :OldWorkers, :Running, :Backlog, :PoolCapacity, :MaxThreads
114
+ ]
115
+ end
116
+
117
+ # Start the reporter so Puma metrics are collected
118
+ Speedshop::Cloudwatch.start!
119
+ ```
120
+
121
+ Collection runs in the master process and reports per-worker metrics (see below). This works correctly with both `preload_app true` and `false`, as well as single and cluster modes.
122
+
123
+ This reports the following metrics:
124
+
125
+ ```
126
+ Workers - Number of workers configured (Count)
127
+ BootedWorkers - Number of workers currently booted (Count)
128
+ OldWorkers - Number of workers that are old/being phased out (Count)
129
+ Running - Number of threads currently running (Count) [per worker]
130
+ Backlog - Number of requests in the backlog (Count) [per worker]
131
+ PoolCapacity - Current thread pool capacity (Count) [per worker]
132
+ MaxThreads - Maximum number of threads configured (Count) [per worker]
133
+ ```
134
+
135
+ ## Rack
136
+
137
+ If you're using Rails, we'll automatically insert the correct middleware into the stack.
138
+
139
+ If you're using some other Rack-based framework, insert the `Speedshop::Cloudwatch::Rack` high up (i.e. first) in the stack.
140
+
141
+ You will need a reverse proxy, such as nginx, adding an `X-Request-Start` or `X-Queue-Start` header (containing the time since the Unix epoch in milliseconds) to incoming requests. See [New Relic's instructions](https://docs.newrelic.com/docs/apm/applications-menu/features/configure-request-queue-reporting/) for more about how to do this.
142
+
143
+ We report the following metrics:
144
+
145
+ ```
146
+ RequestQueueTime - Time spent waiting in the request queue (Milliseconds)
147
+ ```
148
+
149
+ ### Sidekiq Integration
150
+
151
+ In Sidekiq server processes, this integration auto-registers lifecycle hooks. On startup, it adds the `:sidekiq` collector and starts the reporter (leader-only when using Sidekiq Enterprise).
152
+
153
+ If you're using Sidekiq as your ActiveJob adapter, prefer this integration instead of the ActiveJob integration.
154
+
155
+ By default, only `QueueLatency` is reported. To enable additional metrics, configure them explicitly:
156
+
157
+ ```ruby
158
+ Speedshop::Cloudwatch.configure do |config|
159
+ config.metrics[:sidekiq] = [
160
+ :EnqueuedJobs, :ProcessedJobs, :FailedJobs, :ScheduledJobs, :RetryJobs,
161
+ :DeadJobs, :Workers, :Processes, :DefaultQueueLatency, :Capacity,
162
+ :Utilization, :QueueLatency, :QueueSize
163
+ ]
164
+ end
165
+ ```
166
+
167
+ We report the following metrics:
168
+
169
+ ```
170
+ EnqueuedJobs - Number of jobs currently enqueued (Count)
171
+ ProcessedJobs - Total number of jobs processed (Count)
172
+ FailedJobs - Total number of failed jobs (Count)
173
+ ScheduledJobs - Number of scheduled jobs (Count)
174
+ RetryJobs - Number of jobs in retry queue (Count)
175
+ DeadJobs - Number of dead jobs (Count)
176
+ Workers - Number of Sidekiq workers (Count)
177
+ Processes - Number of Sidekiq processes (Count)
178
+ DefaultQueueLatency - Latency for the default queue (Seconds)
179
+ Capacity - Total concurrency across all processes (Count)
180
+ Utilization - Average utilization across all processes (Percent)
181
+ QueueLatency - Latency for each queue (Seconds) [per queue]
182
+ QueueSize - Size of each queue (Count) [per queue]
183
+ ```
184
+
185
+ Metrics marked [per queue] include a QueueName dimension.
186
+ Utilization metrics include Tag and/or Hostname dimensions.
187
+
188
+ ## ActiveJob
189
+
190
+ > [!WARNING]
191
+ > If you're using Sidekiq, just use that integration and do not include the ActiveJob module.
192
+
193
+ In your ApplicationJob:
194
+
195
+ ```ruby
196
+ include Speedshop::Cloudwatch::ActiveJob
197
+ ```
198
+
199
+ We report the following metrics:
200
+
201
+ ```
202
+ QueueLatency - Time job spent waiting in queue before execution (Seconds)
203
+ ```
204
+
205
+ This metric includes QueueName dimension and is aggregated per interval using CloudWatch StatisticSets.
206
+
207
+ ## Rails
208
+
209
+ When running in a Rails app we:
210
+
211
+ 1. Automatically insert the Rack middleware at index 0.
212
+ 2. Respect your configuration for enabled metrics and collectors. The reporter starts automatically the first time a metric is reported (e.g., via Rack middleware) or when you call `Speedshop::Cloudwatch.start!` yourself (e.g., in Puma or initializers).
213
+
214
+ If you want full control over these behaviors, add `require: false` to your Gemfile:
215
+
216
+ ```ruby
217
+ gem 'speedshop-cloudwatch', require: false
218
+ ```
219
+
220
+ Then manually require the core module without the railtie:
221
+
222
+ ```ruby
223
+ # config/initializers/speedshop-cloudwatch.rb
224
+ require 'speedshop/cloudwatch'
225
+
226
+ # Insert middleware manually (if using Rack integration)
227
+ Rails.application.config.middleware.insert_before 0, Speedshop::Cloudwatch::Rack
228
+
229
+ Rails.application.configure do
230
+ config.after_initialize do
231
+ Speedshop::Cloudwatch.start!
232
+ end
233
+ end
234
+ ```
235
+
236
+ ## Non-Rails Apps
237
+
238
+ For Rack apps (Sinatra, etc.):
239
+
240
+ - Insert `Speedshop::Cloudwatch::Rack` at the top of your middleware stack.
241
+ - Configure collectors and start the reporter during app boot.
242
+
243
+ Example config:
244
+
245
+ ```ruby
246
+ require 'speedshop/cloudwatch'
247
+
248
+ Speedshop::Cloudwatch.configure do |config|
249
+ # ...
250
+ end
251
+
252
+ Speedshop::Cloudwatch.start!
253
+ ```
254
+
255
+ ## Disabling Automatic Integration
256
+
257
+ You can disable the auto-integration of Sidekiq and Puma by not requiring them:
258
+
259
+ ```ruby
260
+ gem 'speedshop-cloudwatch', require: false
261
+ ```
262
+
263
+ ```ruby
264
+ # some_initializer.rb
265
+ require 'speedshop/cloudwatch'
266
+ require 'speedshop/cloudwatch/puma'
267
+ require 'speedshop/cloudwatch/active_job'
268
+ require 'speedshop/cloudwatch/rack'
269
+ # require 'speedshop/cloudwatch/sidekiq'
270
+ ```
271
+
272
+ ## Bibliography
273
+
274
+ This library was developed with reference to and inspiration from these excellent projects:
275
+
276
+ - [sidekiq-cloudwatchmetrics](https://github.com/sj26/sidekiq-cloudwatchmetrics) - Sidekiq CloudWatch metrics integration (portions adapted, see lib/speedshop/cloudwatch/sidekiq.rb)
277
+ - [puma-cloudwatch](https://github.com/boltops-tools/puma-cloudwatch) - Puma CloudWatch metrics reporter
278
+ - [judoscale-ruby](https://github.com/judoscale/judoscale-ruby) - Autoscaling metrics collection patterns
data/Rakefile ADDED
@@ -0,0 +1,40 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "bundler/gem_tasks"
4
+ require "rake/testtask"
5
+
6
+ require "standard/rake"
7
+
8
+ FLOG_THRESHOLD = (ENV["FLOG_THRESHOLD"] || 50).to_i
9
+ FLAY_THRESHOLD = (ENV["FLAY_THRESHOLD"] || 100).to_i
10
+
11
+ Rake::TestTask.new(:test) do |t|
12
+ t.libs << "test"
13
+ t.libs << "lib"
14
+ t.test_files = FileList["test/**/*_test.rb"]
15
+ end
16
+
17
+ desc "Run flog"
18
+ task :flog do
19
+ flog_output = `bundle exec flog -a lib test`
20
+ puts flog_output
21
+ method_scores = flog_output.lines.grep(/^\s+[0-9]+\.[0-9]+:.*#/).reject { |line| line.include?("main#none") }.map { |line| line.split.first.to_f }
22
+ max_score = method_scores.max
23
+ if max_score && max_score > FLOG_THRESHOLD
24
+ abort "flog failed: highest complexity (#{max_score}) exceeds threshold (#{FLOG_THRESHOLD})"
25
+ end
26
+ puts "flog passed (max complexity: #{max_score}, threshold: #{FLOG_THRESHOLD})"
27
+ end
28
+
29
+ desc "Run flay"
30
+ task :flay do
31
+ flay_output = `bundle exec flay lib test`
32
+ puts flay_output
33
+ flay_score = flay_output[/Total score.*?=\s*(\d+)/, 1]&.to_i
34
+ if flay_score && flay_score > FLAY_THRESHOLD
35
+ abort "flay failed: duplication score (#{flay_score}) exceeds threshold (#{FLAY_THRESHOLD})"
36
+ end
37
+ puts "flay passed (duplication score: #{flay_score}, threshold: #{FLAY_THRESHOLD})"
38
+ end
39
+
40
+ task default: [:test, :standard, :flog, :flay]
data/bin/console ADDED
@@ -0,0 +1,11 @@
1
+ #!/usr/bin/env ruby
2
+ # frozen_string_literal: true
3
+
4
+ require "bundler/setup"
5
+ require "speedshop/cloudwatch/all"
6
+
7
+ # You can add fixtures and/or initialization code here to make experimenting
8
+ # with your gem easier. You can also use a different console, if you like.
9
+
10
+ require "irb"
11
+ IRB.start(__FILE__)
data/bin/setup ADDED
@@ -0,0 +1,15 @@
1
+ #!/usr/bin/env bash
2
+ set -euo pipefail
3
+ IFS=$'\n\t'
4
+ set -vx
5
+
6
+ bundle install
7
+
8
+ # Start Redis in Docker if port 6379 is free and docker is available
9
+ if ! nc -z localhost 6379 2>/dev/null; then
10
+ if command -v docker &>/dev/null; then
11
+ docker run -d --name speedshop-cloudwatch-redis -p 6379:6379 redis 2>/dev/null || true
12
+ fi
13
+ fi
14
+
15
+ # Do any other automated setup that you need to do here
@@ -0,0 +1,24 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Speedshop
4
+ module Cloudwatch
5
+ module ActiveJob
6
+ def self.included(base)
7
+ base.around_perform :report_job_metrics
8
+ end
9
+
10
+ def report_job_metrics
11
+ begin
12
+ if enqueued_at
13
+ queue_time = Time.now.to_f - enqueued_at.to_f
14
+ # Drop JobClass to reduce time series cardinality and allow aggregation into StatisticSets per queue
15
+ Reporter.instance.report(metric: :QueueLatency, value: queue_time, dimensions: {QueueName: queue_name}, integration: :active_job)
16
+ end
17
+ rescue => e
18
+ Speedshop::Cloudwatch.log_error("Failed to collect ActiveJob metrics: #{e.message}", e)
19
+ end
20
+ yield
21
+ end
22
+ end
23
+ end
24
+ end
@@ -0,0 +1,8 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "speedshop/cloudwatch"
4
+ require "speedshop/cloudwatch/active_job"
5
+ require "speedshop/cloudwatch/puma"
6
+ require "speedshop/cloudwatch/rack"
7
+ require "speedshop/cloudwatch/sidekiq"
8
+ require "speedshop/cloudwatch/railtie" if defined?(Rails)
@@ -0,0 +1,61 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "logger"
4
+ require "singleton"
5
+
6
+ module Speedshop
7
+ module Cloudwatch
8
+ class Config
9
+ include Singleton
10
+
11
+ attr_accessor :interval, :metrics, :namespaces, :logger, :queue_max_size, :sidekiq_queues, :dimensions,
12
+ :collectors, :enabled_environments, :environment
13
+ attr_writer :client
14
+
15
+ def initialize
16
+ reset
17
+ end
18
+
19
+ def client
20
+ @client ||= Aws::CloudWatch::Client.new
21
+ end
22
+
23
+ def reset
24
+ @interval = 60
25
+ @queue_max_size = 1000
26
+ @client = nil
27
+ @metrics = {
28
+ puma: [],
29
+ sidekiq: [:QueueLatency],
30
+ rack: [:RequestQueueTime],
31
+ active_job: [:QueueLatency]
32
+ }
33
+ @namespaces = {puma: "Puma", sidekiq: "Sidekiq", rack: "Rack", active_job: "ActiveJob"}
34
+ @sidekiq_queues = nil
35
+ @dimensions = {}
36
+ @logger = (defined?(Rails) && Rails.respond_to?(:logger) && Rails.logger) ? Rails.logger : Logger.new($stdout)
37
+ @collectors = [] # [:puma, :sidekiq]
38
+ @enabled_environments = ["production"]
39
+ @environment = detect_environment
40
+ end
41
+
42
+ def environment_enabled?
43
+ enabled_environments.include?(environment)
44
+ end
45
+
46
+ def self.reset
47
+ if instance_variable_defined?(:@singleton__instance__)
48
+ config = instance_variable_get(:@singleton__instance__)
49
+ config&.reset
50
+ end
51
+ instance_variable_set(:@singleton__instance__, nil)
52
+ end
53
+
54
+ private
55
+
56
+ def detect_environment
57
+ ENV.fetch("RAILS_ENV", ENV.fetch("RACK_ENV", "development"))
58
+ end
59
+ end
60
+ end
61
+ end
@@ -0,0 +1,181 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Speedshop
4
+ module Cloudwatch
5
+ Metric = Struct.new(:name, :unit, :description, :source, keyword_init: true)
6
+
7
+ METRICS = {
8
+ puma: [
9
+ # Cluster-only metrics that report on the overall Puma process
10
+ Metric.new(
11
+ name: :Workers,
12
+ unit: "Count",
13
+ description: "Total number of workers (child processes) configured in " \
14
+ "Puma. This is a static configuration value.",
15
+ source: "Puma.stats_hash[:workers]"
16
+ ),
17
+ Metric.new(
18
+ name: :BootedWorkers,
19
+ unit: "Count",
20
+ description: "Number of worker processes currently running and ready to " \
21
+ "handle requests. Should match Workers count under normal " \
22
+ "operation.",
23
+ source: "Puma.stats_hash[:booted_workers]"
24
+ ),
25
+ Metric.new(
26
+ name: :OldWorkers,
27
+ unit: "Count",
28
+ description: "Number of worker processes being phased out during a " \
29
+ "restart. Zero during normal operation.",
30
+ source: "Puma.stats_hash[:old_workers]"
31
+ ),
32
+
33
+ # Per-worker metrics (also available in single mode with WorkerIndex=0)
34
+ Metric.new(
35
+ name: :Running,
36
+ unit: "Count",
37
+ description: "Number of threads currently processing requests in this " \
38
+ "worker. Compare against MaxThreads to understand " \
39
+ "utilization.",
40
+ source: "worker_stats[:running]"
41
+ ),
42
+ Metric.new(
43
+ name: :Backlog,
44
+ unit: "Count",
45
+ description: "Number of requests waiting to be processed by this worker. " \
46
+ "Sustained backlog suggests over capacity.",
47
+ source: "worker_stats[:backlog]"
48
+ ),
49
+ Metric.new(
50
+ name: :PoolCapacity,
51
+ unit: "Count",
52
+ description: "Number of threads available to handle new requests. When " \
53
+ "this reaches zero, requests queue in the backlog.",
54
+ source: "worker_stats[:pool_capacity]"
55
+ ),
56
+ Metric.new(
57
+ name: :MaxThreads,
58
+ unit: "Count",
59
+ description: "Maximum number of threads configured for this worker. This " \
60
+ "is a static configuration value.",
61
+ source: "worker_stats[:max_threads]"
62
+ )
63
+ ],
64
+
65
+ sidekiq: [
66
+ # Overall Sidekiq statistics
67
+ Metric.new(
68
+ name: :EnqueuedJobs,
69
+ unit: "Count",
70
+ description: "Total number of jobs currently enqueued across all queues, waiting to be processed.",
71
+ source: "Sidekiq::Stats.new.enqueued"
72
+ ),
73
+ Metric.new(
74
+ name: :ProcessedJobs,
75
+ unit: "Count",
76
+ description: "Cumulative count of all jobs successfully processed since Sidekiq started.",
77
+ source: "Sidekiq::Stats.new.processed"
78
+ ),
79
+ Metric.new(
80
+ name: :FailedJobs,
81
+ unit: "Count",
82
+ description: "Cumulative count of all jobs that have failed since Sidekiq started.",
83
+ source: "Sidekiq::Stats.new.failed"
84
+ ),
85
+ Metric.new(
86
+ name: :ScheduledJobs,
87
+ unit: "Count",
88
+ description: "Number of jobs scheduled to run at a future time. Will move " \
89
+ "to enqueued when their scheduled time arrives.",
90
+ source: "Sidekiq::Stats.new.scheduled_size"
91
+ ),
92
+ Metric.new(
93
+ name: :RetryJobs,
94
+ unit: "Count",
95
+ description: "Number of jobs in the retry queue. These jobs failed but " \
96
+ "have not exhausted their retry attempts.",
97
+ source: "Sidekiq::Stats.new.retry_size"
98
+ ),
99
+ Metric.new(
100
+ name: :DeadJobs,
101
+ unit: "Count",
102
+ description: "Number of jobs in the dead queue. These jobs exhausted all retry attempts.",
103
+ source: "Sidekiq::Stats.new.dead_size"
104
+ ),
105
+ Metric.new(
106
+ name: :Workers,
107
+ unit: "Count",
108
+ description: "Total number of worker threads currently processing jobs across all Sidekiq processes.",
109
+ source: "Sidekiq::Stats.new.workers_size"
110
+ ),
111
+ Metric.new(
112
+ name: :Processes,
113
+ unit: "Count",
114
+ description: "Number of Sidekiq server processes currently running.",
115
+ source: "Sidekiq::Stats.new.processes_size"
116
+ ),
117
+ Metric.new(
118
+ name: :DefaultQueueLatency,
119
+ unit: "Seconds",
120
+ description: "Time the oldest job in the default queue has been waiting. Zero if the queue is empty.",
121
+ source: "Sidekiq::Stats.new.default_queue_latency"
122
+ ),
123
+
124
+ # Process-level metrics
125
+ Metric.new(
126
+ name: :Capacity,
127
+ unit: "Count",
128
+ description: "Total number of worker threads available across all Sidekiq " \
129
+ "processes. Can be tagged by process tag.",
130
+ source: "Sum of process['concurrency'] across processes"
131
+ ),
132
+ Metric.new(
133
+ name: :Utilization,
134
+ unit: "Percent",
135
+ description: "Average percentage of worker threads currently busy. 100% " \
136
+ "means all workers are busy. Can be reported by tag or " \
137
+ "hostname.",
138
+ source: "Average of busy / concurrency * 100 across processes"
139
+ ),
140
+
141
+ # Queue-specific metrics
142
+ Metric.new(
143
+ name: :QueueLatency,
144
+ unit: "Seconds",
145
+ description: "Time the oldest job in this queue has been waiting. High " \
146
+ "latency indicates jobs are backing up.",
147
+ source: "Sidekiq::Queue#latency"
148
+ ),
149
+ Metric.new(
150
+ name: :QueueSize,
151
+ unit: "Count",
152
+ description: "Number of jobs currently waiting in this queue. Growing " \
153
+ "size indicates jobs arriving faster than processing.",
154
+ source: "Sidekiq::Queue#size"
155
+ )
156
+ ],
157
+
158
+ rack: [
159
+ Metric.new(
160
+ name: :RequestQueueTime,
161
+ unit: "Milliseconds",
162
+ description: "Time a request spent waiting in the reverse proxy before " \
163
+ "reaching the application. High values indicate requests " \
164
+ "backing up before reaching your application server.",
165
+ source: "(Time.now.to_f * 1000) - HTTP_X_REQUEST_START"
166
+ )
167
+ ],
168
+
169
+ active_job: [
170
+ Metric.new(
171
+ name: :QueueLatency,
172
+ unit: "Seconds",
173
+ description: "Time a job spent waiting in the queue before execution " \
174
+ "started. Values are aggregated into CloudWatch " \
175
+ "StatisticSets per reporting interval.",
176
+ source: "Time.now.to_f - job.enqueued_at"
177
+ )
178
+ ]
179
+ }.freeze
180
+ end
181
+ end