speedshop-cloudwatch 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/LICENSE.txt +21 -0
- data/README.md +278 -0
- data/Rakefile +40 -0
- data/bin/console +11 -0
- data/bin/setup +15 -0
- data/lib/speedshop/cloudwatch/active_job.rb +24 -0
- data/lib/speedshop/cloudwatch/all.rb +8 -0
- data/lib/speedshop/cloudwatch/config.rb +61 -0
- data/lib/speedshop/cloudwatch/metrics.rb +181 -0
- data/lib/speedshop/cloudwatch/puma.rb +57 -0
- data/lib/speedshop/cloudwatch/rack.rb +23 -0
- data/lib/speedshop/cloudwatch/railtie.rb +19 -0
- data/lib/speedshop/cloudwatch/reporter.rb +315 -0
- data/lib/speedshop/cloudwatch/sidekiq.rb +118 -0
- data/lib/speedshop/cloudwatch/version.rb +7 -0
- data/lib/speedshop/cloudwatch.rb +48 -0
- data/lib/speedshop-cloudwatch.rb +3 -0
- data/speedshop-cloudwatch.gemspec +30 -0
- metadata +81 -0
checksums.yaml
ADDED
|
@@ -0,0 +1,7 @@
|
|
|
1
|
+
---
|
|
2
|
+
SHA256:
|
|
3
|
+
metadata.gz: cb7393054526aee599b81c7f6031475244449ec5e2f6137f509b0fde8811c496
|
|
4
|
+
data.tar.gz: 45eababb5f976f5fe8eeacf13d43a13e1edc3a06b3a57e5f7a127ffa0385324e
|
|
5
|
+
SHA512:
|
|
6
|
+
metadata.gz: f09b6d50481cb2ea0bb2d25a01fe93955321c1f528145a6e864b959d81724e87067e0851d8dd13d78d3864bbf60d6342bba737696350d2f10be50f6b5e06f1ec
|
|
7
|
+
data.tar.gz: 9c719ebf214f5112f95c15b31327ef71dff873b5eeb71ac2c5fbdf2ad1dffcba908abb74867f1429406095b8c7b3712d999650a8ca6d38af6be405ac68cd426e
|
data/LICENSE.txt
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
The MIT License (MIT)
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2025 Nate Berkopec
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in
|
|
13
|
+
all copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
|
|
21
|
+
THE SOFTWARE.
|
data/README.md
ADDED
|
@@ -0,0 +1,278 @@
|
|
|
1
|
+
# Speedshop::Cloudwatch
|
|
2
|
+
|
|
3
|
+
<p align="center">
|
|
4
|
+
<img src="https://github.com/user-attachments/assets/146ce110-311a-4acb-8000-66cd098f8b45" alt="Sit around and watch the clouds all day...">
|
|
5
|
+
</p>
|
|
6
|
+
|
|
7
|
+
This gem helps integrate your Ruby application with AWS CloudWatch for the purposes of auto-scaling. There are integrations for **Puma**, **Rack**, **Sidekiq** and **ActiveJob**.
|
|
8
|
+
|
|
9
|
+
This gem is for **infrastructure and queue metrics**, not application performance metrics, like response times, job execution times, or error rates. Use your APM for that stuff.
|
|
10
|
+
|
|
11
|
+
CloudWatch is unusually difficult to integrate with properly in Ruby, because the AWS library makes a synchronous HTTP request to AWS every time you record a metric. This is unlike the statsd or UDP-based models used by Datadog or other providers, which return more-or-less-instantaneously and are a lot less dangerous to use. Naively implementing this stuff yourself, you could end up adding 20-50ms of delay to your jobs or responses!
|
|
12
|
+
|
|
13
|
+
This library helps you avoid that latency by reporting to CloudWatch in a background thread.
|
|
14
|
+
|
|
15
|
+
This library supports **Ruby 2.7+, Sidekiq 7+, and Puma 6+**.
|
|
16
|
+
|
|
17
|
+
## Metrics
|
|
18
|
+
|
|
19
|
+
For a full explanation of every metric, [read about them in the code.](./lib/speedshop/cloudwatch/metrics.rb)
|
|
20
|
+
|
|
21
|
+
By default, only essential queue metrics are enabled:
|
|
22
|
+
|
|
23
|
+
```ruby
|
|
24
|
+
config.metrics[:puma] = [] # Disabled by default
|
|
25
|
+
config.metrics[:sidekiq] = [:QueueLatency] # Per queue
|
|
26
|
+
config.metrics[:rack] = [:RequestQueueTime]
|
|
27
|
+
config.metrics[:active_job] = [:QueueLatency] # Per queue
|
|
28
|
+
```
|
|
29
|
+
|
|
30
|
+
To enable additional metrics, configure them explicitly:
|
|
31
|
+
|
|
32
|
+
```ruby
|
|
33
|
+
# Enable all Puma metrics. These are based on reading Puma.stats.
|
|
34
|
+
config.metrics[:puma] = [
|
|
35
|
+
:Workers, :BootedWorkers, :OldWorkers, :Running, :Backlog, :PoolCapacity, :MaxThreads
|
|
36
|
+
]
|
|
37
|
+
|
|
38
|
+
# Enable additional Sidekiq metrics
|
|
39
|
+
config.metrics[:sidekiq] = [
|
|
40
|
+
:EnqueuedJobs, :ProcessedJobs, :FailedJobs, :ScheduledJobs, :RetryJobs,
|
|
41
|
+
:DeadJobs, :Workers, :Processes, :DefaultQueueLatency, :Capacity,
|
|
42
|
+
:Utilization, :QueueLatency, :QueueSize
|
|
43
|
+
]
|
|
44
|
+
```
|
|
45
|
+
|
|
46
|
+
## Installation
|
|
47
|
+
|
|
48
|
+
```ruby
|
|
49
|
+
gem 'speedshop-cloudwatch'
|
|
50
|
+
```
|
|
51
|
+
|
|
52
|
+
See each integration below for instructions on how to setup and configure that integration.
|
|
53
|
+
|
|
54
|
+
## Configuration
|
|
55
|
+
|
|
56
|
+
You'll need to [configure your CloudWatch API credentials](https://github.com/aws/aws-sdk-ruby?tab=readme-ov-file#configuration), which is usually done via ENV var. If you're using one of their supported auto-config methods, you're good to go. If you're not, you'll need to provide your own `Aws::Cloudwatch::Client` object to the config (see below).
|
|
57
|
+
|
|
58
|
+
```ruby
|
|
59
|
+
Speedshop::Cloudwatch.configure do |config|
|
|
60
|
+
config.client = Aws::CloudWatch::Client.new
|
|
61
|
+
config.interval = 60
|
|
62
|
+
|
|
63
|
+
# Optional: Custom logger (defaults to Rails.logger if available, otherwise STDOUT)
|
|
64
|
+
config.logger = Logger.new(Rails.root.join("log", "cloudwatch.log"))
|
|
65
|
+
|
|
66
|
+
# Customize which metrics to report (whitelist)
|
|
67
|
+
# Puma metrics are disabled by default, enable them explicitly:
|
|
68
|
+
config.metrics[:puma] = [:Workers, :BootedWorkers, :Running, :Backlog]
|
|
69
|
+
# Sidekiq defaults to [:QueueLatency], add more as needed:
|
|
70
|
+
config.metrics[:sidekiq] = [:EnqueuedJobs, :QueueLatency, :QueueSize]
|
|
71
|
+
|
|
72
|
+
# Customize which Sidekiq queues to monitor (all queues by default)
|
|
73
|
+
config.sidekiq_queues = ["critical", "default", "low_priority"]
|
|
74
|
+
|
|
75
|
+
# Customize CloudWatch namespaces
|
|
76
|
+
config.namespaces[:puma] = "MyApp/Puma"
|
|
77
|
+
config.namespaces[:sidekiq] = "MyApp/Sidekiq"
|
|
78
|
+
config.namespaces[:rack] = "MyApp/Rack"
|
|
79
|
+
config.namespaces[:active_job] = "MyApp/ActiveJob"
|
|
80
|
+
|
|
81
|
+
# Optional: Add custom dimensions to all metrics
|
|
82
|
+
config.dimensions[:Env] = ENV["RAILS_ENV"] || "development"
|
|
83
|
+
end
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
> [!WARNING]
|
|
87
|
+
> Setting `config.interval` to less than 60 seconds automatically enables [high-resolution storage](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/publishingMetrics.html#high-resolution-metrics) (1-second granularity) in CloudWatch, which incurs additional costs.
|
|
88
|
+
|
|
89
|
+
### Environment Control
|
|
90
|
+
|
|
91
|
+
**By default, the reporter only runs in production.** The environment is detected from `RAILS_ENV`, `RACK_ENV`, and defaults to `"development"`.
|
|
92
|
+
|
|
93
|
+
```ruby
|
|
94
|
+
Speedshop::Cloudwatch.configure do |config|
|
|
95
|
+
config.enabled_environments = ["production", "staging"]
|
|
96
|
+
config.environment = "staging" # optional override
|
|
97
|
+
end
|
|
98
|
+
```
|
|
99
|
+
|
|
100
|
+
## Puma
|
|
101
|
+
|
|
102
|
+
Puma metrics are disabled by default. You must explicitly enable them in your configuration.
|
|
103
|
+
|
|
104
|
+
Add to your `config/puma.rb`:
|
|
105
|
+
|
|
106
|
+
```ruby
|
|
107
|
+
require_relative "../config/environment"
|
|
108
|
+
|
|
109
|
+
Speedshop::Cloudwatch.configure do |config|
|
|
110
|
+
config.collectors << :puma
|
|
111
|
+
# Enable Puma metrics (disabled by default)
|
|
112
|
+
config.metrics[:puma] = [
|
|
113
|
+
:Workers, :BootedWorkers, :OldWorkers, :Running, :Backlog, :PoolCapacity, :MaxThreads
|
|
114
|
+
]
|
|
115
|
+
end
|
|
116
|
+
|
|
117
|
+
# Start the reporter so Puma metrics are collected
|
|
118
|
+
Speedshop::Cloudwatch.start!
|
|
119
|
+
```
|
|
120
|
+
|
|
121
|
+
Collection runs in the master process and reports per-worker metrics (see below). This works correctly with both `preload_app true` and `false`, as well as single and cluster modes.
|
|
122
|
+
|
|
123
|
+
This reports the following metrics:
|
|
124
|
+
|
|
125
|
+
```
|
|
126
|
+
Workers - Number of workers configured (Count)
|
|
127
|
+
BootedWorkers - Number of workers currently booted (Count)
|
|
128
|
+
OldWorkers - Number of workers that are old/being phased out (Count)
|
|
129
|
+
Running - Number of threads currently running (Count) [per worker]
|
|
130
|
+
Backlog - Number of requests in the backlog (Count) [per worker]
|
|
131
|
+
PoolCapacity - Current thread pool capacity (Count) [per worker]
|
|
132
|
+
MaxThreads - Maximum number of threads configured (Count) [per worker]
|
|
133
|
+
```
|
|
134
|
+
|
|
135
|
+
## Rack
|
|
136
|
+
|
|
137
|
+
If you're using Rails, we'll automatically insert the correct middleware into the stack.
|
|
138
|
+
|
|
139
|
+
If you're using some other Rack-based framework, insert the `Speedshop::Cloudwatch::Rack` high up (i.e. first) in the stack.
|
|
140
|
+
|
|
141
|
+
You will need a reverse proxy, such as nginx, adding an `X-Request-Start` or `X-Queue-Start` header (containing the time since the Unix epoch in milliseconds) to incoming requests. See [New Relic's instructions](https://docs.newrelic.com/docs/apm/applications-menu/features/configure-request-queue-reporting/) for more about how to do this.
|
|
142
|
+
|
|
143
|
+
We report the following metrics:
|
|
144
|
+
|
|
145
|
+
```
|
|
146
|
+
RequestQueueTime - Time spent waiting in the request queue (Milliseconds)
|
|
147
|
+
```
|
|
148
|
+
|
|
149
|
+
### Sidekiq Integration
|
|
150
|
+
|
|
151
|
+
In Sidekiq server processes, this integration auto-registers lifecycle hooks. On startup, it adds the `:sidekiq` collector and starts the reporter (leader-only when using Sidekiq Enterprise).
|
|
152
|
+
|
|
153
|
+
If you're using Sidekiq as your ActiveJob adapter, prefer this integration instead of the ActiveJob integration.
|
|
154
|
+
|
|
155
|
+
By default, only `QueueLatency` is reported. To enable additional metrics, configure them explicitly:
|
|
156
|
+
|
|
157
|
+
```ruby
|
|
158
|
+
Speedshop::Cloudwatch.configure do |config|
|
|
159
|
+
config.metrics[:sidekiq] = [
|
|
160
|
+
:EnqueuedJobs, :ProcessedJobs, :FailedJobs, :ScheduledJobs, :RetryJobs,
|
|
161
|
+
:DeadJobs, :Workers, :Processes, :DefaultQueueLatency, :Capacity,
|
|
162
|
+
:Utilization, :QueueLatency, :QueueSize
|
|
163
|
+
]
|
|
164
|
+
end
|
|
165
|
+
```
|
|
166
|
+
|
|
167
|
+
We report the following metrics:
|
|
168
|
+
|
|
169
|
+
```
|
|
170
|
+
EnqueuedJobs - Number of jobs currently enqueued (Count)
|
|
171
|
+
ProcessedJobs - Total number of jobs processed (Count)
|
|
172
|
+
FailedJobs - Total number of failed jobs (Count)
|
|
173
|
+
ScheduledJobs - Number of scheduled jobs (Count)
|
|
174
|
+
RetryJobs - Number of jobs in retry queue (Count)
|
|
175
|
+
DeadJobs - Number of dead jobs (Count)
|
|
176
|
+
Workers - Number of Sidekiq workers (Count)
|
|
177
|
+
Processes - Number of Sidekiq processes (Count)
|
|
178
|
+
DefaultQueueLatency - Latency for the default queue (Seconds)
|
|
179
|
+
Capacity - Total concurrency across all processes (Count)
|
|
180
|
+
Utilization - Average utilization across all processes (Percent)
|
|
181
|
+
QueueLatency - Latency for each queue (Seconds) [per queue]
|
|
182
|
+
QueueSize - Size of each queue (Count) [per queue]
|
|
183
|
+
```
|
|
184
|
+
|
|
185
|
+
Metrics marked [per queue] include a QueueName dimension.
|
|
186
|
+
Utilization metrics include Tag and/or Hostname dimensions.
|
|
187
|
+
|
|
188
|
+
## ActiveJob
|
|
189
|
+
|
|
190
|
+
> [!WARNING]
|
|
191
|
+
> If you're using Sidekiq, just use that integration and do not include the ActiveJob module.
|
|
192
|
+
|
|
193
|
+
In your ApplicationJob:
|
|
194
|
+
|
|
195
|
+
```ruby
|
|
196
|
+
include Speedshop::Cloudwatch::ActiveJob
|
|
197
|
+
```
|
|
198
|
+
|
|
199
|
+
We report the following metrics:
|
|
200
|
+
|
|
201
|
+
```
|
|
202
|
+
QueueLatency - Time job spent waiting in queue before execution (Seconds)
|
|
203
|
+
```
|
|
204
|
+
|
|
205
|
+
This metric includes QueueName dimension and is aggregated per interval using CloudWatch StatisticSets.
|
|
206
|
+
|
|
207
|
+
## Rails
|
|
208
|
+
|
|
209
|
+
When running in a Rails app we:
|
|
210
|
+
|
|
211
|
+
1. Automatically insert the Rack middleware at index 0.
|
|
212
|
+
2. Respect your configuration for enabled metrics and collectors. The reporter starts automatically the first time a metric is reported (e.g., via Rack middleware) or when you call `Speedshop::Cloudwatch.start!` yourself (e.g., in Puma or initializers).
|
|
213
|
+
|
|
214
|
+
If you want full control over these behaviors, add `require: false` to your Gemfile:
|
|
215
|
+
|
|
216
|
+
```ruby
|
|
217
|
+
gem 'speedshop-cloudwatch', require: false
|
|
218
|
+
```
|
|
219
|
+
|
|
220
|
+
Then manually require the core module without the railtie:
|
|
221
|
+
|
|
222
|
+
```ruby
|
|
223
|
+
# config/initializers/speedshop-cloudwatch.rb
|
|
224
|
+
require 'speedshop/cloudwatch'
|
|
225
|
+
|
|
226
|
+
# Insert middleware manually (if using Rack integration)
|
|
227
|
+
Rails.application.config.middleware.insert_before 0, Speedshop::Cloudwatch::Rack
|
|
228
|
+
|
|
229
|
+
Rails.application.configure do
|
|
230
|
+
config.after_initialize do
|
|
231
|
+
Speedshop::Cloudwatch.start!
|
|
232
|
+
end
|
|
233
|
+
end
|
|
234
|
+
```
|
|
235
|
+
|
|
236
|
+
## Non-Rails Apps
|
|
237
|
+
|
|
238
|
+
For Rack apps (Sinatra, etc.):
|
|
239
|
+
|
|
240
|
+
- Insert `Speedshop::Cloudwatch::Rack` at the top of your middleware stack.
|
|
241
|
+
- Configure collectors and start the reporter during app boot.
|
|
242
|
+
|
|
243
|
+
Example config:
|
|
244
|
+
|
|
245
|
+
```ruby
|
|
246
|
+
require 'speedshop/cloudwatch'
|
|
247
|
+
|
|
248
|
+
Speedshop::Cloudwatch.configure do |config|
|
|
249
|
+
# ...
|
|
250
|
+
end
|
|
251
|
+
|
|
252
|
+
Speedshop::Cloudwatch.start!
|
|
253
|
+
```
|
|
254
|
+
|
|
255
|
+
## Disabling Automatic Integration
|
|
256
|
+
|
|
257
|
+
You can disable the auto-integration of Sidekiq and Puma by not requiring them:
|
|
258
|
+
|
|
259
|
+
```ruby
|
|
260
|
+
gem 'speedshop-cloudwatch', require: false
|
|
261
|
+
```
|
|
262
|
+
|
|
263
|
+
```ruby
|
|
264
|
+
# some_initializer.rb
|
|
265
|
+
require 'speedshop/cloudwatch'
|
|
266
|
+
require 'speedshop/cloudwatch/puma'
|
|
267
|
+
require 'speedshop/cloudwatch/active_job'
|
|
268
|
+
require 'speedshop/cloudwatch/rack'
|
|
269
|
+
# require 'speedshop/cloudwatch/sidekiq'
|
|
270
|
+
```
|
|
271
|
+
|
|
272
|
+
## Bibliography
|
|
273
|
+
|
|
274
|
+
This library was developed with reference to and inspiration from these excellent projects:
|
|
275
|
+
|
|
276
|
+
- [sidekiq-cloudwatchmetrics](https://github.com/sj26/sidekiq-cloudwatchmetrics) - Sidekiq CloudWatch metrics integration (portions adapted, see lib/speedshop/cloudwatch/sidekiq.rb)
|
|
277
|
+
- [puma-cloudwatch](https://github.com/boltops-tools/puma-cloudwatch) - Puma CloudWatch metrics reporter
|
|
278
|
+
- [judoscale-ruby](https://github.com/judoscale/judoscale-ruby) - Autoscaling metrics collection patterns
|
data/Rakefile
ADDED
|
@@ -0,0 +1,40 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
require "bundler/gem_tasks"
|
|
4
|
+
require "rake/testtask"
|
|
5
|
+
|
|
6
|
+
require "standard/rake"
|
|
7
|
+
|
|
8
|
+
FLOG_THRESHOLD = (ENV["FLOG_THRESHOLD"] || 50).to_i
|
|
9
|
+
FLAY_THRESHOLD = (ENV["FLAY_THRESHOLD"] || 100).to_i
|
|
10
|
+
|
|
11
|
+
Rake::TestTask.new(:test) do |t|
|
|
12
|
+
t.libs << "test"
|
|
13
|
+
t.libs << "lib"
|
|
14
|
+
t.test_files = FileList["test/**/*_test.rb"]
|
|
15
|
+
end
|
|
16
|
+
|
|
17
|
+
desc "Run flog"
|
|
18
|
+
task :flog do
|
|
19
|
+
flog_output = `bundle exec flog -a lib test`
|
|
20
|
+
puts flog_output
|
|
21
|
+
method_scores = flog_output.lines.grep(/^\s+[0-9]+\.[0-9]+:.*#/).reject { |line| line.include?("main#none") }.map { |line| line.split.first.to_f }
|
|
22
|
+
max_score = method_scores.max
|
|
23
|
+
if max_score && max_score > FLOG_THRESHOLD
|
|
24
|
+
abort "flog failed: highest complexity (#{max_score}) exceeds threshold (#{FLOG_THRESHOLD})"
|
|
25
|
+
end
|
|
26
|
+
puts "flog passed (max complexity: #{max_score}, threshold: #{FLOG_THRESHOLD})"
|
|
27
|
+
end
|
|
28
|
+
|
|
29
|
+
desc "Run flay"
|
|
30
|
+
task :flay do
|
|
31
|
+
flay_output = `bundle exec flay lib test`
|
|
32
|
+
puts flay_output
|
|
33
|
+
flay_score = flay_output[/Total score.*?=\s*(\d+)/, 1]&.to_i
|
|
34
|
+
if flay_score && flay_score > FLAY_THRESHOLD
|
|
35
|
+
abort "flay failed: duplication score (#{flay_score}) exceeds threshold (#{FLAY_THRESHOLD})"
|
|
36
|
+
end
|
|
37
|
+
puts "flay passed (duplication score: #{flay_score}, threshold: #{FLAY_THRESHOLD})"
|
|
38
|
+
end
|
|
39
|
+
|
|
40
|
+
task default: [:test, :standard, :flog, :flay]
|
data/bin/console
ADDED
|
@@ -0,0 +1,11 @@
|
|
|
1
|
+
#!/usr/bin/env ruby
|
|
2
|
+
# frozen_string_literal: true
|
|
3
|
+
|
|
4
|
+
require "bundler/setup"
|
|
5
|
+
require "speedshop/cloudwatch/all"
|
|
6
|
+
|
|
7
|
+
# You can add fixtures and/or initialization code here to make experimenting
|
|
8
|
+
# with your gem easier. You can also use a different console, if you like.
|
|
9
|
+
|
|
10
|
+
require "irb"
|
|
11
|
+
IRB.start(__FILE__)
|
data/bin/setup
ADDED
|
@@ -0,0 +1,15 @@
|
|
|
1
|
+
#!/usr/bin/env bash
|
|
2
|
+
set -euo pipefail
|
|
3
|
+
IFS=$'\n\t'
|
|
4
|
+
set -vx
|
|
5
|
+
|
|
6
|
+
bundle install
|
|
7
|
+
|
|
8
|
+
# Start Redis in Docker if port 6379 is free and docker is available
|
|
9
|
+
if ! nc -z localhost 6379 2>/dev/null; then
|
|
10
|
+
if command -v docker &>/dev/null; then
|
|
11
|
+
docker run -d --name speedshop-cloudwatch-redis -p 6379:6379 redis 2>/dev/null || true
|
|
12
|
+
fi
|
|
13
|
+
fi
|
|
14
|
+
|
|
15
|
+
# Do any other automated setup that you need to do here
|
|
@@ -0,0 +1,24 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
module Speedshop
|
|
4
|
+
module Cloudwatch
|
|
5
|
+
module ActiveJob
|
|
6
|
+
def self.included(base)
|
|
7
|
+
base.around_perform :report_job_metrics
|
|
8
|
+
end
|
|
9
|
+
|
|
10
|
+
def report_job_metrics
|
|
11
|
+
begin
|
|
12
|
+
if enqueued_at
|
|
13
|
+
queue_time = Time.now.to_f - enqueued_at.to_f
|
|
14
|
+
# Drop JobClass to reduce time series cardinality and allow aggregation into StatisticSets per queue
|
|
15
|
+
Reporter.instance.report(metric: :QueueLatency, value: queue_time, dimensions: {QueueName: queue_name}, integration: :active_job)
|
|
16
|
+
end
|
|
17
|
+
rescue => e
|
|
18
|
+
Speedshop::Cloudwatch.log_error("Failed to collect ActiveJob metrics: #{e.message}", e)
|
|
19
|
+
end
|
|
20
|
+
yield
|
|
21
|
+
end
|
|
22
|
+
end
|
|
23
|
+
end
|
|
24
|
+
end
|
|
@@ -0,0 +1,8 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
require "speedshop/cloudwatch"
|
|
4
|
+
require "speedshop/cloudwatch/active_job"
|
|
5
|
+
require "speedshop/cloudwatch/puma"
|
|
6
|
+
require "speedshop/cloudwatch/rack"
|
|
7
|
+
require "speedshop/cloudwatch/sidekiq"
|
|
8
|
+
require "speedshop/cloudwatch/railtie" if defined?(Rails)
|
|
@@ -0,0 +1,61 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
require "logger"
|
|
4
|
+
require "singleton"
|
|
5
|
+
|
|
6
|
+
module Speedshop
|
|
7
|
+
module Cloudwatch
|
|
8
|
+
class Config
|
|
9
|
+
include Singleton
|
|
10
|
+
|
|
11
|
+
attr_accessor :interval, :metrics, :namespaces, :logger, :queue_max_size, :sidekiq_queues, :dimensions,
|
|
12
|
+
:collectors, :enabled_environments, :environment
|
|
13
|
+
attr_writer :client
|
|
14
|
+
|
|
15
|
+
def initialize
|
|
16
|
+
reset
|
|
17
|
+
end
|
|
18
|
+
|
|
19
|
+
def client
|
|
20
|
+
@client ||= Aws::CloudWatch::Client.new
|
|
21
|
+
end
|
|
22
|
+
|
|
23
|
+
def reset
|
|
24
|
+
@interval = 60
|
|
25
|
+
@queue_max_size = 1000
|
|
26
|
+
@client = nil
|
|
27
|
+
@metrics = {
|
|
28
|
+
puma: [],
|
|
29
|
+
sidekiq: [:QueueLatency],
|
|
30
|
+
rack: [:RequestQueueTime],
|
|
31
|
+
active_job: [:QueueLatency]
|
|
32
|
+
}
|
|
33
|
+
@namespaces = {puma: "Puma", sidekiq: "Sidekiq", rack: "Rack", active_job: "ActiveJob"}
|
|
34
|
+
@sidekiq_queues = nil
|
|
35
|
+
@dimensions = {}
|
|
36
|
+
@logger = (defined?(Rails) && Rails.respond_to?(:logger) && Rails.logger) ? Rails.logger : Logger.new($stdout)
|
|
37
|
+
@collectors = [] # [:puma, :sidekiq]
|
|
38
|
+
@enabled_environments = ["production"]
|
|
39
|
+
@environment = detect_environment
|
|
40
|
+
end
|
|
41
|
+
|
|
42
|
+
def environment_enabled?
|
|
43
|
+
enabled_environments.include?(environment)
|
|
44
|
+
end
|
|
45
|
+
|
|
46
|
+
def self.reset
|
|
47
|
+
if instance_variable_defined?(:@singleton__instance__)
|
|
48
|
+
config = instance_variable_get(:@singleton__instance__)
|
|
49
|
+
config&.reset
|
|
50
|
+
end
|
|
51
|
+
instance_variable_set(:@singleton__instance__, nil)
|
|
52
|
+
end
|
|
53
|
+
|
|
54
|
+
private
|
|
55
|
+
|
|
56
|
+
def detect_environment
|
|
57
|
+
ENV.fetch("RAILS_ENV", ENV.fetch("RACK_ENV", "development"))
|
|
58
|
+
end
|
|
59
|
+
end
|
|
60
|
+
end
|
|
61
|
+
end
|
|
@@ -0,0 +1,181 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
module Speedshop
|
|
4
|
+
module Cloudwatch
|
|
5
|
+
Metric = Struct.new(:name, :unit, :description, :source, keyword_init: true)
|
|
6
|
+
|
|
7
|
+
METRICS = {
|
|
8
|
+
puma: [
|
|
9
|
+
# Cluster-only metrics that report on the overall Puma process
|
|
10
|
+
Metric.new(
|
|
11
|
+
name: :Workers,
|
|
12
|
+
unit: "Count",
|
|
13
|
+
description: "Total number of workers (child processes) configured in " \
|
|
14
|
+
"Puma. This is a static configuration value.",
|
|
15
|
+
source: "Puma.stats_hash[:workers]"
|
|
16
|
+
),
|
|
17
|
+
Metric.new(
|
|
18
|
+
name: :BootedWorkers,
|
|
19
|
+
unit: "Count",
|
|
20
|
+
description: "Number of worker processes currently running and ready to " \
|
|
21
|
+
"handle requests. Should match Workers count under normal " \
|
|
22
|
+
"operation.",
|
|
23
|
+
source: "Puma.stats_hash[:booted_workers]"
|
|
24
|
+
),
|
|
25
|
+
Metric.new(
|
|
26
|
+
name: :OldWorkers,
|
|
27
|
+
unit: "Count",
|
|
28
|
+
description: "Number of worker processes being phased out during a " \
|
|
29
|
+
"restart. Zero during normal operation.",
|
|
30
|
+
source: "Puma.stats_hash[:old_workers]"
|
|
31
|
+
),
|
|
32
|
+
|
|
33
|
+
# Per-worker metrics (also available in single mode with WorkerIndex=0)
|
|
34
|
+
Metric.new(
|
|
35
|
+
name: :Running,
|
|
36
|
+
unit: "Count",
|
|
37
|
+
description: "Number of threads currently processing requests in this " \
|
|
38
|
+
"worker. Compare against MaxThreads to understand " \
|
|
39
|
+
"utilization.",
|
|
40
|
+
source: "worker_stats[:running]"
|
|
41
|
+
),
|
|
42
|
+
Metric.new(
|
|
43
|
+
name: :Backlog,
|
|
44
|
+
unit: "Count",
|
|
45
|
+
description: "Number of requests waiting to be processed by this worker. " \
|
|
46
|
+
"Sustained backlog suggests over capacity.",
|
|
47
|
+
source: "worker_stats[:backlog]"
|
|
48
|
+
),
|
|
49
|
+
Metric.new(
|
|
50
|
+
name: :PoolCapacity,
|
|
51
|
+
unit: "Count",
|
|
52
|
+
description: "Number of threads available to handle new requests. When " \
|
|
53
|
+
"this reaches zero, requests queue in the backlog.",
|
|
54
|
+
source: "worker_stats[:pool_capacity]"
|
|
55
|
+
),
|
|
56
|
+
Metric.new(
|
|
57
|
+
name: :MaxThreads,
|
|
58
|
+
unit: "Count",
|
|
59
|
+
description: "Maximum number of threads configured for this worker. This " \
|
|
60
|
+
"is a static configuration value.",
|
|
61
|
+
source: "worker_stats[:max_threads]"
|
|
62
|
+
)
|
|
63
|
+
],
|
|
64
|
+
|
|
65
|
+
sidekiq: [
|
|
66
|
+
# Overall Sidekiq statistics
|
|
67
|
+
Metric.new(
|
|
68
|
+
name: :EnqueuedJobs,
|
|
69
|
+
unit: "Count",
|
|
70
|
+
description: "Total number of jobs currently enqueued across all queues, waiting to be processed.",
|
|
71
|
+
source: "Sidekiq::Stats.new.enqueued"
|
|
72
|
+
),
|
|
73
|
+
Metric.new(
|
|
74
|
+
name: :ProcessedJobs,
|
|
75
|
+
unit: "Count",
|
|
76
|
+
description: "Cumulative count of all jobs successfully processed since Sidekiq started.",
|
|
77
|
+
source: "Sidekiq::Stats.new.processed"
|
|
78
|
+
),
|
|
79
|
+
Metric.new(
|
|
80
|
+
name: :FailedJobs,
|
|
81
|
+
unit: "Count",
|
|
82
|
+
description: "Cumulative count of all jobs that have failed since Sidekiq started.",
|
|
83
|
+
source: "Sidekiq::Stats.new.failed"
|
|
84
|
+
),
|
|
85
|
+
Metric.new(
|
|
86
|
+
name: :ScheduledJobs,
|
|
87
|
+
unit: "Count",
|
|
88
|
+
description: "Number of jobs scheduled to run at a future time. Will move " \
|
|
89
|
+
"to enqueued when their scheduled time arrives.",
|
|
90
|
+
source: "Sidekiq::Stats.new.scheduled_size"
|
|
91
|
+
),
|
|
92
|
+
Metric.new(
|
|
93
|
+
name: :RetryJobs,
|
|
94
|
+
unit: "Count",
|
|
95
|
+
description: "Number of jobs in the retry queue. These jobs failed but " \
|
|
96
|
+
"have not exhausted their retry attempts.",
|
|
97
|
+
source: "Sidekiq::Stats.new.retry_size"
|
|
98
|
+
),
|
|
99
|
+
Metric.new(
|
|
100
|
+
name: :DeadJobs,
|
|
101
|
+
unit: "Count",
|
|
102
|
+
description: "Number of jobs in the dead queue. These jobs exhausted all retry attempts.",
|
|
103
|
+
source: "Sidekiq::Stats.new.dead_size"
|
|
104
|
+
),
|
|
105
|
+
Metric.new(
|
|
106
|
+
name: :Workers,
|
|
107
|
+
unit: "Count",
|
|
108
|
+
description: "Total number of worker threads currently processing jobs across all Sidekiq processes.",
|
|
109
|
+
source: "Sidekiq::Stats.new.workers_size"
|
|
110
|
+
),
|
|
111
|
+
Metric.new(
|
|
112
|
+
name: :Processes,
|
|
113
|
+
unit: "Count",
|
|
114
|
+
description: "Number of Sidekiq server processes currently running.",
|
|
115
|
+
source: "Sidekiq::Stats.new.processes_size"
|
|
116
|
+
),
|
|
117
|
+
Metric.new(
|
|
118
|
+
name: :DefaultQueueLatency,
|
|
119
|
+
unit: "Seconds",
|
|
120
|
+
description: "Time the oldest job in the default queue has been waiting. Zero if the queue is empty.",
|
|
121
|
+
source: "Sidekiq::Stats.new.default_queue_latency"
|
|
122
|
+
),
|
|
123
|
+
|
|
124
|
+
# Process-level metrics
|
|
125
|
+
Metric.new(
|
|
126
|
+
name: :Capacity,
|
|
127
|
+
unit: "Count",
|
|
128
|
+
description: "Total number of worker threads available across all Sidekiq " \
|
|
129
|
+
"processes. Can be tagged by process tag.",
|
|
130
|
+
source: "Sum of process['concurrency'] across processes"
|
|
131
|
+
),
|
|
132
|
+
Metric.new(
|
|
133
|
+
name: :Utilization,
|
|
134
|
+
unit: "Percent",
|
|
135
|
+
description: "Average percentage of worker threads currently busy. 100% " \
|
|
136
|
+
"means all workers are busy. Can be reported by tag or " \
|
|
137
|
+
"hostname.",
|
|
138
|
+
source: "Average of busy / concurrency * 100 across processes"
|
|
139
|
+
),
|
|
140
|
+
|
|
141
|
+
# Queue-specific metrics
|
|
142
|
+
Metric.new(
|
|
143
|
+
name: :QueueLatency,
|
|
144
|
+
unit: "Seconds",
|
|
145
|
+
description: "Time the oldest job in this queue has been waiting. High " \
|
|
146
|
+
"latency indicates jobs are backing up.",
|
|
147
|
+
source: "Sidekiq::Queue#latency"
|
|
148
|
+
),
|
|
149
|
+
Metric.new(
|
|
150
|
+
name: :QueueSize,
|
|
151
|
+
unit: "Count",
|
|
152
|
+
description: "Number of jobs currently waiting in this queue. Growing " \
|
|
153
|
+
"size indicates jobs arriving faster than processing.",
|
|
154
|
+
source: "Sidekiq::Queue#size"
|
|
155
|
+
)
|
|
156
|
+
],
|
|
157
|
+
|
|
158
|
+
rack: [
|
|
159
|
+
Metric.new(
|
|
160
|
+
name: :RequestQueueTime,
|
|
161
|
+
unit: "Milliseconds",
|
|
162
|
+
description: "Time a request spent waiting in the reverse proxy before " \
|
|
163
|
+
"reaching the application. High values indicate requests " \
|
|
164
|
+
"backing up before reaching your application server.",
|
|
165
|
+
source: "(Time.now.to_f * 1000) - HTTP_X_REQUEST_START"
|
|
166
|
+
)
|
|
167
|
+
],
|
|
168
|
+
|
|
169
|
+
active_job: [
|
|
170
|
+
Metric.new(
|
|
171
|
+
name: :QueueLatency,
|
|
172
|
+
unit: "Seconds",
|
|
173
|
+
description: "Time a job spent waiting in the queue before execution " \
|
|
174
|
+
"started. Values are aggregated into CloudWatch " \
|
|
175
|
+
"StatisticSets per reporting interval.",
|
|
176
|
+
source: "Time.now.to_f - job.enqueued_at"
|
|
177
|
+
)
|
|
178
|
+
]
|
|
179
|
+
}.freeze
|
|
180
|
+
end
|
|
181
|
+
end
|