allgood 0.1.0 → 0.3.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/CHANGELOG.md +16 -1
- data/README.md +137 -46
- data/allgood.jpeg +0 -0
- data/allgood_skipped.webp +0 -0
- data/app/controllers/allgood/healthcheck_controller.rb +59 -7
- data/app/views/allgood/healthcheck/index.html.erb +15 -4
- data/examples/allgood.rb +216 -0
- data/lib/allgood/cache_store.rb +52 -0
- data/lib/allgood/configuration.rb +156 -2
- data/lib/allgood/engine.rb +1 -1
- data/lib/allgood/version.rb +1 -1
- data/lib/allgood.rb +1 -0
- metadata +13 -9
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 45bac1780ef5cb92516f0a6f02cd7aa7b476213d6595a937252c600dc3e870e0
|
4
|
+
data.tar.gz: b1f8f7e8e30609d67c6c28fb66cd651d13414f9161f74b3e3880f02113408c10
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: b169c7d38987605312e2e013f645814e12fa23f2b0574793fa560f9f9ec9f97491eea5347c625e8b6d0909f3ec32c912a744925906f9e66b710486eb365cf35f
|
7
|
+
data.tar.gz: 0573a0b4918ec2e6e4a0ce40831d3e57afed34db5e8ada879d0db4450764dd0e72a8ffe4b3e779bf58758cf3d29546728600f9823437cdf1618627a48648fe5a
|
data/CHANGELOG.md
CHANGED
@@ -1,4 +1,19 @@
|
|
1
|
-
## [
|
1
|
+
## [0.3.0] - 2024-10-27
|
2
|
+
|
3
|
+
- Added rate limiting for expensive checks with the `run: "N times per day/hour"` option
|
4
|
+
- Added a cache mechanism to store check results and error states, which allows for rate limiting and avoiding redundant runs when checks fail
|
5
|
+
- Added automatic cache key expiration
|
6
|
+
- Added error handling and feedback for rate-limited checks
|
7
|
+
|
8
|
+
## [0.2.0] - 2024-10-26
|
9
|
+
|
10
|
+
- Improved the `allgood` DSL by adding optional conditionals on when individual checks are run
|
11
|
+
- Allow for environment-specific checks with `only` and `except` options (`check "Test Check", only: [:development, :test]`)
|
12
|
+
- Allow for conditional checks with `if` and `unless` options, which can be procs or any other condition (`check "Test Check", if: -> { condition }`)
|
13
|
+
- Added visual indication of skipped checks in the healthcheck page
|
14
|
+
- Improved developer experience by showing why checks were skipped (didn't meet conditions, environment-specific, etc.)
|
15
|
+
- New DSL changes are fully backward compatible with the previous version (new options are optional, and checks will run normally if they are not specified), so the new version won't break existing configurations
|
16
|
+
- Changed configuration loading to happen after Rails initialization so we fix the segfault that could occur when requiring gems in the `allgood.rb` configuration file before Rails was initialized
|
2
17
|
|
3
18
|
## [0.1.0] - 2024-08-22
|
4
19
|
|
data/README.md
CHANGED
@@ -1,12 +1,22 @@
|
|
1
1
|
# ✅ Allgood - Rails gem for health checks
|
2
2
|
|
3
|
-
|
3
|
+
[![Gem Version](https://badge.fury.io/rb/allgood.svg)](https://badge.fury.io/rb/allgood)
|
4
4
|
|
5
|
-
|
5
|
+
Add quick, simple, and beautiful health checks to your Rails application via a `/healthcheck` page.
|
6
6
|
|
7
|
-
|
7
|
+
Use it for smoke testing, to make sure your app is healthy and functioning as expected.
|
8
8
|
|
9
|
-
![
|
9
|
+
![Example dashboard of the Allgood health check page](allgood.jpeg)
|
10
|
+
|
11
|
+
## How it works
|
12
|
+
|
13
|
+
`allgood` allows you to define custom health checks / smoke tests (as in: can the Rails app connect to the DB, are there any new users in the past 24 hours, are they actually using the app, etc.) in a very intuitive way that reads just like English.
|
14
|
+
|
15
|
+
It provides a `/healthcheck` endpoint that displays the results in a beautiful page.
|
16
|
+
|
17
|
+
You can then [use that endpoint to monitor the health of your application via UptimeRobot](https://uptimerobot.com/?rid=854006b5fe82e4), Pingdom, etc. These services will load your `/healthcheck` page every few minutes, so all checks will be run when UptimeRobot fetches the page.
|
18
|
+
|
19
|
+
`allgood` aims to provide developers with peace of mind by answering the question "is production okay?" at a glance.
|
10
20
|
|
11
21
|
## Installation
|
12
22
|
|
@@ -17,9 +27,10 @@ gem 'allgood'
|
|
17
27
|
|
18
28
|
Then run `bundle install`.
|
19
29
|
|
20
|
-
|
30
|
+
After installing the gem, you need to mount the `/healthcheck` route and define your health checks in a `config/allgood.rb` file.
|
21
31
|
|
22
|
-
|
32
|
+
|
33
|
+
## Mount the `/healthcheck` route
|
23
34
|
|
24
35
|
In your `config/routes.rb` file, mount the Allgood engine:
|
25
36
|
```ruby
|
@@ -28,36 +39,83 @@ mount Allgood::Engine => '/healthcheck'
|
|
28
39
|
|
29
40
|
You can now navigate to `/healthcheck` to see the health check results.
|
30
41
|
|
31
|
-
The `/healthcheck` page returns
|
42
|
+
The `/healthcheck` page returns HTTP codes:
|
43
|
+
- `200 OK` if all checks are successful
|
44
|
+
- `503 Service Unavailable` error otherwise
|
45
|
+
|
46
|
+
Services like UptimeRobot pick up these HTTP codes, which makes monitoring easy.
|
32
47
|
|
33
|
-
`allgood`
|
48
|
+
**Kamal**: `allgood` can also be used as a replacement for the default `/up` Rails action, to make [Kamal](https://github.com/basecamp/kamal) check things like if the database connection is healthy when deploying your app's containers. Just change `allgood`'s mounting route to `/up` instead of `/healthcheck`, or configure Kamal to use the `allgood` route.
|
34
49
|
|
50
|
+
> [!TIP]
|
51
|
+
> If you're using Kamal with `allgood`, container deployment will fail if any defined checks fail, [without feedback from Kamal](https://github.com/rameerez/allgood/issues/1) on what went wrong. Your containers will just not start, and you'll get a generic error message. To avoid this, you can either keep the `allgood.rb` file very minimal (e.g., only check for active DB connection, migrations up to date, etc.) so the app deployment is likely to succeed, or you can use the default `/up` route for Kamal, and then mount `allgood` on another route for more advanced business-oriented checks. What you want to avoid is your app deployment failing because of usage-dependent or business-oriented checks, like your app not deploying because it didn't get any users in the past hour, or something like that.
|
35
52
|
|
36
|
-
|
53
|
+
## Configure your health checks
|
37
54
|
|
38
|
-
Create a file `config/allgood.rb` in your Rails application. This is where you'll define your health checks:
|
55
|
+
Create a file `config/allgood.rb` in your Rails application. This is where you'll define your health checks. Here's a simple example:
|
39
56
|
```ruby
|
40
57
|
# config/allgood.rb
|
41
58
|
|
42
59
|
check "We have an active database connection" do
|
43
|
-
make_sure ActiveRecord::Base.connection.active?
|
60
|
+
make_sure ActiveRecord::Base.connection.connect!.active?
|
44
61
|
end
|
45
62
|
```
|
46
63
|
|
47
|
-
|
64
|
+
`allgood` will run all checks upon page load, and will show "Check passed" or "Check failed" next to it. That's it – add as many health checks as you want!
|
65
|
+
|
66
|
+
Here's my default `config/allgood.rb` file that should work for most Rails applications, feel free to use it as a starting point:
|
67
|
+
|
48
68
|
```ruby
|
49
|
-
|
50
|
-
|
51
|
-
|
69
|
+
# config/allgood.rb
|
70
|
+
|
71
|
+
check "We have an active database connection" do
|
72
|
+
make_sure ActiveRecord::Base.connection.connect!.active?
|
73
|
+
end
|
74
|
+
|
75
|
+
check "Database can perform a simple query" do
|
76
|
+
make_sure ActiveRecord::Base.connection.execute("SELECT 1").any?
|
77
|
+
end
|
78
|
+
|
79
|
+
check "Database migrations are up to date" do
|
80
|
+
make_sure ActiveRecord::Migration.check_all_pending! == nil
|
81
|
+
end
|
82
|
+
|
83
|
+
check "Disk space usage is below 90%" do
|
84
|
+
usage = `df -h / | tail -1 | awk '{print $5}' | sed 's/%//'`.to_i
|
85
|
+
expect(usage).to_be_less_than(90)
|
86
|
+
end
|
87
|
+
|
88
|
+
check "Memory usage is below 90%" do
|
89
|
+
usage = `free | grep Mem | awk '{print $3/$2 * 100.0}' | cut -d. -f1`.to_i
|
90
|
+
expect(usage).to_be_less_than(90)
|
52
91
|
end
|
53
92
|
```
|
54
93
|
|
55
|
-
|
94
|
+
I've also added an example [`config/allgood.rb`](examples/allgood.rb) file in the `examples` folder, with very comprehensive checks for a Rails 8+ app, that you can use as a starting point.
|
95
|
+
|
96
|
+
> [!IMPORTANT]
|
97
|
+
> Make sure you restart the Rails server (`bin/rails s`) every time you modify the `config/allgood.rb` file for the changes to apply – the `allgood` config is only loaded once when the Rails server starts.
|
98
|
+
|
99
|
+
### The `allgood` DSL
|
100
|
+
|
101
|
+
As you can see, there's a very simple DSL (Domain-Specific Language) you can use to define health checks.
|
102
|
+
|
103
|
+
It reads almost like natural English, and allows you to define powerful yet simple checks to make sure your app is healthy.
|
104
|
+
|
105
|
+
For example, you can specify a custom human-readable success / error message for each check, so you don't go crazy when things fail and you can't figure out what the check expected output was:
|
106
|
+
```ruby
|
107
|
+
check "Cache is accessible and functioning" do
|
108
|
+
Rails.cache.write('allgood_test', 'ok')
|
109
|
+
make_sure Rails.cache.read('allgood_test') == 'ok', "The `allgood_test` key in the cache should contain `'ok'`"
|
110
|
+
end
|
111
|
+
```
|
56
112
|
|
57
113
|
Other than checking for an active database connection, it's useful to check for business-oriented metrics, such as whether your app has gotten any new users in the past 24 hours (to make sure your signup flow is not broken), check whether there have been any new posts / records created recently (to make sure your users are performing the actions you'd expect them to do in your app), check for recent purchases, check for external API connections, check whether new records contain values within expected range, etc.
|
58
114
|
|
59
115
|
Some business health check examples that you'd need to adapt to the specifics of your particular app:
|
60
116
|
```ruby
|
117
|
+
# Adapt these to your app specifics
|
118
|
+
|
61
119
|
check "There's been new signups in the past 24 hours" do
|
62
120
|
count = User.where(created_at: 24.hours.ago..Time.now).count
|
63
121
|
expect(count).to_be_greater_than(0)
|
@@ -70,63 +128,96 @@ check "The last created Purchase has a valid total" do
|
|
70
128
|
end
|
71
129
|
```
|
72
130
|
|
73
|
-
|
131
|
+
### Available check methods
|
132
|
+
|
133
|
+
- `make_sure(condition, message = nil)`: Ensures that the given condition is true.
|
134
|
+
- `expect(actual).to_eq(expected)`: Checks if the actual value equals the expected value.
|
135
|
+
- `expect(actual).to_be_greater_than(expected)`: Checks if the actual value is greater than the expected value.
|
136
|
+
- `expect(actual).to_be_less_than(expected)`: Checks if the actual value is less than the expected value.
|
137
|
+
|
138
|
+
Please help us develop by adding more expectation methods in the `Expectation` class!
|
139
|
+
|
140
|
+
### Run checks only in specific environments or under certain conditions
|
141
|
+
|
142
|
+
You can also make certain checks run only in specific environments or under certain conditions. Some examples:
|
143
|
+
|
74
144
|
```ruby
|
75
|
-
|
76
|
-
|
145
|
+
# Only run in production
|
146
|
+
check "There have been new user signups in the past hour", only: :production do
|
147
|
+
make_sure User.where(created_at: 1.hour.ago..Time.now).count.positive?
|
77
148
|
end
|
78
149
|
|
79
|
-
|
80
|
-
|
150
|
+
# Run in both staging and production
|
151
|
+
check "External API is responsive", only: [:staging, :production] do
|
152
|
+
# ...
|
81
153
|
end
|
82
154
|
|
83
|
-
|
84
|
-
|
85
|
-
|
155
|
+
# Run everywhere except development
|
156
|
+
check "A SolidCable connection is active and healthy", except: :development do
|
157
|
+
# ...
|
86
158
|
end
|
87
159
|
|
88
|
-
|
89
|
-
|
90
|
-
|
160
|
+
# Using if with a direct boolean
|
161
|
+
check "Feature flag is enabled", if: ENV['FEATURE_ENABLED'] == 'true' do
|
162
|
+
# ...
|
91
163
|
end
|
92
164
|
|
93
|
-
|
94
|
-
|
95
|
-
|
165
|
+
# Using if with a Proc for more complex conditions
|
166
|
+
check "Complex condition", if: -> { User.count > 1000 && User.last.created_at < 10.minutes.ago } do
|
167
|
+
# ...
|
96
168
|
end
|
97
|
-
```
|
98
169
|
|
99
|
-
|
170
|
+
# Override default timeout (in seconds) for specific checks
|
171
|
+
# By default, each check has a timeout of 10 seconds
|
172
|
+
check "Slow external API", timeout: 30 do
|
173
|
+
# ...
|
174
|
+
end
|
100
175
|
|
101
|
-
|
176
|
+
# Combine multiple conditions
|
177
|
+
check "Complex check",
|
178
|
+
only: :production,
|
179
|
+
if: -> { User.count > 1000 },
|
180
|
+
timeout: 15 do
|
181
|
+
# ...
|
182
|
+
end
|
183
|
+
```
|
102
184
|
|
185
|
+
### Rate Limiting Expensive Checks
|
103
186
|
|
104
|
-
|
187
|
+
For expensive operations (like testing paid APIs), you can limit how often checks run:
|
105
188
|
|
106
|
-
|
107
|
-
|
108
|
-
|
109
|
-
|
189
|
+
```ruby
|
190
|
+
# Run expensive checks a limited number of times
|
191
|
+
check "OpenAI is responding with a valid LLM message", run: "2 times per day" do
|
192
|
+
# expensive API call
|
193
|
+
end
|
110
194
|
|
111
|
-
|
195
|
+
check "Analytics can be processed", run: "4 times per hour" do
|
196
|
+
# expensive operation
|
197
|
+
end
|
198
|
+
```
|
112
199
|
|
113
|
-
|
200
|
+
Important notes:
|
201
|
+
- Rate limits reset at the start of each period (hour/day)
|
202
|
+
- The error state persists between rate-limited runs
|
203
|
+
- Rate-limited checks show clear feedback about remaining runs and next reset time
|
114
204
|
|
115
|
-
|
205
|
+
When a check is skipped due to its conditions not being met, it will appear in the healthcheck page with a skip emoji (⏭️) and a clear explanation of why it was skipped.
|
116
206
|
|
117
|
-
|
207
|
+
![Example dashboard of the Allgood health check page with skipped checks](allgood_skipped.webp)
|
118
208
|
|
209
|
+
_Note: the `allgood` health check dashboard has an automatic dark mode, based on the system's appearance settings._
|
119
210
|
|
120
211
|
## Development
|
121
212
|
|
122
|
-
After checking out the repo, run `bin/setup` to install dependencies. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
|
213
|
+
After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
|
123
214
|
|
124
|
-
To install this gem onto your local machine, run `bundle exec rake install`.
|
215
|
+
To install this gem onto your local machine, run `bundle exec rake install`.
|
125
216
|
|
126
217
|
## Contributing
|
127
218
|
|
128
|
-
Bug reports and pull requests are welcome on GitHub at https://github.com/rameerez/allgood Our code of conduct is: just be nice and make your mom proud of what you do and post online.
|
219
|
+
Bug reports and pull requests are welcome on GitHub at https://github.com/rameerez/allgood. Our code of conduct is: just be nice and make your mom proud of what you do and post online.
|
129
220
|
|
130
221
|
## License
|
131
222
|
|
132
|
-
The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
|
223
|
+
The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
|
data/allgood.jpeg
CHANGED
Binary file
|
Binary file
|
@@ -29,25 +29,77 @@ module Allgood
|
|
29
29
|
|
30
30
|
def run_checks
|
31
31
|
Allgood.configuration.checks.map do |check|
|
32
|
-
|
32
|
+
if check[:status] == :skipped
|
33
|
+
{
|
34
|
+
name: check[:name],
|
35
|
+
success: true,
|
36
|
+
skipped: true,
|
37
|
+
message: check[:skip_reason],
|
38
|
+
duration: 0
|
39
|
+
}
|
40
|
+
else
|
41
|
+
run_single_check(check)
|
42
|
+
end
|
33
43
|
end
|
34
44
|
end
|
35
45
|
|
36
46
|
def run_single_check(check)
|
47
|
+
last_result_key = "allgood:last_result:#{check[:name].parameterize}"
|
48
|
+
last_result = Allgood::CacheStore.instance.read(last_result_key)
|
49
|
+
|
50
|
+
unless Allgood.configuration.should_run_check?(check)
|
51
|
+
message = check[:skip_reason]
|
52
|
+
if last_result
|
53
|
+
status_info = "Last check #{last_result[:success] ? 'passed' : 'failed'} #{time_ago_in_words(last_result[:time])} ago: #{last_result[:message]}"
|
54
|
+
message = "#{message}. #{status_info}"
|
55
|
+
end
|
56
|
+
|
57
|
+
return {
|
58
|
+
name: check[:name],
|
59
|
+
success: last_result ? last_result[:success] : true,
|
60
|
+
skipped: true,
|
61
|
+
message: message,
|
62
|
+
duration: 0
|
63
|
+
}
|
64
|
+
end
|
65
|
+
|
37
66
|
start_time = Time.now
|
38
67
|
result = { success: false, message: "Check timed out after #{check[:timeout]} seconds" }
|
68
|
+
error_key = "allgood:error:#{check[:name].parameterize}"
|
39
69
|
|
40
70
|
begin
|
41
71
|
Timeout.timeout(check[:timeout]) do
|
42
72
|
check_result = Allgood.configuration.run_check(&check[:block])
|
43
73
|
result = { success: check_result[:success], message: check_result[:message] }
|
74
|
+
|
75
|
+
if result[:success]
|
76
|
+
# Clear error state and store successful result
|
77
|
+
Allgood::CacheStore.instance.write(error_key, nil)
|
78
|
+
Allgood::CacheStore.instance.write(last_result_key, {
|
79
|
+
success: true,
|
80
|
+
message: result[:message],
|
81
|
+
time: Time.current
|
82
|
+
})
|
83
|
+
end
|
44
84
|
end
|
45
|
-
rescue Timeout::Error
|
46
|
-
|
47
|
-
|
48
|
-
|
49
|
-
|
50
|
-
|
85
|
+
rescue Timeout::Error, Allgood::CheckFailedError, StandardError => e
|
86
|
+
error_message = case e
|
87
|
+
when Timeout::Error
|
88
|
+
"Check timed out after #{check[:timeout]} seconds"
|
89
|
+
when Allgood::CheckFailedError
|
90
|
+
e.message
|
91
|
+
else
|
92
|
+
"Error: #{e.message}"
|
93
|
+
end
|
94
|
+
|
95
|
+
# Store error state and failed result
|
96
|
+
Allgood::CacheStore.instance.write(error_key, error_message)
|
97
|
+
Allgood::CacheStore.instance.write(last_result_key, {
|
98
|
+
success: false,
|
99
|
+
message: error_message,
|
100
|
+
time: Time.current
|
101
|
+
})
|
102
|
+
result = { success: false, message: error_message }
|
51
103
|
end
|
52
104
|
|
53
105
|
{
|
@@ -14,6 +14,10 @@
|
|
14
14
|
.check {
|
15
15
|
margin: 0.5em 0;
|
16
16
|
}
|
17
|
+
|
18
|
+
.skipped {
|
19
|
+
opacity: 0.6;
|
20
|
+
}
|
17
21
|
</style>
|
18
22
|
|
19
23
|
<header>
|
@@ -22,11 +26,18 @@
|
|
22
26
|
|
23
27
|
<% if @results.any? %>
|
24
28
|
<% @results.each do |result| %>
|
25
|
-
<div class="check">
|
26
|
-
|
27
|
-
|
29
|
+
<div class="check <%= 'skipped' if result[:skipped] %>">
|
30
|
+
<% if result[:skipped] %>
|
31
|
+
⏭️
|
32
|
+
<% else %>
|
33
|
+
<%= result[:success] ? "✅" : "❌" %>
|
34
|
+
<% end %>
|
35
|
+
<b><%= result[:name] %></b>: <i><%= result[:message] %></i>
|
36
|
+
<% unless result[:skipped] %>
|
37
|
+
<code>[<%= result[:duration] %>ms]</code>
|
38
|
+
<% end %>
|
28
39
|
</div>
|
29
40
|
<% end %>
|
30
41
|
<% else %>
|
31
42
|
<p>No health checks were run. Please check your configuration.</p>
|
32
|
-
<% end %>
|
43
|
+
<% end %>
|
data/examples/allgood.rb
ADDED
@@ -0,0 +1,216 @@
|
|
1
|
+
require 'open-uri'
|
2
|
+
TEST_IMAGE = URI.open("https://picsum.photos/id/237/536/354").read
|
3
|
+
|
4
|
+
# --- ACTIVE RECORD ---
|
5
|
+
|
6
|
+
check "We have an active database connection" do
|
7
|
+
make_sure ActiveRecord::Base.connection.connect!.active?
|
8
|
+
end
|
9
|
+
|
10
|
+
check "The database can perform a simple query" do
|
11
|
+
make_sure ActiveRecord::Base.connection.execute("SELECT 1 LIMIT 1").any?
|
12
|
+
end
|
13
|
+
|
14
|
+
check "The database can perform writes" do
|
15
|
+
table_name = "allgood_health_check_#{Time.now.to_i}"
|
16
|
+
random_id = rand(1..999999)
|
17
|
+
|
18
|
+
result = ActiveRecord::Base.connection.execute(<<~SQL)
|
19
|
+
DROP TABLE IF EXISTS #{table_name};
|
20
|
+
CREATE TEMPORARY TABLE #{table_name} (id integer);
|
21
|
+
INSERT INTO #{table_name} (id) VALUES (#{random_id});
|
22
|
+
SELECT id FROM #{table_name} LIMIT 1;
|
23
|
+
SQL
|
24
|
+
|
25
|
+
ActiveRecord::Base.connection.execute("DROP TABLE #{table_name}")
|
26
|
+
|
27
|
+
make_sure result.present? && result.first["id"] == random_id, "Able to write to temporary table"
|
28
|
+
end
|
29
|
+
|
30
|
+
check "The database connection pool is healthy" do
|
31
|
+
pool = ActiveRecord::Base.connection_pool
|
32
|
+
|
33
|
+
used_connections = pool.connections.count
|
34
|
+
max_connections = pool.size
|
35
|
+
usage_percentage = (used_connections.to_f / max_connections * 100).round
|
36
|
+
|
37
|
+
make_sure usage_percentage < 90, "Pool usage at #{usage_percentage}% (#{used_connections}/#{max_connections})"
|
38
|
+
end
|
39
|
+
|
40
|
+
check "Database migrations are up to date" do
|
41
|
+
make_sure ActiveRecord::Migration.check_all_pending! == nil
|
42
|
+
end
|
43
|
+
|
44
|
+
# --- IMAGE PROCESSING ---
|
45
|
+
|
46
|
+
check "Vips (libvips) is installed on Linux", except: :development do
|
47
|
+
output = `ldconfig -p | grep libvips`
|
48
|
+
make_sure output.present? && output.include?("libvips.so") && output.include?("libvips-cpp.so"), "libvips is found in the Linux system's library cache"
|
49
|
+
end
|
50
|
+
|
51
|
+
check "Vips is available to Rails" do
|
52
|
+
throw "ImageProcessing::Vips is not available" if !ImageProcessing::Vips.present? # Need this line to load `Vips`
|
53
|
+
|
54
|
+
make_sure Vips::VERSION.present?, "Vips available with version #{Vips::VERSION}"
|
55
|
+
end
|
56
|
+
|
57
|
+
check "Vips can perform operations on images" do
|
58
|
+
throw "ImageProcessing::Vips is not available" if !ImageProcessing::Vips.present? # Need this line to load `Vips`
|
59
|
+
|
60
|
+
image = Vips::Image.new_from_buffer(TEST_IMAGE, "")
|
61
|
+
processed_image = image
|
62
|
+
.gaussblur(10) # Apply Gaussian blur with sigma 10
|
63
|
+
.linear([1.2], [0]) # Increase brightness
|
64
|
+
.invert # Invert colors for a wild effect
|
65
|
+
.sharpen # Apply sharpening
|
66
|
+
.resize(0.5)
|
67
|
+
|
68
|
+
make_sure processed_image.present? && processed_image.width == 268 && processed_image.height == 177, "If we input an image of 536x354px, and we apply filters and a 0.5 resize, we should get an image of 268x177px"
|
69
|
+
end
|
70
|
+
|
71
|
+
check "ImageProcessing::Vips is available to Rails" do
|
72
|
+
make_sure ImageProcessing::Vips.present?
|
73
|
+
end
|
74
|
+
|
75
|
+
check "ImageProcessing can perform operations on images" do
|
76
|
+
image_processing_image = ImageProcessing::Vips
|
77
|
+
.source(Vips::Image.new_from_buffer(TEST_IMAGE, ""))
|
78
|
+
.resize_to_limit(123, 123) # Resize to fit within 500x500
|
79
|
+
.convert("webp") # Convert to webp format
|
80
|
+
.call
|
81
|
+
processed_image = Vips::Image.new_from_file(image_processing_image.path)
|
82
|
+
|
83
|
+
make_sure processed_image.present? && processed_image.width == 123 && processed_image.get("vips-loader") == "webpload", "ImageProcessing can resize and convert to webp"
|
84
|
+
end
|
85
|
+
|
86
|
+
# --- ACTIVE STORAGE ---
|
87
|
+
|
88
|
+
check "Active Storage is available to Rails" do
|
89
|
+
make_sure ActiveStorage.present?
|
90
|
+
end
|
91
|
+
|
92
|
+
check "Active Storage tables are present in the database" do
|
93
|
+
make_sure ActiveRecord::Base.connection.table_exists?("active_storage_attachments") && ActiveRecord::Base.connection.table_exists?("active_storage_blobs")
|
94
|
+
end
|
95
|
+
|
96
|
+
check "Active Storage has a valid client configured" do
|
97
|
+
service = ActiveStorage::Blob.service
|
98
|
+
service_name = service&.class&.name&.split("::")&.last&.split("Service")&.first
|
99
|
+
|
100
|
+
if !service_name.downcase.include?("disk")
|
101
|
+
make_sure service.present? && service.respond_to?(:client) && service.client.present?, "Active Storage service has a valid #{service_name} client configured"
|
102
|
+
else
|
103
|
+
make_sure !Rails.env.production? && service.present?, "Active Storage using #{service_name} service in #{Rails.env.to_s}"
|
104
|
+
end
|
105
|
+
end
|
106
|
+
|
107
|
+
check "ActiveStorage can store images, retrieve them, and purge them" do
|
108
|
+
blob = ActiveStorage::Blob.create_and_upload!(io: StringIO.new(TEST_IMAGE), filename: "allgood-test-image-#{Time.now.to_i}.jpg", content_type: "image/jpeg")
|
109
|
+
blob_key = blob.key
|
110
|
+
make_sure blob.persisted? && blob.service.exist?(blob_key)
|
111
|
+
blob.purge
|
112
|
+
make_sure !blob.service.exist?(blob_key), "Image needs to be successfully stored, retrieved, and purged from #{ActiveStorage::Blob.service.name} (#{ActiveStorage::Blob.service.class.name})"
|
113
|
+
end
|
114
|
+
|
115
|
+
# --- CACHE ---
|
116
|
+
|
117
|
+
check "Cache is accessible and functioning" do
|
118
|
+
cache_value = "allgood_#{Time.now.to_i}"
|
119
|
+
Rails.cache.write("allgood_health_check_test", cache_value, expires_in: 1.minute)
|
120
|
+
make_sure Rails.cache.read("allgood_health_check_test") == cache_value, "The `allgood_health_check_test` key in the cache should return the string `#{cache_value}`"
|
121
|
+
end
|
122
|
+
|
123
|
+
# --- SOLID QUEUE ---
|
124
|
+
|
125
|
+
check "SolidQueue is available to Rails" do
|
126
|
+
make_sure SolidQueue.present?
|
127
|
+
end
|
128
|
+
|
129
|
+
check "We have an active SolidQueue connection to the database" do
|
130
|
+
make_sure SolidQueue::Job.connection.connect!.active?
|
131
|
+
end
|
132
|
+
|
133
|
+
check "SolidQueue tables are present in the database" do
|
134
|
+
make_sure SolidQueue::Job.connection.table_exists?("solid_queue_jobs") && SolidQueue::Job.connection.table_exists?("solid_queue_failed_executions") && SolidQueue::Job.connection.table_exists?("solid_queue_semaphores")
|
135
|
+
end
|
136
|
+
|
137
|
+
check "The percentage of failed jobs in the last 24 hours is less than 1%", only: :production do
|
138
|
+
failed_jobs = SolidQueue::FailedExecution.where(created_at: 24.hours.ago..Time.now).count
|
139
|
+
all_jobs = SolidQueue::Job.where(created_at: 24.hours.ago..Time.now).count
|
140
|
+
|
141
|
+
if all_jobs > 10
|
142
|
+
percentage = all_jobs > 0 ? (failed_jobs.to_f / all_jobs.to_f * 100) : 0
|
143
|
+
make_sure percentage < 1, "#{percentage.round(2)}% of jobs are failing"
|
144
|
+
else
|
145
|
+
make_sure true, "Not enough jobs to calculate meaningful failure rate (only #{all_jobs} jobs in last 24h)"
|
146
|
+
end
|
147
|
+
end
|
148
|
+
|
149
|
+
# --- ACTION CABLE ---
|
150
|
+
|
151
|
+
check "ActionCable is configured and running" do
|
152
|
+
make_sure ActionCable.server.present?, "ActionCable server should be running"
|
153
|
+
end
|
154
|
+
|
155
|
+
check "ActionCable is configured to accept connections with a valid adapter" do
|
156
|
+
make_sure ActionCable.server.config.allow_same_origin_as_host, "ActionCable server should be configured to accept connections"
|
157
|
+
|
158
|
+
adapter = ActionCable.server.config.cable["adapter"]
|
159
|
+
|
160
|
+
if Rails.env.production?
|
161
|
+
make_sure adapter.in?(["solid_cable", "redis"]), "ActionCable running #{adapter} adapter in #{Rails.env.to_s}"
|
162
|
+
else
|
163
|
+
make_sure adapter.in?(["solid_cable", "async"]), "ActionCable running #{adapter} adapter in #{Rails.env.to_s}"
|
164
|
+
end
|
165
|
+
end
|
166
|
+
|
167
|
+
check "ActionCable can broadcast messages and store them in SolidCable" do
|
168
|
+
test_message = "allgood_test_#{Time.now.to_i}"
|
169
|
+
|
170
|
+
begin
|
171
|
+
ActionCable.server.broadcast("allgood_test_channel", { message: test_message })
|
172
|
+
|
173
|
+
# Verify message was stored in SolidCable
|
174
|
+
message = SolidCable::Message.where(channel: "allgood_test_channel")
|
175
|
+
.order(created_at: :desc)
|
176
|
+
.first
|
177
|
+
|
178
|
+
make_sure message.present?, "Message should be stored in SolidCable"
|
179
|
+
make_sure message.payload.include?(test_message) && message.destroy, "Message payload should contain our test message"
|
180
|
+
rescue => e
|
181
|
+
make_sure false, "Failed to broadcast/verify message: #{e.message}"
|
182
|
+
end
|
183
|
+
end
|
184
|
+
|
185
|
+
# --- SYSTEM ---
|
186
|
+
|
187
|
+
check "Disk space usage is below 90%", only: :production do
|
188
|
+
usage = `df -h / | tail -1 | awk '{print $5}' | sed 's/%//'`.to_i
|
189
|
+
expect(usage).to_be_less_than(90)
|
190
|
+
end
|
191
|
+
|
192
|
+
check "Memory usage is below 90%", only: :production do
|
193
|
+
usage = `free | grep Mem | awk '{print $3/$2 * 100.0}' | cut -d. -f1`.to_i
|
194
|
+
expect(usage).to_be_less_than(90)
|
195
|
+
end
|
196
|
+
|
197
|
+
# --- SITEMAP ---
|
198
|
+
|
199
|
+
check "The sitemap generator is available" do
|
200
|
+
make_sure SitemapGenerator.present?
|
201
|
+
end
|
202
|
+
|
203
|
+
check "sitemap.xml.gz exists", only: :production do
|
204
|
+
make_sure File.exist?(Rails.public_path.join("sitemap.xml.gz"))
|
205
|
+
end
|
206
|
+
|
207
|
+
|
208
|
+
# --- USAGE-DEPENDENT CHECKS ---
|
209
|
+
|
210
|
+
check "SolidQueue has processed jobs in the last 24 hours", only: :production do
|
211
|
+
make_sure SolidQueue::Job.where(created_at: 24.hours.ago..Time.now).order(created_at: :desc).limit(1).any?
|
212
|
+
end
|
213
|
+
|
214
|
+
# --- PAY / STRIPE ---
|
215
|
+
|
216
|
+
# TODO: no error webhooks in the past 24 hours, new sales in the past few hours, etc.
|
@@ -0,0 +1,52 @@
|
|
1
|
+
# frozen_string_literal: true
|
2
|
+
|
3
|
+
module Allgood
|
4
|
+
class CacheStore
|
5
|
+
def self.instance
|
6
|
+
@instance ||= new
|
7
|
+
end
|
8
|
+
|
9
|
+
def initialize
|
10
|
+
@memory_store = {}
|
11
|
+
end
|
12
|
+
|
13
|
+
def read(key)
|
14
|
+
if rails_cache_available?
|
15
|
+
Rails.cache.read(key)
|
16
|
+
else
|
17
|
+
@memory_store[key]
|
18
|
+
end
|
19
|
+
end
|
20
|
+
|
21
|
+
def write(key, value)
|
22
|
+
if rails_cache_available?
|
23
|
+
expiry = key.include?('day') ? 1.day : 1.hour
|
24
|
+
Rails.cache.write(key, value, expires_in: expiry)
|
25
|
+
else
|
26
|
+
@memory_store[key] = value
|
27
|
+
end
|
28
|
+
end
|
29
|
+
|
30
|
+
def cleanup_old_keys
|
31
|
+
return unless rails_cache_available?
|
32
|
+
|
33
|
+
keys_pattern = "allgood:*"
|
34
|
+
if Rails.cache.respond_to?(:delete_matched)
|
35
|
+
Rails.cache.delete_matched("#{keys_pattern}:*:#{(Time.current - 2.days).strftime('%Y-%m-%d')}*")
|
36
|
+
end
|
37
|
+
rescue StandardError => e
|
38
|
+
Rails.logger.warn "Allgood: Failed to cleanup old cache keys: #{e.message}"
|
39
|
+
end
|
40
|
+
|
41
|
+
private
|
42
|
+
|
43
|
+
def rails_cache_available?
|
44
|
+
Rails.cache && Rails.cache.respond_to?(:read) && Rails.cache.respond_to?(:write) &&
|
45
|
+
Rails.cache.write("allgood_rails_cache_test_ok", "true") &&
|
46
|
+
Rails.cache.read("allgood_rails_cache_test_ok") == "true"
|
47
|
+
rescue StandardError => e
|
48
|
+
Rails.logger.warn "Allgood: Rails.cache not available (#{e.message}), falling back to memory store"
|
49
|
+
false
|
50
|
+
end
|
51
|
+
end
|
52
|
+
end
|
@@ -8,13 +8,167 @@ module Allgood
|
|
8
8
|
@default_timeout = 10 # Default timeout of 10 seconds
|
9
9
|
end
|
10
10
|
|
11
|
-
def check(name, &block)
|
12
|
-
|
11
|
+
def check(name, **options, &block)
|
12
|
+
check_info = {
|
13
|
+
name: name,
|
14
|
+
block: block,
|
15
|
+
timeout: options[:timeout] || @default_timeout,
|
16
|
+
options: options,
|
17
|
+
status: :pending
|
18
|
+
}
|
19
|
+
|
20
|
+
# Handle rate limiting
|
21
|
+
if options[:run]
|
22
|
+
begin
|
23
|
+
check_info[:rate] = parse_run_frequency(options[:run])
|
24
|
+
rescue ArgumentError => e
|
25
|
+
check_info[:status] = :skipped
|
26
|
+
check_info[:skip_reason] = "Invalid run frequency: #{e.message}"
|
27
|
+
@checks << check_info
|
28
|
+
return
|
29
|
+
end
|
30
|
+
end
|
31
|
+
|
32
|
+
# Handle environment-specific options
|
33
|
+
if options[:only]
|
34
|
+
environments = Array(options[:only])
|
35
|
+
unless environments.include?(Rails.env.to_sym)
|
36
|
+
check_info[:status] = :skipped
|
37
|
+
check_info[:skip_reason] = "Only runs in #{environments.join(', ')}"
|
38
|
+
@checks << check_info
|
39
|
+
return
|
40
|
+
end
|
41
|
+
end
|
42
|
+
|
43
|
+
if options[:except]
|
44
|
+
environments = Array(options[:except])
|
45
|
+
if environments.include?(Rails.env.to_sym)
|
46
|
+
check_info[:status] = :skipped
|
47
|
+
check_info[:skip_reason] = "This check doesn't run in #{environments.join(', ')}"
|
48
|
+
@checks << check_info
|
49
|
+
return
|
50
|
+
end
|
51
|
+
end
|
52
|
+
|
53
|
+
# Handle conditional checks
|
54
|
+
if options[:if]
|
55
|
+
condition = options[:if]
|
56
|
+
unless condition.is_a?(Proc) ? condition.call : condition
|
57
|
+
check_info[:status] = :skipped
|
58
|
+
check_info[:skip_reason] = "Check condition not met"
|
59
|
+
@checks << check_info
|
60
|
+
return
|
61
|
+
end
|
62
|
+
end
|
63
|
+
|
64
|
+
if options[:unless]
|
65
|
+
condition = options[:unless]
|
66
|
+
if condition.is_a?(Proc) ? condition.call : condition
|
67
|
+
check_info[:status] = :skipped
|
68
|
+
check_info[:skip_reason] = "Check `unless` condition met"
|
69
|
+
@checks << check_info
|
70
|
+
return
|
71
|
+
end
|
72
|
+
end
|
73
|
+
|
74
|
+
check_info[:status] = :active
|
75
|
+
@checks << check_info
|
13
76
|
end
|
14
77
|
|
15
78
|
def run_check(&block)
|
16
79
|
CheckRunner.new.instance_eval(&block)
|
17
80
|
end
|
81
|
+
|
82
|
+
def should_run_check?(check)
|
83
|
+
return true unless check[:rate]
|
84
|
+
|
85
|
+
cache_key = "allgood:last_run:#{check[:name].parameterize}"
|
86
|
+
runs_key = "allgood:runs_count:#{check[:name].parameterize}:#{current_period(check[:rate])}"
|
87
|
+
error_key = "allgood:error:#{check[:name].parameterize}"
|
88
|
+
last_result_key = "allgood:last_result:#{check[:name].parameterize}"
|
89
|
+
|
90
|
+
last_run = Allgood::CacheStore.instance.read(cache_key)
|
91
|
+
period_runs = Allgood::CacheStore.instance.read(runs_key).to_i
|
92
|
+
last_result = Allgood::CacheStore.instance.read(last_result_key)
|
93
|
+
|
94
|
+
current_period_key = current_period(check[:rate])
|
95
|
+
stored_period = Allgood::CacheStore.instance.read("allgood:current_period:#{check[:name].parameterize}")
|
96
|
+
|
97
|
+
# If we're in a new period, reset the counter
|
98
|
+
if stored_period != current_period_key
|
99
|
+
period_runs = 0
|
100
|
+
Allgood::CacheStore.instance.write("allgood:current_period:#{check[:name].parameterize}", current_period_key)
|
101
|
+
Allgood::CacheStore.instance.write(runs_key, 0)
|
102
|
+
end
|
103
|
+
|
104
|
+
# If there's an error, wait until next period
|
105
|
+
if previous_error = Allgood::CacheStore.instance.read(error_key)
|
106
|
+
next_period = next_period_start(check[:rate])
|
107
|
+
rate_info = "Rate limited (#{period_runs}/#{check[:rate][:max_runs]} runs this #{check[:rate][:period]})"
|
108
|
+
check[:skip_reason] = "#{rate_info}. Waiting until #{next_period.strftime('%H:%M:%S %Z')} to retry failed check"
|
109
|
+
return false
|
110
|
+
end
|
111
|
+
|
112
|
+
# If we haven't exceeded the max runs for this period
|
113
|
+
if period_runs < check[:rate][:max_runs]
|
114
|
+
Allgood::CacheStore.instance.write(cache_key, Time.current)
|
115
|
+
Allgood::CacheStore.instance.write(runs_key, period_runs + 1)
|
116
|
+
true
|
117
|
+
else
|
118
|
+
next_period = next_period_start(check[:rate])
|
119
|
+
rate_info = "Rate limited (#{period_runs}/#{check[:rate][:max_runs]} runs this #{check[:rate][:period]})"
|
120
|
+
next_run = "Next check at #{next_period.strftime('%H:%M:%S %Z')}"
|
121
|
+
check[:skip_reason] = "#{rate_info}. #{next_run}"
|
122
|
+
false
|
123
|
+
end
|
124
|
+
end
|
125
|
+
|
126
|
+
private
|
127
|
+
|
128
|
+
def parse_run_frequency(frequency)
|
129
|
+
case frequency.to_s.downcase
|
130
|
+
when /(\d+)\s+times?\s+per\s+(day|hour)/i
|
131
|
+
max_runs, period = $1.to_i, $2
|
132
|
+
if max_runs <= 0
|
133
|
+
raise ArgumentError, "Number of runs must be positive"
|
134
|
+
end
|
135
|
+
if max_runs > 1000
|
136
|
+
raise ArgumentError, "Maximum 1000 runs per period allowed"
|
137
|
+
end
|
138
|
+
{ max_runs: max_runs, period: period }
|
139
|
+
else
|
140
|
+
raise ArgumentError, "Unsupported frequency format. Use 'N times per day' or 'N times per hour'"
|
141
|
+
end
|
142
|
+
end
|
143
|
+
|
144
|
+
def current_period(rate)
|
145
|
+
case rate[:period]
|
146
|
+
when 'day'
|
147
|
+
Time.current.strftime('%Y-%m-%d')
|
148
|
+
when 'hour'
|
149
|
+
Time.current.strftime('%Y-%m-%d-%H')
|
150
|
+
end
|
151
|
+
end
|
152
|
+
|
153
|
+
def new_period?(last_run, rate)
|
154
|
+
case rate[:period]
|
155
|
+
when 'day'
|
156
|
+
!last_run.to_date.equal?(Time.current.to_date)
|
157
|
+
when 'hour'
|
158
|
+
last_run.strftime('%Y-%m-%d-%H') != Time.current.strftime('%Y-%m-%d-%H')
|
159
|
+
end
|
160
|
+
end
|
161
|
+
|
162
|
+
def next_period_start(rate)
|
163
|
+
case rate[:period]
|
164
|
+
when 'day'
|
165
|
+
Time.current.beginning_of_day + 1.day
|
166
|
+
when 'hour'
|
167
|
+
Time.current.beginning_of_hour + 1.hour
|
168
|
+
else
|
169
|
+
raise ArgumentError, "Unsupported period: #{rate[:period]}"
|
170
|
+
end
|
171
|
+
end
|
18
172
|
end
|
19
173
|
|
20
174
|
class CheckRunner
|
data/lib/allgood/engine.rb
CHANGED
@@ -2,7 +2,7 @@ module Allgood
|
|
2
2
|
class Engine < ::Rails::Engine
|
3
3
|
isolate_namespace Allgood
|
4
4
|
|
5
|
-
|
5
|
+
config.after_initialize do
|
6
6
|
config_file = Rails.root.join("config", "allgood.rb")
|
7
7
|
if File.exist?(config_file)
|
8
8
|
Allgood.configure do |config|
|
data/lib/allgood/version.rb
CHANGED
data/lib/allgood.rb
CHANGED
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: allgood
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.3.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- rameerez
|
8
8
|
autorequire:
|
9
9
|
bindir: exe
|
10
10
|
cert_chain: []
|
11
|
-
date: 2024-
|
11
|
+
date: 2024-11-13 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: rails
|
@@ -24,12 +24,13 @@ dependencies:
|
|
24
24
|
- - ">="
|
25
25
|
- !ruby/object:Gem::Version
|
26
26
|
version: 6.0.0
|
27
|
-
description: 'Define custom
|
28
|
-
|
29
|
-
|
30
|
-
services.'
|
27
|
+
description: 'Define custom health checks for your app (as in: are there any new users
|
28
|
+
in the past 24 hours) and see the results in a simple /healthcheck page that you
|
29
|
+
can use to monitor your production app with UptimeRobot, Pingdom, or other monitoring
|
30
|
+
services. It''s also useful as a drop-in replacement for the default `/up` health
|
31
|
+
check endpoint for Kamal deployments.'
|
31
32
|
email:
|
32
|
-
-
|
33
|
+
- rubygems@rameerez.com
|
33
34
|
executables: []
|
34
35
|
extensions: []
|
35
36
|
extra_rdoc_files: []
|
@@ -39,12 +40,15 @@ files:
|
|
39
40
|
- README.md
|
40
41
|
- Rakefile
|
41
42
|
- allgood.jpeg
|
43
|
+
- allgood_skipped.webp
|
42
44
|
- app/controllers/allgood/base_controller.rb
|
43
45
|
- app/controllers/allgood/healthcheck_controller.rb
|
44
46
|
- app/views/allgood/healthcheck/index.html.erb
|
45
47
|
- app/views/layouts/allgood/application.html.erb
|
46
48
|
- config/routes.rb
|
49
|
+
- examples/allgood.rb
|
47
50
|
- lib/allgood.rb
|
51
|
+
- lib/allgood/cache_store.rb
|
48
52
|
- lib/allgood/configuration.rb
|
49
53
|
- lib/allgood/engine.rb
|
50
54
|
- lib/allgood/version.rb
|
@@ -56,7 +60,7 @@ metadata:
|
|
56
60
|
allowed_push_host: https://rubygems.org
|
57
61
|
homepage_uri: https://github.com/rameerez/allgood
|
58
62
|
source_code_uri: https://github.com/rameerez/allgood
|
59
|
-
changelog_uri: https://github.com/rameerez/allgood
|
63
|
+
changelog_uri: https://github.com/rameerez/allgood/blob/main/CHANGELOG.md
|
60
64
|
post_install_message:
|
61
65
|
rdoc_options: []
|
62
66
|
require_paths:
|
@@ -72,7 +76,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
72
76
|
- !ruby/object:Gem::Version
|
73
77
|
version: '0'
|
74
78
|
requirements: []
|
75
|
-
rubygems_version: 3.5.
|
79
|
+
rubygems_version: 3.5.22
|
76
80
|
signing_key:
|
77
81
|
specification_version: 4
|
78
82
|
summary: Add quick, simple, and beautiful health checks to your Rails application.
|