rails_health_checks 1.1.0 → 1.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/README.md +168 -10
- data/app/controllers/rails_health_checks/live_controller.rb +3 -6
- data/app/controllers/rails_health_checks/ready_controller.rb +14 -0
- data/config/routes.rb +7 -4
- data/lib/generators/rails_health_checks/templates/initializer.rb +7 -0
- data/lib/rails_health_checks/configuration.rb +3 -1
- data/lib/rails_health_checks/version.rb +1 -1
- metadata +2 -1
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 9033111f01524e17546525feaba1a215ac6e3cad1c7c8b0b12dd527850c74e7f
|
|
4
|
+
data.tar.gz: 3eded20b26844b046d33689f6a5512efa1a7c7964f04679552d00020d9a7aa78
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 7a8373f504fde389fd3e0202f3a08a1ea9e5a6af15dca997f8270e8dfcbca2cf5155a5f57a490392c6df1d3bc65f15a452b1c64ba3f96650b214007be5506623
|
|
7
|
+
data.tar.gz: 3f58fb48bd0a1a416e078f3e3bae0bb70e9e220c1cb93eaae25b2ec1c7710d56c9965e965fffa33f6d3922c132a120f8e0b554c18358d10eaa0787cebad0b778
|
data/README.md
CHANGED
|
@@ -11,6 +11,7 @@ A Rails engine that adds production-grade health check endpoints to any Rails ap
|
|
|
11
11
|
**Built-in checks:** database · cache · Redis · SMTP · Sidekiq · SolidQueue · GoodJob · Resque · disk · memory · HTTP
|
|
12
12
|
|
|
13
13
|
**Key features:**
|
|
14
|
+
- **Two-tier endpoints:** `/live` (liveness — process only) and `/ready` (readiness — all deps) prevent cascade failures in Kubernetes and behind load balancers
|
|
14
15
|
- Parallel check execution via `Concurrent::Future` — response time bounded by the slowest check, not the sum
|
|
15
16
|
- Result caching (`config.cache_duration`) to absorb high-frequency probe traffic
|
|
16
17
|
- Prometheus text exposition at `GET /health/metrics` (always HTTP 200)
|
|
@@ -21,9 +22,14 @@ A Rails engine that adds production-grade health check endpoints to any Rails ap
|
|
|
21
22
|
|
|
22
23
|
## Table of Contents
|
|
23
24
|
|
|
25
|
+
- [Upgrading](#upgrading)
|
|
24
26
|
- [Installation](#installation)
|
|
25
27
|
- [Rack Applications](#rack-applications)
|
|
26
28
|
- [Endpoints](#endpoints)
|
|
29
|
+
- [Liveness vs. Readiness](#liveness-vs-readiness--why-two-tiers)
|
|
30
|
+
- [Kubernetes wiring](#kubernetes-wiring)
|
|
31
|
+
- [Load balancer wiring](#load-balancer-wiring)
|
|
32
|
+
- [Configuring endpoint paths](#configuring-endpoint-paths)
|
|
27
33
|
- [Configuration](#configuration)
|
|
28
34
|
- [Configuration Reference](#configuration-reference)
|
|
29
35
|
- [Authentication](#authentication)
|
|
@@ -43,6 +49,30 @@ A Rails engine that adds production-grade health check endpoints to any Rails ap
|
|
|
43
49
|
|
|
44
50
|
---
|
|
45
51
|
|
|
52
|
+
## Upgrading
|
|
53
|
+
|
|
54
|
+
### v1.1.x → v1.2.x — breaking change to `/live`
|
|
55
|
+
|
|
56
|
+
> **`GET /health/live` no longer runs dependency checks.**
|
|
57
|
+
|
|
58
|
+
Prior to v1.2.0, `/live` ran all configured checks (database, Redis, etc.) and returned `503` if any failed. This was readiness behaviour under a liveness name and is the root cause of the cascade failure footgun described below.
|
|
59
|
+
|
|
60
|
+
**What changed:** `/live` now returns `200 OK` whenever the Ruby process is alive, regardless of dependency state. Authentication is also skipped on this endpoint so Kubernetes and load balancer probes work without credentials.
|
|
61
|
+
|
|
62
|
+
**What to do:** If you were relying on `/live` to verify dependencies, switch to the new `/health/ready` endpoint. No configuration changes required.
|
|
63
|
+
|
|
64
|
+
```
|
|
65
|
+
# Before (was running dependency checks — now only liveness)
|
|
66
|
+
GET /health/live → 200 if process alive (deps ignored)
|
|
67
|
+
|
|
68
|
+
# New endpoint for dependency checks
|
|
69
|
+
GET /health/ready → 200 if all deps pass, 503 if any fail
|
|
70
|
+
```
|
|
71
|
+
|
|
72
|
+
[↑ Back to top](#table-of-contents)
|
|
73
|
+
|
|
74
|
+
---
|
|
75
|
+
|
|
46
76
|
## Installation
|
|
47
77
|
|
|
48
78
|
Add to your Gemfile:
|
|
@@ -125,8 +155,9 @@ The routes are identical to the Rails engine, relative to the mount point:
|
|
|
125
155
|
|
|
126
156
|
| Endpoint | Format | Use case |
|
|
127
157
|
|----------|--------|----------|
|
|
128
|
-
| `GET/HEAD /` | JSON |
|
|
129
|
-
| `GET/HEAD /live` | Plain text | Liveness probe |
|
|
158
|
+
| `GET/HEAD /` | JSON | Full dependency health (monitoring dashboards) |
|
|
159
|
+
| `GET/HEAD /live` | Plain text | Liveness probe — process only, no deps |
|
|
160
|
+
| `GET/HEAD /ready` | Plain text | Readiness probe — all configured dependency checks |
|
|
130
161
|
| `GET /metrics` | Prometheus text | Prometheus scraping |
|
|
131
162
|
| `GET /:group` | JSON | Scoped check group |
|
|
132
163
|
|
|
@@ -174,16 +205,140 @@ Token and IP allowlist strategies are unchanged.
|
|
|
174
205
|
|
|
175
206
|
## Endpoints
|
|
176
207
|
|
|
177
|
-
| Endpoint | Format | Use case |
|
|
178
|
-
|
|
179
|
-
| `GET /health` |
|
|
180
|
-
| `GET /health/
|
|
181
|
-
| `GET /health
|
|
182
|
-
| `GET /health
|
|
208
|
+
| Endpoint | Runs checks? | Format | Use case |
|
|
209
|
+
|----------|-------------|--------|----------|
|
|
210
|
+
| `GET /health/live` | No — process only | Plain text | Kubernetes `livenessProbe`, load balancer health check |
|
|
211
|
+
| `GET /health/ready` | Yes — all configured deps | Plain text | Kubernetes `readinessProbe`, external uptime monitors |
|
|
212
|
+
| `GET /health` | Yes — all configured deps | JSON | Monitoring dashboards, alerting pipelines |
|
|
213
|
+
| `GET /health/metrics` | Yes — all configured deps | Prometheus text | Prometheus / OpenMetrics scraping |
|
|
214
|
+
| `GET /health/:group` | Yes — named subset | JSON | Scoped group (e.g. `/health/workers`) |
|
|
215
|
+
|
|
216
|
+
`/health/live`, `/health/ready`, and `/health` also respond to `HEAD` requests.
|
|
183
217
|
|
|
184
|
-
|
|
218
|
+
HTTP status: `200 OK` when all checks pass, `503 Service Unavailable` when any check fails (except `/metrics` which always returns `200`, and `/live` which always returns `200`).
|
|
219
|
+
|
|
220
|
+
---
|
|
221
|
+
|
|
222
|
+
### Liveness vs. Readiness — why two tiers?
|
|
223
|
+
|
|
224
|
+
**Using a single health endpoint for both load balancer checks and dependency monitoring is a cascade failure footgun.** Here is the exact failure chain:
|
|
225
|
+
|
|
226
|
+
1. Your database has a 30-second blip
|
|
227
|
+
2. All running pods probe `/health/ready` → all return `503`
|
|
228
|
+
3. The load balancer removes every pod from rotation simultaneously
|
|
229
|
+
4. Traffic has nowhere to go — the app is fully down
|
|
230
|
+
5. If the same endpoint drives `livenessProbe`, Kubernetes begins restarting every pod
|
|
231
|
+
6. Restarting pods reconnect to the still-blipping database, fail again, restart again
|
|
232
|
+
7. What was a 30-second DB hiccup is now a multi-minute outage driven by a thundering herd of pod restarts
|
|
233
|
+
|
|
234
|
+
The fix is to separate the two concerns:
|
|
235
|
+
|
|
236
|
+
| Endpoint | Question it answers | Correct probe |
|
|
237
|
+
|----------|--------------------|--------------:|
|
|
238
|
+
| `/health/live` | Is the process running and responsive? | `livenessProbe`, LB health check |
|
|
239
|
+
| `/health/ready` | Are all dependencies reachable? | `readinessProbe`, uptime monitor |
|
|
240
|
+
|
|
241
|
+
**Liveness (`/health/live`)** — returns `200 OK` as long as the Ruby process responds. No dependency checks run. Authentication is skipped so Kubernetes and load balancers work without credentials. When this fails, k8s restarts the pod because the process itself is stuck or crashed.
|
|
242
|
+
|
|
243
|
+
**Readiness (`/health/ready`)** — runs all configured dependency checks. Returns `503` if any check fails. When this fails, k8s stops routing traffic to the pod but leaves it running. The pod rejoins rotation automatically once dependencies recover — no restart, no thundering herd.
|
|
244
|
+
|
|
245
|
+
**Deep JSON (`/health`)** — same dependency checks as `/ready`, returned as structured JSON with per-check status and latency. Use for monitoring dashboards, alerting, or anywhere you need machine-readable detail. Do not use for liveness or readiness probes.
|
|
246
|
+
|
|
247
|
+
---
|
|
248
|
+
|
|
249
|
+
### Kubernetes wiring
|
|
250
|
+
|
|
251
|
+
```yaml
|
|
252
|
+
containers:
|
|
253
|
+
- name: web
|
|
254
|
+
ports:
|
|
255
|
+
- containerPort: 3000
|
|
256
|
+
livenessProbe:
|
|
257
|
+
httpGet:
|
|
258
|
+
path: /health/live # process-only — DB blip does NOT restart this pod
|
|
259
|
+
port: 3000
|
|
260
|
+
initialDelaySeconds: 10
|
|
261
|
+
periodSeconds: 10
|
|
262
|
+
failureThreshold: 3 # restarts only if the process stops responding entirely
|
|
263
|
+
readinessProbe:
|
|
264
|
+
httpGet:
|
|
265
|
+
path: /health/ready # dep checks — stops traffic but does NOT restart the pod
|
|
266
|
+
port: 3000
|
|
267
|
+
initialDelaySeconds: 5
|
|
268
|
+
periodSeconds: 10
|
|
269
|
+
failureThreshold: 2 # removes from rotation after 2 consecutive dep failures
|
|
270
|
+
startupProbe: # optional: give the app time to boot before probing
|
|
271
|
+
httpGet:
|
|
272
|
+
path: /health/live
|
|
273
|
+
port: 3000
|
|
274
|
+
failureThreshold: 30
|
|
275
|
+
periodSeconds: 5
|
|
276
|
+
```
|
|
277
|
+
|
|
278
|
+
> **Warning:** Do not point `livenessProbe` at `/health/ready`. A single dependency failure will cause Kubernetes to restart every pod simultaneously, turning a recoverable dep outage into a full application restart loop.
|
|
279
|
+
|
|
280
|
+
---
|
|
185
281
|
|
|
186
|
-
|
|
282
|
+
### Load balancer wiring
|
|
283
|
+
|
|
284
|
+
Always use the liveness endpoint for load balancer health checks. If you use the readiness endpoint and a dependency blips, the load balancer ejects all nodes at once and traffic has nowhere to go.
|
|
285
|
+
|
|
286
|
+
**AWS ALB / NLB (target group health check)**
|
|
287
|
+
|
|
288
|
+
```
|
|
289
|
+
Health check path: /health/live
|
|
290
|
+
Healthy threshold: 2
|
|
291
|
+
Unhealthy threshold: 3
|
|
292
|
+
Timeout: 5s
|
|
293
|
+
Interval: 10s
|
|
294
|
+
```
|
|
295
|
+
|
|
296
|
+
**Nginx upstream**
|
|
297
|
+
|
|
298
|
+
```nginx
|
|
299
|
+
upstream rails_app {
|
|
300
|
+
server app1:3000;
|
|
301
|
+
server app2:3000;
|
|
302
|
+
}
|
|
303
|
+
|
|
304
|
+
server {
|
|
305
|
+
location /health/live {
|
|
306
|
+
proxy_pass http://rails_app;
|
|
307
|
+
}
|
|
308
|
+
}
|
|
309
|
+
```
|
|
310
|
+
|
|
311
|
+
**HAProxy**
|
|
312
|
+
|
|
313
|
+
```
|
|
314
|
+
backend rails_app
|
|
315
|
+
option httpchk GET /health/live
|
|
316
|
+
server app1 app1:3000 check
|
|
317
|
+
server app2 app2:3000 check
|
|
318
|
+
```
|
|
319
|
+
|
|
320
|
+
> **Note:** Reserve `/health/ready` for Kubernetes `readinessProbe` and external uptime monitors (Pingdom, UptimeRobot, Better Uptime). These are the right tools to alert you when dependencies are down — the load balancer is not.
|
|
321
|
+
|
|
322
|
+
---
|
|
323
|
+
|
|
324
|
+
### Configuring endpoint paths
|
|
325
|
+
|
|
326
|
+
The readiness path defaults to `ready` (i.e. `/health/ready` when the engine is mounted at `/health`). Override it in your initializer:
|
|
327
|
+
|
|
328
|
+
```ruby
|
|
329
|
+
RailsHealthChecks.configure do |config|
|
|
330
|
+
config.readiness_path = "readyz" # → /health/readyz
|
|
331
|
+
end
|
|
332
|
+
```
|
|
333
|
+
|
|
334
|
+
The engine mount point is configurable in `config/routes.rb`:
|
|
335
|
+
|
|
336
|
+
```ruby
|
|
337
|
+
mount RailsHealthChecks::Engine => "/healthz"
|
|
338
|
+
# exposes: /healthz/live, /healthz/ready, /healthz, /healthz/metrics
|
|
339
|
+
```
|
|
340
|
+
|
|
341
|
+
---
|
|
187
342
|
|
|
188
343
|
### JSON response shape
|
|
189
344
|
|
|
@@ -329,6 +484,7 @@ Configuration is validated at boot time. An unknown check name, a missing `http_
|
|
|
329
484
|
| `checks` | `Array` | `[:database]` | Built-in or custom check names to run |
|
|
330
485
|
| `timeout` | `Integer` | `5` | Global per-check timeout in seconds |
|
|
331
486
|
| `cache_duration` | `Integer\|nil` | `nil` | Cache results for N seconds; `nil` disables caching |
|
|
487
|
+
| `readiness_path` | `String` | `"ready"` | Path of the readiness endpoint within the engine (e.g. `"ready"` → `/health/ready`) |
|
|
332
488
|
| `token` | `String\|nil` | `nil` | Bearer token for authentication |
|
|
333
489
|
| `allowed_ips` | `Array\|nil` | `nil` | IP allowlist; accepts exact IPs and CIDR ranges |
|
|
334
490
|
| `redis_url` | `String\|nil` | `nil` | Redis URL for `:redis` check; falls back to `REDIS_URL` env var then `redis://localhost:6379/0` |
|
|
@@ -354,6 +510,8 @@ Configuration is validated at boot time. An unknown check name, a missing `http_
|
|
|
354
510
|
|
|
355
511
|
By default health endpoints are public. Use one of the following strategies to restrict access. Unauthenticated requests receive `401 Unauthorized`.
|
|
356
512
|
|
|
513
|
+
> **Note:** `GET /health/live` always bypasses authentication regardless of the configured strategy. Liveness probes are called by Kubernetes and load balancers which cannot pass credentials, so enforcing auth on this endpoint would break infrastructure probing.
|
|
514
|
+
|
|
357
515
|
### Bearer token
|
|
358
516
|
|
|
359
517
|
```ruby
|
|
@@ -2,13 +2,10 @@
|
|
|
2
2
|
|
|
3
3
|
module RailsHealthChecks
|
|
4
4
|
class LiveController < ApplicationController
|
|
5
|
+
skip_before_action :authenticate!
|
|
6
|
+
|
|
5
7
|
def show
|
|
6
|
-
|
|
7
|
-
if builder.overall_status == "ok"
|
|
8
|
-
render plain: "OK", status: :ok
|
|
9
|
-
else
|
|
10
|
-
render plain: "Service Unavailable", status: :service_unavailable
|
|
11
|
-
end
|
|
8
|
+
render plain: "OK", status: :ok
|
|
12
9
|
end
|
|
13
10
|
end
|
|
14
11
|
end
|
|
@@ -0,0 +1,14 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
module RailsHealthChecks
|
|
4
|
+
class ReadyController < ApplicationController
|
|
5
|
+
def show
|
|
6
|
+
builder = ResponseBuilder.new(run_checks(RailsHealthChecks.configuration.checks))
|
|
7
|
+
if builder.overall_status == "ok"
|
|
8
|
+
render plain: "OK", status: :ok
|
|
9
|
+
else
|
|
10
|
+
render plain: "Service Unavailable", status: :service_unavailable
|
|
11
|
+
end
|
|
12
|
+
end
|
|
13
|
+
end
|
|
14
|
+
end
|
data/config/routes.rb
CHANGED
|
@@ -1,8 +1,11 @@
|
|
|
1
1
|
# frozen_string_literal: true
|
|
2
2
|
|
|
3
3
|
RailsHealthChecks::Engine.routes.draw do
|
|
4
|
-
|
|
5
|
-
|
|
6
|
-
|
|
7
|
-
|
|
4
|
+
readiness_path = RailsHealthChecks.configuration.readiness_path
|
|
5
|
+
|
|
6
|
+
match "/", to: "health#show", as: :health, via: [:get, :head]
|
|
7
|
+
match "/live", to: "live#show", as: :health_live, via: [:get, :head]
|
|
8
|
+
match "/#{readiness_path}", to: "ready#show", as: :health_ready, via: [:get, :head]
|
|
9
|
+
get "/metrics", to: "metrics#show", as: :health_metrics
|
|
10
|
+
get "/:id", to: "groups#show", as: :health_group
|
|
8
11
|
end
|
|
@@ -12,6 +12,13 @@ RailsHealthChecks.configure do |config|
|
|
|
12
12
|
# Cache check results for N seconds to avoid re-running on every request (default: nil, disabled)
|
|
13
13
|
# config.cache_duration = 10
|
|
14
14
|
|
|
15
|
+
# ---------------------------------------------------------------------------
|
|
16
|
+
# Endpoint paths
|
|
17
|
+
# ---------------------------------------------------------------------------
|
|
18
|
+
# Path for the readiness endpoint within the engine (default: "ready").
|
|
19
|
+
# When the engine is mounted at "/health", the readiness endpoint is "/health/ready".
|
|
20
|
+
# config.readiness_path = "ready"
|
|
21
|
+
|
|
15
22
|
# ---------------------------------------------------------------------------
|
|
16
23
|
# Authentication — all strategies are mutually exclusive; default is public
|
|
17
24
|
# ---------------------------------------------------------------------------
|
|
@@ -12,7 +12,8 @@ module RailsHealthChecks
|
|
|
12
12
|
:smtp_address, :smtp_port,
|
|
13
13
|
:sidekiq_queue_size, :solid_queue_job_count, :good_job_latency,
|
|
14
14
|
:resque_queue_size, :disk_warn_threshold, :disk_critical_threshold, :disk_path,
|
|
15
|
-
:memory_threshold, :http_url, :http_expected_status, :http_headers
|
|
15
|
+
:memory_threshold, :http_url, :http_expected_status, :http_headers,
|
|
16
|
+
:readiness_path
|
|
16
17
|
attr_reader :authenticate_block, :custom_checks, :groups
|
|
17
18
|
|
|
18
19
|
def initialize
|
|
@@ -39,6 +40,7 @@ module RailsHealthChecks
|
|
|
39
40
|
@custom_checks = {}
|
|
40
41
|
@groups = {}
|
|
41
42
|
@disabled_checks = {}
|
|
43
|
+
@readiness_path = "ready"
|
|
42
44
|
end
|
|
43
45
|
|
|
44
46
|
def checks
|
metadata
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
|
2
2
|
name: rails_health_checks
|
|
3
3
|
version: !ruby/object:Gem::Version
|
|
4
|
-
version: 1.
|
|
4
|
+
version: 1.2.0
|
|
5
5
|
platform: ruby
|
|
6
6
|
authors:
|
|
7
7
|
- Chuck Smith
|
|
@@ -57,6 +57,7 @@ files:
|
|
|
57
57
|
- app/controllers/rails_health_checks/health_controller.rb
|
|
58
58
|
- app/controllers/rails_health_checks/live_controller.rb
|
|
59
59
|
- app/controllers/rails_health_checks/metrics_controller.rb
|
|
60
|
+
- app/controllers/rails_health_checks/ready_controller.rb
|
|
60
61
|
- app/jobs/rails_health_checks/application_job.rb
|
|
61
62
|
- app/mailers/rails_health_checks/application_mailer.rb
|
|
62
63
|
- app/models/rails_health_checks/application_record.rb
|