rails_health_checks 1.1.0 → 1.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 934bfd1962ca10009ba6925f81cbf57f09d8d63ab017a6452296bfb32994352b
4
- data.tar.gz: 7ba0cfd783e433b26de67d2f9a2dc00a13e341584c88b1559c5f08e1f237187c
3
+ metadata.gz: 9033111f01524e17546525feaba1a215ac6e3cad1c7c8b0b12dd527850c74e7f
4
+ data.tar.gz: 3eded20b26844b046d33689f6a5512efa1a7c7964f04679552d00020d9a7aa78
5
5
  SHA512:
6
- metadata.gz: 2e52cb1373a3a8c26bd7a01ceefa6a32dedfb5c06421d754d94150a56349349b3ef467e529e696e9a74efd2449b7f8078bfff5b1a3cf9f2915786e8b16468d87
7
- data.tar.gz: 9c1dbc7ea623885abc7cf39acbdc63741fffb25698c93d3beddc501f598783956ede451c60456b8ba1e54bc54c1b725fb70a2a2edc727e6836081d8ff93587fc
6
+ metadata.gz: 7a8373f504fde389fd3e0202f3a08a1ea9e5a6af15dca997f8270e8dfcbca2cf5155a5f57a490392c6df1d3bc65f15a452b1c64ba3f96650b214007be5506623
7
+ data.tar.gz: 3f58fb48bd0a1a416e078f3e3bae0bb70e9e220c1cb93eaae25b2ec1c7710d56c9965e965fffa33f6d3922c132a120f8e0b554c18358d10eaa0787cebad0b778
data/README.md CHANGED
@@ -11,6 +11,7 @@ A Rails engine that adds production-grade health check endpoints to any Rails ap
11
11
  **Built-in checks:** database · cache · Redis · SMTP · Sidekiq · SolidQueue · GoodJob · Resque · disk · memory · HTTP
12
12
 
13
13
  **Key features:**
14
+ - **Two-tier endpoints:** `/live` (liveness — process only) and `/ready` (readiness — all deps) prevent cascade failures in Kubernetes and behind load balancers
14
15
  - Parallel check execution via `Concurrent::Future` — response time bounded by the slowest check, not the sum
15
16
  - Result caching (`config.cache_duration`) to absorb high-frequency probe traffic
16
17
  - Prometheus text exposition at `GET /health/metrics` (always HTTP 200)
@@ -21,9 +22,14 @@ A Rails engine that adds production-grade health check endpoints to any Rails ap
21
22
 
22
23
  ## Table of Contents
23
24
 
25
+ - [Upgrading](#upgrading)
24
26
  - [Installation](#installation)
25
27
  - [Rack Applications](#rack-applications)
26
28
  - [Endpoints](#endpoints)
29
+ - [Liveness vs. Readiness](#liveness-vs-readiness--why-two-tiers)
30
+ - [Kubernetes wiring](#kubernetes-wiring)
31
+ - [Load balancer wiring](#load-balancer-wiring)
32
+ - [Configuring endpoint paths](#configuring-endpoint-paths)
27
33
  - [Configuration](#configuration)
28
34
  - [Configuration Reference](#configuration-reference)
29
35
  - [Authentication](#authentication)
@@ -43,6 +49,30 @@ A Rails engine that adds production-grade health check endpoints to any Rails ap
43
49
 
44
50
  ---
45
51
 
52
+ ## Upgrading
53
+
54
+ ### v1.1.x → v1.2.x — breaking change to `/live`
55
+
56
+ > **`GET /health/live` no longer runs dependency checks.**
57
+
58
+ Prior to v1.2.0, `/live` ran all configured checks (database, Redis, etc.) and returned `503` if any failed. This was readiness behaviour under a liveness name and is the root cause of the cascade failure footgun described below.
59
+
60
+ **What changed:** `/live` now returns `200 OK` whenever the Ruby process is alive, regardless of dependency state. Authentication is also skipped on this endpoint so Kubernetes and load balancer probes work without credentials.
61
+
62
+ **What to do:** If you were relying on `/live` to verify dependencies, switch to the new `/health/ready` endpoint. No configuration changes required.
63
+
64
+ ```
65
+ # Before (was running dependency checks — now only liveness)
66
+ GET /health/live → 200 if process alive (deps ignored)
67
+
68
+ # New endpoint for dependency checks
69
+ GET /health/ready → 200 if all deps pass, 503 if any fail
70
+ ```
71
+
72
+ [↑ Back to top](#table-of-contents)
73
+
74
+ ---
75
+
46
76
  ## Installation
47
77
 
48
78
  Add to your Gemfile:
@@ -125,8 +155,9 @@ The routes are identical to the Rails engine, relative to the mount point:
125
155
 
126
156
  | Endpoint | Format | Use case |
127
157
  |----------|--------|----------|
128
- | `GET/HEAD /` | JSON | Health status |
129
- | `GET/HEAD /live` | Plain text | Liveness probe |
158
+ | `GET/HEAD /` | JSON | Full dependency health (monitoring dashboards) |
159
+ | `GET/HEAD /live` | Plain text | Liveness probe — process only, no deps |
160
+ | `GET/HEAD /ready` | Plain text | Readiness probe — all configured dependency checks |
130
161
  | `GET /metrics` | Prometheus text | Prometheus scraping |
131
162
  | `GET /:group` | JSON | Scoped check group |
132
163
 
@@ -174,16 +205,140 @@ Token and IP allowlist strategies are unchanged.
174
205
 
175
206
  ## Endpoints
176
207
 
177
- | Endpoint | Format | Use case |
178
- |----------|--------|----------|
179
- | `GET /health` | JSON | Monitoring dashboards, detailed diagnostics |
180
- | `GET /health/live` | Plain text | Load balancer liveness probes |
181
- | `GET /health/metrics` | Prometheus text | Prometheus / OpenMetrics scraping |
182
- | `GET /health/:group` | JSON | Scoped check group (e.g. `/health/workers`) |
208
+ | Endpoint | Runs checks? | Format | Use case |
209
+ |----------|-------------|--------|----------|
210
+ | `GET /health/live` | No — process only | Plain text | Kubernetes `livenessProbe`, load balancer health check |
211
+ | `GET /health/ready` | Yes — all configured deps | Plain text | Kubernetes `readinessProbe`, external uptime monitors |
212
+ | `GET /health` | Yes all configured deps | JSON | Monitoring dashboards, alerting pipelines |
213
+ | `GET /health/metrics` | Yes — all configured deps | Prometheus text | Prometheus / OpenMetrics scraping |
214
+ | `GET /health/:group` | Yes — named subset | JSON | Scoped group (e.g. `/health/workers`) |
215
+
216
+ `/health/live`, `/health/ready`, and `/health` also respond to `HEAD` requests.
183
217
 
184
- `/health` and `/health/live` also respond to `HEAD` requests (useful for lightweight load balancer probes).
218
+ HTTP status: `200 OK` when all checks pass, `503 Service Unavailable` when any check fails (except `/metrics` which always returns `200`, and `/live` which always returns `200`).
219
+
220
+ ---
221
+
222
+ ### Liveness vs. Readiness — why two tiers?
223
+
224
+ **Using a single health endpoint for both load balancer checks and dependency monitoring is a cascade failure footgun.** Here is the exact failure chain:
225
+
226
+ 1. Your database has a 30-second blip
227
+ 2. All running pods probe `/health/ready` → all return `503`
228
+ 3. The load balancer removes every pod from rotation simultaneously
229
+ 4. Traffic has nowhere to go — the app is fully down
230
+ 5. If the same endpoint drives `livenessProbe`, Kubernetes begins restarting every pod
231
+ 6. Restarting pods reconnect to the still-blipping database, fail again, restart again
232
+ 7. What was a 30-second DB hiccup is now a multi-minute outage driven by a thundering herd of pod restarts
233
+
234
+ The fix is to separate the two concerns:
235
+
236
+ | Endpoint | Question it answers | Correct probe |
237
+ |----------|--------------------|--------------:|
238
+ | `/health/live` | Is the process running and responsive? | `livenessProbe`, LB health check |
239
+ | `/health/ready` | Are all dependencies reachable? | `readinessProbe`, uptime monitor |
240
+
241
+ **Liveness (`/health/live`)** — returns `200 OK` as long as the Ruby process responds. No dependency checks run. Authentication is skipped so Kubernetes and load balancers work without credentials. When this fails, k8s restarts the pod because the process itself is stuck or crashed.
242
+
243
+ **Readiness (`/health/ready`)** — runs all configured dependency checks. Returns `503` if any check fails. When this fails, k8s stops routing traffic to the pod but leaves it running. The pod rejoins rotation automatically once dependencies recover — no restart, no thundering herd.
244
+
245
+ **Deep JSON (`/health`)** — same dependency checks as `/ready`, returned as structured JSON with per-check status and latency. Use for monitoring dashboards, alerting, or anywhere you need machine-readable detail. Do not use for liveness or readiness probes.
246
+
247
+ ---
248
+
249
+ ### Kubernetes wiring
250
+
251
+ ```yaml
252
+ containers:
253
+ - name: web
254
+ ports:
255
+ - containerPort: 3000
256
+ livenessProbe:
257
+ httpGet:
258
+ path: /health/live # process-only — DB blip does NOT restart this pod
259
+ port: 3000
260
+ initialDelaySeconds: 10
261
+ periodSeconds: 10
262
+ failureThreshold: 3 # restarts only if the process stops responding entirely
263
+ readinessProbe:
264
+ httpGet:
265
+ path: /health/ready # dep checks — stops traffic but does NOT restart the pod
266
+ port: 3000
267
+ initialDelaySeconds: 5
268
+ periodSeconds: 10
269
+ failureThreshold: 2 # removes from rotation after 2 consecutive dep failures
270
+ startupProbe: # optional: give the app time to boot before probing
271
+ httpGet:
272
+ path: /health/live
273
+ port: 3000
274
+ failureThreshold: 30
275
+ periodSeconds: 5
276
+ ```
277
+
278
+ > **Warning:** Do not point `livenessProbe` at `/health/ready`. A single dependency failure will cause Kubernetes to restart every pod simultaneously, turning a recoverable dep outage into a full application restart loop.
279
+
280
+ ---
185
281
 
186
- HTTP status is `200 OK` when all checks pass, `503 Service Unavailable` otherwise (except `/metrics` which always returns `200`).
282
+ ### Load balancer wiring
283
+
284
+ Always use the liveness endpoint for load balancer health checks. If you use the readiness endpoint and a dependency blips, the load balancer ejects all nodes at once and traffic has nowhere to go.
285
+
286
+ **AWS ALB / NLB (target group health check)**
287
+
288
+ ```
289
+ Health check path: /health/live
290
+ Healthy threshold: 2
291
+ Unhealthy threshold: 3
292
+ Timeout: 5s
293
+ Interval: 10s
294
+ ```
295
+
296
+ **Nginx upstream**
297
+
298
+ ```nginx
299
+ upstream rails_app {
300
+ server app1:3000;
301
+ server app2:3000;
302
+ }
303
+
304
+ server {
305
+ location /health/live {
306
+ proxy_pass http://rails_app;
307
+ }
308
+ }
309
+ ```
310
+
311
+ **HAProxy**
312
+
313
+ ```
314
+ backend rails_app
315
+ option httpchk GET /health/live
316
+ server app1 app1:3000 check
317
+ server app2 app2:3000 check
318
+ ```
319
+
320
+ > **Note:** Reserve `/health/ready` for Kubernetes `readinessProbe` and external uptime monitors (Pingdom, UptimeRobot, Better Uptime). These are the right tools to alert you when dependencies are down — the load balancer is not.
321
+
322
+ ---
323
+
324
+ ### Configuring endpoint paths
325
+
326
+ The readiness path defaults to `ready` (i.e. `/health/ready` when the engine is mounted at `/health`). Override it in your initializer:
327
+
328
+ ```ruby
329
+ RailsHealthChecks.configure do |config|
330
+ config.readiness_path = "readyz" # → /health/readyz
331
+ end
332
+ ```
333
+
334
+ The engine mount point is configurable in `config/routes.rb`:
335
+
336
+ ```ruby
337
+ mount RailsHealthChecks::Engine => "/healthz"
338
+ # exposes: /healthz/live, /healthz/ready, /healthz, /healthz/metrics
339
+ ```
340
+
341
+ ---
187
342
 
188
343
  ### JSON response shape
189
344
 
@@ -329,6 +484,7 @@ Configuration is validated at boot time. An unknown check name, a missing `http_
329
484
  | `checks` | `Array` | `[:database]` | Built-in or custom check names to run |
330
485
  | `timeout` | `Integer` | `5` | Global per-check timeout in seconds |
331
486
  | `cache_duration` | `Integer\|nil` | `nil` | Cache results for N seconds; `nil` disables caching |
487
+ | `readiness_path` | `String` | `"ready"` | Path of the readiness endpoint within the engine (e.g. `"ready"` → `/health/ready`) |
332
488
  | `token` | `String\|nil` | `nil` | Bearer token for authentication |
333
489
  | `allowed_ips` | `Array\|nil` | `nil` | IP allowlist; accepts exact IPs and CIDR ranges |
334
490
  | `redis_url` | `String\|nil` | `nil` | Redis URL for `:redis` check; falls back to `REDIS_URL` env var then `redis://localhost:6379/0` |
@@ -354,6 +510,8 @@ Configuration is validated at boot time. An unknown check name, a missing `http_
354
510
 
355
511
  By default health endpoints are public. Use one of the following strategies to restrict access. Unauthenticated requests receive `401 Unauthorized`.
356
512
 
513
+ > **Note:** `GET /health/live` always bypasses authentication regardless of the configured strategy. Liveness probes are called by Kubernetes and load balancers which cannot pass credentials, so enforcing auth on this endpoint would break infrastructure probing.
514
+
357
515
  ### Bearer token
358
516
 
359
517
  ```ruby
@@ -2,13 +2,10 @@
2
2
 
3
3
  module RailsHealthChecks
4
4
  class LiveController < ApplicationController
5
+ skip_before_action :authenticate!
6
+
5
7
  def show
6
- builder = ResponseBuilder.new(run_checks(RailsHealthChecks.configuration.checks))
7
- if builder.overall_status == "ok"
8
- render plain: "OK", status: :ok
9
- else
10
- render plain: "Service Unavailable", status: :service_unavailable
11
- end
8
+ render plain: "OK", status: :ok
12
9
  end
13
10
  end
14
11
  end
@@ -0,0 +1,14 @@
1
+ # frozen_string_literal: true
2
+
3
+ module RailsHealthChecks
4
+ class ReadyController < ApplicationController
5
+ def show
6
+ builder = ResponseBuilder.new(run_checks(RailsHealthChecks.configuration.checks))
7
+ if builder.overall_status == "ok"
8
+ render plain: "OK", status: :ok
9
+ else
10
+ render plain: "Service Unavailable", status: :service_unavailable
11
+ end
12
+ end
13
+ end
14
+ end
data/config/routes.rb CHANGED
@@ -1,8 +1,11 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  RailsHealthChecks::Engine.routes.draw do
4
- match "/", to: "health#show", as: :health, via: [:get, :head]
5
- match "/live", to: "live#show", as: :health_live, via: [:get, :head]
6
- get "/metrics", to: "metrics#show", as: :health_metrics
7
- get "/:id", to: "groups#show", as: :health_group
4
+ readiness_path = RailsHealthChecks.configuration.readiness_path
5
+
6
+ match "/", to: "health#show", as: :health, via: [:get, :head]
7
+ match "/live", to: "live#show", as: :health_live, via: [:get, :head]
8
+ match "/#{readiness_path}", to: "ready#show", as: :health_ready, via: [:get, :head]
9
+ get "/metrics", to: "metrics#show", as: :health_metrics
10
+ get "/:id", to: "groups#show", as: :health_group
8
11
  end
@@ -12,6 +12,13 @@ RailsHealthChecks.configure do |config|
12
12
  # Cache check results for N seconds to avoid re-running on every request (default: nil, disabled)
13
13
  # config.cache_duration = 10
14
14
 
15
+ # ---------------------------------------------------------------------------
16
+ # Endpoint paths
17
+ # ---------------------------------------------------------------------------
18
+ # Path for the readiness endpoint within the engine (default: "ready").
19
+ # When the engine is mounted at "/health", the readiness endpoint is "/health/ready".
20
+ # config.readiness_path = "ready"
21
+
15
22
  # ---------------------------------------------------------------------------
16
23
  # Authentication — all strategies are mutually exclusive; default is public
17
24
  # ---------------------------------------------------------------------------
@@ -12,7 +12,8 @@ module RailsHealthChecks
12
12
  :smtp_address, :smtp_port,
13
13
  :sidekiq_queue_size, :solid_queue_job_count, :good_job_latency,
14
14
  :resque_queue_size, :disk_warn_threshold, :disk_critical_threshold, :disk_path,
15
- :memory_threshold, :http_url, :http_expected_status, :http_headers
15
+ :memory_threshold, :http_url, :http_expected_status, :http_headers,
16
+ :readiness_path
16
17
  attr_reader :authenticate_block, :custom_checks, :groups
17
18
 
18
19
  def initialize
@@ -39,6 +40,7 @@ module RailsHealthChecks
39
40
  @custom_checks = {}
40
41
  @groups = {}
41
42
  @disabled_checks = {}
43
+ @readiness_path = "ready"
42
44
  end
43
45
 
44
46
  def checks
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module RailsHealthChecks
4
- VERSION = "1.1.0"
4
+ VERSION = "1.2.0"
5
5
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: rails_health_checks
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.1.0
4
+ version: 1.2.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Chuck Smith
@@ -57,6 +57,7 @@ files:
57
57
  - app/controllers/rails_health_checks/health_controller.rb
58
58
  - app/controllers/rails_health_checks/live_controller.rb
59
59
  - app/controllers/rails_health_checks/metrics_controller.rb
60
+ - app/controllers/rails_health_checks/ready_controller.rb
60
61
  - app/jobs/rails_health_checks/application_job.rb
61
62
  - app/mailers/rails_health_checks/application_mailer.rb
62
63
  - app/models/rails_health_checks/application_record.rb