elasticgraph-health_check 0.18.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: 3810ecb7062490ded525ea2bfee19dac4a408227baebae6b3ca0e6cf43fc6b92
4
+ data.tar.gz: 3d793cfdca8a02dcbb328d4e0a6f8f5608c8e3ad45ffe86e9269f8955a9b205c
5
+ SHA512:
6
+ metadata.gz: 60ff8f6a90f1d260b8fac16a8677ea7c9db922cdee2f3c2cc614dd5be3ea7da1ca914f77aadc825a9b51c42fba69b7bd3d602eed027e5560288f0c68a3479d2f
7
+ data.tar.gz: b30042926be384bddee2172bfc5d7f6a4fa696878b2f58b82328a5988ea784b6a908ea70671f55eb5e47daadb24e86b57f254cee3bb92227e5829d5245951d01
data/LICENSE.txt ADDED
@@ -0,0 +1,21 @@
1
+ The MIT License (MIT)
2
+
3
+ Copyright (c) 2024 Block, Inc.
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in
13
+ all copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21
+ THE SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,82 @@
1
+ # ElasticGraph::HealthCheck
2
+
3
+ Provides a component that can act as a health check for high availability deployments. The HealthCheck component
4
+ returns a summary status of either `healthy`, `degraded`, or `unhealthy` for the endpoint.
5
+
6
+ The intended semantics of these statuses
7
+ map to the corresponding Envoy statuses, see
8
+ [the Envoy documentation for more details](https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/upstream/health_checking),
9
+ but in short `degraded` maps to "endpoint is impaired, do not use unless you have no other choice" and `unhealthy` maps to "endpoint is hard
10
+ down/should not be used under any circumstances".
11
+
12
+ The returned status is the worst of the status values from the individual sub-checks:
13
+ 1. The datastore clusters' own health statuses. The datastore clusters reflect their status as green/yellow/red. See
14
+ [the Elasticsearch documentation](https://www.elastic.co/guide/en/elasticsearch/reference/current/cluster-health.html#cluster-health-api-response-body)
15
+ for details on the meaning of these statuses.
16
+ - `green` maps to `healthy`, `yellow` to `degraded`, and `red` to `unhealthy`.
17
+
18
+ 2. The recency of data present in ElasticGraph indices. The HealthCheck configuration specifies the expected "max recency" for items within an
19
+ index.
20
+ - If no records have been indexed within the specified period, the HealthCheck component will consider the index to be in a `degraded` status.
21
+
22
+ As mentioned above, the returned status is the worst status of these two checks. E.g. if the datastore cluster(s) are all `green`, but a recency check fails, the
23
+ overall status will be `degraded`. If the recency checks pass, but at least one datastore cluster is `red`, an `unhealthy` status will be returned.
24
+
25
+ ## Integration
26
+
27
+ To use, simply register the `EnvoyExtension` when defining your schema:
28
+
29
+ ```ruby
30
+ require(envoy_extension_path = "elastic_graph/health_check/envoy_extension")
31
+ schema.register_graphql_extension ElasticGraph::HealthCheck::EnvoyExtension,
32
+ defined_at: envoy_extension_path,
33
+ http_path_segment: "/_status"
34
+ ```
35
+
36
+ ## Configuration
37
+
38
+ These checks are configurable. The following configuration will be used as an example:
39
+
40
+ ```
41
+ health_check:
42
+ clusters_to_consider: ["widgets-cluster"]
43
+ data_recency_checks:
44
+ Widget:
45
+ timestamp_field: createdAt
46
+ expected_max_recency_seconds: 30
47
+ ```
48
+
49
+ - `clusters_to_consider` configures the first check (datastore cluster health), and specifies which clusters' health status is monitored.
50
+ - `data_recency_checks` configures the second check (data recency), and configures the recency check described above. In this example, if no new "Widgets"
51
+ are indexed for thirty seconds (perhaps because of an infrastructure issue), a `degraded` status will be returned.
52
+ - Note that this setting is most appropriate for types where you expect a steady stream of indexing (and where the absence of new records is indicative
53
+ of some kind of failure).
54
+
55
+ ## Behavior when datastore clusters are inaccessible
56
+
57
+ A given ElasticGraph GraphQL endpoint does not necessarily have access to all datastore clusters - more specifically, the endpoint will only have access
58
+ to clusters present in the `datastore.clusters` configuration map.
59
+
60
+ If a health check is configured for either a cluster or type that the GraphQL endpoint does not have access to, the respective check will be skipped. This is appropriate,
61
+ as since the GraphQL endpoint does not have access to the cluster/type, the cluster's/type's health is immaterial.
62
+
63
+ For example, with the following configuration:
64
+
65
+ ```
66
+ datastore:
67
+ clusters:
68
+ widgets-cluster: { ... }
69
+ # components-cluster: { ... } ### Not available, commented out.
70
+ health_check:
71
+ clusters_to_consider: ["widgets-cluster", "components-cluster"]
72
+ data_recency_checks:
73
+ Component:
74
+ timestamp_field: createdAt
75
+ expected_max_recency_seconds: 10
76
+ Widget:
77
+ timestamp_field: createdAt
78
+ expected_max_recency_seconds: 30
79
+ ```
80
+
81
+ ... the `components-cluster` datastore status health check will be skipped, as will the Component recency check. However the `widgets-cluster`/`Widget` health
82
+ checks will proceed as normal.
@@ -0,0 +1,23 @@
1
+ # Copyright 2024 Block, Inc.
2
+ #
3
+ # Use of this source code is governed by an MIT-style
4
+ # license that can be found in the LICENSE file or at
5
+ # https://opensource.org/licenses/MIT.
6
+ #
7
+ # frozen_string_literal: true
8
+
9
+ require_relative "../gemspec_helper"
10
+
11
+ ElasticGraphGemspecHelper.define_elasticgraph_gem(gemspec_file: __FILE__, category: :extension) do |spec, eg_version|
12
+ spec.summary = "An ElasticGraph extension that provides a health check for high availability deployments."
13
+
14
+ spec.add_dependency "elasticgraph-datastore_core", eg_version
15
+ spec.add_dependency "elasticgraph-graphql", eg_version
16
+ spec.add_dependency "elasticgraph-support", eg_version
17
+
18
+ spec.add_development_dependency "elasticgraph-admin", eg_version
19
+ spec.add_development_dependency "elasticgraph-elasticsearch", eg_version
20
+ spec.add_development_dependency "elasticgraph-indexer", eg_version
21
+ spec.add_development_dependency "elasticgraph-opensearch", eg_version
22
+ spec.add_development_dependency "elasticgraph-schema_definition", eg_version
23
+ end
@@ -0,0 +1,48 @@
1
+ # Copyright 2024 Block, Inc.
2
+ #
3
+ # Use of this source code is governed by an MIT-style
4
+ # license that can be found in the LICENSE file or at
5
+ # https://opensource.org/licenses/MIT.
6
+ #
7
+ # frozen_string_literal: true
8
+
9
+ module ElasticGraph
10
+ module HealthCheck
11
+ class Config < ::Data.define(
12
+ # The list of clusters to perform datastore status health checks on. A `green` status maps to `healthy`, a
13
+ # `yellow` status maps to `degraded`, and a `red` status maps to `unhealthy`. The returned status is the minimum
14
+ # status from all clusters in the list (a `yellow` cluster and a `green` cluster will result in a `degraded` status).
15
+ #
16
+ # Example: ["cluster-one", "cluster-two"]
17
+ :clusters_to_consider,
18
+ # A map of types to perform recency checks on. If no new records for that type have been indexed within the specified
19
+ # period, a `degraded` status will be returned.
20
+ #
21
+ # Example: { Widget: { timestamp_field: createdAt, expected_max_recency_seconds: 30 }}
22
+ :data_recency_checks
23
+ )
24
+ EMPTY = new([], {})
25
+
26
+ def self.from_parsed_yaml(config_hash)
27
+ config_hash = config_hash.fetch("health_check") { return EMPTY }
28
+
29
+ new(
30
+ clusters_to_consider: config_hash.fetch("clusters_to_consider"),
31
+ data_recency_checks: config_hash.fetch("data_recency_checks").transform_values do |value_hash|
32
+ DataRecencyCheck.from(value_hash)
33
+ end
34
+ )
35
+ end
36
+
37
+ DataRecencyCheck = ::Data.define(:expected_max_recency_seconds, :timestamp_field) do
38
+ # @implements DataRecencyCheck
39
+ def self.from(config_hash)
40
+ new(
41
+ expected_max_recency_seconds: config_hash.fetch("expected_max_recency_seconds"),
42
+ timestamp_field: config_hash.fetch("timestamp_field")
43
+ )
44
+ end
45
+ end
46
+ end
47
+ end
48
+ end
@@ -0,0 +1,46 @@
1
+ # Copyright 2024 Block, Inc.
2
+ #
3
+ # Use of this source code is governed by an MIT-style
4
+ # license that can be found in the LICENSE file or at
5
+ # https://opensource.org/licenses/MIT.
6
+ #
7
+ # frozen_string_literal: true
8
+
9
+ # Enumerates constants that are used from multiple places in ElasticGraph::HealthCheck.
10
+ module ElasticGraph
11
+ module HealthCheck
12
+ # List of datastore cluster health fields from:
13
+ # https://www.elastic.co/guide/en/elasticsearch/reference/7.10/cluster-health.html#cluster-health-api-response-body
14
+ #
15
+ # This is expressed as a constant so that we can use it in dynamic ways in a few places
16
+ # (such as in a test; we want an acceptance test to fetch all these fields to make
17
+ # sure they work, and having them defined this way makes that easier).
18
+ #
19
+ # To get this list, this javascript was used in the chrome console:
20
+ #
21
+ # Array.from(document.querySelectorAll('div.variablelist')[2].querySelectorAll(':scope > dl.variablelist > dt')).map(x => x.innerText)
22
+ #
23
+ # (Feel free to use/change that as needed if/when you update this list in the future based on a newer datastore version.)
24
+ #
25
+ # Note: `discovered_master` is a new boolean field that AWS OpenSearch seems to add to the cluster health response.
26
+ # It was observed on the response on 2022-04-18.
27
+ DATASTORE_CLUSTER_HEALTH_FIELDS = %i[
28
+ cluster_name
29
+ status
30
+ timed_out
31
+ number_of_nodes
32
+ number_of_data_nodes
33
+ active_primary_shards
34
+ active_shards
35
+ relocating_shards
36
+ initializing_shards
37
+ unassigned_shards
38
+ delayed_unassigned_shards
39
+ number_of_pending_tasks
40
+ number_of_in_flight_fetch
41
+ task_max_waiting_in_queue_millis
42
+ active_shards_percent_as_number
43
+ discovered_master
44
+ ].to_set
45
+ end
46
+ end
@@ -0,0 +1,61 @@
1
+ # Copyright 2024 Block, Inc.
2
+ #
3
+ # Use of this source code is governed by an MIT-style
4
+ # license that can be found in the LICENSE file or at
5
+ # https://opensource.org/licenses/MIT.
6
+ #
7
+ # frozen_string_literal: true
8
+
9
+ require "delegate"
10
+ require "elastic_graph/graphql/http_endpoint"
11
+ require "uri"
12
+
13
+ module ElasticGraph
14
+ module HealthCheck
15
+ module EnvoyExtension
16
+ # Intercepts HTTP requests so that a health check can be performed if it's a GET request to the configured health check path.
17
+ # The HTTP response follows Envoy HTTP health check guidelines:
18
+ #
19
+ # https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/upstream/health_checking
20
+ class GraphQLHTTPEndpointDecorator < DelegateClass(GraphQL::HTTPEndpoint)
21
+ def initialize(http_endpoint, health_check_http_path_segment:, health_checker:, logger:)
22
+ super(http_endpoint)
23
+ @health_check_http_path_segment = health_check_http_path_segment.delete_prefix("/").delete_suffix("/")
24
+ @health_checker = health_checker
25
+ @logger = logger
26
+ end
27
+
28
+ __skip__ =
29
+ def process(request, **)
30
+ if request.http_method == :get && URI(request.url).path.split("/").include?(@health_check_http_path_segment)
31
+ perform_health_check
32
+ else
33
+ super
34
+ end
35
+ end
36
+
37
+ private
38
+
39
+ RESPONSES_BY_HEALTH_STATUS_CATEGORY = {
40
+ healthy: [200, "Healthy!", {}],
41
+ unhealthy: [500, "Unhealthy!", {}],
42
+ # https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/upstream/health_checking#degraded-health
43
+ degraded: [200, "Degraded.", {"x-envoy-degraded" => "true"}]
44
+ }
45
+
46
+ def perform_health_check
47
+ status = @health_checker.check_health
48
+ @logger.info status.to_loggable_description
49
+
50
+ status, message, headers = RESPONSES_BY_HEALTH_STATUS_CATEGORY.fetch(status.category)
51
+
52
+ GraphQL::HTTPResponse.new(
53
+ status_code: status,
54
+ headers: headers.merge("Content-Type" => "text/plain"),
55
+ body: message
56
+ )
57
+ end
58
+ end
59
+ end
60
+ end
61
+ end
@@ -0,0 +1,43 @@
1
+ # Copyright 2024 Block, Inc.
2
+ #
3
+ # Use of this source code is governed by an MIT-style
4
+ # license that can be found in the LICENSE file or at
5
+ # https://opensource.org/licenses/MIT.
6
+ #
7
+ # frozen_string_literal: true
8
+
9
+ require "elastic_graph/error"
10
+ require "elastic_graph/health_check/envoy_extension/graphql_http_endpoint_decorator"
11
+ require "elastic_graph/health_check/health_checker"
12
+
13
+ module ElasticGraph
14
+ module HealthCheck
15
+ # An extension module that hooks into the HTTP endpoint to provide Envoy health checks.
16
+ module EnvoyExtension
17
+ def graphql_http_endpoint
18
+ @graphql_http_endpoint ||=
19
+ begin
20
+ http_path_segment = config.extension_settings.dig("health_check", "http_path_segment")
21
+ http_path_segment ||= runtime_metadata
22
+ .graphql_extension_modules
23
+ .find { |ext_mod| ext_mod.extension_class == EnvoyExtension }
24
+ &.extension_config
25
+ &.dig(:http_path_segment)
26
+
27
+ if http_path_segment.nil?
28
+ raise ElasticGraph::ConfigSettingNotSetError, "Health check `http_path_segment` is not configured. " \
29
+ "Either set under `health_check` in YAML config or pass it along if you register the `EnvoyExtension` " \
30
+ "via `register_graphql_extension`."
31
+ end
32
+
33
+ GraphQLHTTPEndpointDecorator.new(
34
+ super,
35
+ health_check_http_path_segment: http_path_segment,
36
+ health_checker: HealthChecker.build_from(self),
37
+ logger: logger
38
+ )
39
+ end
40
+ end
41
+ end
42
+ end
43
+ end
@@ -0,0 +1,221 @@
1
+ # Copyright 2024 Block, Inc.
2
+ #
3
+ # Use of this source code is governed by an MIT-style
4
+ # license that can be found in the LICENSE file or at
5
+ # https://opensource.org/licenses/MIT.
6
+ #
7
+ # frozen_string_literal: true
8
+
9
+ require "elastic_graph/error"
10
+ require "elastic_graph/health_check/config"
11
+ require "elastic_graph/health_check/health_status"
12
+ require "elastic_graph/support/threading"
13
+ require "time"
14
+
15
+ module ElasticGraph
16
+ module HealthCheck
17
+ class HealthChecker
18
+ # Static factory method that builds a HealthChecker from an ElasticGraph::GraphQL instance.
19
+ def self.build_from(graphql)
20
+ new(
21
+ schema: graphql.schema,
22
+ config: HealthCheck::Config.from_parsed_yaml(graphql.config.extension_settings),
23
+ datastore_search_router: graphql.datastore_search_router,
24
+ datastore_query_builder: graphql.datastore_query_builder,
25
+ datastore_clients_by_name: graphql.datastore_core.clients_by_name,
26
+ clock: graphql.clock,
27
+ logger: graphql.logger
28
+ )
29
+ end
30
+
31
+ def initialize(
32
+ schema:,
33
+ config:,
34
+ datastore_search_router:,
35
+ datastore_query_builder:,
36
+ datastore_clients_by_name:,
37
+ clock:,
38
+ logger:
39
+ )
40
+ @schema = schema
41
+ @datastore_search_router = datastore_search_router
42
+ @datastore_query_builder = datastore_query_builder
43
+ @datastore_clients_by_name = datastore_clients_by_name
44
+ @clock = clock
45
+ @logger = logger
46
+ @indexed_document_types_by_name = @schema.indexed_document_types.to_h { |t| [t.name.to_s, t] }
47
+
48
+ @config = validate_and_normalize_config(config)
49
+ end
50
+
51
+ def check_health
52
+ recency_queries_by_type_name = @config.data_recency_checks.to_h do |type_name, recency_config|
53
+ [type_name, build_recency_query_for(type_name, recency_config)]
54
+ end
55
+
56
+ recency_results_by_query, *cluster_healths = execute_in_parallel(
57
+ lambda { @datastore_search_router.msearch(recency_queries_by_type_name.values) },
58
+ *@config.clusters_to_consider.map do |cluster|
59
+ lambda { [cluster, @datastore_clients_by_name.fetch(cluster).get_cluster_health] }
60
+ end
61
+ )
62
+
63
+ HealthStatus.new(
64
+ cluster_health_by_name: build_cluster_health_by_name(cluster_healths.to_h),
65
+ latest_record_by_type: build_latest_record_by_type(recency_results_by_query, recency_queries_by_type_name)
66
+ )
67
+ end
68
+
69
+ private
70
+
71
+ def build_recency_query_for(type_name, recency_config)
72
+ type = @indexed_document_types_by_name.fetch(type_name)
73
+
74
+ @datastore_query_builder.new_query(
75
+ search_index_definitions: type.search_index_definitions,
76
+ filter: build_index_optimization_filter_for(recency_config),
77
+ requested_fields: ["id", recency_config.timestamp_field],
78
+ document_pagination: {first: 1},
79
+ sort: [{recency_config.timestamp_field => {"order" => "desc"}}]
80
+ )
81
+ end
82
+
83
+ # To make the recency query more optimal, we filter on the timestamp field. This can provide
84
+ # a couple optimizations:
85
+ #
86
+ # - If its a rollover index and the timestamp is the field we use for rollover, this allows
87
+ # the ElasticGraph query engine to hit only a subset of indices for better perf.
88
+ # - We've been told (by AWS support) that sorting a larger result set if more expensive than
89
+ # a small result set (presumably larger than filtering cost) so even if we can't limit what
90
+ # indices we hit with this, it should still be helpful.
91
+ #
92
+ # However, there's a bit of a risk of not actually finding the latest record if we include
93
+ # this filter. What we have here is a compromise: we "lookback" up to 100 times the
94
+ # `expected_max_recency_seconds`. For example, if that's set at 30, we'd search the last 3000
95
+ # seconds of data, which should be plenty of lookback for most cases, while still allowing
96
+ # a filter optimization. Once the latest record is more than 100 times older than our threshold
97
+ # the exact age of it is less interesting, anyway.
98
+ def build_index_optimization_filter_for(recency_config)
99
+ lookback_timestamp = @clock.now - (recency_config.expected_max_recency_seconds * 100)
100
+ {recency_config.timestamp_field => {"gte" => lookback_timestamp.iso8601}}
101
+ end
102
+
103
+ def execute_in_parallel(*lambdas)
104
+ Support::Threading.parallel_map(lambdas) { |l| l.call }
105
+ end
106
+
107
+ def build_cluster_health_by_name(cluster_healths)
108
+ cluster_healths.transform_values do |health|
109
+ health_status_fields = DATASTORE_CLUSTER_HEALTH_FIELDS.to_h do |field_name|
110
+ [field_name, health[field_name.to_s]]
111
+ end
112
+
113
+ HealthStatus::ClusterHealth.new(**health_status_fields)
114
+ end
115
+ end
116
+
117
+ def build_latest_record_by_type(recency_results_by_query, recency_queries_by_type_name)
118
+ recency_queries_by_type_name.to_h do |type_name, query|
119
+ config = @config.data_recency_checks.fetch(type_name)
120
+
121
+ latest_record = if (latest_doc = recency_results_by_query.fetch(query).first)
122
+ timestamp = ::Time.iso8601(latest_doc.fetch(config.timestamp_field))
123
+
124
+ HealthStatus::LatestRecord.new(
125
+ id: latest_doc.id,
126
+ timestamp: timestamp,
127
+ seconds_newer_than_required: timestamp - (@clock.now - config.expected_max_recency_seconds)
128
+ )
129
+ end
130
+
131
+ [type_name, latest_record]
132
+ end
133
+ end
134
+
135
+ def validate_and_normalize_config(config)
136
+ unrecognized_cluster_names = config.clusters_to_consider - all_known_clusters
137
+
138
+ # @type var errors: ::Array[::String]
139
+ errors = []
140
+
141
+ if unrecognized_cluster_names.any?
142
+ errors << "`health_check.clusters_to_consider` contains " \
143
+ "unrecognized cluster names: #{unrecognized_cluster_names.join(", ")}"
144
+ end
145
+
146
+ # Here, we determine which of the specified `clusters_to_consider` are actually available for datastore health checks (green/yellow/red).
147
+ # Before partitioning, we remove `unrecognized_cluster_names` as those will be reported through a separate error mechanism (above).
148
+ #
149
+ # Below, `available_clusters_to_consider` will replace `clusters_to_consider` in the returned `Config` instance.
150
+ available_clusters_to_consider, unavailable_clusters_to_consider =
151
+ (config.clusters_to_consider - unrecognized_cluster_names).partition { |it| @datastore_clients_by_name.key?(it) }
152
+
153
+ if unavailable_clusters_to_consider.any?
154
+ @logger.warn("#{unavailable_clusters_to_consider.length} cluster(s) were unavailable for health-checking: #{unavailable_clusters_to_consider.join(", ")}")
155
+ end
156
+
157
+ valid_type_names, invalid_type_names = config
158
+ .data_recency_checks.keys
159
+ .partition { |type| @indexed_document_types_by_name.key?(type) }
160
+
161
+ if invalid_type_names.any?
162
+ errors << "Some `health_check.data_recency_checks` types are not recognized indexed types: " \
163
+ "#{invalid_type_names.join(", ")}"
164
+ end
165
+
166
+ # It is possible to configure a GraphQL endpoint that has a healthcheck set up on type A, but doesn't actually
167
+ # have access to the datastore cluster that backs type A. In that case, we want to skip the health check - if the endpoint
168
+ # can't access type A, its health (or unhealth) is immaterial.
169
+ #
170
+ # So below, filter to types that have all of their datastore clusters available for querying.
171
+ available_type_names, unavailable_type_names = valid_type_names.partition do |type_name|
172
+ @indexed_document_types_by_name[type_name].search_index_definitions.all? do |search_index_definition|
173
+ @datastore_clients_by_name.key?(search_index_definition.cluster_to_query.to_s)
174
+ end
175
+ end
176
+
177
+ if unavailable_type_names.any?
178
+ @logger.warn("#{unavailable_type_names.length} type(s) were unavailable for health-checking: #{unavailable_type_names.join(", ")}")
179
+ end
180
+
181
+ # @type var invalid_timestamp_fields_by_type: ::Hash[::String, ::String]
182
+ invalid_timestamp_fields_by_type = {}
183
+ # @type var normalized_data_recency_checks: ::Hash[::String, Config::DataRecencyCheck]
184
+ normalized_data_recency_checks = {}
185
+
186
+ available_type_names.each do |type|
187
+ check = config.data_recency_checks.fetch(type)
188
+ field = @indexed_document_types_by_name
189
+ .fetch(type)
190
+ .fields_by_name[check.timestamp_field]
191
+
192
+ if field&.type&.unwrap_fully&.name.to_s == "DateTime"
193
+ # Convert the config so that we have a reference to the index field name.
194
+ normalized_data_recency_checks[type] = check.with(timestamp_field: field.name_in_index.to_s)
195
+ else
196
+ invalid_timestamp_fields_by_type[type] = check.timestamp_field
197
+ end
198
+ end
199
+
200
+ if invalid_timestamp_fields_by_type.any?
201
+ errors << "Some `health_check.data_recency_checks` entries have invalid timestamp fields: " \
202
+ "#{invalid_timestamp_fields_by_type.map { |k, v| "#{k} (#{v})" }.join(", ")}"
203
+ end
204
+
205
+ raise ConfigError, errors.join("\n\n") unless errors.empty?
206
+ config.with(
207
+ data_recency_checks: normalized_data_recency_checks,
208
+ clusters_to_consider: available_clusters_to_consider
209
+ )
210
+ end
211
+
212
+ def all_known_clusters
213
+ @all_known_clusters ||= @indexed_document_types_by_name.flat_map do |_, index_type|
214
+ index_type.search_index_definitions.flat_map do |it|
215
+ [it.cluster_to_query] + it.clusters_to_index_into
216
+ end
217
+ end + @datastore_clients_by_name.keys
218
+ end
219
+ end
220
+ end
221
+ end
@@ -0,0 +1,90 @@
1
+ # Copyright 2024 Block, Inc.
2
+ #
3
+ # Use of this source code is governed by an MIT-style
4
+ # license that can be found in the LICENSE file or at
5
+ # https://opensource.org/licenses/MIT.
6
+ #
7
+ # frozen_string_literal: true
8
+
9
+ require "elastic_graph/health_check/constants"
10
+
11
+ module ElasticGraph
12
+ module HealthCheck
13
+ # Encapsulates all of the status information for an ElasticGraph GraphQL endpoint.
14
+ # Computes a `category` for the status of the ElasticGraph endpoint.
15
+ #
16
+ # - unhealthy: the endpoint should not be used
17
+ # - degraded: the endpoint can be used, but prefer a healthy endpoint over it
18
+ # - healthy: the endpoint should be used
19
+ class HealthStatus < ::Data.define(:cluster_health_by_name, :latest_record_by_type, :category)
20
+ def initialize(cluster_health_by_name:, latest_record_by_type:)
21
+ super(
22
+ cluster_health_by_name: cluster_health_by_name,
23
+ latest_record_by_type: latest_record_by_type,
24
+ category: compute_category(cluster_health_by_name, latest_record_by_type)
25
+ )
26
+ end
27
+
28
+ def to_loggable_description
29
+ latest_record_descriptions = latest_record_by_type
30
+ .sort_by(&:first) # sort by type name
31
+ .map { |type, record| record&.to_loggable_description(type) || "Latest #{type} (missing)" }
32
+ .map { |description| "- #{description}" }
33
+
34
+ cluster_health_descriptions = cluster_health_by_name
35
+ .sort_by(&:first) # sort by cluster name
36
+ .map { |name, health| "\n- #{health.to_loggable_description(name)}" }
37
+
38
+ <<~EOS.strip.gsub("\n\n\n", "\n")
39
+ HealthStatus: #{category} (checked #{cluster_health_by_name.size} clusters, #{latest_record_by_type.size} latest records)
40
+ #{latest_record_descriptions.join("\n")}
41
+ #{cluster_health_descriptions.join("\n")}
42
+ EOS
43
+ end
44
+
45
+ private
46
+
47
+ def compute_category(cluster_health_by_name, latest_record_by_type)
48
+ cluster_statuses = cluster_health_by_name.values.map(&:status)
49
+ return :unhealthy if cluster_statuses.include?("red")
50
+
51
+ return :degraded if cluster_statuses.include?("yellow")
52
+ return :degraded if latest_record_by_type.values.any? { |v| v.nil? || v.too_old? }
53
+
54
+ :healthy
55
+ end
56
+
57
+ # Encapsulates the status information for a single datastore cluster.
58
+ ClusterHealth = ::Data.define(*DATASTORE_CLUSTER_HEALTH_FIELDS.to_a) do
59
+ # @implements ClusterHealth
60
+
61
+ def to_loggable_description(name)
62
+ field_values = to_h.map { |field, value| " #{field}: #{value.inspect}" }
63
+ "#{name} cluster health (#{status}):\n#{field_values.join("\n")}"
64
+ end
65
+ end
66
+
67
+ # Encapsulates information about the latest record of a type.
68
+ LatestRecord = ::Data.define(
69
+ :id, # the id of the record
70
+ :timestamp, # the record's timestamp
71
+ :seconds_newer_than_required # the recency of the record relative to expectation; positive == more recent
72
+ ) do
73
+ # @implements LatestRecord
74
+ def to_loggable_description(type)
75
+ rounded_age = seconds_newer_than_required.round(2).abs
76
+
77
+ if too_old?
78
+ "Latest #{type} (too old): #{id} / #{timestamp.iso8601} (#{rounded_age}s too old)"
79
+ else
80
+ "Latest #{type} (recent enough): #{id} / #{timestamp.iso8601} (#{rounded_age}s newer than required)"
81
+ end
82
+ end
83
+
84
+ def too_old?
85
+ seconds_newer_than_required < 0
86
+ end
87
+ end
88
+ end
89
+ end
90
+ end
metadata ADDED
@@ -0,0 +1,428 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: elasticgraph-health_check
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.18.0.0
5
+ platform: ruby
6
+ authors:
7
+ - Myron Marston
8
+ autorequire:
9
+ bindir: exe
10
+ cert_chain: []
11
+ date: 2024-08-27 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: rubocop-factory_bot
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - "~>"
18
+ - !ruby/object:Gem::Version
19
+ version: '2.26'
20
+ type: :development
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - "~>"
25
+ - !ruby/object:Gem::Version
26
+ version: '2.26'
27
+ - !ruby/object:Gem::Dependency
28
+ name: rubocop-rake
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - "~>"
32
+ - !ruby/object:Gem::Version
33
+ version: '0.6'
34
+ type: :development
35
+ prerelease: false
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - "~>"
39
+ - !ruby/object:Gem::Version
40
+ version: '0.6'
41
+ - !ruby/object:Gem::Dependency
42
+ name: rubocop-rspec
43
+ requirement: !ruby/object:Gem::Requirement
44
+ requirements:
45
+ - - "~>"
46
+ - !ruby/object:Gem::Version
47
+ version: '3.0'
48
+ type: :development
49
+ prerelease: false
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - "~>"
53
+ - !ruby/object:Gem::Version
54
+ version: '3.0'
55
+ - !ruby/object:Gem::Dependency
56
+ name: standard
57
+ requirement: !ruby/object:Gem::Requirement
58
+ requirements:
59
+ - - "~>"
60
+ - !ruby/object:Gem::Version
61
+ version: 1.39.0
62
+ type: :development
63
+ prerelease: false
64
+ version_requirements: !ruby/object:Gem::Requirement
65
+ requirements:
66
+ - - "~>"
67
+ - !ruby/object:Gem::Version
68
+ version: 1.39.0
69
+ - !ruby/object:Gem::Dependency
70
+ name: steep
71
+ requirement: !ruby/object:Gem::Requirement
72
+ requirements:
73
+ - - "~>"
74
+ - !ruby/object:Gem::Version
75
+ version: '1.7'
76
+ type: :development
77
+ prerelease: false
78
+ version_requirements: !ruby/object:Gem::Requirement
79
+ requirements:
80
+ - - "~>"
81
+ - !ruby/object:Gem::Version
82
+ version: '1.7'
83
+ - !ruby/object:Gem::Dependency
84
+ name: coderay
85
+ requirement: !ruby/object:Gem::Requirement
86
+ requirements:
87
+ - - "~>"
88
+ - !ruby/object:Gem::Version
89
+ version: '1.1'
90
+ type: :development
91
+ prerelease: false
92
+ version_requirements: !ruby/object:Gem::Requirement
93
+ requirements:
94
+ - - "~>"
95
+ - !ruby/object:Gem::Version
96
+ version: '1.1'
97
+ - !ruby/object:Gem::Dependency
98
+ name: flatware-rspec
99
+ requirement: !ruby/object:Gem::Requirement
100
+ requirements:
101
+ - - ">="
102
+ - !ruby/object:Gem::Version
103
+ version: 2.3.2
104
+ - - "<"
105
+ - !ruby/object:Gem::Version
106
+ version: '3.0'
107
+ type: :development
108
+ prerelease: false
109
+ version_requirements: !ruby/object:Gem::Requirement
110
+ requirements:
111
+ - - ">="
112
+ - !ruby/object:Gem::Version
113
+ version: 2.3.2
114
+ - - "<"
115
+ - !ruby/object:Gem::Version
116
+ version: '3.0'
117
+ - !ruby/object:Gem::Dependency
118
+ name: rspec
119
+ requirement: !ruby/object:Gem::Requirement
120
+ requirements:
121
+ - - "~>"
122
+ - !ruby/object:Gem::Version
123
+ version: '3.13'
124
+ type: :development
125
+ prerelease: false
126
+ version_requirements: !ruby/object:Gem::Requirement
127
+ requirements:
128
+ - - "~>"
129
+ - !ruby/object:Gem::Version
130
+ version: '3.13'
131
+ - !ruby/object:Gem::Dependency
132
+ name: super_diff
133
+ requirement: !ruby/object:Gem::Requirement
134
+ requirements:
135
+ - - ">="
136
+ - !ruby/object:Gem::Version
137
+ version: 0.12.1
138
+ type: :development
139
+ prerelease: false
140
+ version_requirements: !ruby/object:Gem::Requirement
141
+ requirements:
142
+ - - ">="
143
+ - !ruby/object:Gem::Version
144
+ version: 0.12.1
145
+ - !ruby/object:Gem::Dependency
146
+ name: simplecov
147
+ requirement: !ruby/object:Gem::Requirement
148
+ requirements:
149
+ - - "~>"
150
+ - !ruby/object:Gem::Version
151
+ version: '0.22'
152
+ type: :development
153
+ prerelease: false
154
+ version_requirements: !ruby/object:Gem::Requirement
155
+ requirements:
156
+ - - "~>"
157
+ - !ruby/object:Gem::Version
158
+ version: '0.22'
159
+ - !ruby/object:Gem::Dependency
160
+ name: simplecov-console
161
+ requirement: !ruby/object:Gem::Requirement
162
+ requirements:
163
+ - - ">="
164
+ - !ruby/object:Gem::Version
165
+ version: 0.9.1
166
+ - - "<"
167
+ - !ruby/object:Gem::Version
168
+ version: '1.0'
169
+ type: :development
170
+ prerelease: false
171
+ version_requirements: !ruby/object:Gem::Requirement
172
+ requirements:
173
+ - - ">="
174
+ - !ruby/object:Gem::Version
175
+ version: 0.9.1
176
+ - - "<"
177
+ - !ruby/object:Gem::Version
178
+ version: '1.0'
179
+ - !ruby/object:Gem::Dependency
180
+ name: httpx
181
+ requirement: !ruby/object:Gem::Requirement
182
+ requirements:
183
+ - - ">="
184
+ - !ruby/object:Gem::Version
185
+ version: 1.2.6
186
+ - - "<"
187
+ - !ruby/object:Gem::Version
188
+ version: '2.0'
189
+ type: :development
190
+ prerelease: false
191
+ version_requirements: !ruby/object:Gem::Requirement
192
+ requirements:
193
+ - - ">="
194
+ - !ruby/object:Gem::Version
195
+ version: 1.2.6
196
+ - - "<"
197
+ - !ruby/object:Gem::Version
198
+ version: '2.0'
199
+ - !ruby/object:Gem::Dependency
200
+ name: method_source
201
+ requirement: !ruby/object:Gem::Requirement
202
+ requirements:
203
+ - - "~>"
204
+ - !ruby/object:Gem::Version
205
+ version: '1.1'
206
+ type: :development
207
+ prerelease: false
208
+ version_requirements: !ruby/object:Gem::Requirement
209
+ requirements:
210
+ - - "~>"
211
+ - !ruby/object:Gem::Version
212
+ version: '1.1'
213
+ - !ruby/object:Gem::Dependency
214
+ name: rspec-retry
215
+ requirement: !ruby/object:Gem::Requirement
216
+ requirements:
217
+ - - "~>"
218
+ - !ruby/object:Gem::Version
219
+ version: '0.6'
220
+ type: :development
221
+ prerelease: false
222
+ version_requirements: !ruby/object:Gem::Requirement
223
+ requirements:
224
+ - - "~>"
225
+ - !ruby/object:Gem::Version
226
+ version: '0.6'
227
+ - !ruby/object:Gem::Dependency
228
+ name: vcr
229
+ requirement: !ruby/object:Gem::Requirement
230
+ requirements:
231
+ - - ">="
232
+ - !ruby/object:Gem::Version
233
+ version: 6.3.1
234
+ - - "<"
235
+ - !ruby/object:Gem::Version
236
+ version: 7.0.0
237
+ type: :development
238
+ prerelease: false
239
+ version_requirements: !ruby/object:Gem::Requirement
240
+ requirements:
241
+ - - ">="
242
+ - !ruby/object:Gem::Version
243
+ version: 6.3.1
244
+ - - "<"
245
+ - !ruby/object:Gem::Version
246
+ version: 7.0.0
247
+ - !ruby/object:Gem::Dependency
248
+ name: factory_bot
249
+ requirement: !ruby/object:Gem::Requirement
250
+ requirements:
251
+ - - "~>"
252
+ - !ruby/object:Gem::Version
253
+ version: '6.4'
254
+ type: :development
255
+ prerelease: false
256
+ version_requirements: !ruby/object:Gem::Requirement
257
+ requirements:
258
+ - - "~>"
259
+ - !ruby/object:Gem::Version
260
+ version: '6.4'
261
+ - !ruby/object:Gem::Dependency
262
+ name: faker
263
+ requirement: !ruby/object:Gem::Requirement
264
+ requirements:
265
+ - - "~>"
266
+ - !ruby/object:Gem::Version
267
+ version: '3.4'
268
+ type: :development
269
+ prerelease: false
270
+ version_requirements: !ruby/object:Gem::Requirement
271
+ requirements:
272
+ - - "~>"
273
+ - !ruby/object:Gem::Version
274
+ version: '3.4'
275
+ - !ruby/object:Gem::Dependency
276
+ name: elasticgraph-datastore_core
277
+ requirement: !ruby/object:Gem::Requirement
278
+ requirements:
279
+ - - '='
280
+ - !ruby/object:Gem::Version
281
+ version: 0.18.0.0
282
+ type: :runtime
283
+ prerelease: false
284
+ version_requirements: !ruby/object:Gem::Requirement
285
+ requirements:
286
+ - - '='
287
+ - !ruby/object:Gem::Version
288
+ version: 0.18.0.0
289
+ - !ruby/object:Gem::Dependency
290
+ name: elasticgraph-graphql
291
+ requirement: !ruby/object:Gem::Requirement
292
+ requirements:
293
+ - - '='
294
+ - !ruby/object:Gem::Version
295
+ version: 0.18.0.0
296
+ type: :runtime
297
+ prerelease: false
298
+ version_requirements: !ruby/object:Gem::Requirement
299
+ requirements:
300
+ - - '='
301
+ - !ruby/object:Gem::Version
302
+ version: 0.18.0.0
303
+ - !ruby/object:Gem::Dependency
304
+ name: elasticgraph-support
305
+ requirement: !ruby/object:Gem::Requirement
306
+ requirements:
307
+ - - '='
308
+ - !ruby/object:Gem::Version
309
+ version: 0.18.0.0
310
+ type: :runtime
311
+ prerelease: false
312
+ version_requirements: !ruby/object:Gem::Requirement
313
+ requirements:
314
+ - - '='
315
+ - !ruby/object:Gem::Version
316
+ version: 0.18.0.0
317
+ - !ruby/object:Gem::Dependency
318
+ name: elasticgraph-admin
319
+ requirement: !ruby/object:Gem::Requirement
320
+ requirements:
321
+ - - '='
322
+ - !ruby/object:Gem::Version
323
+ version: 0.18.0.0
324
+ type: :development
325
+ prerelease: false
326
+ version_requirements: !ruby/object:Gem::Requirement
327
+ requirements:
328
+ - - '='
329
+ - !ruby/object:Gem::Version
330
+ version: 0.18.0.0
331
+ - !ruby/object:Gem::Dependency
332
+ name: elasticgraph-elasticsearch
333
+ requirement: !ruby/object:Gem::Requirement
334
+ requirements:
335
+ - - '='
336
+ - !ruby/object:Gem::Version
337
+ version: 0.18.0.0
338
+ type: :development
339
+ prerelease: false
340
+ version_requirements: !ruby/object:Gem::Requirement
341
+ requirements:
342
+ - - '='
343
+ - !ruby/object:Gem::Version
344
+ version: 0.18.0.0
345
+ - !ruby/object:Gem::Dependency
346
+ name: elasticgraph-indexer
347
+ requirement: !ruby/object:Gem::Requirement
348
+ requirements:
349
+ - - '='
350
+ - !ruby/object:Gem::Version
351
+ version: 0.18.0.0
352
+ type: :development
353
+ prerelease: false
354
+ version_requirements: !ruby/object:Gem::Requirement
355
+ requirements:
356
+ - - '='
357
+ - !ruby/object:Gem::Version
358
+ version: 0.18.0.0
359
+ - !ruby/object:Gem::Dependency
360
+ name: elasticgraph-opensearch
361
+ requirement: !ruby/object:Gem::Requirement
362
+ requirements:
363
+ - - '='
364
+ - !ruby/object:Gem::Version
365
+ version: 0.18.0.0
366
+ type: :development
367
+ prerelease: false
368
+ version_requirements: !ruby/object:Gem::Requirement
369
+ requirements:
370
+ - - '='
371
+ - !ruby/object:Gem::Version
372
+ version: 0.18.0.0
373
+ - !ruby/object:Gem::Dependency
374
+ name: elasticgraph-schema_definition
375
+ requirement: !ruby/object:Gem::Requirement
376
+ requirements:
377
+ - - '='
378
+ - !ruby/object:Gem::Version
379
+ version: 0.18.0.0
380
+ type: :development
381
+ prerelease: false
382
+ version_requirements: !ruby/object:Gem::Requirement
383
+ requirements:
384
+ - - '='
385
+ - !ruby/object:Gem::Version
386
+ version: 0.18.0.0
387
+ description:
388
+ email:
389
+ - myron@squareup.com
390
+ executables: []
391
+ extensions: []
392
+ extra_rdoc_files: []
393
+ files:
394
+ - LICENSE.txt
395
+ - README.md
396
+ - elasticgraph-health_check.gemspec
397
+ - lib/elastic_graph/health_check/config.rb
398
+ - lib/elastic_graph/health_check/constants.rb
399
+ - lib/elastic_graph/health_check/envoy_extension.rb
400
+ - lib/elastic_graph/health_check/envoy_extension/graphql_http_endpoint_decorator.rb
401
+ - lib/elastic_graph/health_check/health_checker.rb
402
+ - lib/elastic_graph/health_check/health_status.rb
403
+ homepage:
404
+ licenses:
405
+ - MIT
406
+ metadata:
407
+ gem_category: extension
408
+ post_install_message:
409
+ rdoc_options: []
410
+ require_paths:
411
+ - lib
412
+ required_ruby_version: !ruby/object:Gem::Requirement
413
+ requirements:
414
+ - - "~>"
415
+ - !ruby/object:Gem::Version
416
+ version: '3.2'
417
+ required_rubygems_version: !ruby/object:Gem::Requirement
418
+ requirements:
419
+ - - ">="
420
+ - !ruby/object:Gem::Version
421
+ version: '0'
422
+ requirements: []
423
+ rubygems_version: 3.5.9
424
+ signing_key:
425
+ specification_version: 4
426
+ summary: An ElasticGraph extension that provides a health check for high availability
427
+ deployments.
428
+ test_files: []