elasticgraph-health_check 0.18.0.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: 3810ecb7062490ded525ea2bfee19dac4a408227baebae6b3ca0e6cf43fc6b92
4
+ data.tar.gz: 3d793cfdca8a02dcbb328d4e0a6f8f5608c8e3ad45ffe86e9269f8955a9b205c
5
+ SHA512:
6
+ metadata.gz: 60ff8f6a90f1d260b8fac16a8677ea7c9db922cdee2f3c2cc614dd5be3ea7da1ca914f77aadc825a9b51c42fba69b7bd3d602eed027e5560288f0c68a3479d2f
7
+ data.tar.gz: b30042926be384bddee2172bfc5d7f6a4fa696878b2f58b82328a5988ea784b6a908ea70671f55eb5e47daadb24e86b57f254cee3bb92227e5829d5245951d01
data/LICENSE.txt ADDED
@@ -0,0 +1,21 @@
1
+ The MIT License (MIT)
2
+
3
+ Copyright (c) 2024 Block, Inc.
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in
13
+ all copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21
+ THE SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,82 @@
1
+ # ElasticGraph::HealthCheck
2
+
3
+ Provides a component that can act as a health check for high availability deployments. The HealthCheck component
4
+ returns a summary status of either `healthy`, `degraded`, or `unhealthy` for the endpoint.
5
+
6
+ The intended semantics of these statuses
7
+ map to the corresponding Envoy statuses, see
8
+ [the Envoy documentation for more details](https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/upstream/health_checking),
9
+ but in short `degraded` maps to "endpoint is impaired, do not use unless you have no other choice" and `unhealthy` maps to "endpoint is hard
10
+ down/should not be used under any circumstances".
11
+
12
+ The returned status is the worst of the status values from the individual sub-checks:
13
+ 1. The datastore clusters' own health statuses. The datastore clusters reflect their status as green/yellow/red. See
14
+ [the Elasticsearch documentation](https://www.elastic.co/guide/en/elasticsearch/reference/current/cluster-health.html#cluster-health-api-response-body)
15
+ for details on the meaning of these statuses.
16
+ - `green` maps to `healthy`, `yellow` to `degraded`, and `red` to `unhealthy`.
17
+
18
+ 2. The recency of data present in ElasticGraph indices. The HealthCheck configuration specifies the expected "max recency" for items within an
19
+ index.
20
+ - If no records have been indexed within the specified period, the HealthCheck component will consider the index to be in a `degraded` status.
21
+
22
+ As mentioned above, the returned status is the worst status of these two checks. E.g. if the datastore cluster(s) are all `green`, but a recency check fails, the
23
+ overall status will be `degraded`. If the recency checks pass, but at least one datastore cluster is `red`, an `unhealthy` status will be returned.
24
+
25
+ ## Integration
26
+
27
+ To use, simply register the `EnvoyExtension` when defining your schema:
28
+
29
+ ```ruby
30
+ require(envoy_extension_path = "elastic_graph/health_check/envoy_extension")
31
+ schema.register_graphql_extension ElasticGraph::HealthCheck::EnvoyExtension,
32
+ defined_at: envoy_extension_path,
33
+ http_path_segment: "/_status"
34
+ ```
35
+
36
+ ## Configuration
37
+
38
+ These checks are configurable. The following configuration will be used as an example:
39
+
40
+ ```
41
+ health_check:
42
+ clusters_to_consider: ["widgets-cluster"]
43
+ data_recency_checks:
44
+ Widget:
45
+ timestamp_field: createdAt
46
+ expected_max_recency_seconds: 30
47
+ ```
48
+
49
+ - `clusters_to_consider` configures the first check (datastore cluster health), and specifies which clusters' health status is monitored.
50
+ - `data_recency_checks` configures the second check (data recency), and configures the recency check described above. In this example, if no new "Widgets"
51
+ are indexed for thirty seconds (perhaps because of an infrastructure issue), a `degraded` status will be returned.
52
+ - Note that this setting is most appropriate for types where you expect a steady stream of indexing (and where the absence of new records is indicative
53
+ of some kind of failure).
54
+
55
+ ## Behavior when datastore clusters are inaccessible
56
+
57
+ A given ElasticGraph GraphQL endpoint does not necessarily have access to all datastore clusters - more specifically, the endpoint will only have access
58
+ to clusters present in the `datastore.clusters` configuration map.
59
+
60
+ If a health check is configured for either a cluster or type that the GraphQL endpoint does not have access to, the respective check will be skipped. This is appropriate,
61
+ as since the GraphQL endpoint does not have access to the cluster/type, the cluster's/type's health is immaterial.
62
+
63
+ For example, with the following configuration:
64
+
65
+ ```
66
+ datastore:
67
+ clusters:
68
+ widgets-cluster: { ... }
69
+ # components-cluster: { ... } ### Not available, commented out.
70
+ health_check:
71
+ clusters_to_consider: ["widgets-cluster", "components-cluster"]
72
+ data_recency_checks:
73
+ Component:
74
+ timestamp_field: createdAt
75
+ expected_max_recency_seconds: 10
76
+ Widget:
77
+ timestamp_field: createdAt
78
+ expected_max_recency_seconds: 30
79
+ ```
80
+
81
+ ... the `components-cluster` datastore status health check will be skipped, as will the Component recency check. However the `widgets-cluster`/`Widget` health
82
+ checks will proceed as normal.
@@ -0,0 +1,23 @@
1
+ # Copyright 2024 Block, Inc.
2
+ #
3
+ # Use of this source code is governed by an MIT-style
4
+ # license that can be found in the LICENSE file or at
5
+ # https://opensource.org/licenses/MIT.
6
+ #
7
+ # frozen_string_literal: true
8
+
9
+ require_relative "../gemspec_helper"
10
+
11
+ ElasticGraphGemspecHelper.define_elasticgraph_gem(gemspec_file: __FILE__, category: :extension) do |spec, eg_version|
12
+ spec.summary = "An ElasticGraph extension that provides a health check for high availability deployments."
13
+
14
+ spec.add_dependency "elasticgraph-datastore_core", eg_version
15
+ spec.add_dependency "elasticgraph-graphql", eg_version
16
+ spec.add_dependency "elasticgraph-support", eg_version
17
+
18
+ spec.add_development_dependency "elasticgraph-admin", eg_version
19
+ spec.add_development_dependency "elasticgraph-elasticsearch", eg_version
20
+ spec.add_development_dependency "elasticgraph-indexer", eg_version
21
+ spec.add_development_dependency "elasticgraph-opensearch", eg_version
22
+ spec.add_development_dependency "elasticgraph-schema_definition", eg_version
23
+ end
@@ -0,0 +1,48 @@
1
+ # Copyright 2024 Block, Inc.
2
+ #
3
+ # Use of this source code is governed by an MIT-style
4
+ # license that can be found in the LICENSE file or at
5
+ # https://opensource.org/licenses/MIT.
6
+ #
7
+ # frozen_string_literal: true
8
+
9
+ module ElasticGraph
10
+ module HealthCheck
11
+ class Config < ::Data.define(
12
+ # The list of clusters to perform datastore status health checks on. A `green` status maps to `healthy`, a
13
+ # `yellow` status maps to `degraded`, and a `red` status maps to `unhealthy`. The returned status is the minimum
14
+ # status from all clusters in the list (a `yellow` cluster and a `green` cluster will result in a `degraded` status).
15
+ #
16
+ # Example: ["cluster-one", "cluster-two"]
17
+ :clusters_to_consider,
18
+ # A map of types to perform recency checks on. If no new records for that type have been indexed within the specified
19
+ # period, a `degraded` status will be returned.
20
+ #
21
+ # Example: { Widget: { timestamp_field: createdAt, expected_max_recency_seconds: 30 }}
22
+ :data_recency_checks
23
+ )
24
+ EMPTY = new([], {})
25
+
26
+ def self.from_parsed_yaml(config_hash)
27
+ config_hash = config_hash.fetch("health_check") { return EMPTY }
28
+
29
+ new(
30
+ clusters_to_consider: config_hash.fetch("clusters_to_consider"),
31
+ data_recency_checks: config_hash.fetch("data_recency_checks").transform_values do |value_hash|
32
+ DataRecencyCheck.from(value_hash)
33
+ end
34
+ )
35
+ end
36
+
37
+ DataRecencyCheck = ::Data.define(:expected_max_recency_seconds, :timestamp_field) do
38
+ # @implements DataRecencyCheck
39
+ def self.from(config_hash)
40
+ new(
41
+ expected_max_recency_seconds: config_hash.fetch("expected_max_recency_seconds"),
42
+ timestamp_field: config_hash.fetch("timestamp_field")
43
+ )
44
+ end
45
+ end
46
+ end
47
+ end
48
+ end
@@ -0,0 +1,46 @@
1
+ # Copyright 2024 Block, Inc.
2
+ #
3
+ # Use of this source code is governed by an MIT-style
4
+ # license that can be found in the LICENSE file or at
5
+ # https://opensource.org/licenses/MIT.
6
+ #
7
+ # frozen_string_literal: true
8
+
9
+ # Enumerates constants that are used from multiple places in ElasticGraph::HealthCheck.
10
+ module ElasticGraph
11
+ module HealthCheck
12
+ # List of datastore cluster health fields from:
13
+ # https://www.elastic.co/guide/en/elasticsearch/reference/7.10/cluster-health.html#cluster-health-api-response-body
14
+ #
15
+ # This is expressed as a constant so that we can use it in dynamic ways in a few places
16
+ # (such as in a test; we want an acceptance test to fetch all these fields to make
17
+ # sure they work, and having them defined this way makes that easier).
18
+ #
19
+ # To get this list, this javascript was used in the chrome console:
20
+ #
21
+ # Array.from(document.querySelectorAll('div.variablelist')[2].querySelectorAll(':scope > dl.variablelist > dt')).map(x => x.innerText)
22
+ #
23
+ # (Feel free to use/change that as needed if/when you update this list in the future based on a newer datastore version.)
24
+ #
25
+ # Note: `discovered_master` is a new boolean field that AWS OpenSearch seems to add to the cluster health response.
26
+ # It was observed on the response on 2022-04-18.
27
+ DATASTORE_CLUSTER_HEALTH_FIELDS = %i[
28
+ cluster_name
29
+ status
30
+ timed_out
31
+ number_of_nodes
32
+ number_of_data_nodes
33
+ active_primary_shards
34
+ active_shards
35
+ relocating_shards
36
+ initializing_shards
37
+ unassigned_shards
38
+ delayed_unassigned_shards
39
+ number_of_pending_tasks
40
+ number_of_in_flight_fetch
41
+ task_max_waiting_in_queue_millis
42
+ active_shards_percent_as_number
43
+ discovered_master
44
+ ].to_set
45
+ end
46
+ end
@@ -0,0 +1,61 @@
1
+ # Copyright 2024 Block, Inc.
2
+ #
3
+ # Use of this source code is governed by an MIT-style
4
+ # license that can be found in the LICENSE file or at
5
+ # https://opensource.org/licenses/MIT.
6
+ #
7
+ # frozen_string_literal: true
8
+
9
+ require "delegate"
10
+ require "elastic_graph/graphql/http_endpoint"
11
+ require "uri"
12
+
13
+ module ElasticGraph
14
+ module HealthCheck
15
+ module EnvoyExtension
16
+ # Intercepts HTTP requests so that a health check can be performed if it's a GET request to the configured health check path.
17
+ # The HTTP response follows Envoy HTTP health check guidelines:
18
+ #
19
+ # https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/upstream/health_checking
20
+ class GraphQLHTTPEndpointDecorator < DelegateClass(GraphQL::HTTPEndpoint)
21
+ def initialize(http_endpoint, health_check_http_path_segment:, health_checker:, logger:)
22
+ super(http_endpoint)
23
+ @health_check_http_path_segment = health_check_http_path_segment.delete_prefix("/").delete_suffix("/")
24
+ @health_checker = health_checker
25
+ @logger = logger
26
+ end
27
+
28
+ __skip__ =
29
+ def process(request, **)
30
+ if request.http_method == :get && URI(request.url).path.split("/").include?(@health_check_http_path_segment)
31
+ perform_health_check
32
+ else
33
+ super
34
+ end
35
+ end
36
+
37
+ private
38
+
39
+ RESPONSES_BY_HEALTH_STATUS_CATEGORY = {
40
+ healthy: [200, "Healthy!", {}],
41
+ unhealthy: [500, "Unhealthy!", {}],
42
+ # https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/upstream/health_checking#degraded-health
43
+ degraded: [200, "Degraded.", {"x-envoy-degraded" => "true"}]
44
+ }
45
+
46
+ def perform_health_check
47
+ status = @health_checker.check_health
48
+ @logger.info status.to_loggable_description
49
+
50
+ status, message, headers = RESPONSES_BY_HEALTH_STATUS_CATEGORY.fetch(status.category)
51
+
52
+ GraphQL::HTTPResponse.new(
53
+ status_code: status,
54
+ headers: headers.merge("Content-Type" => "text/plain"),
55
+ body: message
56
+ )
57
+ end
58
+ end
59
+ end
60
+ end
61
+ end
@@ -0,0 +1,43 @@
1
+ # Copyright 2024 Block, Inc.
2
+ #
3
+ # Use of this source code is governed by an MIT-style
4
+ # license that can be found in the LICENSE file or at
5
+ # https://opensource.org/licenses/MIT.
6
+ #
7
+ # frozen_string_literal: true
8
+
9
+ require "elastic_graph/error"
10
+ require "elastic_graph/health_check/envoy_extension/graphql_http_endpoint_decorator"
11
+ require "elastic_graph/health_check/health_checker"
12
+
13
+ module ElasticGraph
14
+ module HealthCheck
15
+ # An extension module that hooks into the HTTP endpoint to provide Envoy health checks.
16
+ module EnvoyExtension
17
+ def graphql_http_endpoint
18
+ @graphql_http_endpoint ||=
19
+ begin
20
+ http_path_segment = config.extension_settings.dig("health_check", "http_path_segment")
21
+ http_path_segment ||= runtime_metadata
22
+ .graphql_extension_modules
23
+ .find { |ext_mod| ext_mod.extension_class == EnvoyExtension }
24
+ &.extension_config
25
+ &.dig(:http_path_segment)
26
+
27
+ if http_path_segment.nil?
28
+ raise ElasticGraph::ConfigSettingNotSetError, "Health check `http_path_segment` is not configured. " \
29
+ "Either set under `health_check` in YAML config or pass it along if you register the `EnvoyExtension` " \
30
+ "via `register_graphql_extension`."
31
+ end
32
+
33
+ GraphQLHTTPEndpointDecorator.new(
34
+ super,
35
+ health_check_http_path_segment: http_path_segment,
36
+ health_checker: HealthChecker.build_from(self),
37
+ logger: logger
38
+ )
39
+ end
40
+ end
41
+ end
42
+ end
43
+ end
@@ -0,0 +1,221 @@
1
+ # Copyright 2024 Block, Inc.
2
+ #
3
+ # Use of this source code is governed by an MIT-style
4
+ # license that can be found in the LICENSE file or at
5
+ # https://opensource.org/licenses/MIT.
6
+ #
7
+ # frozen_string_literal: true
8
+
9
+ require "elastic_graph/error"
10
+ require "elastic_graph/health_check/config"
11
+ require "elastic_graph/health_check/health_status"
12
+ require "elastic_graph/support/threading"
13
+ require "time"
14
+
15
+ module ElasticGraph
16
+ module HealthCheck
17
+ class HealthChecker
18
+ # Static factory method that builds a HealthChecker from an ElasticGraph::GraphQL instance.
19
+ def self.build_from(graphql)
20
+ new(
21
+ schema: graphql.schema,
22
+ config: HealthCheck::Config.from_parsed_yaml(graphql.config.extension_settings),
23
+ datastore_search_router: graphql.datastore_search_router,
24
+ datastore_query_builder: graphql.datastore_query_builder,
25
+ datastore_clients_by_name: graphql.datastore_core.clients_by_name,
26
+ clock: graphql.clock,
27
+ logger: graphql.logger
28
+ )
29
+ end
30
+
31
+ def initialize(
32
+ schema:,
33
+ config:,
34
+ datastore_search_router:,
35
+ datastore_query_builder:,
36
+ datastore_clients_by_name:,
37
+ clock:,
38
+ logger:
39
+ )
40
+ @schema = schema
41
+ @datastore_search_router = datastore_search_router
42
+ @datastore_query_builder = datastore_query_builder
43
+ @datastore_clients_by_name = datastore_clients_by_name
44
+ @clock = clock
45
+ @logger = logger
46
+ @indexed_document_types_by_name = @schema.indexed_document_types.to_h { |t| [t.name.to_s, t] }
47
+
48
+ @config = validate_and_normalize_config(config)
49
+ end
50
+
51
+ def check_health
52
+ recency_queries_by_type_name = @config.data_recency_checks.to_h do |type_name, recency_config|
53
+ [type_name, build_recency_query_for(type_name, recency_config)]
54
+ end
55
+
56
+ recency_results_by_query, *cluster_healths = execute_in_parallel(
57
+ lambda { @datastore_search_router.msearch(recency_queries_by_type_name.values) },
58
+ *@config.clusters_to_consider.map do |cluster|
59
+ lambda { [cluster, @datastore_clients_by_name.fetch(cluster).get_cluster_health] }
60
+ end
61
+ )
62
+
63
+ HealthStatus.new(
64
+ cluster_health_by_name: build_cluster_health_by_name(cluster_healths.to_h),
65
+ latest_record_by_type: build_latest_record_by_type(recency_results_by_query, recency_queries_by_type_name)
66
+ )
67
+ end
68
+
69
+ private
70
+
71
+ def build_recency_query_for(type_name, recency_config)
72
+ type = @indexed_document_types_by_name.fetch(type_name)
73
+
74
+ @datastore_query_builder.new_query(
75
+ search_index_definitions: type.search_index_definitions,
76
+ filter: build_index_optimization_filter_for(recency_config),
77
+ requested_fields: ["id", recency_config.timestamp_field],
78
+ document_pagination: {first: 1},
79
+ sort: [{recency_config.timestamp_field => {"order" => "desc"}}]
80
+ )
81
+ end
82
+
83
+ # To make the recency query more optimal, we filter on the timestamp field. This can provide
84
+ # a couple optimizations:
85
+ #
86
+ # - If its a rollover index and the timestamp is the field we use for rollover, this allows
87
+ # the ElasticGraph query engine to hit only a subset of indices for better perf.
88
+ # - We've been told (by AWS support) that sorting a larger result set if more expensive than
89
+ # a small result set (presumably larger than filtering cost) so even if we can't limit what
90
+ # indices we hit with this, it should still be helpful.
91
+ #
92
+ # However, there's a bit of a risk of not actually finding the latest record if we include
93
+ # this filter. What we have here is a compromise: we "lookback" up to 100 times the
94
+ # `expected_max_recency_seconds`. For example, if that's set at 30, we'd search the last 3000
95
+ # seconds of data, which should be plenty of lookback for most cases, while still allowing
96
+ # a filter optimization. Once the latest record is more than 100 times older than our threshold
97
+ # the exact age of it is less interesting, anyway.
98
+ def build_index_optimization_filter_for(recency_config)
99
+ lookback_timestamp = @clock.now - (recency_config.expected_max_recency_seconds * 100)
100
+ {recency_config.timestamp_field => {"gte" => lookback_timestamp.iso8601}}
101
+ end
102
+
103
+ def execute_in_parallel(*lambdas)
104
+ Support::Threading.parallel_map(lambdas) { |l| l.call }
105
+ end
106
+
107
+ def build_cluster_health_by_name(cluster_healths)
108
+ cluster_healths.transform_values do |health|
109
+ health_status_fields = DATASTORE_CLUSTER_HEALTH_FIELDS.to_h do |field_name|
110
+ [field_name, health[field_name.to_s]]
111
+ end
112
+
113
+ HealthStatus::ClusterHealth.new(**health_status_fields)
114
+ end
115
+ end
116
+
117
+ def build_latest_record_by_type(recency_results_by_query, recency_queries_by_type_name)
118
+ recency_queries_by_type_name.to_h do |type_name, query|
119
+ config = @config.data_recency_checks.fetch(type_name)
120
+
121
+ latest_record = if (latest_doc = recency_results_by_query.fetch(query).first)
122
+ timestamp = ::Time.iso8601(latest_doc.fetch(config.timestamp_field))
123
+
124
+ HealthStatus::LatestRecord.new(
125
+ id: latest_doc.id,
126
+ timestamp: timestamp,
127
+ seconds_newer_than_required: timestamp - (@clock.now - config.expected_max_recency_seconds)
128
+ )
129
+ end
130
+
131
+ [type_name, latest_record]
132
+ end
133
+ end
134
+
135
+ def validate_and_normalize_config(config)
136
+ unrecognized_cluster_names = config.clusters_to_consider - all_known_clusters
137
+
138
+ # @type var errors: ::Array[::String]
139
+ errors = []
140
+
141
+ if unrecognized_cluster_names.any?
142
+ errors << "`health_check.clusters_to_consider` contains " \
143
+ "unrecognized cluster names: #{unrecognized_cluster_names.join(", ")}"
144
+ end
145
+
146
+ # Here, we determine which of the specified `clusters_to_consider` are actually available for datastore health checks (green/yellow/red).
147
+ # Before partitioning, we remove `unrecognized_cluster_names` as those will be reported through a separate error mechanism (above).
148
+ #
149
+ # Below, `available_clusters_to_consider` will replace `clusters_to_consider` in the returned `Config` instance.
150
+ available_clusters_to_consider, unavailable_clusters_to_consider =
151
+ (config.clusters_to_consider - unrecognized_cluster_names).partition { |it| @datastore_clients_by_name.key?(it) }
152
+
153
+ if unavailable_clusters_to_consider.any?
154
+ @logger.warn("#{unavailable_clusters_to_consider.length} cluster(s) were unavailable for health-checking: #{unavailable_clusters_to_consider.join(", ")}")
155
+ end
156
+
157
+ valid_type_names, invalid_type_names = config
158
+ .data_recency_checks.keys
159
+ .partition { |type| @indexed_document_types_by_name.key?(type) }
160
+
161
+ if invalid_type_names.any?
162
+ errors << "Some `health_check.data_recency_checks` types are not recognized indexed types: " \
163
+ "#{invalid_type_names.join(", ")}"
164
+ end
165
+
166
+ # It is possible to configure a GraphQL endpoint that has a healthcheck set up on type A, but doesn't actually
167
+ # have access to the datastore cluster that backs type A. In that case, we want to skip the health check - if the endpoint
168
+ # can't access type A, its health (or unhealth) is immaterial.
169
+ #
170
+ # So below, filter to types that have all of their datastore clusters available for querying.
171
+ available_type_names, unavailable_type_names = valid_type_names.partition do |type_name|
172
+ @indexed_document_types_by_name[type_name].search_index_definitions.all? do |search_index_definition|
173
+ @datastore_clients_by_name.key?(search_index_definition.cluster_to_query.to_s)
174
+ end
175
+ end
176
+
177
+ if unavailable_type_names.any?
178
+ @logger.warn("#{unavailable_type_names.length} type(s) were unavailable for health-checking: #{unavailable_type_names.join(", ")}")
179
+ end
180
+
181
+ # @type var invalid_timestamp_fields_by_type: ::Hash[::String, ::String]
182
+ invalid_timestamp_fields_by_type = {}
183
+ # @type var normalized_data_recency_checks: ::Hash[::String, Config::DataRecencyCheck]
184
+ normalized_data_recency_checks = {}
185
+
186
+ available_type_names.each do |type|
187
+ check = config.data_recency_checks.fetch(type)
188
+ field = @indexed_document_types_by_name
189
+ .fetch(type)
190
+ .fields_by_name[check.timestamp_field]
191
+
192
+ if field&.type&.unwrap_fully&.name.to_s == "DateTime"
193
+ # Convert the config so that we have a reference to the index field name.
194
+ normalized_data_recency_checks[type] = check.with(timestamp_field: field.name_in_index.to_s)
195
+ else
196
+ invalid_timestamp_fields_by_type[type] = check.timestamp_field
197
+ end
198
+ end
199
+
200
+ if invalid_timestamp_fields_by_type.any?
201
+ errors << "Some `health_check.data_recency_checks` entries have invalid timestamp fields: " \
202
+ "#{invalid_timestamp_fields_by_type.map { |k, v| "#{k} (#{v})" }.join(", ")}"
203
+ end
204
+
205
+ raise ConfigError, errors.join("\n\n") unless errors.empty?
206
+ config.with(
207
+ data_recency_checks: normalized_data_recency_checks,
208
+ clusters_to_consider: available_clusters_to_consider
209
+ )
210
+ end
211
+
212
+ def all_known_clusters
213
+ @all_known_clusters ||= @indexed_document_types_by_name.flat_map do |_, index_type|
214
+ index_type.search_index_definitions.flat_map do |it|
215
+ [it.cluster_to_query] + it.clusters_to_index_into
216
+ end
217
+ end + @datastore_clients_by_name.keys
218
+ end
219
+ end
220
+ end
221
+ end
@@ -0,0 +1,90 @@
1
+ # Copyright 2024 Block, Inc.
2
+ #
3
+ # Use of this source code is governed by an MIT-style
4
+ # license that can be found in the LICENSE file or at
5
+ # https://opensource.org/licenses/MIT.
6
+ #
7
+ # frozen_string_literal: true
8
+
9
+ require "elastic_graph/health_check/constants"
10
+
11
+ module ElasticGraph
12
+ module HealthCheck
13
+ # Encapsulates all of the status information for an ElasticGraph GraphQL endpoint.
14
+ # Computes a `category` for the status of the ElasticGraph endpoint.
15
+ #
16
+ # - unhealthy: the endpoint should not be used
17
+ # - degraded: the endpoint can be used, but prefer a healthy endpoint over it
18
+ # - healthy: the endpoint should be used
19
+ class HealthStatus < ::Data.define(:cluster_health_by_name, :latest_record_by_type, :category)
20
+ def initialize(cluster_health_by_name:, latest_record_by_type:)
21
+ super(
22
+ cluster_health_by_name: cluster_health_by_name,
23
+ latest_record_by_type: latest_record_by_type,
24
+ category: compute_category(cluster_health_by_name, latest_record_by_type)
25
+ )
26
+ end
27
+
28
+ def to_loggable_description
29
+ latest_record_descriptions = latest_record_by_type
30
+ .sort_by(&:first) # sort by type name
31
+ .map { |type, record| record&.to_loggable_description(type) || "Latest #{type} (missing)" }
32
+ .map { |description| "- #{description}" }
33
+
34
+ cluster_health_descriptions = cluster_health_by_name
35
+ .sort_by(&:first) # sort by cluster name
36
+ .map { |name, health| "\n- #{health.to_loggable_description(name)}" }
37
+
38
+ <<~EOS.strip.gsub("\n\n\n", "\n")
39
+ HealthStatus: #{category} (checked #{cluster_health_by_name.size} clusters, #{latest_record_by_type.size} latest records)
40
+ #{latest_record_descriptions.join("\n")}
41
+ #{cluster_health_descriptions.join("\n")}
42
+ EOS
43
+ end
44
+
45
+ private
46
+
47
+ def compute_category(cluster_health_by_name, latest_record_by_type)
48
+ cluster_statuses = cluster_health_by_name.values.map(&:status)
49
+ return :unhealthy if cluster_statuses.include?("red")
50
+
51
+ return :degraded if cluster_statuses.include?("yellow")
52
+ return :degraded if latest_record_by_type.values.any? { |v| v.nil? || v.too_old? }
53
+
54
+ :healthy
55
+ end
56
+
57
+ # Encapsulates the status information for a single datastore cluster.
58
+ ClusterHealth = ::Data.define(*DATASTORE_CLUSTER_HEALTH_FIELDS.to_a) do
59
+ # @implements ClusterHealth
60
+
61
+ def to_loggable_description(name)
62
+ field_values = to_h.map { |field, value| " #{field}: #{value.inspect}" }
63
+ "#{name} cluster health (#{status}):\n#{field_values.join("\n")}"
64
+ end
65
+ end
66
+
67
+ # Encapsulates information about the latest record of a type.
68
+ LatestRecord = ::Data.define(
69
+ :id, # the id of the record
70
+ :timestamp, # the record's timestamp
71
+ :seconds_newer_than_required # the recency of the record relative to expectation; positive == more recent
72
+ ) do
73
+ # @implements LatestRecord
74
+ def to_loggable_description(type)
75
+ rounded_age = seconds_newer_than_required.round(2).abs
76
+
77
+ if too_old?
78
+ "Latest #{type} (too old): #{id} / #{timestamp.iso8601} (#{rounded_age}s too old)"
79
+ else
80
+ "Latest #{type} (recent enough): #{id} / #{timestamp.iso8601} (#{rounded_age}s newer than required)"
81
+ end
82
+ end
83
+
84
+ def too_old?
85
+ seconds_newer_than_required < 0
86
+ end
87
+ end
88
+ end
89
+ end
90
+ end
metadata ADDED
@@ -0,0 +1,428 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: elasticgraph-health_check
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.18.0.0
5
+ platform: ruby
6
+ authors:
7
+ - Myron Marston
8
+ autorequire:
9
+ bindir: exe
10
+ cert_chain: []
11
+ date: 2024-08-27 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: rubocop-factory_bot
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - "~>"
18
+ - !ruby/object:Gem::Version
19
+ version: '2.26'
20
+ type: :development
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - "~>"
25
+ - !ruby/object:Gem::Version
26
+ version: '2.26'
27
+ - !ruby/object:Gem::Dependency
28
+ name: rubocop-rake
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - "~>"
32
+ - !ruby/object:Gem::Version
33
+ version: '0.6'
34
+ type: :development
35
+ prerelease: false
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - "~>"
39
+ - !ruby/object:Gem::Version
40
+ version: '0.6'
41
+ - !ruby/object:Gem::Dependency
42
+ name: rubocop-rspec
43
+ requirement: !ruby/object:Gem::Requirement
44
+ requirements:
45
+ - - "~>"
46
+ - !ruby/object:Gem::Version
47
+ version: '3.0'
48
+ type: :development
49
+ prerelease: false
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - "~>"
53
+ - !ruby/object:Gem::Version
54
+ version: '3.0'
55
+ - !ruby/object:Gem::Dependency
56
+ name: standard
57
+ requirement: !ruby/object:Gem::Requirement
58
+ requirements:
59
+ - - "~>"
60
+ - !ruby/object:Gem::Version
61
+ version: 1.39.0
62
+ type: :development
63
+ prerelease: false
64
+ version_requirements: !ruby/object:Gem::Requirement
65
+ requirements:
66
+ - - "~>"
67
+ - !ruby/object:Gem::Version
68
+ version: 1.39.0
69
+ - !ruby/object:Gem::Dependency
70
+ name: steep
71
+ requirement: !ruby/object:Gem::Requirement
72
+ requirements:
73
+ - - "~>"
74
+ - !ruby/object:Gem::Version
75
+ version: '1.7'
76
+ type: :development
77
+ prerelease: false
78
+ version_requirements: !ruby/object:Gem::Requirement
79
+ requirements:
80
+ - - "~>"
81
+ - !ruby/object:Gem::Version
82
+ version: '1.7'
83
+ - !ruby/object:Gem::Dependency
84
+ name: coderay
85
+ requirement: !ruby/object:Gem::Requirement
86
+ requirements:
87
+ - - "~>"
88
+ - !ruby/object:Gem::Version
89
+ version: '1.1'
90
+ type: :development
91
+ prerelease: false
92
+ version_requirements: !ruby/object:Gem::Requirement
93
+ requirements:
94
+ - - "~>"
95
+ - !ruby/object:Gem::Version
96
+ version: '1.1'
97
+ - !ruby/object:Gem::Dependency
98
+ name: flatware-rspec
99
+ requirement: !ruby/object:Gem::Requirement
100
+ requirements:
101
+ - - ">="
102
+ - !ruby/object:Gem::Version
103
+ version: 2.3.2
104
+ - - "<"
105
+ - !ruby/object:Gem::Version
106
+ version: '3.0'
107
+ type: :development
108
+ prerelease: false
109
+ version_requirements: !ruby/object:Gem::Requirement
110
+ requirements:
111
+ - - ">="
112
+ - !ruby/object:Gem::Version
113
+ version: 2.3.2
114
+ - - "<"
115
+ - !ruby/object:Gem::Version
116
+ version: '3.0'
117
+ - !ruby/object:Gem::Dependency
118
+ name: rspec
119
+ requirement: !ruby/object:Gem::Requirement
120
+ requirements:
121
+ - - "~>"
122
+ - !ruby/object:Gem::Version
123
+ version: '3.13'
124
+ type: :development
125
+ prerelease: false
126
+ version_requirements: !ruby/object:Gem::Requirement
127
+ requirements:
128
+ - - "~>"
129
+ - !ruby/object:Gem::Version
130
+ version: '3.13'
131
+ - !ruby/object:Gem::Dependency
132
+ name: super_diff
133
+ requirement: !ruby/object:Gem::Requirement
134
+ requirements:
135
+ - - ">="
136
+ - !ruby/object:Gem::Version
137
+ version: 0.12.1
138
+ type: :development
139
+ prerelease: false
140
+ version_requirements: !ruby/object:Gem::Requirement
141
+ requirements:
142
+ - - ">="
143
+ - !ruby/object:Gem::Version
144
+ version: 0.12.1
145
+ - !ruby/object:Gem::Dependency
146
+ name: simplecov
147
+ requirement: !ruby/object:Gem::Requirement
148
+ requirements:
149
+ - - "~>"
150
+ - !ruby/object:Gem::Version
151
+ version: '0.22'
152
+ type: :development
153
+ prerelease: false
154
+ version_requirements: !ruby/object:Gem::Requirement
155
+ requirements:
156
+ - - "~>"
157
+ - !ruby/object:Gem::Version
158
+ version: '0.22'
159
+ - !ruby/object:Gem::Dependency
160
+ name: simplecov-console
161
+ requirement: !ruby/object:Gem::Requirement
162
+ requirements:
163
+ - - ">="
164
+ - !ruby/object:Gem::Version
165
+ version: 0.9.1
166
+ - - "<"
167
+ - !ruby/object:Gem::Version
168
+ version: '1.0'
169
+ type: :development
170
+ prerelease: false
171
+ version_requirements: !ruby/object:Gem::Requirement
172
+ requirements:
173
+ - - ">="
174
+ - !ruby/object:Gem::Version
175
+ version: 0.9.1
176
+ - - "<"
177
+ - !ruby/object:Gem::Version
178
+ version: '1.0'
179
+ - !ruby/object:Gem::Dependency
180
+ name: httpx
181
+ requirement: !ruby/object:Gem::Requirement
182
+ requirements:
183
+ - - ">="
184
+ - !ruby/object:Gem::Version
185
+ version: 1.2.6
186
+ - - "<"
187
+ - !ruby/object:Gem::Version
188
+ version: '2.0'
189
+ type: :development
190
+ prerelease: false
191
+ version_requirements: !ruby/object:Gem::Requirement
192
+ requirements:
193
+ - - ">="
194
+ - !ruby/object:Gem::Version
195
+ version: 1.2.6
196
+ - - "<"
197
+ - !ruby/object:Gem::Version
198
+ version: '2.0'
199
+ - !ruby/object:Gem::Dependency
200
+ name: method_source
201
+ requirement: !ruby/object:Gem::Requirement
202
+ requirements:
203
+ - - "~>"
204
+ - !ruby/object:Gem::Version
205
+ version: '1.1'
206
+ type: :development
207
+ prerelease: false
208
+ version_requirements: !ruby/object:Gem::Requirement
209
+ requirements:
210
+ - - "~>"
211
+ - !ruby/object:Gem::Version
212
+ version: '1.1'
213
+ - !ruby/object:Gem::Dependency
214
+ name: rspec-retry
215
+ requirement: !ruby/object:Gem::Requirement
216
+ requirements:
217
+ - - "~>"
218
+ - !ruby/object:Gem::Version
219
+ version: '0.6'
220
+ type: :development
221
+ prerelease: false
222
+ version_requirements: !ruby/object:Gem::Requirement
223
+ requirements:
224
+ - - "~>"
225
+ - !ruby/object:Gem::Version
226
+ version: '0.6'
227
+ - !ruby/object:Gem::Dependency
228
+ name: vcr
229
+ requirement: !ruby/object:Gem::Requirement
230
+ requirements:
231
+ - - ">="
232
+ - !ruby/object:Gem::Version
233
+ version: 6.3.1
234
+ - - "<"
235
+ - !ruby/object:Gem::Version
236
+ version: 7.0.0
237
+ type: :development
238
+ prerelease: false
239
+ version_requirements: !ruby/object:Gem::Requirement
240
+ requirements:
241
+ - - ">="
242
+ - !ruby/object:Gem::Version
243
+ version: 6.3.1
244
+ - - "<"
245
+ - !ruby/object:Gem::Version
246
+ version: 7.0.0
247
+ - !ruby/object:Gem::Dependency
248
+ name: factory_bot
249
+ requirement: !ruby/object:Gem::Requirement
250
+ requirements:
251
+ - - "~>"
252
+ - !ruby/object:Gem::Version
253
+ version: '6.4'
254
+ type: :development
255
+ prerelease: false
256
+ version_requirements: !ruby/object:Gem::Requirement
257
+ requirements:
258
+ - - "~>"
259
+ - !ruby/object:Gem::Version
260
+ version: '6.4'
261
+ - !ruby/object:Gem::Dependency
262
+ name: faker
263
+ requirement: !ruby/object:Gem::Requirement
264
+ requirements:
265
+ - - "~>"
266
+ - !ruby/object:Gem::Version
267
+ version: '3.4'
268
+ type: :development
269
+ prerelease: false
270
+ version_requirements: !ruby/object:Gem::Requirement
271
+ requirements:
272
+ - - "~>"
273
+ - !ruby/object:Gem::Version
274
+ version: '3.4'
275
+ - !ruby/object:Gem::Dependency
276
+ name: elasticgraph-datastore_core
277
+ requirement: !ruby/object:Gem::Requirement
278
+ requirements:
279
+ - - '='
280
+ - !ruby/object:Gem::Version
281
+ version: 0.18.0.0
282
+ type: :runtime
283
+ prerelease: false
284
+ version_requirements: !ruby/object:Gem::Requirement
285
+ requirements:
286
+ - - '='
287
+ - !ruby/object:Gem::Version
288
+ version: 0.18.0.0
289
+ - !ruby/object:Gem::Dependency
290
+ name: elasticgraph-graphql
291
+ requirement: !ruby/object:Gem::Requirement
292
+ requirements:
293
+ - - '='
294
+ - !ruby/object:Gem::Version
295
+ version: 0.18.0.0
296
+ type: :runtime
297
+ prerelease: false
298
+ version_requirements: !ruby/object:Gem::Requirement
299
+ requirements:
300
+ - - '='
301
+ - !ruby/object:Gem::Version
302
+ version: 0.18.0.0
303
+ - !ruby/object:Gem::Dependency
304
+ name: elasticgraph-support
305
+ requirement: !ruby/object:Gem::Requirement
306
+ requirements:
307
+ - - '='
308
+ - !ruby/object:Gem::Version
309
+ version: 0.18.0.0
310
+ type: :runtime
311
+ prerelease: false
312
+ version_requirements: !ruby/object:Gem::Requirement
313
+ requirements:
314
+ - - '='
315
+ - !ruby/object:Gem::Version
316
+ version: 0.18.0.0
317
+ - !ruby/object:Gem::Dependency
318
+ name: elasticgraph-admin
319
+ requirement: !ruby/object:Gem::Requirement
320
+ requirements:
321
+ - - '='
322
+ - !ruby/object:Gem::Version
323
+ version: 0.18.0.0
324
+ type: :development
325
+ prerelease: false
326
+ version_requirements: !ruby/object:Gem::Requirement
327
+ requirements:
328
+ - - '='
329
+ - !ruby/object:Gem::Version
330
+ version: 0.18.0.0
331
+ - !ruby/object:Gem::Dependency
332
+ name: elasticgraph-elasticsearch
333
+ requirement: !ruby/object:Gem::Requirement
334
+ requirements:
335
+ - - '='
336
+ - !ruby/object:Gem::Version
337
+ version: 0.18.0.0
338
+ type: :development
339
+ prerelease: false
340
+ version_requirements: !ruby/object:Gem::Requirement
341
+ requirements:
342
+ - - '='
343
+ - !ruby/object:Gem::Version
344
+ version: 0.18.0.0
345
+ - !ruby/object:Gem::Dependency
346
+ name: elasticgraph-indexer
347
+ requirement: !ruby/object:Gem::Requirement
348
+ requirements:
349
+ - - '='
350
+ - !ruby/object:Gem::Version
351
+ version: 0.18.0.0
352
+ type: :development
353
+ prerelease: false
354
+ version_requirements: !ruby/object:Gem::Requirement
355
+ requirements:
356
+ - - '='
357
+ - !ruby/object:Gem::Version
358
+ version: 0.18.0.0
359
+ - !ruby/object:Gem::Dependency
360
+ name: elasticgraph-opensearch
361
+ requirement: !ruby/object:Gem::Requirement
362
+ requirements:
363
+ - - '='
364
+ - !ruby/object:Gem::Version
365
+ version: 0.18.0.0
366
+ type: :development
367
+ prerelease: false
368
+ version_requirements: !ruby/object:Gem::Requirement
369
+ requirements:
370
+ - - '='
371
+ - !ruby/object:Gem::Version
372
+ version: 0.18.0.0
373
+ - !ruby/object:Gem::Dependency
374
+ name: elasticgraph-schema_definition
375
+ requirement: !ruby/object:Gem::Requirement
376
+ requirements:
377
+ - - '='
378
+ - !ruby/object:Gem::Version
379
+ version: 0.18.0.0
380
+ type: :development
381
+ prerelease: false
382
+ version_requirements: !ruby/object:Gem::Requirement
383
+ requirements:
384
+ - - '='
385
+ - !ruby/object:Gem::Version
386
+ version: 0.18.0.0
387
+ description:
388
+ email:
389
+ - myron@squareup.com
390
+ executables: []
391
+ extensions: []
392
+ extra_rdoc_files: []
393
+ files:
394
+ - LICENSE.txt
395
+ - README.md
396
+ - elasticgraph-health_check.gemspec
397
+ - lib/elastic_graph/health_check/config.rb
398
+ - lib/elastic_graph/health_check/constants.rb
399
+ - lib/elastic_graph/health_check/envoy_extension.rb
400
+ - lib/elastic_graph/health_check/envoy_extension/graphql_http_endpoint_decorator.rb
401
+ - lib/elastic_graph/health_check/health_checker.rb
402
+ - lib/elastic_graph/health_check/health_status.rb
403
+ homepage:
404
+ licenses:
405
+ - MIT
406
+ metadata:
407
+ gem_category: extension
408
+ post_install_message:
409
+ rdoc_options: []
410
+ require_paths:
411
+ - lib
412
+ required_ruby_version: !ruby/object:Gem::Requirement
413
+ requirements:
414
+ - - "~>"
415
+ - !ruby/object:Gem::Version
416
+ version: '3.2'
417
+ required_rubygems_version: !ruby/object:Gem::Requirement
418
+ requirements:
419
+ - - ">="
420
+ - !ruby/object:Gem::Version
421
+ version: '0'
422
+ requirements: []
423
+ rubygems_version: 3.5.9
424
+ signing_key:
425
+ specification_version: 4
426
+ summary: An ElasticGraph extension that provides a health check for high availability
427
+ deployments.
428
+ test_files: []