RubyGems - elasticgraph-graphql - Versions diffs - 0.18.0.0 - Mend

elasticgraph-graphql 0.18.0.0

Files changed (81) hide show

data/lib/elastic_graph/graphql/datastore_query/paginator.rb ADDED Viewed

@@ -0,0 +1,199 @@
+# Copyright 2024 Block, Inc.
+#
+# Use of this source code is governed by an MIT-style
+# license that can be found in the LICENSE file or at
+# https://opensource.org/licenses/MIT.
+#
+# frozen_string_literal: true
+require "elastic_graph/error"
+require "elastic_graph/support/memoizable_data"
+module ElasticGraph
+  class GraphQL
+    class DatastoreQuery
+      # A generic pagination implementation, designed to handle both document pagination and
+      # aggregation pagination. Not tested directly; tests drive the `Query` interface instead.
+      #
+      # Our pagination support is designed to support Facebook's Relay Cursor Connections Spec.
+      # The description of the pagination algorithm is directly implemented by this class:
+      #
+      #    https://facebook.github.io/relay/graphql/connections.htm#sec-Pagination-algorithm
+      #
+      # As described by the spec, we support 4 pagination arguments, and apply them in this order:
+      #
+      #   - `after`: items with a cursor value on or before this value are excluded
+      #   - `before`: items with a cursor value on or after this value are excluded
+      #   - `first`: after applying before/after, all but the first `N` items are excluded
+      #   - `last`: after applying before/after/first, all but the last `N` items are excluded
+      #
+      # Note that `first` is applied before `last`, meaning that when both are provided (as in
+      # `first: 10, last: 4`) it is interpreted as "the last 4 of the first 10". However, the Relay
+      # spec itself discourages clients from passing both, but servers must still support it:
+      #
+      # > Including a value for both first and last is strongly discouraged, as it is likely to lead
+      # > to confusing queries and results.
+      #
+      # For document pagination, the relay semantics are implemented on top of Elasticsearch/OpenSearch's `search_after` feature:
+      #
+      #    https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-search-after.html
+      #
+      # For aggregation pagination, the relay semantics are implemented on top of the composite aggregation
+      # pagination feature:
+      #
+      #    https://www.elastic.co/guide/en/elasticsearch/reference/7.12/search-aggregations-bucket-composite-aggregation.html#_pagination
+      #
+      # In either case, the `search_after` (or `after`) argument is directly analogous to Relay's `after`.
+      # To support the full Relay spec, we have to do some additional clever things:
+      #
+      #   - When necessary (such as for `last: 50, before: some_cursor`), we have to _reverse_ the
+      #     sort, perform the query with a size of `last`, and then reverse the returned items
+      #     to the originally requested order.
+      #   - In some cases, we have to apply `after`, `before` or `last` as a post-processing step
+      #     to the items returned by the datastore.
+      #
+      # Note, however, that the sort key data type used for these two cases is a bit different:
+      #
+      # - For document pagination, `search_after` is a list of scalar values, corresponding to the order
+      #   of `sort` clauses. That is, if we are sorting on `amount` ascending and `createdAt` descending,
+      #   then the `search_after` value (and the `sort` value of each document) will be an
+      #   `[amount, createdAt]` tuple.
+      # - For aggregation pagination, `after` (and the `key` of each aggregation bucket is an unordered
+      #   hash of sort values. The sort field order is instead implied by the composite aggregation
+      #   `sources`.
+      class Paginator < Support::MemoizableData.define(:default_page_size, :max_page_size, :first, :after, :last, :before, :schema_element_names)
+        # These methods are provided by `Data.define`:
+        # @dynamic default_page_size, max_page_size, first, after, last, before, schema_element_names, initialize
+        def requested_page_size
+          # `+ 1` so we can tell if there are more docs for `has_next_page`/`has_previous_page`
+          # ...but only if we need to get anything at all.
+          (desired_page_size == 0) ? 0 : desired_page_size + 1
+        end
+        # Indicates if we need to search in reverse or not in order to satisfy the Relay pagination args.
+        # If searching in reverse is necessary, `process_items_and_build_page_info` will take care of
+        # reversing the reversed results back to their original order.
+        def search_in_reverse?
+          # If `first` has been provided then we _must not_ search in reverse.
+          # The relay spec requires us to apply `first` before `last`, and searching
+          # in reverse would prevent us from being able to return the first `N`.
+          return false if first_n
+          # If we do not have to return the first N results, we are free to search in
+          # reverse if needed. Either `last` or `before` requires it.
+          last_n || before
+        end
+        # The cursor values to search after (if we need to search after one at all).
+        def search_after
+          search_in_reverse? ? before : after
+        end
+        # In some cases, we're forced to search in reverse; in those caes, this is used to restore
+        # the ordering of the items to the intended order.
+        def restore_intended_item_order(items)
+          search_in_reverse? ? items.reverse : items
+        end
+        # Used for post-processing a list of items from a search result, truncating the list as needed. Truncation
+        # may be necessary because we may request an extra item as part of our pagination implementation.
+        def truncate_items(items)
+          # Remove the extra doc we requested by doing `size: size + 1`, if an extra was returned.
+          # Removing the first or last doc (as this will do) will signal to `bulid_page_info`
+          # that there definitely is a previous or next page.
+          # Note: we use `to_a` to satisfy steep, since `Array#[]` can return `nil`--but with the arg
+          # we pass, never does when items is non-empty, which our conditional enforces here.
+          items = items[search_in_reverse? ? 1..-1 : 0...-1].to_a if items.size > desired_page_size
+          # We can't always use `before` and `after` in the datastore query (such as when both are provided!),
+          # so here we drop items from the start that come on or before `after`, and items from the
+          # end that come on or after `before`.
+          if (after_cursor = after)
+            items = items.drop_while do |doc|
+              item_sort_values_satisfy?(yield(doc, after_cursor), :<=)
+            end
+          end
+          if (before_cursor = before)
+            items = items.take_while do |doc|
+              item_sort_values_satisfy?(yield(doc, before_cursor), :<)
+            end
+          end
+          # We are not always able to use `last` as the query `size` (such as when `first` is also provided)
+          # so here we apply `last`. If it has already been used this line will be a no-op.
+          items = (_ = items).last(last_n.to_i) if last_n
+          items
+        end
+        def paginated_from_singleton_cursor?
+          before == DecodedCursor::SINGLETON || after == DecodedCursor::SINGLETON
+        end
+        def desired_page_size
+          # The relay spec requires us to apply `first` before `last`, but if neither
+          # is provided we fall back to `default_page_size`.
+          @desired_page_size ||= [first_n || last_n || default_page_size, max_page_size].min.to_i
+        end
+        private
+        def first_n
+          @first_n ||= size_arg_value(:first, first)
+        end
+        def last_n
+          @last_n ||= size_arg_value(:last, last)
+        end
+        def size_arg_value(arg_name, value)
+          if value && value < 0
+            raise ::GraphQL::ExecutionError, "`#{schema_element_names.public_send(arg_name)}` cannot be negative, but is #{value}."
+          else
+            value
+          end
+        end
+        # A bit like `Array#<=>`, but understands ascending vs descending sorts.
+        # We can't simply use doc_sort_values <=> cursor_sort_values` because our
+        # sort might mix ascending and descending sorts. So, we have to go value-by-value
+        # and compare each.
+        def item_sort_values_satisfy?(sort_values, comparison_operator)
+          if (first_unequal_sort_value = sort_values.find(&:unequal?))
+            # Since each subsequent sort field is a tie breaker that only gets used if two documents
+            # have the same values for all the prior sort fields, as soon as we find a sort value that
+            # is unequal we can just do the comparison based on it.
+            first_unequal_sort_value.item_satisfies_compared_to_cursor?(comparison_operator)
+          else
+            # The doc values and cursor values are all exactly equal. Return true or false on
+            # the basis of whether or not the comparison operator allows exact equality.
+            comparison_operator == :<= || comparison_operator == :>=
+          end
+        end
+        SortValue = ::Data.define(:from_item, :from_cursor, :sort_direction) do
+          # @implements SortValue
+          def unequal?
+            from_item != from_cursor
+          end
+          def item_satisfies_compared_to_cursor?(comparison_operator)
+            if from_item.nil?
+              # nil values sort first when sorting ascending, and last when sorting descending.
+              # (see `DocumentPaginator#sort` for a more thorough explanation).
+              sort_direction == :asc
+            elsif from_cursor.nil?
+              # nil values sort first when sorting ascending, and last when sorting descending.
+              # (see `DocumentPaginator#sort` for a more thorough explanation).
+              sort_direction == :desc
+            else # both `from_item` and `from_cursor` are non-nil, and can be compared.
+              result = from_item.public_send(comparison_operator, from_cursor)
+              (sort_direction == :asc) ? result : !result
+            end
+          end
+        end
+      end
+    end
+  end
+end

data/lib/elastic_graph/graphql/datastore_query/routing_picker.rb ADDED Viewed

@@ -0,0 +1,239 @@
+# Copyright 2024 Block, Inc.
+#
+# Use of this source code is governed by an MIT-style
+# license that can be found in the LICENSE file or at
+# https://opensource.org/licenses/MIT.
+#
+# frozen_string_literal: true
+require "elastic_graph/graphql/filtering/filter_value_set_extractor"
+module ElasticGraph
+  class GraphQL
+    class DatastoreQuery
+      # Responsible for picking routing values for a specific query based on the filters.
+      class RoutingPicker
+        def initialize(schema_names:)
+          # @type var all_values_set: _RoutingValueSet
+          all_values_set = RoutingValueSet::ALL
+          @filter_value_set_extractor = Filtering::FilterValueSetExtractor.new(schema_names, all_values_set) do |operator, filter_value|
+            if operator == :equal_to_any_of
+              # This calls `.compact` to remove `nil` filter_value values
+              RoutingValueSet.of(filter_value.compact)
+            else # gt, lt, gte, lte, matches
+              # With one of these inexact/inequality operators, we don't have a way to precisely represent
+              # the set of values. Instead, we represent it with the special UnboundedWithExclusions
+              # implementation since when these operators are used the set is unbounded (there's an infinite
+              # number of values in the set) but it doesn't contain all values (it has some exclusions).
+              RoutingValueSet::UnboundedWithExclusions
+            end
+          end
+        end
+        # Given a list of `filter_hashes` and a list of `routing_field_paths`, returns a list of
+        # routing values that can safely be used to limit what index shards we search
+        # without risking missing any matching documents that could exist on other shards.
+        #
+        # If an eligible list of routing values cannot be determined, returns `nil`.
+        #
+        # Importantly, we have to be careful to not return routing values unless we are 100% sure
+        # that the set of values will route to the full set of shards on which documents matching
+        # the filters could live. If a document matching the filters lived on a shard that our
+        # search does not route to, it will not be included in the search response.
+        #
+        # Essentially, this method guarantees that the following pseudo code is always satisfied:
+        #
+        # ``` ruby
+        # if (routing_values = extract_eligible_routing_values(filter_hashes, routing_field_paths))
+        #   Datastore.all_documents_matching(filter_hashes).each do |document|
+        #     routing_field_paths.each do |field_path|
+        #       expect(routing_values).to include(document.value_at(field_path))
+        #     end
+        #   end
+        # end
+        # ```
+        def extract_eligible_routing_values(filter_hashes, routing_field_paths)
+          @filter_value_set_extractor.extract_filter_value_set(filter_hashes, routing_field_paths).to_return_value
+        end
+      end
+      class RoutingValueSet < Data.define(:type, :routing_values)
+        # @dynamic ==
+        def self.of(routing_values)
+          new(:inclusive, routing_values.to_set)
+        end
+        def self.of_all_except(routing_values)
+          new(:exclusive, routing_values.to_set)
+        end
+        ALL = of_all_except([])
+        def intersection(other_set)
+          # Here we return `self` to preserve the commutative property of `intersection`. Returning `self`
+          # here matches the behavior of `UnboundedWithExclusions.intersection`. See the comment there for
+          # rationale.
+          return self if other_set == UnboundedWithExclusions
+          # @type var other: RoutingValueSet
+          other = _ = other_set
+          if inclusive? && other.inclusive?
+            # Since both sets are inclusive, we can just delegate to `Set#intersection` here.
+            RoutingValueSet.of(routing_values.intersection(other.routing_values))
+          elsif exclusive? && other.exclusive?
+            # Since both sets are exclusive, we need to return an exclusive set of the union of the
+            # excluded values. For example, when dealing with positive integers:
+            #
+            #   s1 = RoutingValueSet.of_all_except([1, 2, 3]) # > 3
+            #   s2 = RoutingValueSet.of_all_except([3, 4, 5]) # 1, 2, > 5
+            #
+            #   s3 = s1.intersection(s2)
+            #
+            # Here s3 would be all values > 5 (the same as `RoutingValueSet.of_all_except([1, 2, 3, 4, 5])`)
+            RoutingValueSet.of_all_except(routing_values.union(other.routing_values))
+          else
+            # Since one set is inclusive and one set is exclusive, we need to return an inclusive set of
+            # `included_values - excluded_values`. For example, when dealing with positive integers:
+            #
+            #   s1 = RoutingValueSet.of([1, 2, 3]) # 1, 2, 3
+            #   s2 = RoutingValueSet.of_all_except([3, 4, 5]) # 1, 2, > 5
+            #
+            #   s3 = s1.intersection(s2)
+            #
+            # Here s3 would be just `1, 2`.
+            included_values, excluded_values = get_included_and_excluded_values(other)
+            RoutingValueSet.of(included_values - excluded_values)
+          end
+        end
+        def union(other_set)
+          # Here we return `other` to preserve the commutative property of `union`. Returning `other`
+          # here matches the behavior of `UnboundedWithExclusions.union`. See the comment there for
+          # rationale.
+          return other_set if other_set == UnboundedWithExclusions
+          # @type var other: RoutingValueSet
+          other = _ = other_set
+          if inclusive? && other.inclusive?
+            # Since both sets are inclusive, we can just delegate to `Set#union` here.
+            RoutingValueSet.of(routing_values.union(other.routing_values))
+          elsif exclusive? && other.exclusive?
+            # Since both sets are exclusive, we need to return an exclusive set of the intersection of the
+            # excluded values. For example, when dealing with positive integers:
+            #
+            #   s1 = RoutingValueSet.of_all_except([1, 2, 3]) # > 3
+            #   s2 = RoutingValueSet.of_all_except([3, 4, 5]) # 1, 2, > 5
+            #
+            #   s3 = s1.union(s2)
+            #
+            # Here s3 would be all 1, 2, > 3 (the same as `RoutingValueSet.of_all_except([3])`)
+            RoutingValueSet.of_all_except(routing_values.intersection(other.routing_values))
+          else
+            # Since one set is inclusive and one set is exclusive, we need to return an exclusive set of
+            # `excluded_values - included_values`. For example, when dealing with positive integers:
+            #
+            #   s1 = RoutingValueSet.of([1, 2, 3]) # 1, 2, 3
+            #   s2 = RoutingValueSet.of_all_except([3, 4, 5]) # 1, 2, > 5
+            #
+            #   s3 = s1.union(s2)
+            #
+            # Here s3 would be 1, 2, 3, > 5 (the same as `RoutingValueSet.of_all_except([4, 5])`)
+            included_values, excluded_values = get_included_and_excluded_values(other)
+            RoutingValueSet.of_all_except(excluded_values - included_values)
+          end
+        end
+        def negate
+          with(type: INVERTED_TYPES.fetch(type))
+        end
+        INVERTED_TYPES = {inclusive: :exclusive, exclusive: :inclusive}
+        def to_return_value
+          # Elasticsearch/OpenSearch have no routing value syntax to tell it to avoid searching a specific shard
+          # (and the fact that we are excluding a routing value doesn't mean that other documents that
+          # live on the same shard with different routing values can't match!) so we return `nil` to
+          # force the datastore to search all shards.
+          return nil if exclusive?
+          routing_values.to_a
+        end
+        protected
+        def inclusive?
+          type == :inclusive
+        end
+        def exclusive?
+          type == :exclusive
+        end
+        private
+        def get_included_and_excluded_values(other)
+          inclusive? ? [routing_values, other.routing_values] : [other.routing_values, routing_values]
+        end
+        # This `RoutingValueSet` implementation is used for otherwise unrepresentable cases. We use it when
+        # a filter on one of the `routing_field_paths` uses an inequality like:
+        #
+        #     {routing_field: {gt: "abc"}}
+        #
+        # In a case like that, the set is unbounded (there's an infinite number of values that are greater
+        # than `"abc"`...), but it's not `RoutingValueSet::ALL`--since it's based on an inequality, there are
+        # _some_ values that are excluded from the set. But we can't use `RoutingValueSet.of_all_except(...)`
+        # because the set of exclusions is also unbounded!
+        #
+        # When our filter value extraction results in this set, we must search all shards of the index and
+        # cannot pass any `routing` value to the datastore at all.
+        module UnboundedWithExclusions
+          # @dynamic self.==
+          def self.intersection(other)
+            # Technically, the "true" intersection would be `other - values_of(self)` but as we don't have
+            # any known values from this unbounded set, we just return `other`. It's OK to include extra values
+            # in the set (we'll search additional shards) but not OK to fail to include necessary values in
+            # the set (we'd avoid searching a shard that may have matching documents) so we err on the side of
+            # including more values.
+            other
+          end
+          def self.union(other)
+            # Since our set here is unbounded, the resulting union is also unbounded. This errs on the side
+            # of safety since this set's `to_return_value` returns `nil` to cause the datastore to search
+            # all shards.
+            self
+          end
+          def self.negate
+            # This here is the only difference in behavior of this set implementation vs `RoutingValueSet::ALL`.
+            # Where as `ALL.negate` returns an empty set, we treat `negate` as a no-op. We do that because the
+            # negation of an inexact unbounded set is still an inexact unbounded set. While it flips which values
+            # are in or out of the set, this object is still the representation in our datamodel for that case.
+            self
+          end
+          def self.to_return_value
+            # Here we return `nil` to make sure that the datastore searches all shards, since we don't have
+            # any information we can use to safely limit what shards it searches.
+            nil
+          end
+        end
+      end
+      # `Query::RoutingPicker` exists only for use by `Query` and is effectively private.
+      private_constant :RoutingPicker
+      # `RoutingValueSet` exists only for use here and is effectively private.
+      private_constant :RoutingValueSet
+      # Steep is complaining that it can't find some `Query` but they are not in this file...
+      # @dynamic aggregations, shard_routing_values, search_index_definitions, merge_with, search_index_expression
+      # @dynamic with, to_datastore_msearch_header_and_body, document_paginator
+    end
+  end
+end