samvera-nesting_indexer 0.7.0 → 0.8.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 3933e26eda523a5609b140bccf34d4c291669292
4
- data.tar.gz: d6d13b4ba28d3863aaaf32f40a8524de8f37ab5b
3
+ metadata.gz: 37f64367d992a4d794752c4eda559a3e066933f3
4
+ data.tar.gz: 5de4fc1861c009b962fecd391582bc2240f329b7
5
5
  SHA512:
6
- metadata.gz: fbd5fa597ab1afc270d34a8e501e39a9aa6d066f3442b87aa7f692c9bbbd8d719dcd6c08cd196e0b95a17dc13918588b33fad42b8f86e4cb65d0e25d31297f8b
7
- data.tar.gz: 164221f096a42d550d624f6658fbc0e3127560935f05684f573c53de5c7772b4b68665a5058788a32fda61c06fbaf41936da0ea3bb32505f0cc874b7a9b1b3ec
6
+ metadata.gz: dad5de0e38c405e321e7f520cb0c44137423fa942e81218ba809e4dc9dc12255b15d69edc59f1c2b8aaa866504da064c1df1d45cd059fcf3e5069f234b2d2990
7
+ data.tar.gz: 6882d1ea6b0a40a9c738a420efe54ec3a984c591d16d20247bd3135f5a919883f249e13a3965585a7be9a6793e0ee5d1e83b656b8a858f832b71f7f14e195f6c
data/README.md CHANGED
@@ -18,6 +18,10 @@ The Samvera::NestingIndexer gem is responsible for indexing the graph relationsh
18
18
 
19
19
  This is a sandbox to work through the reindexing strategy as it relates to [CurateND Collections](https://github.com/ndlib/samvera_nd/issues/420). At this point the code is separate to allow for raid testing and prototyping (no sense spinning up SOLR and Fedora to walk an arbitrary graph).
20
20
 
21
+ ### Notation
22
+
23
+ When B is a member of A, I am using the `A ={ B` notation. When C is a member of B and B is a member of A, I'll chain these together `A ={ B ={ C`.
24
+
21
25
  ## Concepts
22
26
 
23
27
  As we are indexing objects, we have two types of documents:
@@ -36,6 +40,8 @@ See [Samvera::NestingIndexer::Documents::IndexDocument](./lib/samvera/nesting_in
36
40
 
37
41
  To reindex a single document, we leverage the [`Samvera::NestingIndexer.reindex_relationships`](./lib/samvera/nesting_indexer.rb) method.
38
42
 
43
+ To reindex all of the documents, we leverage the [`Samvera::NestingIndexer.reindex_all!`](lib/samvera/nesting_indexer.rb) method. **Warning: This is a very slow process.**
44
+
39
45
  ## Examples
40
46
 
41
47
  Given the following PreservationDocuments:
@@ -87,10 +93,7 @@ RSpec.describe MyCustomAdapter
87
93
  end
88
94
  ```
89
95
 
90
-
91
-
92
-
93
- [See CurateND for our adaptor configuration](https://github.com/ndlib/samvera_nd/blob/6fbe79c9725c0f8b4641981044ec250c5163053b/config/initializers/samvera_config.rb#L32-L35).
96
+ [See CurateND for Notre Dame's adaptor configuration](https://github.com/ndlib/samvera_nd/blob/6fbe79c9725c0f8b4641981044ec250c5163053b/config/initializers/samvera_config.rb#L32-L35).
94
97
 
95
98
  ## Considerations
96
99
 
@@ -100,3 +103,21 @@ Given a single object A, when we reindex A, we:
100
103
  * Iterate through each descendant, in a breadth-first process, to reindex it (and each descendant's descendants).
101
104
 
102
105
  This is a potentially time consumptive process and should not be run within the request cycle.
106
+
107
+ ### Cycle Detections
108
+
109
+ When dealing with nested graphs, there is a danger of creating an cycle (e.g. `A ={ B ={ A`). Samvera::NestingIndexer implements two guards to short-circuit the indexing of cyclic graphs:
110
+
111
+ * Enforcing a maximum nesting depth of the graph
112
+ * Checking that an object is not its own ancestor (`Samvera::NestingIndexer::RelationshipReindexer#guard_against_possiblity_of_self_ancestry`)
113
+
114
+ The [`./spec/features/reindex_pid_and_descendants_spec.rb`](spec/features/reindex_pid_and_descendants_spec.rb) contains examples of behavior.
115
+
116
+ **NOTE: These guards to prevent indexing cyclic graphs do not prevent the underlying preservation document from creating its own cyclic graph.**
117
+
118
+ ## TODO
119
+
120
+ - [ ] Incorporate additional logging
121
+ - [ ] Build methods to allow for fanning out the reindexing. At present, when we reindex a node and its "children", we run that entire process within a single context. Likewise, we run a single process when reindexing EVERYTHING.
122
+ - [ ] Promote from [samvera-labs](https://github.com/samvera-labs) to [samvera](https://github.com/samvera) via the [promotion process](http://samvera-labs.github.io/promotion.html).
123
+ - [ ] Write adapter method to assist in guarding against self-ancestry. We could probably expose a base adapter that has the method through use of the other adapter methods.
@@ -13,11 +13,14 @@ module Samvera
13
13
  # In a perfect world we could reindex the id as well; But that is for another test.
14
14
  #
15
15
  # @param id [String] - The permanent identifier of the object that will be reindexed along with its children.
16
- # @param maximum_nesting_depth [Integer] - there to guard against cyclical graphs
16
+ # @param maximum_nesting_depth [Integer] - used to short-circuit overly deep nesting as well as prevent accidental cyclic graphs
17
+ # from creating an infinite loop.
17
18
  # @return [Boolean] - It was successful
18
- # @raise Samvera::Exceptions::CycleDetectionError - A potential cycle was detected
19
+ # @raise Samvera::Exceptions::CycleDetectionError - A possible cycle was detected
20
+ # @raise Samvera::Exceptions::ExceededMaximumNestingDepthError - We exceeded our maximum depth
21
+ # @raise Samvera::Exceptions::DocumentIsItsOwnAncestorError - A document we were about to index appeared to be its own ancestor
19
22
  def self.reindex_relationships(id:, maximum_nesting_depth: configuration.maximum_nesting_depth)
20
- RelationshipReindexer.call(id: id, maximum_nesting_depth: maximum_nesting_depth, adapter: adapter)
23
+ RelationshipReindexer.call(id: id, maximum_nesting_depth: maximum_nesting_depth, configuration: configuration)
21
24
  true
22
25
  end
23
26
 
@@ -29,14 +32,14 @@ module Samvera
29
32
 
30
33
  # @api public
31
34
  # Responsible for reindexing the entire preservation layer.
32
- # @param maximum_nesting_depth [Integer] - there to guard against cyclical graphs
35
+ # @param maximum_nesting_depth [Integer] - there to guard against cyclic graphs
33
36
  # @return [Boolean] - It was successful
34
- # @raise Samvera::Exceptions::CycleDetectionError - A potential cycle was detected
37
+ # @raise Samvera::Exceptions::ReindexingError - There was a problem reindexing the graph.
35
38
  def self.reindex_all!(maximum_nesting_depth: configuration.maximum_nesting_depth)
36
39
  # While the RepositoryReindexer is responsible for reindexing everything, I
37
40
  # want to inject the lambda that will reindex a single item.
38
41
  id_reindexer = method(:reindex_relationships)
39
- RepositoryReindexer.call(maximum_nesting_depth: maximum_nesting_depth, id_reindexer: id_reindexer, adapter: adapter)
42
+ RepositoryReindexer.call(maximum_nesting_depth: maximum_nesting_depth, id_reindexer: id_reindexer, configuration: configuration)
40
43
  true
41
44
  end
42
45
 
@@ -88,7 +88,7 @@ module Samvera
88
88
  end
89
89
 
90
90
  def find_each
91
- cache.each { |_key, document| yield(document) }
91
+ cache.each_value { |document| yield(document) }
92
92
  end
93
93
 
94
94
  def clear_cache!
@@ -61,7 +61,7 @@ if defined?(RSpec)
61
61
  end
62
62
 
63
63
  it 'expects a block' do
64
- expect(block_parameter_extracter.call(subject)).to be_present
64
+ expect(block_parameter_extracter.call(subject)).to eq([:block])
65
65
  end
66
66
  end
67
67
  describe '.each_child_document_of' do
@@ -76,13 +76,13 @@ if defined?(RSpec)
76
76
  end
77
77
 
78
78
  it 'expects a block' do
79
- expect(block_parameter_extracter.call(subject)).to be_present
79
+ expect(block_parameter_extracter.call(subject)).to eq([:block])
80
80
  end
81
81
  end
82
82
  describe '.write_document_attributes_to_index_layer' do
83
83
  subject { described_class.method(:write_document_attributes_to_index_layer) }
84
84
 
85
- it 'requires the :attributes keyword (and does not require any others)' do
85
+ it 'requires the :ancestors, :id, :parent_ids, and :pathnames keyword (and does not require any others)' do
86
86
  expect(required_keyword_parameters.call(subject)).to eq(%i(ancestors id parent_ids pathnames))
87
87
  end
88
88
 
@@ -1,5 +1,6 @@
1
1
  require 'samvera/nesting_indexer/adapters/abstract_adapter'
2
2
  require 'samvera/nesting_indexer/exceptions'
3
+ require 'logger'
3
4
 
4
5
  module Samvera
5
6
  # :nodoc:
@@ -9,11 +10,14 @@ module Samvera
9
10
  class Configuration
10
11
  DEFAULT_MAXIMUM_NESTING_DEPTH = 15
11
12
 
12
- def initialize(maximum_nesting_depth: DEFAULT_MAXIMUM_NESTING_DEPTH)
13
+ def initialize(maximum_nesting_depth: DEFAULT_MAXIMUM_NESTING_DEPTH, logger: default_logger)
13
14
  self.maximum_nesting_depth = maximum_nesting_depth
15
+ self.logger = logger
14
16
  end
15
17
 
16
- attr_reader :maximum_nesting_depth
18
+ attr_reader :maximum_nesting_depth, :logger
19
+
20
+ attr_writer :logger
17
21
 
18
22
  def maximum_nesting_depth=(input)
19
23
  @maximum_nesting_depth = input.to_i
@@ -68,6 +72,14 @@ module Samvera
68
72
  require 'samvera/nesting_indexer/adapters/in_memory_adapter'
69
73
  Adapters::InMemoryAdapter
70
74
  end
75
+
76
+ def default_logger
77
+ if defined?(Rails.logger)
78
+ Rails.logger
79
+ else
80
+ Logger.new($stdout)
81
+ end
82
+ end
71
83
  end
72
84
  private_constant :Configuration
73
85
  end
@@ -3,6 +3,8 @@ require 'dry-equalizer'
3
3
  module Samvera
4
4
  module NestingIndexer
5
5
  module Documents
6
+ ANCESTOR_AND_PATHNAME_DELIMITER = '/'.freeze
7
+
6
8
  # @api public
7
9
  #
8
10
  # A simplified document that reflects the necessary attributes for re-indexing
@@ -30,9 +30,34 @@ module Samvera
30
30
  # Raised when we may have detected a cycle within the graph
31
31
  class CycleDetectionError < RuntimeError
32
32
  attr_reader :id
33
- def initialize(id)
33
+ def initialize(id:)
34
34
  @id = id
35
- super "Possible graph cycle discovered related to PID=#{id}."
35
+ super to_s
36
+ end
37
+
38
+ def to_s
39
+ "Possible graph cycle discovered related to ID=#{id.inspect}."
40
+ end
41
+ end
42
+
43
+ # Raised when we have exceeded the time to live constraint
44
+ # @see Samvera::NestingIndexer::Configuration.maximum_nesting_depth
45
+ class ExceededMaximumNestingDepthError < CycleDetectionError
46
+ def to_s
47
+ "Exceeded maximum nesting depth while indexing ID=#{id.inspect}."
48
+ end
49
+ end
50
+
51
+ # Raised when we encounter a document that is to be indexed as its own ancestor.
52
+ class DocumentIsItsOwnAncestorError < CycleDetectionError
53
+ attr_reader :pathnames
54
+ def initialize(id:, pathnames:)
55
+ super(id: id)
56
+ @pathnames = pathnames
57
+ end
58
+
59
+ def to_s
60
+ "Document with ID=#{id.inspect} is marked as its own ancestor based on the given pathnames: #{pathnames.inspect}."
36
61
  end
37
62
  end
38
63
  # A wrapper exception that includes the original exception and the id
@@ -41,7 +66,7 @@ module Samvera
41
66
  def initialize(id, original_exception)
42
67
  @id = id
43
68
  @original_exception = original_exception
44
- super "Error PID=#{id} - #{original_exception}"
69
+ super "ReindexingError on ID=#{id.inspect}\n\t#{original_exception}"
45
70
  end
46
71
  end
47
72
  end
@@ -6,7 +6,7 @@ module Samvera
6
6
  # Establishing namespace
7
7
  module NestingIndexer
8
8
  # Responsible for reindexing the PID and its descendants
9
- # @note There is cycle detection via the TIME_TO_LIVE counter
9
+ # @note There is cycle detection via the Samvera::NestingIndexer::Configuration#maximum_nesting_depth counter
10
10
  # @api private
11
11
  class RelationshipReindexer
12
12
  # @api private
@@ -20,32 +20,38 @@ module Samvera
20
20
  end
21
21
 
22
22
  # @param id [String]
23
- # @param maximum_nesting_depth [Integer] Samvera::NestingIndexer::TIME_TO_LIVE to detect cycles in the graph
24
- # @param adapter [Samvera::NestingIndexer::Adapters::AbstractAdapter] Conforms to the Samvera::NestingIndexer::Adapters::AbstractAdapter interface
23
+ # @param maximum_nesting_depth [Integer] What is the maximum allowed depth of nesting
24
+ # @param configuration [#adapter, #logger] The :adapter conforms to the Samvera::NestingIndexer::Adapters::AbstractAdapter interface
25
+ # and the :logger conforms to Logger
25
26
  # @param queue [#shift, #push] queue
26
- def initialize(id:, maximum_nesting_depth:, adapter:, queue: [])
27
+ def initialize(id:, maximum_nesting_depth:, configuration:, queue: [], visited_ids: [])
27
28
  @id = id.to_s
28
29
  @maximum_nesting_depth = maximum_nesting_depth.to_i
29
- @adapter = adapter
30
+ @configuration = configuration
30
31
  @queue = queue
32
+ @visited_ids = visited_ids
31
33
  end
32
- attr_reader :id, :maximum_nesting_depth, :queue, :adapter
34
+ attr_reader :id, :maximum_nesting_depth
33
35
 
34
36
  # Perform a bread-first tree traversal of the initial document and its descendants.
37
+ # rubocop:disable Metrics/AbcSize
35
38
  def call
36
- enqueue(initial_index_document, maximum_nesting_depth)
37
- processing_document = dequeue
38
- while processing_document
39
- process_a_document(processing_document)
40
- adapter.each_child_document_of(document: processing_document) { |child| enqueue(child, processing_document.maximum_nesting_depth - 1) }
39
+ wrap_logging("nested indexing of ID=#{initial_index_document.id.inspect}") do
40
+ enqueue(initial_index_document, maximum_nesting_depth)
41
41
  processing_document = dequeue
42
+ while processing_document
43
+ process_a_document(processing_document)
44
+ adapter.each_child_document_of(document: processing_document) { |child| enqueue(child, processing_document.maximum_nesting_depth - 1) }
45
+ processing_document = dequeue
46
+ end
42
47
  end
43
48
  self
44
49
  end
50
+ # rubocop:enbable Metrics/AbcSize
45
51
 
46
52
  private
47
53
 
48
- attr_writer :document
54
+ attr_reader :queue, :configuration, :visited_ids
49
55
 
50
56
  def initial_index_document
51
57
  adapter.find_index_document_by(id: id)
@@ -53,6 +59,8 @@ module Samvera
53
59
 
54
60
  extend Forwardable
55
61
  def_delegator :queue, :shift, :dequeue
62
+ def_delegator :configuration, :adapter
63
+ def_delegator :configuration, :logger
56
64
 
57
65
  require 'delegate'
58
66
  # A small object to help track time to live concerns
@@ -69,16 +77,33 @@ module Samvera
69
77
  end
70
78
 
71
79
  def process_a_document(index_document)
72
- raise Exceptions::CycleDetectionError, id if index_document.maximum_nesting_depth <= 0
73
- preservation_document = adapter.find_preservation_document_by(id: index_document.id)
74
- parent_ids_and_path_and_ancestors = parent_ids_and_path_and_ancestors_for(preservation_document)
75
- adapter.write_document_attributes_to_index_layer(**parent_ids_and_path_and_ancestors)
80
+ raise Exceptions::ExceededMaximumNestingDepthError, id: id if index_document.maximum_nesting_depth <= 0
81
+ wrap_logging("indexing ID=#{index_document.id.inspect}") do
82
+ preservation_document = adapter.find_preservation_document_by(id: index_document.id)
83
+ parent_ids_and_path_and_ancestors = parent_ids_and_path_and_ancestors_for(preservation_document)
84
+ guard_against_possiblity_of_self_ancestry(index_document: index_document, pathnames: parent_ids_and_path_and_ancestors.fetch(:pathnames))
85
+ adapter.write_document_attributes_to_index_layer(**parent_ids_and_path_and_ancestors)
86
+ visited_ids << index_document.id
87
+ end
76
88
  end
77
89
 
78
90
  def parent_ids_and_path_and_ancestors_for(preservation_document)
79
91
  ParentAndPathAndAncestorsBuilder.new(preservation_document, adapter).to_hash
80
92
  end
81
93
 
94
+ def guard_against_possiblity_of_self_ancestry(index_document:, pathnames:)
95
+ pathnames.each do |pathname|
96
+ next unless pathname.include?("#{index_document.id}/")
97
+ raise Exceptions::DocumentIsItsOwnAncestorError, id: index_document.id, pathnames: pathnames
98
+ end
99
+ end
100
+
101
+ def wrap_logging(message_suffix)
102
+ logger.debug("Starting #{message_suffix}")
103
+ yield
104
+ logger.debug("Ending #{message_suffix}")
105
+ end
106
+
82
107
  # A small object that helps encapsulate the logic of building the hash of information regarding
83
108
  # the initialization of an Samvera::NestingIndexer::Documents::IndexDocument
84
109
  #
@@ -113,9 +138,9 @@ module Samvera
113
138
  def compile_one!(parent_index_document)
114
139
  @parent_ids << parent_index_document.id
115
140
  parent_index_document.pathnames.each do |pathname|
116
- @pathnames << File.join(pathname, @preservation_document.id)
117
- slugs = pathname.split("/")
118
- slugs.each_index { |i| @ancestors << slugs[0..i].join('/') }
141
+ @pathnames << "#{pathname}#{Documents::ANCESTOR_AND_PATHNAME_DELIMITER}#{@preservation_document.id}"
142
+ slugs = pathname.split(Documents::ANCESTOR_AND_PATHNAME_DELIMITER)
143
+ slugs.each_index { |i| @ancestors << slugs[0..i].join(Documents::ANCESTOR_AND_PATHNAME_DELIMITER) }
119
144
  end
120
145
  @ancestors += parent_index_document.ancestors
121
146
  end
@@ -1,3 +1,5 @@
1
+ require 'samvera/nesting_indexer/exceptions'
2
+ require 'forwardable'
1
3
  module Samvera
2
4
  # Establishing namespace
3
5
  module NestingIndexer
@@ -19,24 +21,29 @@ module Samvera
19
21
 
20
22
  # @param id_reindexer [#call] Samvera::NestingIndexer.method(:reindex_relationships) Responsible for reindexing a single object
21
23
  # @param maximum_nesting_depth [Integer] detect cycles in the graph
22
- # @param adapter [Samvera::NestingIndexer::Adapters::AbstractAdapter] Conforms to the Samvera::NestingIndexer::Adapters::AbstractAdapter interface
23
- def initialize(maximum_nesting_depth:, id_reindexer:, adapter:)
24
+ # @param configuration [#adapter, #logger] The :adapter conforms to the Samvera::NestingIndexer::Adapters::AbstractAdapter interface
25
+ # and the :logger conforms to Logger
26
+ def initialize(maximum_nesting_depth:, id_reindexer:, configuration:)
24
27
  @maximum_nesting_depth = maximum_nesting_depth.to_i
25
28
  @id_reindexer = id_reindexer
26
- @adapter = adapter
29
+ @configuration = configuration
27
30
  @processed_ids = []
28
31
  end
29
32
 
30
33
  # @todo Would it make sense to leverage an each_preservation_id instead?
31
34
  def call
32
- @adapter.each_perservation_document_id_and_parent_ids do |id, parent_ids|
35
+ adapter.each_perservation_document_id_and_parent_ids do |id, parent_ids|
33
36
  recursive_reindex(id: id, parent_ids: parent_ids, time_to_live: maximum_nesting_depth)
34
37
  end
35
38
  end
36
39
 
37
40
  private
38
41
 
39
- attr_reader :maximum_nesting_depth, :processed_ids, :id_reindexer
42
+ attr_reader :maximum_nesting_depth, :processed_ids, :id_reindexer, :configuration
43
+
44
+ extend Forwardable
45
+ def_delegator :configuration, :adapter
46
+ def_delegator :configuration, :logger
40
47
 
41
48
  # When we find a document, reindex it if it doesn't have a parent. If it has a parent, reindex the parent first.
42
49
  #
@@ -47,9 +54,9 @@ module Samvera
47
54
  # walk up the parent graph to reindex the parents before we start on the child.
48
55
  def recursive_reindex(id:, parent_ids:, time_to_live:)
49
56
  return true if processed_ids.include?(id)
50
- raise Exceptions::CycleDetectionError, id if time_to_live <= 0
57
+ raise Exceptions::ExceededMaximumNestingDepthError, id: id if time_to_live <= 0
51
58
  parent_ids.each do |parent_id|
52
- grand_parent_ids = @adapter.find_preservation_parent_ids_for(id: parent_id)
59
+ grand_parent_ids = adapter.find_preservation_parent_ids_for(id: parent_id)
53
60
  recursive_reindex(id: parent_id, parent_ids: grand_parent_ids, time_to_live: maximum_nesting_depth - 1)
54
61
  end
55
62
  reindex_an_id(id)
@@ -59,6 +66,7 @@ module Samvera
59
66
  id_reindexer.call(id: id)
60
67
  processed_ids << id
61
68
  rescue StandardError => e
69
+ logger.error(e)
62
70
  raise Exceptions::ReindexingError.new(id, e)
63
71
  end
64
72
  end
@@ -1,5 +1,5 @@
1
1
  module Samvera
2
2
  module NestingIndexer
3
- VERSION = "0.7.0".freeze
3
+ VERSION = "0.8.0".freeze
4
4
  end
5
5
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: samvera-nesting_indexer
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.7.0
4
+ version: 0.8.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Jeremy Friesen
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2017-08-25 00:00:00.000000000 Z
11
+ date: 2017-11-17 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: bundler