samvera-nesting_indexer 0.7.0 → 0.8.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 3933e26eda523a5609b140bccf34d4c291669292
4
- data.tar.gz: d6d13b4ba28d3863aaaf32f40a8524de8f37ab5b
3
+ metadata.gz: 37f64367d992a4d794752c4eda559a3e066933f3
4
+ data.tar.gz: 5de4fc1861c009b962fecd391582bc2240f329b7
5
5
  SHA512:
6
- metadata.gz: fbd5fa597ab1afc270d34a8e501e39a9aa6d066f3442b87aa7f692c9bbbd8d719dcd6c08cd196e0b95a17dc13918588b33fad42b8f86e4cb65d0e25d31297f8b
7
- data.tar.gz: 164221f096a42d550d624f6658fbc0e3127560935f05684f573c53de5c7772b4b68665a5058788a32fda61c06fbaf41936da0ea3bb32505f0cc874b7a9b1b3ec
6
+ metadata.gz: dad5de0e38c405e321e7f520cb0c44137423fa942e81218ba809e4dc9dc12255b15d69edc59f1c2b8aaa866504da064c1df1d45cd059fcf3e5069f234b2d2990
7
+ data.tar.gz: 6882d1ea6b0a40a9c738a420efe54ec3a984c591d16d20247bd3135f5a919883f249e13a3965585a7be9a6793e0ee5d1e83b656b8a858f832b71f7f14e195f6c
data/README.md CHANGED
@@ -18,6 +18,10 @@ The Samvera::NestingIndexer gem is responsible for indexing the graph relationsh
18
18
 
19
19
  This is a sandbox to work through the reindexing strategy as it relates to [CurateND Collections](https://github.com/ndlib/samvera_nd/issues/420). At this point the code is separate to allow for raid testing and prototyping (no sense spinning up SOLR and Fedora to walk an arbitrary graph).
20
20
 
21
+ ### Notation
22
+
23
+ When B is a member of A, I am using the `A ={ B` notation. When C is a member of B and B is a member of A, I'll chain these together `A ={ B ={ C`.
24
+
21
25
  ## Concepts
22
26
 
23
27
  As we are indexing objects, we have two types of documents:
@@ -36,6 +40,8 @@ See [Samvera::NestingIndexer::Documents::IndexDocument](./lib/samvera/nesting_in
36
40
 
37
41
  To reindex a single document, we leverage the [`Samvera::NestingIndexer.reindex_relationships`](./lib/samvera/nesting_indexer.rb) method.
38
42
 
43
+ To reindex all of the documents, we leverage the [`Samvera::NestingIndexer.reindex_all!`](lib/samvera/nesting_indexer.rb) method. **Warning: This is a very slow process.**
44
+
39
45
  ## Examples
40
46
 
41
47
  Given the following PreservationDocuments:
@@ -87,10 +93,7 @@ RSpec.describe MyCustomAdapter
87
93
  end
88
94
  ```
89
95
 
90
-
91
-
92
-
93
- [See CurateND for our adaptor configuration](https://github.com/ndlib/samvera_nd/blob/6fbe79c9725c0f8b4641981044ec250c5163053b/config/initializers/samvera_config.rb#L32-L35).
96
+ [See CurateND for Notre Dame's adaptor configuration](https://github.com/ndlib/samvera_nd/blob/6fbe79c9725c0f8b4641981044ec250c5163053b/config/initializers/samvera_config.rb#L32-L35).
94
97
 
95
98
  ## Considerations
96
99
 
@@ -100,3 +103,21 @@ Given a single object A, when we reindex A, we:
100
103
  * Iterate through each descendant, in a breadth-first process, to reindex it (and each descendant's descendants).
101
104
 
102
105
  This is a potentially time consumptive process and should not be run within the request cycle.
106
+
107
+ ### Cycle Detections
108
+
109
+ When dealing with nested graphs, there is a danger of creating an cycle (e.g. `A ={ B ={ A`). Samvera::NestingIndexer implements two guards to short-circuit the indexing of cyclic graphs:
110
+
111
+ * Enforcing a maximum nesting depth of the graph
112
+ * Checking that an object is not its own ancestor (`Samvera::NestingIndexer::RelationshipReindexer#guard_against_possiblity_of_self_ancestry`)
113
+
114
+ The [`./spec/features/reindex_pid_and_descendants_spec.rb`](spec/features/reindex_pid_and_descendants_spec.rb) contains examples of behavior.
115
+
116
+ **NOTE: These guards to prevent indexing cyclic graphs do not prevent the underlying preservation document from creating its own cyclic graph.**
117
+
118
+ ## TODO
119
+
120
+ - [ ] Incorporate additional logging
121
+ - [ ] Build methods to allow for fanning out the reindexing. At present, when we reindex a node and its "children", we run that entire process within a single context. Likewise, we run a single process when reindexing EVERYTHING.
122
+ - [ ] Promote from [samvera-labs](https://github.com/samvera-labs) to [samvera](https://github.com/samvera) via the [promotion process](http://samvera-labs.github.io/promotion.html).
123
+ - [ ] Write adapter method to assist in guarding against self-ancestry. We could probably expose a base adapter that has the method through use of the other adapter methods.
@@ -13,11 +13,14 @@ module Samvera
13
13
  # In a perfect world we could reindex the id as well; But that is for another test.
14
14
  #
15
15
  # @param id [String] - The permanent identifier of the object that will be reindexed along with its children.
16
- # @param maximum_nesting_depth [Integer] - there to guard against cyclical graphs
16
+ # @param maximum_nesting_depth [Integer] - used to short-circuit overly deep nesting as well as prevent accidental cyclic graphs
17
+ # from creating an infinite loop.
17
18
  # @return [Boolean] - It was successful
18
- # @raise Samvera::Exceptions::CycleDetectionError - A potential cycle was detected
19
+ # @raise Samvera::Exceptions::CycleDetectionError - A possible cycle was detected
20
+ # @raise Samvera::Exceptions::ExceededMaximumNestingDepthError - We exceeded our maximum depth
21
+ # @raise Samvera::Exceptions::DocumentIsItsOwnAncestorError - A document we were about to index appeared to be its own ancestor
19
22
  def self.reindex_relationships(id:, maximum_nesting_depth: configuration.maximum_nesting_depth)
20
- RelationshipReindexer.call(id: id, maximum_nesting_depth: maximum_nesting_depth, adapter: adapter)
23
+ RelationshipReindexer.call(id: id, maximum_nesting_depth: maximum_nesting_depth, configuration: configuration)
21
24
  true
22
25
  end
23
26
 
@@ -29,14 +32,14 @@ module Samvera
29
32
 
30
33
  # @api public
31
34
  # Responsible for reindexing the entire preservation layer.
32
- # @param maximum_nesting_depth [Integer] - there to guard against cyclical graphs
35
+ # @param maximum_nesting_depth [Integer] - there to guard against cyclic graphs
33
36
  # @return [Boolean] - It was successful
34
- # @raise Samvera::Exceptions::CycleDetectionError - A potential cycle was detected
37
+ # @raise Samvera::Exceptions::ReindexingError - There was a problem reindexing the graph.
35
38
  def self.reindex_all!(maximum_nesting_depth: configuration.maximum_nesting_depth)
36
39
  # While the RepositoryReindexer is responsible for reindexing everything, I
37
40
  # want to inject the lambda that will reindex a single item.
38
41
  id_reindexer = method(:reindex_relationships)
39
- RepositoryReindexer.call(maximum_nesting_depth: maximum_nesting_depth, id_reindexer: id_reindexer, adapter: adapter)
42
+ RepositoryReindexer.call(maximum_nesting_depth: maximum_nesting_depth, id_reindexer: id_reindexer, configuration: configuration)
40
43
  true
41
44
  end
42
45
 
@@ -88,7 +88,7 @@ module Samvera
88
88
  end
89
89
 
90
90
  def find_each
91
- cache.each { |_key, document| yield(document) }
91
+ cache.each_value { |document| yield(document) }
92
92
  end
93
93
 
94
94
  def clear_cache!
@@ -61,7 +61,7 @@ if defined?(RSpec)
61
61
  end
62
62
 
63
63
  it 'expects a block' do
64
- expect(block_parameter_extracter.call(subject)).to be_present
64
+ expect(block_parameter_extracter.call(subject)).to eq([:block])
65
65
  end
66
66
  end
67
67
  describe '.each_child_document_of' do
@@ -76,13 +76,13 @@ if defined?(RSpec)
76
76
  end
77
77
 
78
78
  it 'expects a block' do
79
- expect(block_parameter_extracter.call(subject)).to be_present
79
+ expect(block_parameter_extracter.call(subject)).to eq([:block])
80
80
  end
81
81
  end
82
82
  describe '.write_document_attributes_to_index_layer' do
83
83
  subject { described_class.method(:write_document_attributes_to_index_layer) }
84
84
 
85
- it 'requires the :attributes keyword (and does not require any others)' do
85
+ it 'requires the :ancestors, :id, :parent_ids, and :pathnames keyword (and does not require any others)' do
86
86
  expect(required_keyword_parameters.call(subject)).to eq(%i(ancestors id parent_ids pathnames))
87
87
  end
88
88
 
@@ -1,5 +1,6 @@
1
1
  require 'samvera/nesting_indexer/adapters/abstract_adapter'
2
2
  require 'samvera/nesting_indexer/exceptions'
3
+ require 'logger'
3
4
 
4
5
  module Samvera
5
6
  # :nodoc:
@@ -9,11 +10,14 @@ module Samvera
9
10
  class Configuration
10
11
  DEFAULT_MAXIMUM_NESTING_DEPTH = 15
11
12
 
12
- def initialize(maximum_nesting_depth: DEFAULT_MAXIMUM_NESTING_DEPTH)
13
+ def initialize(maximum_nesting_depth: DEFAULT_MAXIMUM_NESTING_DEPTH, logger: default_logger)
13
14
  self.maximum_nesting_depth = maximum_nesting_depth
15
+ self.logger = logger
14
16
  end
15
17
 
16
- attr_reader :maximum_nesting_depth
18
+ attr_reader :maximum_nesting_depth, :logger
19
+
20
+ attr_writer :logger
17
21
 
18
22
  def maximum_nesting_depth=(input)
19
23
  @maximum_nesting_depth = input.to_i
@@ -68,6 +72,14 @@ module Samvera
68
72
  require 'samvera/nesting_indexer/adapters/in_memory_adapter'
69
73
  Adapters::InMemoryAdapter
70
74
  end
75
+
76
+ def default_logger
77
+ if defined?(Rails.logger)
78
+ Rails.logger
79
+ else
80
+ Logger.new($stdout)
81
+ end
82
+ end
71
83
  end
72
84
  private_constant :Configuration
73
85
  end
@@ -3,6 +3,8 @@ require 'dry-equalizer'
3
3
  module Samvera
4
4
  module NestingIndexer
5
5
  module Documents
6
+ ANCESTOR_AND_PATHNAME_DELIMITER = '/'.freeze
7
+
6
8
  # @api public
7
9
  #
8
10
  # A simplified document that reflects the necessary attributes for re-indexing
@@ -30,9 +30,34 @@ module Samvera
30
30
  # Raised when we may have detected a cycle within the graph
31
31
  class CycleDetectionError < RuntimeError
32
32
  attr_reader :id
33
- def initialize(id)
33
+ def initialize(id:)
34
34
  @id = id
35
- super "Possible graph cycle discovered related to PID=#{id}."
35
+ super to_s
36
+ end
37
+
38
+ def to_s
39
+ "Possible graph cycle discovered related to ID=#{id.inspect}."
40
+ end
41
+ end
42
+
43
+ # Raised when we have exceeded the time to live constraint
44
+ # @see Samvera::NestingIndexer::Configuration.maximum_nesting_depth
45
+ class ExceededMaximumNestingDepthError < CycleDetectionError
46
+ def to_s
47
+ "Exceeded maximum nesting depth while indexing ID=#{id.inspect}."
48
+ end
49
+ end
50
+
51
+ # Raised when we encounter a document that is to be indexed as its own ancestor.
52
+ class DocumentIsItsOwnAncestorError < CycleDetectionError
53
+ attr_reader :pathnames
54
+ def initialize(id:, pathnames:)
55
+ super(id: id)
56
+ @pathnames = pathnames
57
+ end
58
+
59
+ def to_s
60
+ "Document with ID=#{id.inspect} is marked as its own ancestor based on the given pathnames: #{pathnames.inspect}."
36
61
  end
37
62
  end
38
63
  # A wrapper exception that includes the original exception and the id
@@ -41,7 +66,7 @@ module Samvera
41
66
  def initialize(id, original_exception)
42
67
  @id = id
43
68
  @original_exception = original_exception
44
- super "Error PID=#{id} - #{original_exception}"
69
+ super "ReindexingError on ID=#{id.inspect}\n\t#{original_exception}"
45
70
  end
46
71
  end
47
72
  end
@@ -6,7 +6,7 @@ module Samvera
6
6
  # Establishing namespace
7
7
  module NestingIndexer
8
8
  # Responsible for reindexing the PID and its descendants
9
- # @note There is cycle detection via the TIME_TO_LIVE counter
9
+ # @note There is cycle detection via the Samvera::NestingIndexer::Configuration#maximum_nesting_depth counter
10
10
  # @api private
11
11
  class RelationshipReindexer
12
12
  # @api private
@@ -20,32 +20,38 @@ module Samvera
20
20
  end
21
21
 
22
22
  # @param id [String]
23
- # @param maximum_nesting_depth [Integer] Samvera::NestingIndexer::TIME_TO_LIVE to detect cycles in the graph
24
- # @param adapter [Samvera::NestingIndexer::Adapters::AbstractAdapter] Conforms to the Samvera::NestingIndexer::Adapters::AbstractAdapter interface
23
+ # @param maximum_nesting_depth [Integer] What is the maximum allowed depth of nesting
24
+ # @param configuration [#adapter, #logger] The :adapter conforms to the Samvera::NestingIndexer::Adapters::AbstractAdapter interface
25
+ # and the :logger conforms to Logger
25
26
  # @param queue [#shift, #push] queue
26
- def initialize(id:, maximum_nesting_depth:, adapter:, queue: [])
27
+ def initialize(id:, maximum_nesting_depth:, configuration:, queue: [], visited_ids: [])
27
28
  @id = id.to_s
28
29
  @maximum_nesting_depth = maximum_nesting_depth.to_i
29
- @adapter = adapter
30
+ @configuration = configuration
30
31
  @queue = queue
32
+ @visited_ids = visited_ids
31
33
  end
32
- attr_reader :id, :maximum_nesting_depth, :queue, :adapter
34
+ attr_reader :id, :maximum_nesting_depth
33
35
 
34
36
  # Perform a bread-first tree traversal of the initial document and its descendants.
37
+ # rubocop:disable Metrics/AbcSize
35
38
  def call
36
- enqueue(initial_index_document, maximum_nesting_depth)
37
- processing_document = dequeue
38
- while processing_document
39
- process_a_document(processing_document)
40
- adapter.each_child_document_of(document: processing_document) { |child| enqueue(child, processing_document.maximum_nesting_depth - 1) }
39
+ wrap_logging("nested indexing of ID=#{initial_index_document.id.inspect}") do
40
+ enqueue(initial_index_document, maximum_nesting_depth)
41
41
  processing_document = dequeue
42
+ while processing_document
43
+ process_a_document(processing_document)
44
+ adapter.each_child_document_of(document: processing_document) { |child| enqueue(child, processing_document.maximum_nesting_depth - 1) }
45
+ processing_document = dequeue
46
+ end
42
47
  end
43
48
  self
44
49
  end
50
+ # rubocop:enbable Metrics/AbcSize
45
51
 
46
52
  private
47
53
 
48
- attr_writer :document
54
+ attr_reader :queue, :configuration, :visited_ids
49
55
 
50
56
  def initial_index_document
51
57
  adapter.find_index_document_by(id: id)
@@ -53,6 +59,8 @@ module Samvera
53
59
 
54
60
  extend Forwardable
55
61
  def_delegator :queue, :shift, :dequeue
62
+ def_delegator :configuration, :adapter
63
+ def_delegator :configuration, :logger
56
64
 
57
65
  require 'delegate'
58
66
  # A small object to help track time to live concerns
@@ -69,16 +77,33 @@ module Samvera
69
77
  end
70
78
 
71
79
  def process_a_document(index_document)
72
- raise Exceptions::CycleDetectionError, id if index_document.maximum_nesting_depth <= 0
73
- preservation_document = adapter.find_preservation_document_by(id: index_document.id)
74
- parent_ids_and_path_and_ancestors = parent_ids_and_path_and_ancestors_for(preservation_document)
75
- adapter.write_document_attributes_to_index_layer(**parent_ids_and_path_and_ancestors)
80
+ raise Exceptions::ExceededMaximumNestingDepthError, id: id if index_document.maximum_nesting_depth <= 0
81
+ wrap_logging("indexing ID=#{index_document.id.inspect}") do
82
+ preservation_document = adapter.find_preservation_document_by(id: index_document.id)
83
+ parent_ids_and_path_and_ancestors = parent_ids_and_path_and_ancestors_for(preservation_document)
84
+ guard_against_possiblity_of_self_ancestry(index_document: index_document, pathnames: parent_ids_and_path_and_ancestors.fetch(:pathnames))
85
+ adapter.write_document_attributes_to_index_layer(**parent_ids_and_path_and_ancestors)
86
+ visited_ids << index_document.id
87
+ end
76
88
  end
77
89
 
78
90
  def parent_ids_and_path_and_ancestors_for(preservation_document)
79
91
  ParentAndPathAndAncestorsBuilder.new(preservation_document, adapter).to_hash
80
92
  end
81
93
 
94
+ def guard_against_possiblity_of_self_ancestry(index_document:, pathnames:)
95
+ pathnames.each do |pathname|
96
+ next unless pathname.include?("#{index_document.id}/")
97
+ raise Exceptions::DocumentIsItsOwnAncestorError, id: index_document.id, pathnames: pathnames
98
+ end
99
+ end
100
+
101
+ def wrap_logging(message_suffix)
102
+ logger.debug("Starting #{message_suffix}")
103
+ yield
104
+ logger.debug("Ending #{message_suffix}")
105
+ end
106
+
82
107
  # A small object that helps encapsulate the logic of building the hash of information regarding
83
108
  # the initialization of an Samvera::NestingIndexer::Documents::IndexDocument
84
109
  #
@@ -113,9 +138,9 @@ module Samvera
113
138
  def compile_one!(parent_index_document)
114
139
  @parent_ids << parent_index_document.id
115
140
  parent_index_document.pathnames.each do |pathname|
116
- @pathnames << File.join(pathname, @preservation_document.id)
117
- slugs = pathname.split("/")
118
- slugs.each_index { |i| @ancestors << slugs[0..i].join('/') }
141
+ @pathnames << "#{pathname}#{Documents::ANCESTOR_AND_PATHNAME_DELIMITER}#{@preservation_document.id}"
142
+ slugs = pathname.split(Documents::ANCESTOR_AND_PATHNAME_DELIMITER)
143
+ slugs.each_index { |i| @ancestors << slugs[0..i].join(Documents::ANCESTOR_AND_PATHNAME_DELIMITER) }
119
144
  end
120
145
  @ancestors += parent_index_document.ancestors
121
146
  end
@@ -1,3 +1,5 @@
1
+ require 'samvera/nesting_indexer/exceptions'
2
+ require 'forwardable'
1
3
  module Samvera
2
4
  # Establishing namespace
3
5
  module NestingIndexer
@@ -19,24 +21,29 @@ module Samvera
19
21
 
20
22
  # @param id_reindexer [#call] Samvera::NestingIndexer.method(:reindex_relationships) Responsible for reindexing a single object
21
23
  # @param maximum_nesting_depth [Integer] detect cycles in the graph
22
- # @param adapter [Samvera::NestingIndexer::Adapters::AbstractAdapter] Conforms to the Samvera::NestingIndexer::Adapters::AbstractAdapter interface
23
- def initialize(maximum_nesting_depth:, id_reindexer:, adapter:)
24
+ # @param configuration [#adapter, #logger] The :adapter conforms to the Samvera::NestingIndexer::Adapters::AbstractAdapter interface
25
+ # and the :logger conforms to Logger
26
+ def initialize(maximum_nesting_depth:, id_reindexer:, configuration:)
24
27
  @maximum_nesting_depth = maximum_nesting_depth.to_i
25
28
  @id_reindexer = id_reindexer
26
- @adapter = adapter
29
+ @configuration = configuration
27
30
  @processed_ids = []
28
31
  end
29
32
 
30
33
  # @todo Would it make sense to leverage an each_preservation_id instead?
31
34
  def call
32
- @adapter.each_perservation_document_id_and_parent_ids do |id, parent_ids|
35
+ adapter.each_perservation_document_id_and_parent_ids do |id, parent_ids|
33
36
  recursive_reindex(id: id, parent_ids: parent_ids, time_to_live: maximum_nesting_depth)
34
37
  end
35
38
  end
36
39
 
37
40
  private
38
41
 
39
- attr_reader :maximum_nesting_depth, :processed_ids, :id_reindexer
42
+ attr_reader :maximum_nesting_depth, :processed_ids, :id_reindexer, :configuration
43
+
44
+ extend Forwardable
45
+ def_delegator :configuration, :adapter
46
+ def_delegator :configuration, :logger
40
47
 
41
48
  # When we find a document, reindex it if it doesn't have a parent. If it has a parent, reindex the parent first.
42
49
  #
@@ -47,9 +54,9 @@ module Samvera
47
54
  # walk up the parent graph to reindex the parents before we start on the child.
48
55
  def recursive_reindex(id:, parent_ids:, time_to_live:)
49
56
  return true if processed_ids.include?(id)
50
- raise Exceptions::CycleDetectionError, id if time_to_live <= 0
57
+ raise Exceptions::ExceededMaximumNestingDepthError, id: id if time_to_live <= 0
51
58
  parent_ids.each do |parent_id|
52
- grand_parent_ids = @adapter.find_preservation_parent_ids_for(id: parent_id)
59
+ grand_parent_ids = adapter.find_preservation_parent_ids_for(id: parent_id)
53
60
  recursive_reindex(id: parent_id, parent_ids: grand_parent_ids, time_to_live: maximum_nesting_depth - 1)
54
61
  end
55
62
  reindex_an_id(id)
@@ -59,6 +66,7 @@ module Samvera
59
66
  id_reindexer.call(id: id)
60
67
  processed_ids << id
61
68
  rescue StandardError => e
69
+ logger.error(e)
62
70
  raise Exceptions::ReindexingError.new(id, e)
63
71
  end
64
72
  end
@@ -1,5 +1,5 @@
1
1
  module Samvera
2
2
  module NestingIndexer
3
- VERSION = "0.7.0".freeze
3
+ VERSION = "0.8.0".freeze
4
4
  end
5
5
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: samvera-nesting_indexer
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.7.0
4
+ version: 0.8.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Jeremy Friesen
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2017-08-25 00:00:00.000000000 Z
11
+ date: 2017-11-17 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: bundler