samvera-nesting_indexer 0.8.0 → 1.0.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 37f64367d992a4d794752c4eda559a3e066933f3
4
- data.tar.gz: 5de4fc1861c009b962fecd391582bc2240f329b7
3
+ metadata.gz: bfb903c39e5837da3ce7162d2f16066b23b92127
4
+ data.tar.gz: 250a0310e9b7564e37e2e1ff7ac33b438fc83461
5
5
  SHA512:
6
- metadata.gz: dad5de0e38c405e321e7f520cb0c44137423fa942e81218ba809e4dc9dc12255b15d69edc59f1c2b8aaa866504da064c1df1d45cd059fcf3e5069f234b2d2990
7
- data.tar.gz: 6882d1ea6b0a40a9c738a420efe54ec3a984c591d16d20247bd3135f5a919883f249e13a3965585a7be9a6793e0ee5d1e83b656b8a858f832b71f7f14e195f6c
6
+ metadata.gz: 884293f1ed28ee9df34329d6799cab60fb7c356a914d04221814aa88b5e83ebd89a6f76d6076a5278c4547c93ecb88b9ddc50a52a14930c0e253d8be46ffa943
7
+ data.tar.gz: a53e33bec810987ea4d286d4e626e44aa6be24d72271a1c83994c149d47269c7751443e32ea80c62e2a281a3791f55ebc38052fbba409df34dc8a991bc7ead5c
data/README.md CHANGED
@@ -34,7 +34,7 @@ We have four attributes to consider for indexing the graph:
34
34
  1. id - the unique identifier for a document
35
35
  2. parent_ids - the ids for all of the parents of a given document
36
36
  3. pathnames - the paths to traverse from a root document to the given document
37
- 4. ancestors - the pathnames of each of the ancestors
37
+ 4. ancestors - the pathnames to each node that is an ancestor of the given node (e.g. pathname to my parent, pathname to my grandparent)
38
38
 
39
39
  See [Samvera::NestingIndexer::Documents::IndexDocument](./lib/samvera/nesting_indexer/documents.rb) for further discussion.
40
40
 
@@ -42,6 +42,8 @@ To reindex a single document, we leverage the [`Samvera::NestingIndexer.reindex_
42
42
 
43
43
  To reindex all of the documents, we leverage the [`Samvera::NestingIndexer.reindex_all!`](lib/samvera/nesting_indexer.rb) method. **Warning: This is a very slow process.**
44
44
 
45
+ With a node's pathname(s), we are able to query what are all of my descendants (both direct and indirect) by way of the ancestors.
46
+
45
47
  ## Examples
46
48
 
47
49
  Given the following PreservationDocuments:
@@ -53,19 +55,24 @@ Given the following PreservationDocuments:
53
55
  | C | A |
54
56
  | D | A, B |
55
57
  | E | C |
58
+ | F | D |
56
59
 
57
60
  If we were to reindex the above PreservationDocuments, we will generate the following IndexDocuments:
58
61
 
59
- | PID | Parents | Pathnames | Ancestors |
60
- |-----|---------|------------|-----------|
61
- | A | - | [A] | [] |
62
- | B | - | [B] | [] |
63
- | C | A | [A/C] | [A] |
64
- | D | A, B | [A/D, B/D] | [A, B] |
65
- | E | C | [A/C/E] | [A/C] |
62
+ | PID | Parents | Pathnames | Ancestors |
63
+ |-----|---------|----------------|------------------|
64
+ | A | - | [A] | [] |
65
+ | B | - | [B] | [] |
66
+ | C | A | [A/C] | [A] |
67
+ | D | A, B | [A/D, B/D] | [A, B] |
68
+ | E | C | [A/C/E] | [A, A/C] |
69
+ | F | D | [A/D/F, B/D/F] | [A, A/D, B, B/D] |
66
70
 
67
71
  For more scenarios, look at the [Reindex PID and Descendants specs](./spec/features/reindex_id_and_descendants_spec.rb).
68
72
 
73
+ * Given I want to find the direct descendants of A, then I can query all nodes with that have a parent of A.
74
+ * Given I want to find the direct and indirect descendants of A, then I can query all notes that have an ancestor entry of A.
75
+
69
76
  ## Adapters
70
77
 
71
78
  An [AbstractAdapter](./lib/samvera/nesting_indexer/adapters/abstract_adapter.rb) provides the method interface for others to build against.
@@ -95,6 +102,14 @@ end
95
102
 
96
103
  [See CurateND for Notre Dame's adaptor configuration](https://github.com/ndlib/samvera_nd/blob/6fbe79c9725c0f8b4641981044ec250c5163053b/config/initializers/samvera_config.rb#L32-L35).
97
104
 
105
+ ### Sequence Diagram for Reindexing a Single Document
106
+
107
+ The following sequence diagram documents the interactions in [Samvera::NestingIndexer::RelationshipReindexer](lib/samvera/nesting_indexer/relationship_reindexer.rb).
108
+
109
+ ![Reindex Relationship Diagram](documentation/reindex_relationship.mermaid.jpg)
110
+
111
+ See [the text-based version of Reindex Relationship diagram](documentation/reindex_relationship.mermaid), leveraging the [Mermaid syntax](https://mermaidjs.github.io).
112
+
98
113
  ## Considerations
99
114
 
100
115
  Given a single object A, when we reindex A, we:
@@ -115,9 +130,23 @@ The [`./spec/features/reindex_pid_and_descendants_spec.rb`](spec/features/reinde
115
130
 
116
131
  **NOTE: These guards to prevent indexing cyclic graphs do not prevent the underlying preservation document from creating its own cyclic graph.**
117
132
 
133
+ #### Detecting Possible Cycles Before Indexing
134
+
135
+ Given an up to date index and a document, then it is valid to nest the given document beneath any document that:
136
+
137
+ * Is not the given document
138
+ * Does not have one or more pathnames that includes the given document's ID
139
+
140
+ For examples of determining if we can nest a document within another document, see the [demonstration of nesting](./spec/features/demonstrating_nesting_spec.rb).
141
+
142
+ In implementations, you'll likely want to write a queries that answer:
143
+
144
+ * What are the valid IDs that I can nest within?
145
+ * What are the valid IDs in which I can nest within and am not already nested within?
146
+
118
147
  ## TODO
119
148
 
120
- - [ ] Incorporate additional logging
149
+ - [X] Incorporate additional logging
121
150
  - [ ] Build methods to allow for fanning out the reindexing. At present, when we reindex a node and its "children", we run that entire process within a single context. Likewise, we run a single process when reindexing EVERYTHING.
122
151
  - [ ] Promote from [samvera-labs](https://github.com/samvera-labs) to [samvera](https://github.com/samvera) via the [promotion process](http://samvera-labs.github.io/promotion.html).
123
152
  - [ ] Write adapter method to assist in guarding against self-ancestry. We could probably expose a base adapter that has the method through use of the other adapter methods.
data/Rakefile CHANGED
@@ -16,9 +16,7 @@ namespace :commitment do
16
16
  $stdout.puts "Checking commitment:code_coverage"
17
17
  coverage_percentage = JSON.parse(File.read('coverage/.last_run.json')).fetch('result').fetch('covered_percent').to_i
18
18
  goal = 100
19
- if goal > coverage_percentage
20
- abort("Code Coverage Goal Not Met:\n\t#{coverage_percentage}%\tExpected\n\t#{goal}%\tActual")
21
- end
19
+ abort("Code Coverage Goal Not Met:\n\t#{coverage_percentage}%\tExpected\n\t#{goal}%\tActual") if goal > coverage_percentage
22
20
  end
23
21
  end
24
22
 
@@ -0,0 +1,22 @@
1
+ sequenceDiagram
2
+ participant Application
3
+ participant Indexer as Samvera:: NestingIndexer
4
+ participant Adapter as Application Adapter Implementation
5
+ participant Index as Application Index Layer
6
+ participant Preservation as Application Preservation Layer
7
+
8
+ Application-->>Indexer: indexer.reindex_relationships(id:)
9
+ Indexer-->>Adapter: adapter.find_index_document_by(id:)
10
+ Adapter-->>Index: get index document
11
+ Note right of Adapter: See Samvera:: NestingIndexer:: Adapters:: AbstractAdapter for adapter implementation
12
+ Index-->>Indexer: coerce index document to indexer
13
+ Indexer-->>Indexer: enqueue index document
14
+ loop While queued documents
15
+ Indexer-->>Adapter: adapter.find_preservation_document_by(id:)
16
+ Adapter-->>Preservation: get preservation document
17
+ Preservation-->>Indexer: receive preservation document
18
+ Indexer-->>Indexer: guard_against_possiblity_of_self_ancestry
19
+ Indexer-->>Adapter: adapter.write_document_attributes_to_index_layer
20
+ Adapter-->>Index: write updated application index document
21
+ Indexer-->>Indexer: enqueue children
22
+ end
@@ -43,13 +43,24 @@ module Samvera
43
43
  end
44
44
 
45
45
  # @api public
46
+ # @deprecated Use .write_nesting_document_to_index_layer instead
46
47
  # @see README.md
47
48
  # @param id [String]
48
49
  # @param parent_ids [Array<String>]
49
50
  # @param ancestors [Array<String>]
50
51
  # @param pathnames [Array<String>]
52
+ # @param deepest_nested_depth [Integer]
51
53
  # @return Hash - the attributes written to the indexing layer
52
- def self.write_document_attributes_to_index_layer(id:, parent_ids:, ancestors:, pathnames:)
54
+ def self.write_document_attributes_to_index_layer(id:, parent_ids:, ancestors:, pathnames:, deepest_nested_depth:)
55
+ raise NotImplementedError
56
+ end
57
+
58
+ # @api public
59
+ # @since v1.0.0
60
+ # @see README.md
61
+ # @param nesting_document [Samvera::NestingIndexer::Document::IndexDocument]
62
+ # @return void
63
+ def self.write_nesting_document_to_index_layer(nesting_document:)
53
64
  raise NotImplementedError
54
65
  end
55
66
  end
@@ -59,9 +59,17 @@ module Samvera
59
59
  # @param parent_ids [Array<String>]
60
60
  # @param ancestors [Array<String>]
61
61
  # @param pathnames [Array<String>]
62
+ # @param deepest_nested_depth [Integer]
62
63
  # @return [Hash] - the attributes written to the indexing layer
63
- def self.write_document_attributes_to_index_layer(id:, parent_ids:, ancestors:, pathnames:)
64
- Index.write_document(id: id, parent_ids: parent_ids, ancestors: ancestors, pathnames: pathnames)
64
+ def self.write_document_attributes_to_index_layer(id:, parent_ids:, ancestors:, pathnames:, deepest_nested_depth:)
65
+ Index.write_document(id: id, parent_ids: parent_ids, ancestors: ancestors, pathnames: pathnames, deepest_nested_depth: deepest_nested_depth)
66
+ end
67
+
68
+ # @api public
69
+ # @see README.md
70
+ # @param nesting_document [Samvera::NestingIndexer::Documents::IndexDocument]
71
+ def self.write_nesting_document_to_index_layer(nesting_document:)
72
+ Index.write_to_storage(nesting_document)
65
73
  end
66
74
 
67
75
  # @api private
@@ -70,6 +78,12 @@ module Samvera
70
78
  Index.clear_cache!
71
79
  end
72
80
 
81
+ # @api private
82
+ # A convenience method for testing
83
+ def self.each_index_document(&block)
84
+ Index.each_index_document(&block)
85
+ end
86
+
73
87
  # @api private
74
88
  #
75
89
  # A module mixin to expose rudimentary read/write capabilities
@@ -146,12 +160,20 @@ module Samvera
146
160
  Storage.find(id)
147
161
  end
148
162
 
163
+ def self.each_index_document(&block)
164
+ Storage.find_each(&block)
165
+ end
166
+
149
167
  def self.each_child_document_of(document:, &block)
150
168
  Storage.find_children_of_id(document.id).each(&block)
151
169
  end
152
170
 
171
+ def self.write_to_storage(doc)
172
+ Storage.write(doc)
173
+ end
174
+
153
175
  def self.write_document(attributes = {})
154
- Documents::IndexDocument.new(attributes).tap { |doc| Storage.write(doc) }
176
+ Documents::IndexDocument.new(attributes).tap { |doc| write_to_storage(doc) }
155
177
  end
156
178
 
157
179
  # :nodoc:
@@ -82,8 +82,8 @@ if defined?(RSpec)
82
82
  describe '.write_document_attributes_to_index_layer' do
83
83
  subject { described_class.method(:write_document_attributes_to_index_layer) }
84
84
 
85
- it 'requires the :ancestors, :id, :parent_ids, and :pathnames keyword (and does not require any others)' do
86
- expect(required_keyword_parameters.call(subject)).to eq(%i(ancestors id parent_ids pathnames))
85
+ it 'requires the :ancestors, :deepest_nested_depth, :id, :parent_ids, and :pathnames keyword (and does not require any others)' do
86
+ expect(required_keyword_parameters.call(subject)).to eq(%i(ancestors deepest_nested_depth id parent_ids pathnames))
87
87
  end
88
88
 
89
89
  it 'does not require any other parameters (besides :attributes)' do
@@ -94,5 +94,21 @@ if defined?(RSpec)
94
94
  expect(block_parameter_extracter.call(subject)).to be_empty
95
95
  end
96
96
  end
97
+
98
+ describe '.write_nesting_document_to_index_layer' do
99
+ subject { described_class.method(:write_nesting_document_to_index_layer) }
100
+
101
+ it 'requires the :nesting_document' do
102
+ expect(required_keyword_parameters.call(subject)).to eq(%i(nesting_document))
103
+ end
104
+
105
+ it 'does not require any other parameters' do
106
+ expect(required_parameters.call(subject)).to eq(required_keyword_parameters.call(subject))
107
+ end
108
+
109
+ it 'does not expect a block' do
110
+ expect(block_parameter_extracter.call(subject)).to be_empty
111
+ end
112
+ end
97
113
  end
98
114
  end
@@ -42,6 +42,19 @@ module Samvera
42
42
  @ancestors = Array(keywords.fetch(:ancestors))
43
43
  end
44
44
 
45
+ # @api public
46
+ # @since v1.0.0
47
+ # @return [Hash<Symbol,>] the Ruby hash representation of this index document.
48
+ def to_hash
49
+ {
50
+ id: id,
51
+ parent_ids: parent_ids,
52
+ pathnames: pathnames,
53
+ ancestors: ancestors,
54
+ deepest_nested_depth: deepest_nested_depth
55
+ }
56
+ end
57
+
45
58
  # @api public
46
59
  # @return String The Fedora object's PID
47
60
  attr_reader :id
@@ -70,11 +83,22 @@ module Samvera
70
83
  #
71
84
  # All of the :pathnames of each of the documents ancestors. If I have A, with parent B, and B has
72
85
  # parents C and D then we have the following ancestors:
73
- # [D/B], [C/B]
86
+ # [D], [C], [D/B], [C/B]
74
87
  #
75
88
  # @return Array<String>
76
89
  attr_reader :ancestors
77
90
 
91
+ # @api public
92
+ # @since v1.0.0
93
+ #
94
+ # The largest nesting depth of this document. If I have A ={ B ={ C and D ={ C, then
95
+ # deepest_nested_depth is 3.
96
+ #
97
+ # @return Integer
98
+ def deepest_nested_depth
99
+ pathnames.map(&:length).max
100
+ end
101
+
78
102
  def sorted_parent_ids
79
103
  parent_ids.sort
80
104
  end
@@ -5,7 +5,7 @@ require 'set'
5
5
  module Samvera
6
6
  # Establishing namespace
7
7
  module NestingIndexer
8
- # Responsible for reindexing the PID and its descendants
8
+ # Responsible for reindexing the document associated with the given PID and its descendant documents
9
9
  # @note There is cycle detection via the Samvera::NestingIndexer::Configuration#maximum_nesting_depth counter
10
10
  # @api private
11
11
  class RelationshipReindexer
@@ -24,34 +24,38 @@ module Samvera
24
24
  # @param configuration [#adapter, #logger] The :adapter conforms to the Samvera::NestingIndexer::Adapters::AbstractAdapter interface
25
25
  # and the :logger conforms to Logger
26
26
  # @param queue [#shift, #push] queue
27
- def initialize(id:, maximum_nesting_depth:, configuration:, queue: [], visited_ids: [])
27
+ def initialize(id:, maximum_nesting_depth:, configuration:, queue: [])
28
28
  @id = id.to_s
29
29
  @maximum_nesting_depth = maximum_nesting_depth.to_i
30
30
  @configuration = configuration
31
31
  @queue = queue
32
- @visited_ids = visited_ids
33
32
  end
34
33
  attr_reader :id, :maximum_nesting_depth
35
34
 
36
- # Perform a bread-first tree traversal of the initial document and its descendants.
37
- # rubocop:disable Metrics/AbcSize
35
+ # Perform a breadth-first tree traversal of the initial document and its descendants.
36
+ # We index the document, then queue up each of its children. For each child, queue up the child's children.
38
37
  def call
39
38
  wrap_logging("nested indexing of ID=#{initial_index_document.id.inspect}") do
40
39
  enqueue(initial_index_document, maximum_nesting_depth)
41
- processing_document = dequeue
42
- while processing_document
43
- process_a_document(processing_document)
44
- adapter.each_child_document_of(document: processing_document) { |child| enqueue(child, processing_document.maximum_nesting_depth - 1) }
45
- processing_document = dequeue
46
- end
40
+ process_each_document
47
41
  end
48
42
  self
49
43
  end
50
- # rubocop:enbable Metrics/AbcSize
51
44
 
52
45
  private
53
46
 
54
- attr_reader :queue, :configuration, :visited_ids
47
+ attr_reader :queue, :configuration
48
+
49
+ def process_each_document
50
+ processing_document = dequeue
51
+ while processing_document
52
+ process_a_document(processing_document)
53
+ adapter.each_child_document_of(document: processing_document) do |child|
54
+ enqueue(child, processing_document.maximum_nesting_depth - 1)
55
+ end
56
+ processing_document = dequeue
57
+ end
58
+ end
55
59
 
56
60
  def initial_index_document
57
61
  adapter.find_index_document_by(id: id)
@@ -76,19 +80,20 @@ module Samvera
76
80
  queue.push(ProcessingDocument.new(document, maximum_nesting_depth))
77
81
  end
78
82
 
83
+ # rubocop:disable Metrics/AbcSize
79
84
  def process_a_document(index_document)
80
85
  raise Exceptions::ExceededMaximumNestingDepthError, id: id if index_document.maximum_nesting_depth <= 0
81
86
  wrap_logging("indexing ID=#{index_document.id.inspect}") do
82
87
  preservation_document = adapter.find_preservation_document_by(id: index_document.id)
83
- parent_ids_and_path_and_ancestors = parent_ids_and_path_and_ancestors_for(preservation_document)
84
- guard_against_possiblity_of_self_ancestry(index_document: index_document, pathnames: parent_ids_and_path_and_ancestors.fetch(:pathnames))
85
- adapter.write_document_attributes_to_index_layer(**parent_ids_and_path_and_ancestors)
86
- visited_ids << index_document.id
88
+ nesting_document = build_nesting_document_for(preservation_document)
89
+ guard_against_possiblity_of_self_ancestry(index_document: index_document, pathnames: nesting_document.pathnames)
90
+ adapter.write_nesting_document_to_index_layer(nesting_document: nesting_document)
87
91
  end
88
92
  end
93
+ # rubocop:enable Metrics/AbcSize
89
94
 
90
- def parent_ids_and_path_and_ancestors_for(preservation_document)
91
- ParentAndPathAndAncestorsBuilder.new(preservation_document, adapter).to_hash
95
+ def build_nesting_document_for(preservation_document)
96
+ ParentAndPathAndAncestorsBuilder.new(preservation_document, adapter).nesting_document
92
97
  end
93
98
 
94
99
  def guard_against_possiblity_of_self_ancestry(index_document:, pathnames:)
@@ -104,7 +109,7 @@ module Samvera
104
109
  logger.debug("Ending #{message_suffix}")
105
110
  end
106
111
 
107
- # A small object that helps encapsulate the logic of building the hash of information regarding
112
+ # A small object that helps encapsulate the logic for building the hash of information regarding
108
113
  # the initialization of an Samvera::NestingIndexer::Documents::IndexDocument
109
114
  #
110
115
  # @see Samvera::NestingIndexer::Documents::IndexDocument for details on pathnames, ancestors, and parent_ids.
@@ -116,11 +121,10 @@ module Samvera
116
121
  @ancestors = Set.new
117
122
  @adapter = adapter
118
123
  compile!
124
+ @nesting_document = Documents::IndexDocument.new(id: @preservation_document.id, parent_ids: @parent_ids, pathnames: @pathnames, ancestors: @ancestors)
119
125
  end
120
126
 
121
- def to_hash
122
- { id: @preservation_document.id, parent_ids: @parent_ids.to_a, pathnames: @pathnames.to_a, ancestors: @ancestors.to_a }
123
- end
127
+ attr_reader :nesting_document
124
128
 
125
129
  private
126
130
 
@@ -140,7 +144,9 @@ module Samvera
140
144
  parent_index_document.pathnames.each do |pathname|
141
145
  @pathnames << "#{pathname}#{Documents::ANCESTOR_AND_PATHNAME_DELIMITER}#{@preservation_document.id}"
142
146
  slugs = pathname.split(Documents::ANCESTOR_AND_PATHNAME_DELIMITER)
143
- slugs.each_index { |i| @ancestors << slugs[0..i].join(Documents::ANCESTOR_AND_PATHNAME_DELIMITER) }
147
+ slugs.each_index do |i|
148
+ @ancestors << slugs[0..i].join(Documents::ANCESTOR_AND_PATHNAME_DELIMITER)
149
+ end
144
150
  end
145
151
  @ancestors += parent_index_document.ancestors
146
152
  end
@@ -1,5 +1,5 @@
1
1
  module Samvera
2
2
  module NestingIndexer
3
- VERSION = "0.8.0".freeze
3
+ VERSION = "1.0.0".freeze
4
4
  end
5
5
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: samvera-nesting_indexer
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.8.0
4
+ version: 1.0.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Jeremy Friesen
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2017-11-17 00:00:00.000000000 Z
11
+ date: 2018-01-12 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: bundler
@@ -254,6 +254,8 @@ files:
254
254
  - Rakefile
255
255
  - bin/console
256
256
  - bin/setup
257
+ - documentation/reindex_relationship.mermaid
258
+ - documentation/reindex_relationship.mermaid.jpg
257
259
  - lib/samvera/nesting_indexer.rb
258
260
  - lib/samvera/nesting_indexer/adapters.rb
259
261
  - lib/samvera/nesting_indexer/adapters/abstract_adapter.rb