samvera-nesting_indexer 0.8.0 → 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 37f64367d992a4d794752c4eda559a3e066933f3
4
- data.tar.gz: 5de4fc1861c009b962fecd391582bc2240f329b7
3
+ metadata.gz: bfb903c39e5837da3ce7162d2f16066b23b92127
4
+ data.tar.gz: 250a0310e9b7564e37e2e1ff7ac33b438fc83461
5
5
  SHA512:
6
- metadata.gz: dad5de0e38c405e321e7f520cb0c44137423fa942e81218ba809e4dc9dc12255b15d69edc59f1c2b8aaa866504da064c1df1d45cd059fcf3e5069f234b2d2990
7
- data.tar.gz: 6882d1ea6b0a40a9c738a420efe54ec3a984c591d16d20247bd3135f5a919883f249e13a3965585a7be9a6793e0ee5d1e83b656b8a858f832b71f7f14e195f6c
6
+ metadata.gz: 884293f1ed28ee9df34329d6799cab60fb7c356a914d04221814aa88b5e83ebd89a6f76d6076a5278c4547c93ecb88b9ddc50a52a14930c0e253d8be46ffa943
7
+ data.tar.gz: a53e33bec810987ea4d286d4e626e44aa6be24d72271a1c83994c149d47269c7751443e32ea80c62e2a281a3791f55ebc38052fbba409df34dc8a991bc7ead5c
data/README.md CHANGED
@@ -34,7 +34,7 @@ We have four attributes to consider for indexing the graph:
34
34
  1. id - the unique identifier for a document
35
35
  2. parent_ids - the ids for all of the parents of a given document
36
36
  3. pathnames - the paths to traverse from a root document to the given document
37
- 4. ancestors - the pathnames of each of the ancestors
37
+ 4. ancestors - the pathnames to each node that is an ancestor of the given node (e.g. pathname to my parent, pathname to my grandparent)
38
38
 
39
39
  See [Samvera::NestingIndexer::Documents::IndexDocument](./lib/samvera/nesting_indexer/documents.rb) for further discussion.
40
40
 
@@ -42,6 +42,8 @@ To reindex a single document, we leverage the [`Samvera::NestingIndexer.reindex_
42
42
 
43
43
  To reindex all of the documents, we leverage the [`Samvera::NestingIndexer.reindex_all!`](lib/samvera/nesting_indexer.rb) method. **Warning: This is a very slow process.**
44
44
 
45
+ With a node's pathname(s), we are able to query what are all of my descendants (both direct and indirect) by way of the ancestors.
46
+
45
47
  ## Examples
46
48
 
47
49
  Given the following PreservationDocuments:
@@ -53,19 +55,24 @@ Given the following PreservationDocuments:
53
55
  | C | A |
54
56
  | D | A, B |
55
57
  | E | C |
58
+ | F | D |
56
59
 
57
60
  If we were to reindex the above PreservationDocuments, we will generate the following IndexDocuments:
58
61
 
59
- | PID | Parents | Pathnames | Ancestors |
60
- |-----|---------|------------|-----------|
61
- | A | - | [A] | [] |
62
- | B | - | [B] | [] |
63
- | C | A | [A/C] | [A] |
64
- | D | A, B | [A/D, B/D] | [A, B] |
65
- | E | C | [A/C/E] | [A/C] |
62
+ | PID | Parents | Pathnames | Ancestors |
63
+ |-----|---------|----------------|------------------|
64
+ | A | - | [A] | [] |
65
+ | B | - | [B] | [] |
66
+ | C | A | [A/C] | [A] |
67
+ | D | A, B | [A/D, B/D] | [A, B] |
68
+ | E | C | [A/C/E] | [A, A/C] |
69
+ | F | D | [A/D/F, B/D/F] | [A, A/D, B, B/D] |
66
70
 
67
71
  For more scenarios, look at the [Reindex PID and Descendants specs](./spec/features/reindex_id_and_descendants_spec.rb).
68
72
 
73
+ * Given I want to find the direct descendants of A, then I can query all nodes with that have a parent of A.
74
+ * Given I want to find the direct and indirect descendants of A, then I can query all notes that have an ancestor entry of A.
75
+
69
76
  ## Adapters
70
77
 
71
78
  An [AbstractAdapter](./lib/samvera/nesting_indexer/adapters/abstract_adapter.rb) provides the method interface for others to build against.
@@ -95,6 +102,14 @@ end
95
102
 
96
103
  [See CurateND for Notre Dame's adaptor configuration](https://github.com/ndlib/samvera_nd/blob/6fbe79c9725c0f8b4641981044ec250c5163053b/config/initializers/samvera_config.rb#L32-L35).
97
104
 
105
+ ### Sequence Diagram for Reindexing a Single Document
106
+
107
+ The following sequence diagram documents the interactions in [Samvera::NestingIndexer::RelationshipReindexer](lib/samvera/nesting_indexer/relationship_reindexer.rb).
108
+
109
+ ![Reindex Relationship Diagram](documentation/reindex_relationship.mermaid.jpg)
110
+
111
+ See [the text-based version of Reindex Relationship diagram](documentation/reindex_relationship.mermaid), leveraging the [Mermaid syntax](https://mermaidjs.github.io).
112
+
98
113
  ## Considerations
99
114
 
100
115
  Given a single object A, when we reindex A, we:
@@ -115,9 +130,23 @@ The [`./spec/features/reindex_pid_and_descendants_spec.rb`](spec/features/reinde
115
130
 
116
131
  **NOTE: These guards to prevent indexing cyclic graphs do not prevent the underlying preservation document from creating its own cyclic graph.**
117
132
 
133
+ #### Detecting Possible Cycles Before Indexing
134
+
135
+ Given an up to date index and a document, then it is valid to nest the given document beneath any document that:
136
+
137
+ * Is not the given document
138
+ * Does not have one or more pathnames that includes the given document's ID
139
+
140
+ For examples of determining if we can nest a document within another document, see the [demonstration of nesting](./spec/features/demonstrating_nesting_spec.rb).
141
+
142
+ In implementations, you'll likely want to write a queries that answer:
143
+
144
+ * What are the valid IDs that I can nest within?
145
+ * What are the valid IDs in which I can nest within and am not already nested within?
146
+
118
147
  ## TODO
119
148
 
120
- - [ ] Incorporate additional logging
149
+ - [X] Incorporate additional logging
121
150
  - [ ] Build methods to allow for fanning out the reindexing. At present, when we reindex a node and its "children", we run that entire process within a single context. Likewise, we run a single process when reindexing EVERYTHING.
122
151
  - [ ] Promote from [samvera-labs](https://github.com/samvera-labs) to [samvera](https://github.com/samvera) via the [promotion process](http://samvera-labs.github.io/promotion.html).
123
152
  - [ ] Write adapter method to assist in guarding against self-ancestry. We could probably expose a base adapter that has the method through use of the other adapter methods.
data/Rakefile CHANGED
@@ -16,9 +16,7 @@ namespace :commitment do
16
16
  $stdout.puts "Checking commitment:code_coverage"
17
17
  coverage_percentage = JSON.parse(File.read('coverage/.last_run.json')).fetch('result').fetch('covered_percent').to_i
18
18
  goal = 100
19
- if goal > coverage_percentage
20
- abort("Code Coverage Goal Not Met:\n\t#{coverage_percentage}%\tExpected\n\t#{goal}%\tActual")
21
- end
19
+ abort("Code Coverage Goal Not Met:\n\t#{coverage_percentage}%\tExpected\n\t#{goal}%\tActual") if goal > coverage_percentage
22
20
  end
23
21
  end
24
22
 
@@ -0,0 +1,22 @@
1
+ sequenceDiagram
2
+ participant Application
3
+ participant Indexer as Samvera:: NestingIndexer
4
+ participant Adapter as Application Adapter Implementation
5
+ participant Index as Application Index Layer
6
+ participant Preservation as Application Preservation Layer
7
+
8
+ Application-->>Indexer: indexer.reindex_relationships(id:)
9
+ Indexer-->>Adapter: adapter.find_index_document_by(id:)
10
+ Adapter-->>Index: get index document
11
+ Note right of Adapter: See Samvera:: NestingIndexer:: Adapters:: AbstractAdapter for adapter implementation
12
+ Index-->>Indexer: coerce index document to indexer
13
+ Indexer-->>Indexer: enqueue index document
14
+ loop While queued documents
15
+ Indexer-->>Adapter: adapter.find_preservation_document_by(id:)
16
+ Adapter-->>Preservation: get preservation document
17
+ Preservation-->>Indexer: receive preservation document
18
+ Indexer-->>Indexer: guard_against_possiblity_of_self_ancestry
19
+ Indexer-->>Adapter: adapter.write_document_attributes_to_index_layer
20
+ Adapter-->>Index: write updated application index document
21
+ Indexer-->>Indexer: enqueue children
22
+ end
@@ -43,13 +43,24 @@ module Samvera
43
43
  end
44
44
 
45
45
  # @api public
46
+ # @deprecated Use .write_nesting_document_to_index_layer instead
46
47
  # @see README.md
47
48
  # @param id [String]
48
49
  # @param parent_ids [Array<String>]
49
50
  # @param ancestors [Array<String>]
50
51
  # @param pathnames [Array<String>]
52
+ # @param deepest_nested_depth [Integer]
51
53
  # @return Hash - the attributes written to the indexing layer
52
- def self.write_document_attributes_to_index_layer(id:, parent_ids:, ancestors:, pathnames:)
54
+ def self.write_document_attributes_to_index_layer(id:, parent_ids:, ancestors:, pathnames:, deepest_nested_depth:)
55
+ raise NotImplementedError
56
+ end
57
+
58
+ # @api public
59
+ # @since v1.0.0
60
+ # @see README.md
61
+ # @param nesting_document [Samvera::NestingIndexer::Document::IndexDocument]
62
+ # @return void
63
+ def self.write_nesting_document_to_index_layer(nesting_document:)
53
64
  raise NotImplementedError
54
65
  end
55
66
  end
@@ -59,9 +59,17 @@ module Samvera
59
59
  # @param parent_ids [Array<String>]
60
60
  # @param ancestors [Array<String>]
61
61
  # @param pathnames [Array<String>]
62
+ # @param deepest_nested_depth [Integer]
62
63
  # @return [Hash] - the attributes written to the indexing layer
63
- def self.write_document_attributes_to_index_layer(id:, parent_ids:, ancestors:, pathnames:)
64
- Index.write_document(id: id, parent_ids: parent_ids, ancestors: ancestors, pathnames: pathnames)
64
+ def self.write_document_attributes_to_index_layer(id:, parent_ids:, ancestors:, pathnames:, deepest_nested_depth:)
65
+ Index.write_document(id: id, parent_ids: parent_ids, ancestors: ancestors, pathnames: pathnames, deepest_nested_depth: deepest_nested_depth)
66
+ end
67
+
68
+ # @api public
69
+ # @see README.md
70
+ # @param nesting_document [Samvera::NestingIndexer::Documents::IndexDocument]
71
+ def self.write_nesting_document_to_index_layer(nesting_document:)
72
+ Index.write_to_storage(nesting_document)
65
73
  end
66
74
 
67
75
  # @api private
@@ -70,6 +78,12 @@ module Samvera
70
78
  Index.clear_cache!
71
79
  end
72
80
 
81
+ # @api private
82
+ # A convenience method for testing
83
+ def self.each_index_document(&block)
84
+ Index.each_index_document(&block)
85
+ end
86
+
73
87
  # @api private
74
88
  #
75
89
  # A module mixin to expose rudimentary read/write capabilities
@@ -146,12 +160,20 @@ module Samvera
146
160
  Storage.find(id)
147
161
  end
148
162
 
163
+ def self.each_index_document(&block)
164
+ Storage.find_each(&block)
165
+ end
166
+
149
167
  def self.each_child_document_of(document:, &block)
150
168
  Storage.find_children_of_id(document.id).each(&block)
151
169
  end
152
170
 
171
+ def self.write_to_storage(doc)
172
+ Storage.write(doc)
173
+ end
174
+
153
175
  def self.write_document(attributes = {})
154
- Documents::IndexDocument.new(attributes).tap { |doc| Storage.write(doc) }
176
+ Documents::IndexDocument.new(attributes).tap { |doc| write_to_storage(doc) }
155
177
  end
156
178
 
157
179
  # :nodoc:
@@ -82,8 +82,8 @@ if defined?(RSpec)
82
82
  describe '.write_document_attributes_to_index_layer' do
83
83
  subject { described_class.method(:write_document_attributes_to_index_layer) }
84
84
 
85
- it 'requires the :ancestors, :id, :parent_ids, and :pathnames keyword (and does not require any others)' do
86
- expect(required_keyword_parameters.call(subject)).to eq(%i(ancestors id parent_ids pathnames))
85
+ it 'requires the :ancestors, :deepest_nested_depth, :id, :parent_ids, and :pathnames keyword (and does not require any others)' do
86
+ expect(required_keyword_parameters.call(subject)).to eq(%i(ancestors deepest_nested_depth id parent_ids pathnames))
87
87
  end
88
88
 
89
89
  it 'does not require any other parameters (besides :attributes)' do
@@ -94,5 +94,21 @@ if defined?(RSpec)
94
94
  expect(block_parameter_extracter.call(subject)).to be_empty
95
95
  end
96
96
  end
97
+
98
+ describe '.write_nesting_document_to_index_layer' do
99
+ subject { described_class.method(:write_nesting_document_to_index_layer) }
100
+
101
+ it 'requires the :nesting_document' do
102
+ expect(required_keyword_parameters.call(subject)).to eq(%i(nesting_document))
103
+ end
104
+
105
+ it 'does not require any other parameters' do
106
+ expect(required_parameters.call(subject)).to eq(required_keyword_parameters.call(subject))
107
+ end
108
+
109
+ it 'does not expect a block' do
110
+ expect(block_parameter_extracter.call(subject)).to be_empty
111
+ end
112
+ end
97
113
  end
98
114
  end
@@ -42,6 +42,19 @@ module Samvera
42
42
  @ancestors = Array(keywords.fetch(:ancestors))
43
43
  end
44
44
 
45
+ # @api public
46
+ # @since v1.0.0
47
+ # @return [Hash<Symbol,>] the Ruby hash representation of this index document.
48
+ def to_hash
49
+ {
50
+ id: id,
51
+ parent_ids: parent_ids,
52
+ pathnames: pathnames,
53
+ ancestors: ancestors,
54
+ deepest_nested_depth: deepest_nested_depth
55
+ }
56
+ end
57
+
45
58
  # @api public
46
59
  # @return String The Fedora object's PID
47
60
  attr_reader :id
@@ -70,11 +83,22 @@ module Samvera
70
83
  #
71
84
  # All of the :pathnames of each of the documents ancestors. If I have A, with parent B, and B has
72
85
  # parents C and D then we have the following ancestors:
73
- # [D/B], [C/B]
86
+ # [D], [C], [D/B], [C/B]
74
87
  #
75
88
  # @return Array<String>
76
89
  attr_reader :ancestors
77
90
 
91
+ # @api public
92
+ # @since v1.0.0
93
+ #
94
+ # The largest nesting depth of this document. If I have A ={ B ={ C and D ={ C, then
95
+ # deepest_nested_depth is 3.
96
+ #
97
+ # @return Integer
98
+ def deepest_nested_depth
99
+ pathnames.map(&:length).max
100
+ end
101
+
78
102
  def sorted_parent_ids
79
103
  parent_ids.sort
80
104
  end
@@ -5,7 +5,7 @@ require 'set'
5
5
  module Samvera
6
6
  # Establishing namespace
7
7
  module NestingIndexer
8
- # Responsible for reindexing the PID and its descendants
8
+ # Responsible for reindexing the document associated with the given PID and its descendant documents
9
9
  # @note There is cycle detection via the Samvera::NestingIndexer::Configuration#maximum_nesting_depth counter
10
10
  # @api private
11
11
  class RelationshipReindexer
@@ -24,34 +24,38 @@ module Samvera
24
24
  # @param configuration [#adapter, #logger] The :adapter conforms to the Samvera::NestingIndexer::Adapters::AbstractAdapter interface
25
25
  # and the :logger conforms to Logger
26
26
  # @param queue [#shift, #push] queue
27
- def initialize(id:, maximum_nesting_depth:, configuration:, queue: [], visited_ids: [])
27
+ def initialize(id:, maximum_nesting_depth:, configuration:, queue: [])
28
28
  @id = id.to_s
29
29
  @maximum_nesting_depth = maximum_nesting_depth.to_i
30
30
  @configuration = configuration
31
31
  @queue = queue
32
- @visited_ids = visited_ids
33
32
  end
34
33
  attr_reader :id, :maximum_nesting_depth
35
34
 
36
- # Perform a bread-first tree traversal of the initial document and its descendants.
37
- # rubocop:disable Metrics/AbcSize
35
+ # Perform a breadth-first tree traversal of the initial document and its descendants.
36
+ # We index the document, then queue up each of its children. For each child, queue up the child's children.
38
37
  def call
39
38
  wrap_logging("nested indexing of ID=#{initial_index_document.id.inspect}") do
40
39
  enqueue(initial_index_document, maximum_nesting_depth)
41
- processing_document = dequeue
42
- while processing_document
43
- process_a_document(processing_document)
44
- adapter.each_child_document_of(document: processing_document) { |child| enqueue(child, processing_document.maximum_nesting_depth - 1) }
45
- processing_document = dequeue
46
- end
40
+ process_each_document
47
41
  end
48
42
  self
49
43
  end
50
- # rubocop:enbable Metrics/AbcSize
51
44
 
52
45
  private
53
46
 
54
- attr_reader :queue, :configuration, :visited_ids
47
+ attr_reader :queue, :configuration
48
+
49
+ def process_each_document
50
+ processing_document = dequeue
51
+ while processing_document
52
+ process_a_document(processing_document)
53
+ adapter.each_child_document_of(document: processing_document) do |child|
54
+ enqueue(child, processing_document.maximum_nesting_depth - 1)
55
+ end
56
+ processing_document = dequeue
57
+ end
58
+ end
55
59
 
56
60
  def initial_index_document
57
61
  adapter.find_index_document_by(id: id)
@@ -76,19 +80,20 @@ module Samvera
76
80
  queue.push(ProcessingDocument.new(document, maximum_nesting_depth))
77
81
  end
78
82
 
83
+ # rubocop:disable Metrics/AbcSize
79
84
  def process_a_document(index_document)
80
85
  raise Exceptions::ExceededMaximumNestingDepthError, id: id if index_document.maximum_nesting_depth <= 0
81
86
  wrap_logging("indexing ID=#{index_document.id.inspect}") do
82
87
  preservation_document = adapter.find_preservation_document_by(id: index_document.id)
83
- parent_ids_and_path_and_ancestors = parent_ids_and_path_and_ancestors_for(preservation_document)
84
- guard_against_possiblity_of_self_ancestry(index_document: index_document, pathnames: parent_ids_and_path_and_ancestors.fetch(:pathnames))
85
- adapter.write_document_attributes_to_index_layer(**parent_ids_and_path_and_ancestors)
86
- visited_ids << index_document.id
88
+ nesting_document = build_nesting_document_for(preservation_document)
89
+ guard_against_possiblity_of_self_ancestry(index_document: index_document, pathnames: nesting_document.pathnames)
90
+ adapter.write_nesting_document_to_index_layer(nesting_document: nesting_document)
87
91
  end
88
92
  end
93
+ # rubocop:enable Metrics/AbcSize
89
94
 
90
- def parent_ids_and_path_and_ancestors_for(preservation_document)
91
- ParentAndPathAndAncestorsBuilder.new(preservation_document, adapter).to_hash
95
+ def build_nesting_document_for(preservation_document)
96
+ ParentAndPathAndAncestorsBuilder.new(preservation_document, adapter).nesting_document
92
97
  end
93
98
 
94
99
  def guard_against_possiblity_of_self_ancestry(index_document:, pathnames:)
@@ -104,7 +109,7 @@ module Samvera
104
109
  logger.debug("Ending #{message_suffix}")
105
110
  end
106
111
 
107
- # A small object that helps encapsulate the logic of building the hash of information regarding
112
+ # A small object that helps encapsulate the logic for building the hash of information regarding
108
113
  # the initialization of an Samvera::NestingIndexer::Documents::IndexDocument
109
114
  #
110
115
  # @see Samvera::NestingIndexer::Documents::IndexDocument for details on pathnames, ancestors, and parent_ids.
@@ -116,11 +121,10 @@ module Samvera
116
121
  @ancestors = Set.new
117
122
  @adapter = adapter
118
123
  compile!
124
+ @nesting_document = Documents::IndexDocument.new(id: @preservation_document.id, parent_ids: @parent_ids, pathnames: @pathnames, ancestors: @ancestors)
119
125
  end
120
126
 
121
- def to_hash
122
- { id: @preservation_document.id, parent_ids: @parent_ids.to_a, pathnames: @pathnames.to_a, ancestors: @ancestors.to_a }
123
- end
127
+ attr_reader :nesting_document
124
128
 
125
129
  private
126
130
 
@@ -140,7 +144,9 @@ module Samvera
140
144
  parent_index_document.pathnames.each do |pathname|
141
145
  @pathnames << "#{pathname}#{Documents::ANCESTOR_AND_PATHNAME_DELIMITER}#{@preservation_document.id}"
142
146
  slugs = pathname.split(Documents::ANCESTOR_AND_PATHNAME_DELIMITER)
143
- slugs.each_index { |i| @ancestors << slugs[0..i].join(Documents::ANCESTOR_AND_PATHNAME_DELIMITER) }
147
+ slugs.each_index do |i|
148
+ @ancestors << slugs[0..i].join(Documents::ANCESTOR_AND_PATHNAME_DELIMITER)
149
+ end
144
150
  end
145
151
  @ancestors += parent_index_document.ancestors
146
152
  end
@@ -1,5 +1,5 @@
1
1
  module Samvera
2
2
  module NestingIndexer
3
- VERSION = "0.8.0".freeze
3
+ VERSION = "1.0.0".freeze
4
4
  end
5
5
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: samvera-nesting_indexer
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.8.0
4
+ version: 1.0.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Jeremy Friesen
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2017-11-17 00:00:00.000000000 Z
11
+ date: 2018-01-12 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: bundler
@@ -254,6 +254,8 @@ files:
254
254
  - Rakefile
255
255
  - bin/console
256
256
  - bin/setup
257
+ - documentation/reindex_relationship.mermaid
258
+ - documentation/reindex_relationship.mermaid.jpg
257
259
  - lib/samvera/nesting_indexer.rb
258
260
  - lib/samvera/nesting_indexer/adapters.rb
259
261
  - lib/samvera/nesting_indexer/adapters/abstract_adapter.rb