curate-indexer 0.2.2 → 0.2.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 180b9a34980ab4a17aaf58de88a79bb7421e4d8c
4
- data.tar.gz: 9ee184bed40c4454bf0d4ae2b09c7b21d65554d4
3
+ metadata.gz: f90a339727f8494a69331c91dabfead7a4d3fea5
4
+ data.tar.gz: b79f7029a6767747b52d48500f7fc8d567301f60
5
5
  SHA512:
6
- metadata.gz: c0e1e6cf6e7f5e7090949c7cfe86d1462a86bf24a1aa8f16e4d7f47d0c5b0d39999d410a8ee491d487999e35d0a6743b1ece6348bb74ceea57f479e6df4e196b
7
- data.tar.gz: cebeb033701dcc18e058ef26583f39413e61f8b8c8057a4cee9b61d2a8cfad2ac86bd7b165fcb52a960b9141ff1cff14fad8df51e5566c7ea73572fbc9e14027
6
+ metadata.gz: a5ece9e30c7760998a9d428c2cab96183eb00aa6f331b550ac0ca914c0140bed986a68694f2574fd75338e9b38d28a0a26865eefbc529ceabe8dfc7eaab63583
7
+ data.tar.gz: bc7a658c728db9aa803c3b44bc6ff04e94f2429dcdf500adff16a130652de6d1b49bc053b5e35aaade8388c1a8feceeaaef9f002df7ee697a3e22ea2b81211e9
data/README.md CHANGED
@@ -6,4 +6,58 @@
6
6
  [![Documentation Status](http://inch-ci.org/github/ndlib/curate-indexer.svg?branch=master)](http://inch-ci.org/github/ndlib/curate-indexer)
7
7
  [![APACHE 2 License](http://img.shields.io/badge/APACHE2-license-blue.svg)](./LICENSE)
8
8
 
9
+ The Curate::Indexer gem is responsible for indexing the graph relationship of objects. It maps a PreservationDocument to an IndexDocument by mapping a PreservationDocument's direct parents into the paths to get from a root document to the given PreservationDocument.
10
+
11
+ # Background
12
+
9
13
  This is a sandbox to work through the reindexing strategy as it relates to [CurateND Collections](https://github.com/ndlib/curate_nd/issues/420). At this point the code is separate to allow for rapid testing and prototyping (no sense spinning up SOLR and Fedora to walk an arbitrary graph).
14
+
15
+ # Concepts
16
+
17
+ As we are indexing objects, we have two types of documents:
18
+
19
+ 1. [PreservationDocument](./lib/curate/indexer/documents.rb) - a light-weight representation of a Fedora object
20
+ 2. [IndexDocument](./lib/curate/indexer/documents.rb) - a light-weight representation of a SOLR document object
21
+
22
+ We have four attributes to consider for indexing the graph:
23
+
24
+ 1. pid - the unique identifier for a document
25
+ 2. parent_pids - the pids for all of the parents of a given document
26
+ 3. pathnames - the paths to traverse from a root document to the given document
27
+ 4. ancestors - the pathnames of each of the ancestors
28
+
29
+ See [Curate::Indexer::Documents::IndexDocument](./lib/curate/indexer/documents.rb) for further discussion.
30
+
31
+ To reindex a single document, we leverage the [`Curate::Indexer.reindex_relationships`](./lib/curate/indexer.rb) method.
32
+
33
+ # Examples
34
+
35
+ Given the following PreservationDocuments:
36
+
37
+ | PID | Parents |
38
+ |-----|---------|
39
+ | A | - |
40
+ | B | - |
41
+ | C | A |
42
+ | D | A, B |
43
+ | E | C |
44
+
45
+ If we were to reindex the above PreservationDocuments, we will generate the following IndexDocuments:
46
+
47
+ | PID | Parents | Pathnames | Ancestors |
48
+ |-----|---------|------------|-----------|
49
+ | A | - | [A] | [] |
50
+ | B | - | [B] | [] |
51
+ | C | A | [A/C] | [A] |
52
+ | D | A, B | [A/D, B/D] | [A, B] |
53
+ | E | C | [A/C/E] | [A/C] |
54
+
55
+ For more scenarios, look at the [Reindex PID and Descendants specs](./spec/features/reindex_pid_and_descendants_spec.rb).
56
+
57
+ # Adapters
58
+
59
+ An [AbstractAdapter](./lib/curate/indexer/adapters/abstract_adapter.rb) provides the method interface for others to build against.
60
+
61
+ The [InMemory adapter](./lib/curate/indexer/adapters/in_memory_adapter.rb) is a reference implementation (and used to ease testing overhead).
62
+
63
+ CurateND has implemented the [following adapter](https://github.com/ndlib/curate_nd/blob/master/lib/curate/library_collection_indexing_adapter.rb) for its LibraryCollection indexing.
@@ -12,6 +12,7 @@ Gem::Specification.new do |spec|
12
12
  spec.summary = %q{A playground for CurateND collections indexing}
13
13
  spec.description = %q{A playground for CurateND collections indexing}
14
14
  spec.homepage = "https://github.com/ndlib/curate-indexer"
15
+ spec.license = "Apache-2.0"
15
16
 
16
17
  spec.files = `git ls-files -z`.split("\x0").reject { |f| f.match(%r{^(test|spec|features)/}) }
17
18
  spec.bindir = "bin"
@@ -5,7 +5,7 @@ require 'curate/indexer/configuration'
5
5
  require 'curate/indexer/railtie' if defined?(Rails)
6
6
 
7
7
  module Curate
8
- # Responsible for performign the indexing of an object and its related child objects.
8
+ # Responsible for indexing an object and its related child objects.
9
9
  module Indexer
10
10
  # This assumes a rather deep graph
11
11
  DEFAULT_TIME_TO_LIVE = 15
@@ -18,7 +18,7 @@ module Curate
18
18
  # @return [Boolean] - It was successful
19
19
  # @raise Curate::Exceptions::CycleDetectionError - A potential cycle was detected
20
20
  def self.reindex_relationships(pid, time_to_live = DEFAULT_TIME_TO_LIVE)
21
- RelationshipReindexer.call(pid: pid, time_to_live: time_to_live, adapter: configuration.adapter)
21
+ RelationshipReindexer.call(pid: pid, time_to_live: time_to_live, adapter: adapter)
22
22
  true
23
23
  end
24
24
 
@@ -34,10 +34,15 @@ module Curate
34
34
  # @return [Boolean] - It was successful
35
35
  # @raise Curate::Exceptions::CycleDetectionError - A potential cycle was detected
36
36
  def self.reindex_all!(time_to_live = DEFAULT_TIME_TO_LIVE)
37
- RepositoryReindexer.call(time_to_live: time_to_live, pid_reindexer: method(:reindex_relationships), adapter: configuration.adapter)
37
+ # While the RepositoryReindexer is responsible for reindexing everything, I
38
+ # want to inject the lambda that will reindex a single item.
39
+ pid_reindexer = method(:reindex_relationships)
40
+ RepositoryReindexer.call(time_to_live: time_to_live, pid_reindexer: pid_reindexer, adapter: adapter)
38
41
  true
39
42
  end
40
43
 
44
+ # @api public
45
+ #
41
46
  # Contains the Curate::Indexer configuration information that is referenceable from wit
42
47
  # @see Curate::Indexer::Configuration
43
48
  def self.configuration
@@ -45,23 +50,31 @@ module Curate
45
50
  end
46
51
 
47
52
  # @api public
53
+ #
54
+ # Exposes the data adapter to use for the reindexing process.
55
+ #
56
+ # @see Curate::Indexer::Adapters::AbstractAdapter
57
+ # @return Object that implementes the Curate::Indexer::Adapters::AbstractAdapter method interface
48
58
  def self.adapter
49
59
  configuration.adapter
50
60
  end
51
61
 
52
62
  # @api public
63
+ #
64
+ # Capture the configuration information
65
+ #
53
66
  # @see Curate::Indexer::Configuration
54
67
  # @see .configuration
68
+ # @see Curate::Indexer::Railtie
55
69
  def self.configure(&block)
56
70
  @configuration_block = block
57
- configure!
58
71
  # The Rails load sequence means that some of the configured Targets may
59
72
  # not be loaded; As such I am not calling configure! instead relying on
60
73
  # Curate::Indexer::Railtie to handle the configure! call
61
74
  configure! unless defined?(Rails)
62
75
  end
63
76
 
64
- # @api public
77
+ # @api private
65
78
  def self.configure!
66
79
  return false unless @configuration_block.respond_to?(:call)
67
80
  @configuration_block.call(configuration)
@@ -11,8 +11,11 @@ module Curate
11
11
 
12
12
  private
13
13
 
14
+ IN_MEMORY_ADAPTER_WARNING_MESSAGE =
15
+ "WARNING: You are using the default Curate::Indexer::Adapters::InMemoryAdapter for the Curate::Indexer.adapter.".freeze
16
+
14
17
  def default_adapter
15
- $stdout.puts "WARNING: You are using the default Curate::Indexer::Adapters::InMemoryAdapter for the Curate::Indexer.adapter."
18
+ $stdout.puts IN_MEMORY_ADAPTER_WARNING_MESSAGE unless defined?(SUPPRESS_MEMORY_ADAPTER_WARNING)
16
19
  require 'curate/indexer/adapters/in_memory_adapter'
17
20
  Adapters::InMemoryAdapter
18
21
  end
@@ -12,10 +12,21 @@ module Curate
12
12
  @pid = keywords.fetch(:pid).to_s
13
13
  @parent_pids = Array(keywords.fetch(:parent_pids))
14
14
  end
15
- attr_reader :pid, :parent_pids
15
+
16
+ # @api public
17
+ # @return String The Fedora object's PID
18
+ attr_reader :pid
19
+
20
+ # @api public
21
+ #
22
+ # All of the direct parents of the Fedora document associated with the given PID.
23
+ #
24
+ # This does not include grandparents, great-grandparents, etc.
25
+ # @return Array<String>
26
+ attr_reader :parent_pids
16
27
  end
17
28
 
18
- # @api private
29
+ # @api public
19
30
  #
20
31
  # A rudimentary representation of what is needed to reindex Solr documents
21
32
  class IndexDocument
@@ -28,7 +39,39 @@ module Curate
28
39
  @pathnames = Array(keywords.fetch(:pathnames))
29
40
  @ancestors = Array(keywords.fetch(:ancestors))
30
41
  end
31
- attr_reader :pid, :parent_pids, :pathnames, :ancestors
42
+
43
+ # @api public
44
+ # @return String The Fedora object's PID
45
+ attr_reader :pid
46
+
47
+ # @api public
48
+ #
49
+ # All of the direct parents of the Fedora document associated with the given PID.
50
+ #
51
+ # This does not include grandparents, great-grandparents, etc.
52
+ # @return Array<String>
53
+ attr_reader :parent_pids
54
+
55
+ # @api public
56
+ #
57
+ # All nodes in the graph are addressable by one or more pathnames.
58
+ #
59
+ # If I have A, with parent B, and B has parents C and D, we have the
60
+ # following pathnames:
61
+ # [D/B/A, C/B/A]
62
+ #
63
+ # In the graph representation, we can get to A by going from D to B to A, or by going from C to B to A.
64
+ # @return Array<String>
65
+ attr_reader :pathnames
66
+
67
+ # @api public
68
+ #
69
+ # All of the :pathnames of each of the documents ancestors. If I have A, with parent B, and B has
70
+ # parents C and D then we have the following ancestors:
71
+ # [D/B], [C/B]
72
+ #
73
+ # @return Array<String>
74
+ attr_reader :ancestors
32
75
 
33
76
  def sorted_parent_pids
34
77
  parent_pids.sort
@@ -5,7 +5,7 @@ module Curate
5
5
  # Connect into the boot sequence of a Rails application
6
6
  class Railtie < Rails::Railtie
7
7
  config.to_prepare do
8
- Curate::Indexer.send(:configure!)
8
+ Curate::Indexer.configure!
9
9
  end
10
10
  end
11
11
  end
@@ -21,6 +21,7 @@ module Curate
21
21
  end
22
22
  attr_reader :pid, :time_to_live, :queue, :adapter
23
23
 
24
+ # Perform a bread-first tree traversal of the initial document and its descendants.
24
25
  def call
25
26
  enqueue(initial_index_document, time_to_live)
26
27
  processing_document = dequeue
@@ -68,7 +69,9 @@ module Curate
68
69
  end
69
70
 
70
71
  # A small object that helps encapsulate the logic of building the hash of information regarding
71
- # the initialization of an Index::Document
72
+ # the initialization of an Curate::Indexer::Documents::IndexDocument
73
+ #
74
+ # @see Curate::Indexer::Documents::IndexDocument for details on pathnames, ancestors, and parent_pids.
72
75
  class ParentAndPathAndAncestorsBuilder
73
76
  def initialize(preservation_document, adapter)
74
77
  @preservation_document = preservation_document
@@ -1,5 +1,5 @@
1
1
  module Curate
2
2
  module Indexer
3
- VERSION = "0.2.2".freeze
3
+ VERSION = "0.2.3".freeze
4
4
  end
5
5
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: curate-indexer
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.2.2
4
+ version: 0.2.3
5
5
  platform: ruby
6
6
  authors:
7
7
  - Jeremy Friesen
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2016-07-14 00:00:00.000000000 Z
11
+ date: 2016-12-05 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: bundler
@@ -266,7 +266,8 @@ files:
266
266
  - lib/curate/indexer/repository_reindexer.rb
267
267
  - lib/curate/indexer/version.rb
268
268
  homepage: https://github.com/ndlib/curate-indexer
269
- licenses: []
269
+ licenses:
270
+ - Apache-2.0
270
271
  metadata: {}
271
272
  post_install_message:
272
273
  rdoc_options: []
@@ -289,4 +290,3 @@ signing_key:
289
290
  specification_version: 4
290
291
  summary: A playground for CurateND collections indexing
291
292
  test_files: []
292
- has_rdoc: