curate-indexer 0.2.2 → 0.2.3

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 180b9a34980ab4a17aaf58de88a79bb7421e4d8c
4
- data.tar.gz: 9ee184bed40c4454bf0d4ae2b09c7b21d65554d4
3
+ metadata.gz: f90a339727f8494a69331c91dabfead7a4d3fea5
4
+ data.tar.gz: b79f7029a6767747b52d48500f7fc8d567301f60
5
5
  SHA512:
6
- metadata.gz: c0e1e6cf6e7f5e7090949c7cfe86d1462a86bf24a1aa8f16e4d7f47d0c5b0d39999d410a8ee491d487999e35d0a6743b1ece6348bb74ceea57f479e6df4e196b
7
- data.tar.gz: cebeb033701dcc18e058ef26583f39413e61f8b8c8057a4cee9b61d2a8cfad2ac86bd7b165fcb52a960b9141ff1cff14fad8df51e5566c7ea73572fbc9e14027
6
+ metadata.gz: a5ece9e30c7760998a9d428c2cab96183eb00aa6f331b550ac0ca914c0140bed986a68694f2574fd75338e9b38d28a0a26865eefbc529ceabe8dfc7eaab63583
7
+ data.tar.gz: bc7a658c728db9aa803c3b44bc6ff04e94f2429dcdf500adff16a130652de6d1b49bc053b5e35aaade8388c1a8feceeaaef9f002df7ee697a3e22ea2b81211e9
data/README.md CHANGED
@@ -6,4 +6,58 @@
6
6
  [![Documentation Status](http://inch-ci.org/github/ndlib/curate-indexer.svg?branch=master)](http://inch-ci.org/github/ndlib/curate-indexer)
7
7
  [![APACHE 2 License](http://img.shields.io/badge/APACHE2-license-blue.svg)](./LICENSE)
8
8
 
9
+ The Curate::Indexer gem is responsible for indexing the graph relationship of objects. It maps a PreservationDocument to an IndexDocument by mapping a PreservationDocument's direct parents into the paths to get from a root document to the given PreservationDocument.
10
+
11
+ # Background
12
+
9
13
  This is a sandbox to work through the reindexing strategy as it relates to [CurateND Collections](https://github.com/ndlib/curate_nd/issues/420). At this point the code is separate to allow for rapid testing and prototyping (no sense spinning up SOLR and Fedora to walk an arbitrary graph).
14
+
15
+ # Concepts
16
+
17
+ As we are indexing objects, we have two types of documents:
18
+
19
+ 1. [PreservationDocument](./lib/curate/indexer/documents.rb) - a light-weight representation of a Fedora object
20
+ 2. [IndexDocument](./lib/curate/indexer/documents.rb) - a light-weight representation of a SOLR document object
21
+
22
+ We have four attributes to consider for indexing the graph:
23
+
24
+ 1. pid - the unique identifier for a document
25
+ 2. parent_pids - the pids for all of the parents of a given document
26
+ 3. pathnames - the paths to traverse from a root document to the given document
27
+ 4. ancestors - the pathnames of each of the ancestors
28
+
29
+ See [Curate::Indexer::Documents::IndexDocument](./lib/curate/indexer/documents.rb) for further discussion.
30
+
31
+ To reindex a single document, we leverage the [`Curate::Indexer.reindex_relationships`](./lib/curate/indexer.rb) method.
32
+
33
+ # Examples
34
+
35
+ Given the following PreservationDocuments:
36
+
37
+ | PID | Parents |
38
+ |-----|---------|
39
+ | A | - |
40
+ | B | - |
41
+ | C | A |
42
+ | D | A, B |
43
+ | E | C |
44
+
45
+ If we were to reindex the above PreservationDocuments, we will generate the following IndexDocuments:
46
+
47
+ | PID | Parents | Pathnames | Ancestors |
48
+ |-----|---------|------------|-----------|
49
+ | A | - | [A] | [] |
50
+ | B | - | [B] | [] |
51
+ | C | A | [A/C] | [A] |
52
+ | D | A, B | [A/D, B/D] | [A, B] |
53
+ | E | C | [A/C/E] | [A/C] |
54
+
55
+ For more scenarios, look at the [Reindex PID and Descendants specs](./spec/features/reindex_pid_and_descendants_spec.rb).
56
+
57
+ # Adapters
58
+
59
+ An [AbstractAdapter](./lib/curate/indexer/adapters/abstract_adapter.rb) provides the method interface for others to build against.
60
+
61
+ The [InMemory adapter](./lib/curate/indexer/adapters/in_memory_adapter.rb) is a reference implementation (and used to ease testing overhead).
62
+
63
+ CurateND has implemented the [following adapter](https://github.com/ndlib/curate_nd/blob/master/lib/curate/library_collection_indexing_adapter.rb) for its LibraryCollection indexing.
@@ -12,6 +12,7 @@ Gem::Specification.new do |spec|
12
12
  spec.summary = %q{A playground for CurateND collections indexing}
13
13
  spec.description = %q{A playground for CurateND collections indexing}
14
14
  spec.homepage = "https://github.com/ndlib/curate-indexer"
15
+ spec.license = "Apache-2.0"
15
16
 
16
17
  spec.files = `git ls-files -z`.split("\x0").reject { |f| f.match(%r{^(test|spec|features)/}) }
17
18
  spec.bindir = "bin"
@@ -5,7 +5,7 @@ require 'curate/indexer/configuration'
5
5
  require 'curate/indexer/railtie' if defined?(Rails)
6
6
 
7
7
  module Curate
8
- # Responsible for performign the indexing of an object and its related child objects.
8
+ # Responsible for indexing an object and its related child objects.
9
9
  module Indexer
10
10
  # This assumes a rather deep graph
11
11
  DEFAULT_TIME_TO_LIVE = 15
@@ -18,7 +18,7 @@ module Curate
18
18
  # @return [Boolean] - It was successful
19
19
  # @raise Curate::Exceptions::CycleDetectionError - A potential cycle was detected
20
20
  def self.reindex_relationships(pid, time_to_live = DEFAULT_TIME_TO_LIVE)
21
- RelationshipReindexer.call(pid: pid, time_to_live: time_to_live, adapter: configuration.adapter)
21
+ RelationshipReindexer.call(pid: pid, time_to_live: time_to_live, adapter: adapter)
22
22
  true
23
23
  end
24
24
 
@@ -34,10 +34,15 @@ module Curate
34
34
  # @return [Boolean] - It was successful
35
35
  # @raise Curate::Exceptions::CycleDetectionError - A potential cycle was detected
36
36
  def self.reindex_all!(time_to_live = DEFAULT_TIME_TO_LIVE)
37
- RepositoryReindexer.call(time_to_live: time_to_live, pid_reindexer: method(:reindex_relationships), adapter: configuration.adapter)
37
+ # While the RepositoryReindexer is responsible for reindexing everything, I
38
+ # want to inject the lambda that will reindex a single item.
39
+ pid_reindexer = method(:reindex_relationships)
40
+ RepositoryReindexer.call(time_to_live: time_to_live, pid_reindexer: pid_reindexer, adapter: adapter)
38
41
  true
39
42
  end
40
43
 
44
+ # @api public
45
+ #
41
46
  # Contains the Curate::Indexer configuration information that is referenceable from wit
42
47
  # @see Curate::Indexer::Configuration
43
48
  def self.configuration
@@ -45,23 +50,31 @@ module Curate
45
50
  end
46
51
 
47
52
  # @api public
53
+ #
54
+ # Exposes the data adapter to use for the reindexing process.
55
+ #
56
+ # @see Curate::Indexer::Adapters::AbstractAdapter
57
+ # @return Object that implementes the Curate::Indexer::Adapters::AbstractAdapter method interface
48
58
  def self.adapter
49
59
  configuration.adapter
50
60
  end
51
61
 
52
62
  # @api public
63
+ #
64
+ # Capture the configuration information
65
+ #
53
66
  # @see Curate::Indexer::Configuration
54
67
  # @see .configuration
68
+ # @see Curate::Indexer::Railtie
55
69
  def self.configure(&block)
56
70
  @configuration_block = block
57
- configure!
58
71
  # The Rails load sequence means that some of the configured Targets may
59
72
  # not be loaded; As such I am not calling configure! instead relying on
60
73
  # Curate::Indexer::Railtie to handle the configure! call
61
74
  configure! unless defined?(Rails)
62
75
  end
63
76
 
64
- # @api public
77
+ # @api private
65
78
  def self.configure!
66
79
  return false unless @configuration_block.respond_to?(:call)
67
80
  @configuration_block.call(configuration)
@@ -11,8 +11,11 @@ module Curate
11
11
 
12
12
  private
13
13
 
14
+ IN_MEMORY_ADAPTER_WARNING_MESSAGE =
15
+ "WARNING: You are using the default Curate::Indexer::Adapters::InMemoryAdapter for the Curate::Indexer.adapter.".freeze
16
+
14
17
  def default_adapter
15
- $stdout.puts "WARNING: You are using the default Curate::Indexer::Adapters::InMemoryAdapter for the Curate::Indexer.adapter."
18
+ $stdout.puts IN_MEMORY_ADAPTER_WARNING_MESSAGE unless defined?(SUPPRESS_MEMORY_ADAPTER_WARNING)
16
19
  require 'curate/indexer/adapters/in_memory_adapter'
17
20
  Adapters::InMemoryAdapter
18
21
  end
@@ -12,10 +12,21 @@ module Curate
12
12
  @pid = keywords.fetch(:pid).to_s
13
13
  @parent_pids = Array(keywords.fetch(:parent_pids))
14
14
  end
15
- attr_reader :pid, :parent_pids
15
+
16
+ # @api public
17
+ # @return String The Fedora object's PID
18
+ attr_reader :pid
19
+
20
+ # @api public
21
+ #
22
+ # All of the direct parents of the Fedora document associated with the given PID.
23
+ #
24
+ # This does not include grandparents, great-grandparents, etc.
25
+ # @return Array<String>
26
+ attr_reader :parent_pids
16
27
  end
17
28
 
18
- # @api private
29
+ # @api public
19
30
  #
20
31
  # A rudimentary representation of what is needed to reindex Solr documents
21
32
  class IndexDocument
@@ -28,7 +39,39 @@ module Curate
28
39
  @pathnames = Array(keywords.fetch(:pathnames))
29
40
  @ancestors = Array(keywords.fetch(:ancestors))
30
41
  end
31
- attr_reader :pid, :parent_pids, :pathnames, :ancestors
42
+
43
+ # @api public
44
+ # @return String The Fedora object's PID
45
+ attr_reader :pid
46
+
47
+ # @api public
48
+ #
49
+ # All of the direct parents of the Fedora document associated with the given PID.
50
+ #
51
+ # This does not include grandparents, great-grandparents, etc.
52
+ # @return Array<String>
53
+ attr_reader :parent_pids
54
+
55
+ # @api public
56
+ #
57
+ # All nodes in the graph are addressable by one or more pathnames.
58
+ #
59
+ # If I have A, with parent B, and B has parents C and D, we have the
60
+ # following pathnames:
61
+ # [D/B/A, C/B/A]
62
+ #
63
+ # In the graph representation, we can get to A by going from D to B to A, or by going from C to B to A.
64
+ # @return Array<String>
65
+ attr_reader :pathnames
66
+
67
+ # @api public
68
+ #
69
+ # All of the :pathnames of each of the documents ancestors. If I have A, with parent B, and B has
70
+ # parents C and D then we have the following ancestors:
71
+ # [D/B], [C/B]
72
+ #
73
+ # @return Array<String>
74
+ attr_reader :ancestors
32
75
 
33
76
  def sorted_parent_pids
34
77
  parent_pids.sort
@@ -5,7 +5,7 @@ module Curate
5
5
  # Connect into the boot sequence of a Rails application
6
6
  class Railtie < Rails::Railtie
7
7
  config.to_prepare do
8
- Curate::Indexer.send(:configure!)
8
+ Curate::Indexer.configure!
9
9
  end
10
10
  end
11
11
  end
@@ -21,6 +21,7 @@ module Curate
21
21
  end
22
22
  attr_reader :pid, :time_to_live, :queue, :adapter
23
23
 
24
+ # Perform a bread-first tree traversal of the initial document and its descendants.
24
25
  def call
25
26
  enqueue(initial_index_document, time_to_live)
26
27
  processing_document = dequeue
@@ -68,7 +69,9 @@ module Curate
68
69
  end
69
70
 
70
71
  # A small object that helps encapsulate the logic of building the hash of information regarding
71
- # the initialization of an Index::Document
72
+ # the initialization of an Curate::Indexer::Documents::IndexDocument
73
+ #
74
+ # @see Curate::Indexer::Documents::IndexDocument for details on pathnames, ancestors, and parent_pids.
72
75
  class ParentAndPathAndAncestorsBuilder
73
76
  def initialize(preservation_document, adapter)
74
77
  @preservation_document = preservation_document
@@ -1,5 +1,5 @@
1
1
  module Curate
2
2
  module Indexer
3
- VERSION = "0.2.2".freeze
3
+ VERSION = "0.2.3".freeze
4
4
  end
5
5
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: curate-indexer
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.2.2
4
+ version: 0.2.3
5
5
  platform: ruby
6
6
  authors:
7
7
  - Jeremy Friesen
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2016-07-14 00:00:00.000000000 Z
11
+ date: 2016-12-05 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: bundler
@@ -266,7 +266,8 @@ files:
266
266
  - lib/curate/indexer/repository_reindexer.rb
267
267
  - lib/curate/indexer/version.rb
268
268
  homepage: https://github.com/ndlib/curate-indexer
269
- licenses: []
269
+ licenses:
270
+ - Apache-2.0
270
271
  metadata: {}
271
272
  post_install_message:
272
273
  rdoc_options: []
@@ -289,4 +290,3 @@ signing_key:
289
290
  specification_version: 4
290
291
  summary: A playground for CurateND collections indexing
291
292
  test_files: []
292
- has_rdoc: