RubyGems - curate-indexer - Versions diffs - 0.2.2 → 0.2.3 - Mend

curate-indexer 0.2.2 → 0.2.3

Files changed (10) hide show

checksums.yaml +4 -4
data/README.md +54 -0
data/curate-indexer.gemspec +1 -0
data/lib/curate/indexer.rb +18 -5
data/lib/curate/indexer/configuration.rb +4 -1
data/lib/curate/indexer/documents.rb +46 -3
data/lib/curate/indexer/railtie.rb +1 -1
data/lib/curate/indexer/relationship_reindexer.rb +4 -1
data/lib/curate/indexer/version.rb +1 -1
metadata +4 -4

checksums.yaml CHANGED

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: 180b9a34980ab4a17aaf58de88a79bb7421e4d8c
-  data.tar.gz: 9ee184bed40c4454bf0d4ae2b09c7b21d65554d4
+  metadata.gz: f90a339727f8494a69331c91dabfead7a4d3fea5
+  data.tar.gz: b79f7029a6767747b52d48500f7fc8d567301f60
 SHA512:
-  metadata.gz: c0e1e6cf6e7f5e7090949c7cfe86d1462a86bf24a1aa8f16e4d7f47d0c5b0d39999d410a8ee491d487999e35d0a6743b1ece6348bb74ceea57f479e6df4e196b
-  data.tar.gz: cebeb033701dcc18e058ef26583f39413e61f8b8c8057a4cee9b61d2a8cfad2ac86bd7b165fcb52a960b9141ff1cff14fad8df51e5566c7ea73572fbc9e14027
+  metadata.gz: a5ece9e30c7760998a9d428c2cab96183eb00aa6f331b550ac0ca914c0140bed986a68694f2574fd75338e9b38d28a0a26865eefbc529ceabe8dfc7eaab63583
+  data.tar.gz: bc7a658c728db9aa803c3b44bc6ff04e94f2429dcdf500adff16a130652de6d1b49bc053b5e35aaade8388c1a8feceeaaef9f002df7ee697a3e22ea2b81211e9

data/README.md CHANGED

@@ -6,4 +6,58 @@
 [![Documentation Status](http://inch-ci.org/github/ndlib/curate-indexer.svg?branch=master)](http://inch-ci.org/github/ndlib/curate-indexer)
 [![APACHE 2 License](http://img.shields.io/badge/APACHE2-license-blue.svg)](./LICENSE)
+The Curate::Indexer gem is responsible for indexing the graph relationship of objects. It maps a PreservationDocument to an IndexDocument by mapping a PreservationDocument's direct parents into the paths to get from a root document to the given PreservationDocument.
+# Background
 This is a sandbox to work through the reindexing strategy as it relates to [CurateND Collections](https://github.com/ndlib/curate_nd/issues/420). At this point the code is separate to allow for rapid testing and prototyping (no sense spinning up SOLR and Fedora to walk an arbitrary graph).
+# Concepts
+As we are indexing objects, we have two types of documents:
+1. [PreservationDocument](./lib/curate/indexer/documents.rb) - a light-weight representation of a Fedora object
+2. [IndexDocument](./lib/curate/indexer/documents.rb) - a light-weight representation of a SOLR document object
+We have four attributes to consider for indexing the graph:
+1. pid - the unique identifier for a document
+2. parent_pids - the pids for all of the parents of a given document
+3. pathnames - the paths to traverse from a root document to the given document
+4. ancestors - the pathnames of each of the ancestors
+See [Curate::Indexer::Documents::IndexDocument](./lib/curate/indexer/documents.rb) for further discussion.
+To reindex a single document, we leverage the [`Curate::Indexer.reindex_relationships`](./lib/curate/indexer.rb) method.
+# Examples
+Given the following PreservationDocuments:
+| PID | Parents |
+|-----|---------|
+| A   | -       |
+| B   | -       |
+| C   | A       |
+| D   | A, B    |
+| E   | C       |
+If we were to reindex the above PreservationDocuments, we will generate the following IndexDocuments:
+| PID | Parents | Pathnames  | Ancestors |
+|-----|---------|------------|-----------|
+| A   | -       | [A]        | []        |
+| B   | -       | [B]        | []        |
+| C   | A       | [A/C]      | [A]       |
+| D   | A, B    | [A/D, B/D] | [A, B]    |
+| E   | C       | [A/C/E]    | [A/C]     |
+For more scenarios, look at the [Reindex PID and Descendants specs](./spec/features/reindex_pid_and_descendants_spec.rb).
+# Adapters
+An [AbstractAdapter](./lib/curate/indexer/adapters/abstract_adapter.rb) provides the method interface for others to build against.
+The [InMemory adapter](./lib/curate/indexer/adapters/in_memory_adapter.rb) is a reference implementation (and used to ease testing overhead).
+CurateND has implemented the [following adapter](https://github.com/ndlib/curate_nd/blob/master/lib/curate/library_collection_indexing_adapter.rb) for its LibraryCollection indexing.

data/curate-indexer.gemspec CHANGED

@@ -12,6 +12,7 @@ Gem::Specification.new do |spec|
   spec.summary       = %q{A playground for CurateND collections indexing}
   spec.description   = %q{A playground for CurateND collections indexing}
   spec.homepage      = "https://github.com/ndlib/curate-indexer"
+  spec.license       = "Apache-2.0"
   spec.files         = `git ls-files -z`.split("\x0").reject { |f| f.match(%r{^(test|spec|features)/}) }
   spec.bindir        = "bin"

data/lib/curate/indexer.rb CHANGED

@@ -5,7 +5,7 @@ require 'curate/indexer/configuration'
 require 'curate/indexer/railtie' if defined?(Rails)
 module Curate
-  # Responsible for performign the indexing of an object and its related child objects.
+  # Responsible for indexing an object and its related child objects.
   module Indexer
     # This assumes a rather deep graph
     DEFAULT_TIME_TO_LIVE = 15
@@ -18,7 +18,7 @@ module Curate
     # @return [Boolean] - It was successful
     # @raise Curate::Exceptions::CycleDetectionError - A potential cycle was detected
     def self.reindex_relationships(pid, time_to_live = DEFAULT_TIME_TO_LIVE)
-      RelationshipReindexer.call(pid: pid, time_to_live: time_to_live, adapter: configuration.adapter)
+      RelationshipReindexer.call(pid: pid, time_to_live: time_to_live, adapter: adapter)
       true
     end
@@ -34,10 +34,15 @@ module Curate
     # @return [Boolean] - It was successful
     # @raise Curate::Exceptions::CycleDetectionError - A potential cycle was detected
     def self.reindex_all!(time_to_live = DEFAULT_TIME_TO_LIVE)
-      RepositoryReindexer.call(time_to_live: time_to_live, pid_reindexer: method(:reindex_relationships), adapter: configuration.adapter)
+      # While the RepositoryReindexer is responsible for reindexing everything, I
+      # want to inject the lambda that will reindex a single item.
+      pid_reindexer = method(:reindex_relationships)
+      RepositoryReindexer.call(time_to_live: time_to_live, pid_reindexer: pid_reindexer, adapter: adapter)
       true
     end
+    # @api public
+    #
     # Contains the Curate::Indexer configuration information that is referenceable from wit
     # @see Curate::Indexer::Configuration
     def self.configuration
@@ -45,23 +50,31 @@ module Curate
     end
     # @api public
+    #
+    # Exposes the data adapter to use for the reindexing process.
+    #
+    # @see Curate::Indexer::Adapters::AbstractAdapter
+    # @return Object that implementes the Curate::Indexer::Adapters::AbstractAdapter method interface
     def self.adapter
       configuration.adapter
     end
     # @api public
+    #
+    # Capture the configuration information
+    #
     # @see Curate::Indexer::Configuration
     # @see .configuration
+    # @see Curate::Indexer::Railtie
     def self.configure(&block)
       @configuration_block = block
-      configure!
       # The Rails load sequence means that some of the configured Targets may
       # not be loaded; As such I am not calling configure! instead relying on
       # Curate::Indexer::Railtie to handle the configure! call
       configure! unless defined?(Rails)
     end
-    # @api public
+    # @api private
     def self.configure!
       return false unless @configuration_block.respond_to?(:call)
       @configuration_block.call(configuration)

data/lib/curate/indexer/configuration.rb CHANGED

@@ -11,8 +11,11 @@ module Curate
       private
+      IN_MEMORY_ADAPTER_WARNING_MESSAGE =
+        "WARNING: You are using the default Curate::Indexer::Adapters::InMemoryAdapter for the Curate::Indexer.adapter.".freeze
       def default_adapter
-        $stdout.puts "WARNING: You are using the default Curate::Indexer::Adapters::InMemoryAdapter for the Curate::Indexer.adapter."
+        $stdout.puts IN_MEMORY_ADAPTER_WARNING_MESSAGE unless defined?(SUPPRESS_MEMORY_ADAPTER_WARNING)
         require 'curate/indexer/adapters/in_memory_adapter'
         Adapters::InMemoryAdapter
       end

data/lib/curate/indexer/documents.rb CHANGED

@@ -12,10 +12,21 @@ module Curate
           @pid = keywords.fetch(:pid).to_s
           @parent_pids = Array(keywords.fetch(:parent_pids))
         end
-        attr_reader :pid, :parent_pids
+        # @api public
+        # @return String The Fedora object's PID
+        attr_reader :pid
+        # @api public
+        #
+        # All of the direct parents of the Fedora document associated with the given PID.
+        #
+        # This does not include grandparents, great-grandparents, etc.
+        # @return Array<String>
+        attr_reader :parent_pids
       end
-      # @api private
+      # @api public
       #
       # A rudimentary representation of what is needed to reindex Solr documents
       class IndexDocument
@@ -28,7 +39,39 @@ module Curate
           @pathnames = Array(keywords.fetch(:pathnames))
           @ancestors = Array(keywords.fetch(:ancestors))
         end
-        attr_reader :pid, :parent_pids, :pathnames, :ancestors
+        # @api public
+        # @return String The Fedora object's PID
+        attr_reader :pid
+        # @api public
+        #
+        # All of the direct parents of the Fedora document associated with the given PID.
+        #
+        # This does not include grandparents, great-grandparents, etc.
+        # @return Array<String>
+        attr_reader :parent_pids
+        # @api public
+        #
+        # All nodes in the graph are addressable by one or more pathnames.
+        #
+        # If I have A, with parent B, and B has parents C and D, we have the
+        # following pathnames:
+        #   [D/B/A, C/B/A]
+        #
+        # In the graph representation, we can get to A by going from D to B to A, or by going from C to B to A.
+        # @return Array<String>
+        attr_reader :pathnames
+        # @api public
+        #
+        # All of the :pathnames of each of the documents ancestors. If I have A, with parent B, and B has
+        # parents C and D then we have the following ancestors:
+        #   [D/B], [C/B]
+        #
+        # @return Array<String>
+        attr_reader :ancestors
         def sorted_parent_pids
           parent_pids.sort

data/lib/curate/indexer/railtie.rb CHANGED

@@ -5,7 +5,7 @@ module Curate
     # Connect into the boot sequence of a Rails application
     class Railtie < Rails::Railtie
       config.to_prepare do
-        Curate::Indexer.send(:configure!)
+        Curate::Indexer.configure!
       end
     end
   end

data/lib/curate/indexer/relationship_reindexer.rb CHANGED

@@ -21,6 +21,7 @@ module Curate
       end
       attr_reader :pid, :time_to_live, :queue, :adapter
+      # Perform a bread-first tree traversal of the initial document and its descendants.
       def call
         enqueue(initial_index_document, time_to_live)
         processing_document = dequeue
@@ -68,7 +69,9 @@ module Curate
       end
       # A small object that helps encapsulate the logic of building the hash of information regarding
-      # the initialization of an Index::Document
+      # the initialization of an Curate::Indexer::Documents::IndexDocument
+      #
+      # @see Curate::Indexer::Documents::IndexDocument for details on pathnames, ancestors, and parent_pids.
       class ParentAndPathAndAncestorsBuilder
         def initialize(preservation_document, adapter)
           @preservation_document = preservation_document

data/lib/curate/indexer/version.rb CHANGED

@@ -1,5 +1,5 @@
 module Curate
   module Indexer
-    VERSION = "0.2.2".freeze
+    VERSION = "0.2.3".freeze
   end
 end

metadata CHANGED

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: curate-indexer
 version: !ruby/object:Gem::Version
-  version: 0.2.2
+  version: 0.2.3
 platform: ruby
 authors:
 - Jeremy Friesen
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2016-07-14 00:00:00.000000000 Z
+date: 2016-12-05 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: bundler
@@ -266,7 +266,8 @@ files:
 - lib/curate/indexer/repository_reindexer.rb
 - lib/curate/indexer/version.rb
 homepage: https://github.com/ndlib/curate-indexer
-licenses: []
+licenses:
+- Apache-2.0
 metadata: {}
 post_install_message:
 rdoc_options: []
@@ -289,4 +290,3 @@ signing_key:
 specification_version: 4
 summary: A playground for CurateND collections indexing
 test_files: []
-has_rdoc: