RubyGems - preservation - Versions diffs - 0.4.2 → 0.5.0 - Mend

preservation 0.4.2 → 0.5.0

Files changed (14) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +9 -0
data/README.md +173 -14
data/lib/preservation.rb +2 -2
data/lib/preservation/builder.rb +5 -5
data/lib/preservation/report/database.rb +1 -2
data/lib/preservation/temporal.rb +3 -4
data/lib/preservation/transfer/base.rb +42 -0
data/lib/preservation/transfer/dataset.rb +258 -0
data/lib/preservation/version.rb +1 -1
data/preservation.gemspec +4 -4
metadata +10 -10
data/lib/preservation/ingest.rb +0 -38
data/lib/preservation/transfer/pure.rb +0 -259

checksums.yaml CHANGED

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: 26bdfccfba8fa2f79920f73a13effe5701fcafc1
-  data.tar.gz: 73e1d0a7a0060f115f5cc90df76f5341092f8fc8
+  metadata.gz: 54db84bdb0bc782f05420b420200e78b9394a6af
+  data.tar.gz: a243b3e89cdf0fe830df9eea16639094d5854af1
 SHA512:
-  metadata.gz: 2dbce48f44a040569acbfb7b5dc8082b48bf83e3495d2cbed13577b72a0ba38e0c20d4b459145797faac964bbc61d4ec069705d47d1068135a376e1a13c5328e
-  data.tar.gz: 975a2b8424ddb540f5c187f8aee2b752f54bd2b81bafc04d4d5626bd471f5fc244717d28db477b9c5e9a085e2da0d8d44047af178531adcd699c8e030a863d30
+  metadata.gz: 51d73c2067b1d48c7ce8a5eff9659fd0cb0059e59850e6a0d80c9865a9080a9e718e839b0d83aa2bde790fbcdae18d2eb7f26480f7f72ae9a74084fc7f6975f1
+  data.tar.gz: b58bd774f4905d98fee7be08f4a99a1bca5fa3bd115f8136c7a524c92937757e1ddfea63308ed589c4aaa11174b73499d1b43d993fee660df95e1ab533998a6a

data/CHANGELOG.md CHANGED

@@ -4,8 +4,17 @@ This project adheres to [Semantic Versioning](http://semver.org/).
 ## Unreleased
+## 0.5.0 - 2017-05-23
+### Changed
+- Transfer - created as ISO8601 date format.
+### Fixed
+- Transfer - handling DOIs of related works for both datasets and publications.
+- Transfer - handling missing DOIs of related works.
 ## 0.4.2 - 2017-05-18
 ### Fixed
+- Transfer - presence check for DOI of a related work.
 ## 0.4.1 - 2016-09-30
 ### Fixed

data/README.md CHANGED

@@ -1,6 +1,9 @@
 # Preservation
-Extraction and Transformation for Loading by Archivematica's Automation Tools.
+Extraction from the Pure Research Information System and transformation for
+loading by Archivematica.
+Includes transfer preparation, reporting and disk space management.
 ## Status
@@ -27,7 +30,9 @@ Or install it yourself as:
 ## Usage
 ### Configuration
-Configure Preservation. If ```log_path``` is omitted, logging (standard library) writes to STDOUT.
+Configure Preservation. If ```log_path``` is omitted, logging (standard library)
+writes to STDOUT.
 ```ruby
 Preservation.configure do |config|
@@ -37,24 +42,129 @@ Preservation.configure do |config|
 end
 ```
+Create a hash for passing to a transfer.
+```ruby
+# Pure host with authentication.
+config = {
+  url:      ENV['PURE_URL'],
+  username: ENV['PURE_USERNAME'],
+  password: ENV['PURE_PASSWORD']
+}
+```
+```ruby
+# Pure host without authentication.
+config = {
+  url: ENV['PURE_URL']
+}
+```
 ### Transfer
-Create a transfer using the Pure Research Information System as a data source.
+Configure a transfer to retrieve data from a Pure host.
+```ruby
+transfer = Preservation::Transfer::Dataset.new config
+```
+#### Single
+If necessary, fetch the metadata, prepare a directory in the ingest path and
+populate it with the files and JSON description file.
+```ruby
+transfer.prepare uuid: 'xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx'
+```
+#### Batch
+For multiple Pure datasets, if necessary, fetch the metadata, prepare a
+directory in the ingest path and populate it with the files and JSON description
+file.
+A maximum of 10 will be prepared using the doi_short directory naming scheme.
+Each dataset will only be prepared if 20 days have elapsed since the metadata
+record was last modified.
 ```ruby
-transfer = Preservation::Transfer::Pure.new base_url:   ENV['PURE_BASE_URL'],
-                                            username:   ENV['PURE_USERNAME'],
-                                            password:   ENV['PURE_PASSWORD'],
-                                            basic_auth: true
+transfer.prepare_batch max: 10,
+                       dir_scheme: :doi_short,
+                       delay: 20
 ```
-For a Pure dataset, if necessary, fetch the metadata, prepare
-a directory in the ingest path and populate it with the files and JSON description file.
+#### Directory name
+The following are permitted values for the dir_scheme parameter:
 ```ruby
-transfer.prepare_dataset uuid: 'xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx'
+:uuid_title
+:title_uuid
+:date_uuid_title
+:date_title_uuid
+:date_time_uuid
+:date_time_title
+:date_time_uuid_title
+:date_time_title_uuid
+:uuid
+:doi
+:doi_short
 ```
+#### Load directory
+A transfer-ready directory, with a name built according to the directory scheme
+specified, in this case doi_short. This particular example has only one file
+Ebola_data_Jun15.zip in the dataset.
+```
+.
+├── 10.17635-lancaster-researchdata-6
+│   ├── Ebola_data_Jun15.zip
+│   └── metadata
+│       └── metadata.json
+```
+metadata.json:
+```json
+[
+  {
+    "filename": "objects/Ebola_data_Jun15.zip",
+    "dc.title": "Ebolavirus evolution 2013-2015",
+    "dc.description": "Data used for analysis of selection and evolutionary rate in Zaire Ebolavirus variant Makona",
+    "dcterms.created": "2015-06-04",
+    "dcterms.available": "2015-06-04",
+    "dc.publisher": "Lancaster University",
+    "dc.identifier": "http://dx.doi.org/10.17635/lancaster/researchdata/6",
+    "dcterms.spatial": [
+      "Guinea, Sierra Leone, Liberia"
+    ],
+    "dc.creator": [
+      "Gatherer, Derek"
+    ],
+    "dc.contributor": [
+      "Robertson, David",
+      "Lovell, Simon"
+    ],
+    "dc.subject": [
+      "Ebolavirus",
+      "evolution",
+      "phylogenetics",
+      "virulence",
+      "Filoviridae",
+      "positive selection"
+    ],
+    "dcterms.license": "CC BY",
+    "dc.relation": [
+      "http://dx.doi.org/10.1136/ebmed-2014-110127",
+      "http://dx.doi.org/10.1099/vir.0.067199-0"
+    ]
+  }
+]
+```
+### Storage
 Free up disk space for completed transfers. Can be done at any time.
 ```ruby
@@ -62,13 +172,62 @@ Preservation::Storage.cleanup
 ```
 ### Report
 Can be used for scheduled monitoring of transfers.
 ```ruby
 Preservation::Report::Transfer.exception
 ```
-## Documentation
-[API in YARD](http://www.rubydoc.info/gems/preservation)
-[Detailed usage in GitBook](https://aalbinclark.gitbooks.io/preservation)
+Formatted as JSON:
+```json
+{
+  "pending": {
+    "count": 3,
+    "data": [
+      {
+        "path": "10.17635-lancaster-researchdata-72",
+        "path_timestamp": "2016-09-29 12:08:58 +0100"
+      },
+      {
+        "path": "10.17635-lancaster-researchdata-74",
+        "path_timestamp": "2016-09-29 12:08:59 +0100"
+      },
+      {
+        "path": "10.17635-lancaster-researchdata-75",
+        "path_timestamp": "2016-09-29 12:09:00 +0100"
+      }
+    ]
+  },
+  "current": {
+    "path": "10.17635-lancaster-researchdata-90",
+    "unit_type": "ingest",
+    "status": "PROCESSING",
+    "current": 1,
+    "id": 91,
+    "uuid": "ebf048c3-0ca8-409c-94cf-ab3e5d97e901",
+    "path_timestamp": "2016-09-28 17:09:33 +0100
+  },
+  "failed": {
+    "count": 0
+  },
+  "incomplete": {
+    "count": 1,
+    "data": [
+      {
+         "path": "10.17635-lancaster-researchdata-90",
+         "unit_type": "ingest",
+         "status": "PROCESSING",
+         "current": 1,
+         "id": 91,
+         "uuid": "ebf048c3-0ca8-409c-94cf-ab3e5d97e901",
+         "path_timestamp": "2016-09-28 17:09:33 +0100"
+      }
+    ]
+  },
+  "complete": {
+    "count": 78
+  }
+}
+```

data/lib/preservation.rb CHANGED

@@ -8,11 +8,11 @@ require 'preservation/configuration'
 require 'preservation/report/database'
 require 'preservation/report/transfer'
 require 'preservation/conversion'
-require 'preservation/ingest'
 require 'preservation/builder'
 require 'preservation/storage'
 require 'preservation/temporal'
-require 'preservation/transfer/pure'
+require 'preservation/transfer/base'
+require 'preservation/transfer/dataset'
 require 'preservation/version'
 # Top level namespace

data/lib/preservation/builder.rb CHANGED

@@ -35,9 +35,9 @@ module Preservation
     # @param directory_name_scheme [Symbol]
     # @return [String]
     def self.build_directory_name(metadata_record, directory_name_scheme)
-      doi = metadata_record['doi']
-      uuid = metadata_record['uuid']
-      title = metadata_record['title'].strip.gsub(' ', '-').gsub('/', '-')
+      doi = metadata_record[:doi]
+      uuid = metadata_record[:uuid]
+      title = metadata_record[:title].strip.gsub(' ', '-').gsub('/', '-')
       time = Time.new
       date = time.strftime("%Y-%m-%d")
       time = time.strftime("%H:%M:%S")
@@ -63,12 +63,12 @@ module Preservation
         when :uuid
           uuid
         when :doi
-          if doi.empty?
+          if doi.nil? || doi.empty?
             return ''
           end
           doi.gsub('/', '-')
         when :doi_short
-          if doi.empty?
+          if doi.nil? || doi.empty?
             return ''
           end
           doi_short_to_remove = 'http://dx.doi.org/'

data/lib/preservation/report/database.rb CHANGED

@@ -13,8 +13,7 @@ module Preservation
       # @return [SQLite3::Database]
        def self.db_connection(db_path)
         if db_path.nil?
-          puts 'Missing db_path'
-          exit
+          raise 'Missing db_path'
         end
         @db ||= SQLite3::Database.new db_path
       end

data/lib/preservation/temporal.rb CHANGED

@@ -6,13 +6,12 @@ module Preservation
     # time_to_preserve?
     #
-    # @param start_utc [String]
+    # @param start_utc [Time]
     # @param delay [Integer] days to wait (after start date) before preserving
     # @return [Boolean]
     def self.time_to_preserve?(start_utc, delay)
-      now = DateTime.now
-      start_datetime = DateTime.parse(start_utc)
-      days_since_start = (now - start_datetime).to_i # result in days
+      now = Time.now
+      days_since_start = (now - start_utc).to_i # result in days
       days_since_start >= delay ? true : false
     end

data/lib/preservation/transfer/base.rb ADDED

@@ -0,0 +1,42 @@
+module Preservation
+  module Transfer
+    # Transfer base
+    #
+    class Base
+      attr_reader :logger
+      def initialize
+        setup_logger
+        check_ingest_path
+       end
+      private
+      def check_ingest_path
+        if Preservation.ingest_path.nil?
+          @logger.error 'Missing ingest path'
+          exit
+        end
+      end
+      def setup_logger
+        if @logger.nil?
+          if Preservation.log_path.nil?
+            @logger = Logger.new STDOUT
+          else
+            # Keep data for today and the past 20 days
+            @logger = Logger.new File.new(Preservation.log_path, 'a'), 20, 'daily'
+          end
+        end
+        @logger.level = Logger::INFO
+      end
+    end
+  end
+end

data/lib/preservation/transfer/dataset.rb ADDED

@@ -0,0 +1,258 @@
+module Preservation
+  # Transfer preparation
+  #
+  module Transfer
+    # Transfer preparation for dataset
+    #
+    class Dataset < Preservation::Transfer::Base
+      # @param config [Hash]
+      def initialize(config)
+        super()
+        @config = config
+      end
+      # For given uuid, if necessary, fetch the metadata,
+      # prepare a directory in the ingest path and populate it with the files and
+      # JSON description file.
+      #
+      # @param uuid [String] uuid to preserve
+      # @param dir_scheme [Symbol] how to make directory name
+      # @param delay [Integer] days to wait (after modification date) before preserving
+      # @return [Boolean] indicates presence of metadata description file
+      def prepare(uuid: nil,
+                  dir_scheme: :uuid,
+                  delay: 0)
+        success = false
+        if uuid.nil?
+          @logger.error 'Missing ' + uuid
+          exit
+        end
+        dir_base_path = Preservation.ingest_path
+        dataset_extractor = Puree::Extractor::Dataset.new @config
+        d = dataset_extractor.find uuid: uuid
+        if !d
+          @logger.error 'No metadata for ' + uuid
+          exit
+        end
+        metadata_record = {
+          doi:   d.doi,
+          uuid:  d.uuid,
+          title: d.title
+        }
+        # configurable to become more human-readable
+        dir_name = Preservation::Builder.build_directory_name(metadata_record, dir_scheme)
+        # continue only if dir_name is not empty (e.g. because there was no DOI)
+        # continue only if there is no DB entry
+        # continue only if the dataset has a DOI
+        # continue only if there are files for this resource
+        # continue only if it is time to preserve
+        if !dir_name.nil? &&
+           !dir_name.empty? &&
+           !Preservation::Report::Transfer.in_db?(dir_name) &&
+           d.doi &&
+           !d.files.empty? &&
+           Preservation::Temporal.time_to_preserve?(d.modified, delay)
+          dir_file_path = dir_base_path + '/' + dir_name
+          dir_metadata_path = dir_file_path + '/metadata/'
+          metadata_filename = dir_metadata_path + 'metadata.json'
+          # calculate total size of data files
+          download_storage_required = 0
+          d.files.each { |i| download_storage_required += i.size.to_i }
+          # do we have enough space in filesystem to fetch data files?
+          if Preservation::Storage.enough_storage_for_download? download_storage_required
+            # @logger.info 'Sufficient disk space for ' + dir_file_path
+          else
+            @logger.error 'Insufficient disk space to store files fetched from Pure. Skipping ' + dir_file_path
+          end
+          # has metadata file been created? if so, files and metadata are in place
+          # continue only if files not present in ingest location
+          if !File.size? metadata_filename
+            @logger.info 'Preparing ' + dir_name + ', Pure UUID ' + d.uuid
+            data = []
+            d.files.each do |f|
+              o = package_metadata d, f
+              data << o
+              wget_str = Preservation::Builder.build_wget @config[:username],
+                                                          @config[:password],
+                                                          f.url
+              Dir.mkdir(dir_file_path) if !Dir.exists?(dir_file_path)
+              # fetch the file
+              Dir.chdir(dir_file_path) do
+                # puts 'Changing dir to ' + Dir.pwd
+                # puts 'Size of ' + f.name + ' is ' + File.size(f.name).to_s
+                if File.size?(f.name)
+                  # puts 'Should be deleting ' + f['name']
+                  File.delete(f.name)
+                end
+                # puts f.name + ' missing or empty'
+                # puts wget_str
+                `#{wget_str}`
+              end
+            end
+            Dir.mkdir(dir_metadata_path) if !Dir.exists?(dir_metadata_path)
+            pretty = JSON.pretty_generate( data, :indent => '  ')
+            # puts pretty
+            File.write(metadata_filename,pretty)
+            @logger.info 'Created ' + metadata_filename
+            success = true
+          else
+            @logger.info 'Skipping ' + dir_name + ', Pure UUID ' + d.uuid +
+                         ' because ' + metadata_filename + ' exists'
+          end
+        else
+          @logger.info 'Skipping ' + dir_name + ', Pure UUID ' + d.uuid
+        end
+        success
+      end
+      # For multiple datasets, if necessary, fetch the metadata,
+      # prepare a directory in the ingest path and populate it with the files and
+      # JSON description file.
+      #
+      # @param max [Integer] maximum to prepare, omit to set no maximum
+      # @param dir_scheme [Symbol] how to make directory name
+      # @param delay [Integer] days to wait (after modification date) before preserving
+      def prepare_batch(max: nil,
+                        dir_scheme: :uuid,
+                        delay: 30)
+        collection_extractor = Puree::Extractor::Collection.new config:   @config,
+                                                                resource: :dataset
+        count = collection_extractor.count
+        max = count if max.nil?
+        batch_size = 10
+        num_prepared = 0
+        0.step(count, batch_size) do |n|
+          dataset_collection = collection_extractor.find limit:  batch_size,
+                                                         offset: n
+          dataset_collection.each do |dataset|
+            success = prepare uuid:       dataset.uuid,
+                              dir_scheme: dir_scheme.to_sym,
+                              delay:      delay
+            num_prepared += 1 if success
+            exit if num_prepared == max
+          end
+        end
+      end
+      private
+      def package_metadata(d, f)
+          o = {}
+          o['filename'] = 'objects/' + f.name
+          o['dc.title'] = d.title
+          if d.description
+            o['dc.description'] = d.description
+          end
+          o['dcterms.created'] = d.created.strftime("%F")
+          if d.available
+            o['dcterms.available'] = d.available.strftime("%F")
+          end
+          o['dc.publisher'] = d.publisher
+          if d.doi
+            o['dc.identifier'] = d.doi
+          end
+          if !d.spatial_places.empty?
+            o['dcterms.spatial'] = d.spatial_places
+          end
+          temporal = d.temporal
+          temporal_range = ''
+          if temporal
+            if temporal.start
+              temporal_range << temporal.start.strftime("%F")
+              if temporal.end
+                temporal_range << '/'
+                temporal_range << temporal.end.strftime("%F")
+              end
+              o['dcterms.temporal'] = temporal_range
+            end
+          end
+          creators = []
+          contributors = []
+          all_persons = []
+          all_persons << d.persons_internal
+          all_persons << d.persons_external
+          all_persons << d.persons_other
+          all_persons.each do |person_type|
+            person_type.each do |i|
+              name = i.name.last_first if i.name
+              if i.role == 'Creator'
+                creators << name if name
+              end
+              if i.role == 'Contributor'
+                contributors << name if name
+              end
+            end
+          end
+          o['dc.creator'] = creators
+          if !contributors.empty?
+            o['dc.contributor'] = contributors
+          end
+          keywords = []
+          d.keywords.each { |i|
+            keywords << i
+          }
+          if !keywords.empty?
+            o['dc.subject'] = keywords
+          end
+          o['dcterms.license'] = f.license.name if f.license
+          # o['dc.format'] = f.mime
+          related = []
+          publications = d.publications
+          publications.each do |i|
+            if i.type === 'Dataset'
+              extractor = Puree::Extractor::Dataset.new @config
+              dataset = extractor.find uuid: i.uuid
+              doi = dataset.doi
+              if doi
+                related << doi
+              end
+            end
+            if i.type === 'Publication'
+              extractor = Puree::Extractor::Publication.new @config
+              publication = extractor.find uuid: i.uuid
+              dois = publication.dois
+              if !dois.empty?
+                # Only one needed
+                related << dois[0]
+              end
+            end
+          end
+          if !related.empty?
+            o['dc.relation'] = related
+          end
+          o
+      end
+    end
+  end
+end

data/lib/preservation/version.rb CHANGED

@@ -1,5 +1,5 @@
 module Preservation
   # Semantic version number
   #
-  VERSION = "0.4.2"
+  VERSION = "0.5.0"
 end

data/preservation.gemspec CHANGED

@@ -8,9 +8,9 @@ Gem::Specification.new do |spec|
   spec.version       = Preservation::VERSION
   spec.authors       = ["Adrian Albin-Clark"]
   spec.email         = ["a.albin-clark@lancaster.ac.uk"]
-  spec.summary       = %q{Extraction and Transformation for Loading by Archivematica's Automation Tools.}
-  spec.description   = %q{Extraction and Transformation for Loading by Archivematica's Automation Tools. Includes transfer preparation, reporting and disk space management.}
-  spec.homepage      = "https://aalbinclark.gitbooks.io/preservation"
+  spec.summary       = %q{Extraction from the Pure Research Information System and transformation for
+loading by Archivematica.}
+  spec.homepage      = "https://github.com/lulibrary/preservation"
   spec.license       = "MIT"
   spec.files         = `git ls-files -z`.split("\x0")
@@ -21,6 +21,6 @@ Gem::Specification.new do |spec|
   spec.required_ruby_version = '~> 2.1'
   spec.add_runtime_dependency 'free_disk_space', '~> 1.0'
-  spec.add_runtime_dependency 'puree', '~> 0.19'
+  spec.add_runtime_dependency 'puree', '~> 1.3'
   spec.add_runtime_dependency 'sqlite3', '~> 1.3'
 end

metadata CHANGED

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: preservation
 version: !ruby/object:Gem::Version
-  version: 0.4.2
+  version: 0.5.0
 platform: ruby
 authors:
 - Adrian Albin-Clark
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2017-05-18 00:00:00.000000000 Z
+date: 2017-05-23 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: free_disk_space
@@ -30,14 +30,14 @@ dependencies:
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: '0.19'
+        version: '1.3'
   type: :runtime
   prerelease: false
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: '0.19'
+        version: '1.3'
 - !ruby/object:Gem::Dependency
   name: sqlite3
   requirement: !ruby/object:Gem::Requirement
@@ -52,8 +52,7 @@ dependencies:
     - - "~>"
       - !ruby/object:Gem::Version
         version: '1.3'
-description: Extraction and Transformation for Loading by Archivematica's Automation
-  Tools. Includes transfer preparation, reporting and disk space management.
+description:
 email:
 - a.albin-clark@lancaster.ac.uk
 executables: []
@@ -71,15 +70,15 @@ files:
 - lib/preservation/builder.rb
 - lib/preservation/configuration.rb
 - lib/preservation/conversion.rb
-- lib/preservation/ingest.rb
 - lib/preservation/report/database.rb
 - lib/preservation/report/transfer.rb
 - lib/preservation/storage.rb
 - lib/preservation/temporal.rb
-- lib/preservation/transfer/pure.rb
+- lib/preservation/transfer/base.rb
+- lib/preservation/transfer/dataset.rb
 - lib/preservation/version.rb
 - preservation.gemspec
-homepage: https://aalbinclark.gitbooks.io/preservation
+homepage: https://github.com/lulibrary/preservation
 licenses:
 - MIT
 metadata: {}
@@ -102,5 +101,6 @@ rubyforge_project:
 rubygems_version: 2.2.2
 signing_key:
 specification_version: 4
-summary: Extraction and Transformation for Loading by Archivematica's Automation Tools.
+summary: Extraction from the Pure Research Information System and transformation for
+  loading by Archivematica.
 test_files: []

data/lib/preservation/ingest.rb DELETED

@@ -1,38 +0,0 @@
-module Preservation
-  # Ingest
-  #
-  class Ingest
-    attr_reader :logger
-    def initialize
-      setup_logger
-      check_ingest_path
-     end
-    private
-    def check_ingest_path
-      if Preservation.ingest_path.nil?
-        @logger.error 'Missing ingest path'
-        exit
-      end
-    end
-    def setup_logger
-      if @logger.nil?
-        if Preservation.log_path.nil?
-          @logger = Logger.new STDOUT
-        else
-          # Keep data for today and the past 20 days
-          @logger = Logger.new File.new(Preservation.log_path, 'a'), 20, 'daily'
-        end
-      end
-      @logger.level = Logger::INFO
-    end
-  end
-end

data/lib/preservation/transfer/pure.rb DELETED

@@ -1,259 +0,0 @@
-module Preservation
-  # Transfer preparation
-  #
-  module Transfer
-    # Transfer preparation for Pure
-    #
-    class Pure < Ingest
-      # @param base_url [String]
-      # @param username [String]
-      # @param password [String]
-      # @param basic_auth [Boolean]
-      def initialize(base_url: nil, username: nil, password: nil, basic_auth: nil)
-        super()
-        @base_url = base_url
-        @basic_auth = basic_auth
-        if basic_auth === true
-          @username = username
-          @password = password
-        end
-      end
-      # For given uuid, if necessary, fetch the metadata,
-      # prepare a directory in the ingest path and populate it with the files and
-      # JSON description file.
-      #
-      # @param uuid [String] uuid to preserve
-      # @param dir_scheme [Symbol] how to make directory name
-      # @param delay [Integer] days to wait (after modification date) before preserving
-      # @return [Boolean] indicates presence of metadata description file
-      def prepare_dataset(uuid: nil,
-                          dir_scheme: :uuid,
-                          delay: 0)
-        success = false
-        if uuid.nil?
-          @logger.error 'Missing ' + uuid
-          exit
-        end
-        dir_base_path = Preservation.ingest_path
-        dataset = Puree::Dataset.new base_url: @base_url,
-                                     username: @username,
-                                     password: @password,
-                                     basic_auth: @basic_auth
-        dataset.find uuid: uuid
-        d = dataset.metadata
-        if d.empty?
-          @logger.error 'No metadata for ' + uuid
-          exit
-        end
-        # configurable to become more human-readable
-        dir_name = Preservation::Builder.build_directory_name(d, dir_scheme)
-        # continue only if dir_name is not empty (e.g. because there was no DOI)
-        # continue only if there is no DB entry
-        # continue only if the dataset has a DOI
-        # continue only if there are files for this resource
-        # continue only if it is time to preserve
-        if !dir_name.nil? &&
-           !dir_name.empty? &&
-           !Preservation::Report::Transfer.in_db?(dir_name) &&
-           !d['doi'].empty? &&
-           !d['file'].empty? &&
-           Preservation::Temporal.time_to_preserve?(d['modified'], delay)
-          dir_file_path = dir_base_path + '/' + dir_name
-          dir_metadata_path = dir_file_path + '/metadata/'
-          metadata_filename = dir_metadata_path + 'metadata.json'
-          # calculate total size of data files
-          download_storage_required = 0
-          d['file'].each { |i| download_storage_required += i['size'].to_i }
-          # do we have enough space in filesystem to fetch data files?
-          if Preservation::Storage.enough_storage_for_download? download_storage_required
-            # @logger.info 'Sufficient disk space for ' + dir_file_path
-          else
-            @logger.error 'Insufficient disk space to store files fetched from Pure. Skipping ' + dir_file_path
-          end
-          # has metadata file been created? if so, files and metadata are in place
-          # continue only if files not present in ingest location
-          if !File.size? metadata_filename
-            @logger.info 'Preparing ' + dir_name + ', Pure UUID ' + d['uuid']
-            data = []
-            d['file'].each do |f|
-              o = package_dataset_metadata d, f
-              data << o
-              wget_str = Preservation::Builder.build_wget @username,
-                                                          @password,
-                                                          f['url']
-              Dir.mkdir(dir_file_path) if !Dir.exists?(dir_file_path)
-              # fetch the file
-              Dir.chdir(dir_file_path) do
-                # puts 'Changing dir to ' + Dir.pwd
-                # puts 'Size of ' + f['name'] + ' is ' + File.size(f['name']).to_s
-                if File.size?(f['name'])
-                  # puts 'Should be deleting ' + f['name']
-                  File.delete(f['name'])
-                end
-                # puts f['name'] + ' missing or empty'
-                # puts wget_str
-                `#{wget_str}`
-              end
-            end
-            Dir.mkdir(dir_metadata_path) if !Dir.exists?(dir_metadata_path)
-            pretty = JSON.pretty_generate( data, :indent => '  ')
-            # puts pretty
-            File.write(metadata_filename,pretty)
-            @logger.info 'Created ' + metadata_filename
-            success = true
-          else
-            @logger.info 'Skipping ' + dir_name + ', Pure UUID ' + d['uuid'] +
-                         ' because ' + metadata_filename + ' exists'
-          end
-        else
-          @logger.info 'Skipping ' + dir_name + ', Pure UUID ' + d['uuid']
-        end
-        success
-      end
-      # For multiple datasets, if necessary, fetch the metadata,
-      # prepare a directory in the ingest path and populate it with the files and
-      # JSON description file.
-      #
-      # @param max [Integer] maximum to prepare, omit to set no maximum
-      # @param dir_scheme [Symbol] how to make directory name
-      # @param delay [Integer] days to wait (after modification date) before preserving
-      def prepare_dataset_batch(max: nil,
-                                dir_scheme: :uuid,
-                                delay: 30)
-        collection = Puree::Collection.new resource:  :dataset,
-                                           base_url:   @base_url,
-                                           username:   @username,
-                                           password:   @password,
-                                           basic_auth: @basic_auth
-        count = collection.count
-        max = count if max.nil?
-        batch_size = 10
-        num_prepared = 0
-        0.step(count, batch_size) do |n|
-          minimal_metadata = collection.find limit:  batch_size,
-                                             offset: n,
-                                             full:   false
-          uuids = []
-          minimal_metadata.each do |i|
-            uuids << i['uuid']
-          end
-          uuids.each do |uuid|
-            success = prepare_dataset uuid:       uuid,
-                                      dir_scheme: dir_scheme.to_sym,
-                                      delay:      delay
-            num_prepared += 1 if success
-            exit if num_prepared == max
-          end
-        end
-      end
-      private
-      def package_dataset_metadata(d, f)
-          o = {}
-          o['filename'] = 'objects/' + f['name']
-          o['dc.title'] = d['title']
-          if !d['description'].empty?
-            o['dc.description'] = d['description']
-          end
-          o['dcterms.created'] = d['created']
-          if !d['available']['year'].empty?
-            o['dcterms.available'] = Puree::Date.iso(d['available'])
-          end
-          o['dc.publisher'] = d['publisher']
-          if !d['doi'].empty?
-            o['dc.identifier'] = d['doi']
-          end
-          if !d['spatial'].empty?
-            o['dcterms.spatial'] = d['spatial']
-          end
-          if !d['temporal']['start']['year'].empty?
-            temporal_range = ''
-            temporal_range << Puree::Date.iso(d['temporal']['start'])
-            if !d['temporal']['end']['year'].empty?
-              temporal_range << '/'
-              temporal_range << Puree::Date.iso(d['temporal']['end'])
-            end
-            o['dcterms.temporal'] = temporal_range
-          end
-          creators = []
-          contributors = []
-          person_types = %w(internal external other)
-          person_types.each do |person_type|
-            d['person'][person_type].each do |i|
-              if i['role'] == 'Creator'
-                creator = i['name']['last'] + ', ' + i['name']['first']
-                creators << creator
-              end
-              if i['role'] == 'Contributor'
-                contributor = i['name']['last'] + ', ' + i['name']['first']
-                contributors << contributor
-              end
-            end
-          end
-          o['dc.creator'] = creators
-          if !contributors.empty?
-            o['dc.contributor'] = contributors
-          end
-          keywords = []
-          d['keyword'].each { |i|
-            keywords << i
-          }
-          if !keywords.empty?
-            o['dc.subject'] = keywords
-          end
-          if !f['license']['name'].empty?
-            o['dcterms.license'] = f['license']['name']
-          end
-          # o['dc.format'] = f['mime']
-          related = []
-          publications = d['publication']
-          publications.each do |i|
-            pub = Puree::Publication.new base_url: @base_url,
-                                         username: @username,
-                                         password: @password,
-                                         basic_auth: @basic_auth
-            pub.find uuid: i['uuid']
-            doi = pub.doi
-            if doi
-              related << doi
-            end
-          end
-          if !related.empty?
-            o['dc.relation'] = related
-          end
-          o
-      end
-    end
-  end
-end