RubyGems - preservation - Versions diffs - 0.4.2 → 0.5.0 - Mend

preservation 0.4.2 → 0.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (14) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +9 -0
data/README.md +173 -14
data/lib/preservation.rb +2 -2
data/lib/preservation/builder.rb +5 -5
data/lib/preservation/report/database.rb +1 -2
data/lib/preservation/temporal.rb +3 -4
data/lib/preservation/transfer/base.rb +42 -0
data/lib/preservation/transfer/dataset.rb +258 -0
data/lib/preservation/version.rb +1 -1
data/preservation.gemspec +4 -4
metadata +10 -10
data/lib/preservation/ingest.rb +0 -38
data/lib/preservation/transfer/pure.rb +0 -259

checksums.yaml CHANGED

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: 26bdfccfba8fa2f79920f73a13effe5701fcafc1
-  data.tar.gz: 73e1d0a7a0060f115f5cc90df76f5341092f8fc8
+  metadata.gz: 54db84bdb0bc782f05420b420200e78b9394a6af
+  data.tar.gz: a243b3e89cdf0fe830df9eea16639094d5854af1
 SHA512:
-  metadata.gz: 2dbce48f44a040569acbfb7b5dc8082b48bf83e3495d2cbed13577b72a0ba38e0c20d4b459145797faac964bbc61d4ec069705d47d1068135a376e1a13c5328e
-  data.tar.gz: 975a2b8424ddb540f5c187f8aee2b752f54bd2b81bafc04d4d5626bd471f5fc244717d28db477b9c5e9a085e2da0d8d44047af178531adcd699c8e030a863d30
+  metadata.gz: 51d73c2067b1d48c7ce8a5eff9659fd0cb0059e59850e6a0d80c9865a9080a9e718e839b0d83aa2bde790fbcdae18d2eb7f26480f7f72ae9a74084fc7f6975f1
+  data.tar.gz: b58bd774f4905d98fee7be08f4a99a1bca5fa3bd115f8136c7a524c92937757e1ddfea63308ed589c4aaa11174b73499d1b43d993fee660df95e1ab533998a6a

data/CHANGELOG.md CHANGED

@@ -4,8 +4,17 @@ This project adheres to [Semantic Versioning](http://semver.org/).
 ## Unreleased
+## 0.5.0 - 2017-05-23
+### Changed
+- Transfer - created as ISO8601 date format.
+### Fixed
+- Transfer - handling DOIs of related works for both datasets and publications.
+- Transfer - handling missing DOIs of related works.
 ## 0.4.2 - 2017-05-18
 ### Fixed
+- Transfer - presence check for DOI of a related work.
 ## 0.4.1 - 2016-09-30
 ### Fixed

data/README.md CHANGED

@@ -1,6 +1,9 @@
 # Preservation
-Extraction and Transformation for Loading by Archivematica's Automation Tools.
+Extraction from the Pure Research Information System and transformation for
+loading by Archivematica.
+Includes transfer preparation, reporting and disk space management.
 ## Status
@@ -27,7 +30,9 @@ Or install it yourself as:
 ## Usage
 ### Configuration
-Configure Preservation. If ```log_path``` is omitted, logging (standard library) writes to STDOUT.
+Configure Preservation. If ```log_path``` is omitted, logging (standard library)
+writes to STDOUT.
 ```ruby
 Preservation.configure do |config|
@@ -37,24 +42,129 @@ Preservation.configure do |config|
 end
 ```
+Create a hash for passing to a transfer.
+```ruby
+# Pure host with authentication.
+config = {
+  url:      ENV['PURE_URL'],
+  username: ENV['PURE_USERNAME'],
+  password: ENV['PURE_PASSWORD']
+}
+```
+```ruby
+# Pure host without authentication.
+config = {
+  url: ENV['PURE_URL']
+}
+```
 ### Transfer
-Create a transfer using the Pure Research Information System as a data source.
+Configure a transfer to retrieve data from a Pure host.
+```ruby
+transfer = Preservation::Transfer::Dataset.new config
+```
+#### Single
+If necessary, fetch the metadata, prepare a directory in the ingest path and
+populate it with the files and JSON description file.
+```ruby
+transfer.prepare uuid: 'xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx'
+```
+#### Batch
+For multiple Pure datasets, if necessary, fetch the metadata, prepare a
+directory in the ingest path and populate it with the files and JSON description
+file.
+A maximum of 10 will be prepared using the doi_short directory naming scheme.
+Each dataset will only be prepared if 20 days have elapsed since the metadata
+record was last modified.
 ```ruby
-transfer = Preservation::Transfer::Pure.new base_url:   ENV['PURE_BASE_URL'],
-                                            username:   ENV['PURE_USERNAME'],
-                                            password:   ENV['PURE_PASSWORD'],
-                                            basic_auth: true
+transfer.prepare_batch max: 10,
+                       dir_scheme: :doi_short,
+                       delay: 20
 ```
-For a Pure dataset, if necessary, fetch the metadata, prepare
-a directory in the ingest path and populate it with the files and JSON description file.
+#### Directory name
+The following are permitted values for the dir_scheme parameter:
 ```ruby
-transfer.prepare_dataset uuid: 'xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx'
+:uuid_title
+:title_uuid
+:date_uuid_title
+:date_title_uuid
+:date_time_uuid
+:date_time_title
+:date_time_uuid_title
+:date_time_title_uuid
+:uuid
+:doi
+:doi_short
 ```
+#### Load directory
+A transfer-ready directory, with a name built according to the directory scheme
+specified, in this case doi_short. This particular example has only one file
+Ebola_data_Jun15.zip in the dataset.
+```
+.
+├── 10.17635-lancaster-researchdata-6
+│   ├── Ebola_data_Jun15.zip
+│   └── metadata
+│       └── metadata.json
+```
+metadata.json:
+```json
+[
+  {
+    "filename": "objects/Ebola_data_Jun15.zip",
+    "dc.title": "Ebolavirus evolution 2013-2015",
+    "dc.description": "Data used for analysis of selection and evolutionary rate in Zaire Ebolavirus variant Makona",
+    "dcterms.created": "2015-06-04",
+    "dcterms.available": "2015-06-04",
+    "dc.publisher": "Lancaster University",
+    "dc.identifier": "http://dx.doi.org/10.17635/lancaster/researchdata/6",
+    "dcterms.spatial": [
+      "Guinea, Sierra Leone, Liberia"
+    ],
+    "dc.creator": [
+      "Gatherer, Derek"
+    ],
+    "dc.contributor": [
+      "Robertson, David",
+      "Lovell, Simon"
+    ],
+    "dc.subject": [
+      "Ebolavirus",
+      "evolution",
+      "phylogenetics",
+      "virulence",
+      "Filoviridae",
+      "positive selection"
+    ],
+    "dcterms.license": "CC BY",
+    "dc.relation": [
+      "http://dx.doi.org/10.1136/ebmed-2014-110127",
+      "http://dx.doi.org/10.1099/vir.0.067199-0"
+    ]
+  }
+]
+```
+### Storage
 Free up disk space for completed transfers. Can be done at any time.
 ```ruby
@@ -62,13 +172,62 @@ Preservation::Storage.cleanup
 ```
 ### Report
 Can be used for scheduled monitoring of transfers.
 ```ruby
 Preservation::Report::Transfer.exception
 ```
-## Documentation
-[API in YARD](http://www.rubydoc.info/gems/preservation)
-[Detailed usage in GitBook](https://aalbinclark.gitbooks.io/preservation)
+Formatted as JSON:
+```json
+{
+  "pending": {
+    "count": 3,
+    "data": [
+      {
+        "path": "10.17635-lancaster-researchdata-72",
+        "path_timestamp": "2016-09-29 12:08:58 +0100"
+      },
+      {
+        "path": "10.17635-lancaster-researchdata-74",
+        "path_timestamp": "2016-09-29 12:08:59 +0100"
+      },
+      {
+        "path": "10.17635-lancaster-researchdata-75",
+        "path_timestamp": "2016-09-29 12:09:00 +0100"
+      }
+    ]
+  },
+  "current": {
+    "path": "10.17635-lancaster-researchdata-90",
+    "unit_type": "ingest",
+    "status": "PROCESSING",
+    "current": 1,
+    "id": 91,
+    "uuid": "ebf048c3-0ca8-409c-94cf-ab3e5d97e901",
+    "path_timestamp": "2016-09-28 17:09:33 +0100
+  },
+  "failed": {
+    "count": 0
+  },
+  "incomplete": {
+    "count": 1,
+    "data": [
+      {
+         "path": "10.17635-lancaster-researchdata-90",
+         "unit_type": "ingest",
+         "status": "PROCESSING",
+         "current": 1,
+         "id": 91,
+         "uuid": "ebf048c3-0ca8-409c-94cf-ab3e5d97e901",
+         "path_timestamp": "2016-09-28 17:09:33 +0100"
+      }
+    ]
+  },
+  "complete": {
+    "count": 78
+  }
+}
+```

data/lib/preservation.rb CHANGED

@@ -8,11 +8,11 @@ require 'preservation/configuration'
 require 'preservation/report/database'
 require 'preservation/report/transfer'
 require 'preservation/conversion'
-require 'preservation/ingest'
 require 'preservation/builder'
 require 'preservation/storage'
 require 'preservation/temporal'
-require 'preservation/transfer/pure'
+require 'preservation/transfer/base'
+require 'preservation/transfer/dataset'
 require 'preservation/version'
 # Top level namespace

data/lib/preservation/builder.rb CHANGED

@@ -35,9 +35,9 @@ module Preservation
     # @param directory_name_scheme [Symbol]
     # @return [String]
     def self.build_directory_name(metadata_record, directory_name_scheme)
-      doi = metadata_record['doi']
-      uuid = metadata_record['uuid']
-      title = metadata_record['title'].strip.gsub(' ', '-').gsub('/', '-')
+      doi = metadata_record[:doi]
+      uuid = metadata_record[:uuid]
+      title = metadata_record[:title].strip.gsub(' ', '-').gsub('/', '-')
       time = Time.new
       date = time.strftime("%Y-%m-%d")
       time = time.strftime("%H:%M:%S")
@@ -63,12 +63,12 @@ module Preservation
         when :uuid
           uuid
         when :doi
-          if doi.empty?
+          if doi.nil? || doi.empty?
             return ''
           end
           doi.gsub('/', '-')
         when :doi_short
-          if doi.empty?
+          if doi.nil? || doi.empty?
             return ''
           end
           doi_short_to_remove = 'http://dx.doi.org/'

data/lib/preservation/report/database.rb CHANGED

@@ -13,8 +13,7 @@ module Preservation
       # @return [SQLite3::Database]
        def self.db_connection(db_path)
         if db_path.nil?
-          puts 'Missing db_path'
-          exit
+          raise 'Missing db_path'
         end
         @db ||= SQLite3::Database.new db_path
       end

data/lib/preservation/temporal.rb CHANGED

@@ -6,13 +6,12 @@ module Preservation
     # time_to_preserve?
     #
-    # @param start_utc [String]
+    # @param start_utc [Time]
     # @param delay [Integer] days to wait (after start date) before preserving
     # @return [Boolean]
     def self.time_to_preserve?(start_utc, delay)
-      now = DateTime.now
-      start_datetime = DateTime.parse(start_utc)
-      days_since_start = (now - start_datetime).to_i # result in days
+      now = Time.now
+      days_since_start = (now - start_utc).to_i # result in days
       days_since_start >= delay ? true : false
     end

data/lib/preservation/transfer/base.rb ADDED

@@ -0,0 +1,42 @@
+module Preservation
+  module Transfer
+    # Transfer base
+    #
+    class Base
+      attr_reader :logger
+      def initialize
+        setup_logger
+        check_ingest_path
+       end
+      private
+      def check_ingest_path
+        if Preservation.ingest_path.nil?
+          @logger.error 'Missing ingest path'
+          exit
+        end
+      end
+      def setup_logger
+        if @logger.nil?
+          if Preservation.log_path.nil?
+            @logger = Logger.new STDOUT
+          else
+            # Keep data for today and the past 20 days
+            @logger = Logger.new File.new(Preservation.log_path, 'a'), 20, 'daily'
+          end
+        end
+        @logger.level = Logger::INFO
+      end
+    end
+  end
+end

data/lib/preservation/transfer/dataset.rb ADDED

@@ -0,0 +1,258 @@
+module Preservation
+  # Transfer preparation
+  #
+  module Transfer
+    # Transfer preparation for dataset
+    #
+    class Dataset < Preservation::Transfer::Base
+      # @param config [Hash]
+      def initialize(config)
+        super()
+        @config = config
+      end
+      # For given uuid, if necessary, fetch the metadata,
+      # prepare a directory in the ingest path and populate it with the files and
+      # JSON description file.
+      #
+      # @param uuid [String] uuid to preserve
+      # @param dir_scheme [Symbol] how to make directory name
+      # @param delay [Integer] days to wait (after modification date) before preserving
+      # @return [Boolean] indicates presence of metadata description file
+      def prepare(uuid: nil,
+                  dir_scheme: :uuid,
+                  delay: 0)
+        success = false
+        if uuid.nil?
+          @logger.error 'Missing ' + uuid
+          exit
+        end
+        dir_base_path = Preservation.ingest_path
+        dataset_extractor = Puree::Extractor::Dataset.new @config
+        d = dataset_extractor.find uuid: uuid
+        if !d
+          @logger.error 'No metadata for ' + uuid
+          exit
+        end
+        metadata_record = {
+          doi:   d.doi,
+          uuid:  d.uuid,
+          title: d.title
+        }
+        # configurable to become more human-readable
+        dir_name = Preservation::Builder.build_directory_name(metadata_record, dir_scheme)
+        # continue only if dir_name is not empty (e.g. because there was no DOI)
+        # continue only if there is no DB entry
+        # continue only if the dataset has a DOI
+        # continue only if there are files for this resource
+        # continue only if it is time to preserve
+        if !dir_name.nil? &&
+           !dir_name.empty? &&
+           !Preservation::Report::Transfer.in_db?(dir_name) &&
+           d.doi &&
+           !d.files.empty? &&
+           Preservation::Temporal.time_to_preserve?(d.modified, delay)
+          dir_file_path = dir_base_path + '/' + dir_name
+          dir_metadata_path = dir_file_path + '/metadata/'
+          metadata_filename = dir_metadata_path + 'metadata.json'
+          # calculate total size of data files
+          download_storage_required = 0
+          d.files.each { |i| download_storage_required += i.size.to_i }
+          # do we have enough space in filesystem to fetch data files?
+          if Preservation::Storage.enough_storage_for_download? download_storage_required
+            # @logger.info 'Sufficient disk space for ' + dir_file_path
+          else
+            @logger.error 'Insufficient disk space to store files fetched from Pure. Skipping ' + dir_file_path
+          end
+          # has metadata file been created? if so, files and metadata are in place
+          # continue only if files not present in ingest location
+          if !File.size? metadata_filename
+            @logger.info 'Preparing ' + dir_name + ', Pure UUID ' + d.uuid
+            data = []
+            d.files.each do |f|
+              o = package_metadata d, f
+              data << o
+              wget_str = Preservation::Builder.build_wget @config[:username],
+                                                          @config[:password],
+                                                          f.url
+              Dir.mkdir(dir_file_path) if !Dir.exists?(dir_file_path)
+              # fetch the file
+              Dir.chdir(dir_file_path) do
+                # puts 'Changing dir to ' + Dir.pwd
+                # puts 'Size of ' + f.name + ' is ' + File.size(f.name).to_s
+                if File.size?(f.name)
+                  # puts 'Should be deleting ' + f['name']
+                  File.delete(f.name)
+                end
+                # puts f.name + ' missing or empty'
+                # puts wget_str
+                `#{wget_str}`
+              end
+            end
+            Dir.mkdir(dir_metadata_path) if !Dir.exists?(dir_metadata_path)
+            pretty = JSON.pretty_generate( data, :indent => '  ')
+            # puts pretty
+            File.write(metadata_filename,pretty)
+            @logger.info 'Created ' + metadata_filename
+            success = true
+          else
+            @logger.info 'Skipping ' + dir_name + ', Pure UUID ' + d.uuid +
+                         ' because ' + metadata_filename + ' exists'
+          end
+        else
+          @logger.info 'Skipping ' + dir_name + ', Pure UUID ' + d.uuid
+        end
+        success
+      end
+      # For multiple datasets, if necessary, fetch the metadata,
+      # prepare a directory in the ingest path and populate it with the files and
+      # JSON description file.
+      #
+      # @param max [Integer] maximum to prepare, omit to set no maximum
+      # @param dir_scheme [Symbol] how to make directory name
+      # @param delay [Integer] days to wait (after modification date) before preserving
+      def prepare_batch(max: nil,
+                        dir_scheme: :uuid,
+                        delay: 30)
+        collection_extractor = Puree::Extractor::Collection.new config:   @config,
+                                                                resource: :dataset
+        count = collection_extractor.count
+        max = count if max.nil?
+        batch_size = 10
+        num_prepared = 0
+        0.step(count, batch_size) do |n|
+          dataset_collection = collection_extractor.find limit:  batch_size,
+                                                         offset: n
+          dataset_collection.each do |dataset|
+            success = prepare uuid:       dataset.uuid,
+                              dir_scheme: dir_scheme.to_sym,
+                              delay:      delay
+            num_prepared += 1 if success
+            exit if num_prepared == max
+          end
+        end
+      end
+      private
+      def package_metadata(d, f)
+          o = {}
+          o['filename'] = 'objects/' + f.name
+          o['dc.title'] = d.title
+          if d.description
+            o['dc.description'] = d.description
+          end
+          o['dcterms.created'] = d.created.strftime("%F")
+          if d.available
+            o['dcterms.available'] = d.available.strftime("%F")
+          end
+          o['dc.publisher'] = d.publisher
+          if d.doi
+            o['dc.identifier'] = d.doi
+          end
+          if !d.spatial_places.empty?
+            o['dcterms.spatial'] = d.spatial_places
+          end
+          temporal = d.temporal
+          temporal_range = ''
+          if temporal
+            if temporal.start
+              temporal_range << temporal.start.strftime("%F")
+              if temporal.end
+                temporal_range << '/'
+                temporal_range << temporal.end.strftime("%F")
+              end
+              o['dcterms.temporal'] = temporal_range
+            end
+          end
+          creators = []
+          contributors = []
+          all_persons = []
+          all_persons << d.persons_internal
+          all_persons << d.persons_external
+          all_persons << d.persons_other
+          all_persons.each do |person_type|
+            person_type.each do |i|
+              name = i.name.last_first if i.name
+              if i.role == 'Creator'
+                creators << name if name
+              end
+              if i.role == 'Contributor'
+                contributors << name if name
+              end
+            end
+          end
+          o['dc.creator'] = creators
+          if !contributors.empty?
+            o['dc.contributor'] = contributors
+          end
+          keywords = []
+          d.keywords.each { |i|
+            keywords << i
+          }
+          if !keywords.empty?
+            o['dc.subject'] = keywords
+          end
+          o['dcterms.license'] = f.license.name if f.license
+          # o['dc.format'] = f.mime
+          related = []
+          publications = d.publications
+          publications.each do |i|
+            if i.type === 'Dataset'
+              extractor = Puree::Extractor::Dataset.new @config
+              dataset = extractor.find uuid: i.uuid
+              doi = dataset.doi
+              if doi
+                related << doi
+              end
+            end
+            if i.type === 'Publication'
+              extractor = Puree::Extractor::Publication.new @config
+              publication = extractor.find uuid: i.uuid
+              dois = publication.dois
+              if !dois.empty?
+                # Only one needed
+                related << dois[0]
+              end
+            end
+          end
+          if !related.empty?
+            o['dc.relation'] = related
+          end
+          o
+      end
+    end
+  end
+end

data/lib/preservation/version.rb CHANGED

@@ -1,5 +1,5 @@
 module Preservation
   # Semantic version number
   #
-  VERSION = "0.4.2"
+  VERSION = "0.5.0"
 end

data/preservation.gemspec CHANGED

@@ -8,9 +8,9 @@ Gem::Specification.new do |spec|
   spec.version       = Preservation::VERSION
   spec.authors       = ["Adrian Albin-Clark"]
   spec.email         = ["a.albin-clark@lancaster.ac.uk"]
-  spec.summary       = %q{Extraction and Transformation for Loading by Archivematica's Automation Tools.}
-  spec.description   = %q{Extraction and Transformation for Loading by Archivematica's Automation Tools. Includes transfer preparation, reporting and disk space management.}
-  spec.homepage      = "https://aalbinclark.gitbooks.io/preservation"
+  spec.summary       = %q{Extraction from the Pure Research Information System and transformation for
+loading by Archivematica.}
+  spec.homepage      = "https://github.com/lulibrary/preservation"
   spec.license       = "MIT"
   spec.files         = `git ls-files -z`.split("\x0")
@@ -21,6 +21,6 @@ Gem::Specification.new do |spec|
   spec.required_ruby_version = '~> 2.1'
   spec.add_runtime_dependency 'free_disk_space', '~> 1.0'
-  spec.add_runtime_dependency 'puree', '~> 0.19'
+  spec.add_runtime_dependency 'puree', '~> 1.3'
   spec.add_runtime_dependency 'sqlite3', '~> 1.3'
 end

metadata CHANGED

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: preservation
 version: !ruby/object:Gem::Version
-  version: 0.4.2
+  version: 0.5.0
 platform: ruby
 authors:
 - Adrian Albin-Clark
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2017-05-18 00:00:00.000000000 Z
+date: 2017-05-23 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: free_disk_space
@@ -30,14 +30,14 @@ dependencies:
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: '0.19'
+        version: '1.3'
   type: :runtime
   prerelease: false
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: '0.19'
+        version: '1.3'
 - !ruby/object:Gem::Dependency
   name: sqlite3
   requirement: !ruby/object:Gem::Requirement
@@ -52,8 +52,7 @@ dependencies:
     - - "~>"
       - !ruby/object:Gem::Version
         version: '1.3'
-description: Extraction and Transformation for Loading by Archivematica's Automation
-  Tools. Includes transfer preparation, reporting and disk space management.
+description:
 email:
 - a.albin-clark@lancaster.ac.uk
 executables: []
@@ -71,15 +70,15 @@ files:
 - lib/preservation/builder.rb
 - lib/preservation/configuration.rb
 - lib/preservation/conversion.rb
-- lib/preservation/ingest.rb
 - lib/preservation/report/database.rb
 - lib/preservation/report/transfer.rb
 - lib/preservation/storage.rb
 - lib/preservation/temporal.rb
-- lib/preservation/transfer/pure.rb
+- lib/preservation/transfer/base.rb
+- lib/preservation/transfer/dataset.rb
 - lib/preservation/version.rb
 - preservation.gemspec
-homepage: https://aalbinclark.gitbooks.io/preservation
+homepage: https://github.com/lulibrary/preservation
 licenses:
 - MIT
 metadata: {}
@@ -102,5 +101,6 @@ rubyforge_project:
 rubygems_version: 2.2.2
 signing_key:
 specification_version: 4
-summary: Extraction and Transformation for Loading by Archivematica's Automation Tools.
+summary: Extraction from the Pure Research Information System and transformation for
+  loading by Archivematica.
 test_files: []

data/lib/preservation/ingest.rb DELETED

@@ -1,38 +0,0 @@
-module Preservation
-  # Ingest
-  #
-  class Ingest
-    attr_reader :logger
-    def initialize
-      setup_logger
-      check_ingest_path
-     end
-    private
-    def check_ingest_path
-      if Preservation.ingest_path.nil?
-        @logger.error 'Missing ingest path'
-        exit
-      end
-    end
-    def setup_logger
-      if @logger.nil?
-        if Preservation.log_path.nil?
-          @logger = Logger.new STDOUT
-        else
-          # Keep data for today and the past 20 days
-          @logger = Logger.new File.new(Preservation.log_path, 'a'), 20, 'daily'
-        end
-      end
-      @logger.level = Logger::INFO
-    end
-  end
-end

data/lib/preservation/transfer/pure.rb DELETED

@@ -1,259 +0,0 @@
-module Preservation
-  # Transfer preparation
-  #
-  module Transfer
-    # Transfer preparation for Pure
-    #
-    class Pure < Ingest
-      # @param base_url [String]
-      # @param username [String]
-      # @param password [String]
-      # @param basic_auth [Boolean]
-      def initialize(base_url: nil, username: nil, password: nil, basic_auth: nil)
-        super()
-        @base_url = base_url
-        @basic_auth = basic_auth
-        if basic_auth === true
-          @username = username
-          @password = password
-        end
-      end
-      # For given uuid, if necessary, fetch the metadata,
-      # prepare a directory in the ingest path and populate it with the files and
-      # JSON description file.
-      #
-      # @param uuid [String] uuid to preserve
-      # @param dir_scheme [Symbol] how to make directory name
-      # @param delay [Integer] days to wait (after modification date) before preserving
-      # @return [Boolean] indicates presence of metadata description file
-      def prepare_dataset(uuid: nil,
-                          dir_scheme: :uuid,
-                          delay: 0)
-        success = false
-        if uuid.nil?
-          @logger.error 'Missing ' + uuid
-          exit
-        end
-        dir_base_path = Preservation.ingest_path
-        dataset = Puree::Dataset.new base_url: @base_url,
-                                     username: @username,
-                                     password: @password,
-                                     basic_auth: @basic_auth
-        dataset.find uuid: uuid
-        d = dataset.metadata
-        if d.empty?
-          @logger.error 'No metadata for ' + uuid
-          exit
-        end
-        # configurable to become more human-readable
-        dir_name = Preservation::Builder.build_directory_name(d, dir_scheme)
-        # continue only if dir_name is not empty (e.g. because there was no DOI)
-        # continue only if there is no DB entry
-        # continue only if the dataset has a DOI
-        # continue only if there are files for this resource
-        # continue only if it is time to preserve
-        if !dir_name.nil? &&
-           !dir_name.empty? &&
-           !Preservation::Report::Transfer.in_db?(dir_name) &&
-           !d['doi'].empty? &&
-           !d['file'].empty? &&
-           Preservation::Temporal.time_to_preserve?(d['modified'], delay)
-          dir_file_path = dir_base_path + '/' + dir_name
-          dir_metadata_path = dir_file_path + '/metadata/'
-          metadata_filename = dir_metadata_path + 'metadata.json'
-          # calculate total size of data files
-          download_storage_required = 0
-          d['file'].each { |i| download_storage_required += i['size'].to_i }
-          # do we have enough space in filesystem to fetch data files?
-          if Preservation::Storage.enough_storage_for_download? download_storage_required
-            # @logger.info 'Sufficient disk space for ' + dir_file_path
-          else
-            @logger.error 'Insufficient disk space to store files fetched from Pure. Skipping ' + dir_file_path
-          end
-          # has metadata file been created? if so, files and metadata are in place
-          # continue only if files not present in ingest location
-          if !File.size? metadata_filename
-            @logger.info 'Preparing ' + dir_name + ', Pure UUID ' + d['uuid']
-            data = []
-            d['file'].each do |f|
-              o = package_dataset_metadata d, f
-              data << o
-              wget_str = Preservation::Builder.build_wget @username,
-                                                          @password,
-                                                          f['url']
-              Dir.mkdir(dir_file_path) if !Dir.exists?(dir_file_path)
-              # fetch the file
-              Dir.chdir(dir_file_path) do
-                # puts 'Changing dir to ' + Dir.pwd
-                # puts 'Size of ' + f['name'] + ' is ' + File.size(f['name']).to_s
-                if File.size?(f['name'])
-                  # puts 'Should be deleting ' + f['name']
-                  File.delete(f['name'])
-                end
-                # puts f['name'] + ' missing or empty'
-                # puts wget_str
-                `#{wget_str}`
-              end
-            end
-            Dir.mkdir(dir_metadata_path) if !Dir.exists?(dir_metadata_path)
-            pretty = JSON.pretty_generate( data, :indent => '  ')
-            # puts pretty
-            File.write(metadata_filename,pretty)
-            @logger.info 'Created ' + metadata_filename
-            success = true
-          else
-            @logger.info 'Skipping ' + dir_name + ', Pure UUID ' + d['uuid'] +
-                         ' because ' + metadata_filename + ' exists'
-          end
-        else
-          @logger.info 'Skipping ' + dir_name + ', Pure UUID ' + d['uuid']
-        end
-        success
-      end
-      # For multiple datasets, if necessary, fetch the metadata,
-      # prepare a directory in the ingest path and populate it with the files and
-      # JSON description file.
-      #
-      # @param max [Integer] maximum to prepare, omit to set no maximum
-      # @param dir_scheme [Symbol] how to make directory name
-      # @param delay [Integer] days to wait (after modification date) before preserving
-      def prepare_dataset_batch(max: nil,
-                                dir_scheme: :uuid,
-                                delay: 30)
-        collection = Puree::Collection.new resource:  :dataset,
-                                           base_url:   @base_url,
-                                           username:   @username,
-                                           password:   @password,
-                                           basic_auth: @basic_auth
-        count = collection.count
-        max = count if max.nil?
-        batch_size = 10
-        num_prepared = 0
-        0.step(count, batch_size) do |n|
-          minimal_metadata = collection.find limit:  batch_size,
-                                             offset: n,
-                                             full:   false
-          uuids = []
-          minimal_metadata.each do |i|
-            uuids << i['uuid']
-          end
-          uuids.each do |uuid|
-            success = prepare_dataset uuid:       uuid,
-                                      dir_scheme: dir_scheme.to_sym,
-                                      delay:      delay
-            num_prepared += 1 if success
-            exit if num_prepared == max
-          end
-        end
-      end
-      private
-      def package_dataset_metadata(d, f)
-          o = {}
-          o['filename'] = 'objects/' + f['name']
-          o['dc.title'] = d['title']
-          if !d['description'].empty?
-            o['dc.description'] = d['description']
-          end
-          o['dcterms.created'] = d['created']
-          if !d['available']['year'].empty?
-            o['dcterms.available'] = Puree::Date.iso(d['available'])
-          end
-          o['dc.publisher'] = d['publisher']
-          if !d['doi'].empty?
-            o['dc.identifier'] = d['doi']
-          end
-          if !d['spatial'].empty?
-            o['dcterms.spatial'] = d['spatial']
-          end
-          if !d['temporal']['start']['year'].empty?
-            temporal_range = ''
-            temporal_range << Puree::Date.iso(d['temporal']['start'])
-            if !d['temporal']['end']['year'].empty?
-              temporal_range << '/'
-              temporal_range << Puree::Date.iso(d['temporal']['end'])
-            end
-            o['dcterms.temporal'] = temporal_range
-          end
-          creators = []
-          contributors = []
-          person_types = %w(internal external other)
-          person_types.each do |person_type|
-            d['person'][person_type].each do |i|
-              if i['role'] == 'Creator'
-                creator = i['name']['last'] + ', ' + i['name']['first']
-                creators << creator
-              end
-              if i['role'] == 'Contributor'
-                contributor = i['name']['last'] + ', ' + i['name']['first']
-                contributors << contributor
-              end
-            end
-          end
-          o['dc.creator'] = creators
-          if !contributors.empty?
-            o['dc.contributor'] = contributors
-          end
-          keywords = []
-          d['keyword'].each { |i|
-            keywords << i
-          }
-          if !keywords.empty?
-            o['dc.subject'] = keywords
-          end
-          if !f['license']['name'].empty?
-            o['dcterms.license'] = f['license']['name']
-          end
-          # o['dc.format'] = f['mime']
-          related = []
-          publications = d['publication']
-          publications.each do |i|
-            pub = Puree::Publication.new base_url: @base_url,
-                                         username: @username,
-                                         password: @password,
-                                         basic_auth: @basic_auth
-            pub.find uuid: i['uuid']
-            doi = pub.doi
-            if doi
-              related << doi
-            end
-          end
-          if !related.empty?
-            o['dc.relation'] = related
-          end
-          o
-      end
-    end
-  end
-end