RubyGems - bulkrax - Versions diffs - 3.4.0 → 4.0.0 - Mend

bulkrax 3.4.0 → 4.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (20) hide show

checksums.yaml +4 -4
data/README.md +3 -5
data/app/controllers/bulkrax/exporters_controller.rb +1 -1
data/app/jobs/bulkrax/create_relationships_job.rb +4 -2
data/app/models/bulkrax/entry.rb +0 -2
data/app/models/bulkrax/exporter.rb +15 -2
data/app/models/concerns/bulkrax/dynamic_record_lookup.rb +7 -8
data/app/models/concerns/bulkrax/export_behavior.rb +0 -22
data/app/models/concerns/bulkrax/file_set_entry_behavior.rb +5 -1
data/app/models/concerns/bulkrax/import_behavior.rb +2 -2
data/app/parsers/bulkrax/application_parser.rb +6 -25
data/app/parsers/bulkrax/bagit_parser.rb +69 -160
data/app/parsers/bulkrax/csv_parser.rb +54 -10
data/app/views/bulkrax/exporters/_downloads.html.erb +8 -0
data/app/views/bulkrax/exporters/_form.html.erb +3 -0
data/app/views/bulkrax/exporters/index.html.erb +5 -2
data/app/views/bulkrax/exporters/show.html.erb +4 -7
data/lib/bulkrax/version.rb +1 -1
data/lib/tasks/bulkrax_tasks.rake +28 -4
metadata +7 -6

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: f0ee151bc10b7485eb716463b2c4895165d6df9d73c3dd60813d2eb4de8161d1
-  data.tar.gz: f4e5ddfb5ac602eb20a85850f8ec5a9286a9a668f8c83548dd049db4d91a2a0e
+  metadata.gz: eb56d86ee90ae9e1cf0628504694e1301ab8f2d6b24ffa8fd323f8953a8ee956
+  data.tar.gz: 71056b077e300f27eee3bcccd9d7e2bee2fc7bdf2fc6ba9248b69a29f3994f9c
 SHA512:
-  metadata.gz: a5b029da7feaee11c8a3eb58e0c7150abcb06b6ca08f9d102134d5e1c9eef3049ae85b0084d74934b7504688da7d4958a0dc425c705b42101c1a3ce62a57d0c7
-  data.tar.gz: c807b68265c0d88b9e7faea4f6efdf94b3bcf90e965f5cc97cc259202bc55db976944eea8e4bf99876723f881d29d2fa7004dbd9985f7f8164648df517722133
+  metadata.gz: 05ea49e6f2c5e73cbddacf35dcaf9de499760d7093e3ae8f3ce4ea5ab28e25d065b7877607436fcbe02d21e17c2df940b0224f1ea7d638a602486ce807d99981
+  data.tar.gz: ecdda29924e09793e62684f16ebcd79cd90ab9e8204d011b89a577ca667a644709ff2df5b525ac6c32355426038c05d9fc3d74c6efb20ee7cfab653d9b89b67a

data/README.md CHANGED Viewed

@@ -70,7 +70,7 @@ Bulkrax.setup do |config|
 end
 ```
-The [configuration guide](https://github.com/samvera-labs/bulkrax/wiki/Configuration) provides detailed instructions on the various available configurations.
+The [configuration guide](https://github.com/samvera-labs/bulkrax/wiki/Configuring-Bulkrax) provides detailed instructions on the various available configurations.
 Example:
@@ -120,7 +120,7 @@ It's unlikely that the incoming import data has fields that exactly match those
 By default, a mapping for the OAI parser has been added to map standard oai_dc fields to Hyrax basic_metadata. The other parsers have no default mapping, and will map any incoming fields to Hyrax properties with the same name. Configurations can be added in `config/intializers/bulkrax.rb`
-Configuring field mappings is documented in the [Bulkrax Configuration Guide](https://github.com/samvera-labs/bulkrax/wiki/Configuration).
+Configuring field mappings is documented in the [Bulkrax Configuration Guide](https://github.com/samvera-labs/bulkrax/wiki/Configuring-Bulkrax).
 ## Importing Files
@@ -151,7 +151,7 @@ end
 ## Customizing Bulkrax
-For further information on how to extend and customize Bulkrax, please see the [Bulkrax Customization Guide](https://github.com/samvera-labs/bulkrax/wiki/Customizing).
+For further information on how to extend and customize Bulkrax, please see the [Bulkrax Customization Guide](https://github.com/samvera-labs/bulkrax/wiki/Customizing-Bulkrax).
 ## How it Works
 Once you have Bulkrax installed, you will have access to an easy to use interface with which you are able to create, edit, delete, run, and re-run imports and exports.
@@ -191,8 +191,6 @@ We encourage everyone to help improve this project.  Bug reports and pull reques
 This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the [Contributor Covenant](https://contributor-covenant.org) code of conduct.
-All Contributors should have signed the Samvera Contributor License Agreement (CLA)
 ## Questions
 Questions can be sent to support@notch8.com. Please make sure to include "Bulkrax" in the subject line of your email.

data/app/controllers/bulkrax/exporters_controller.rb CHANGED Viewed

@@ -127,7 +127,7 @@ module Bulkrax
     # Download methods
     def file_path
-      @exporter.exporter_export_zip_path
+      "#{@exporter.exporter_export_zip_path}/#{params['exporter']['exporter_export_zip_files']}"
     end
   end
 end

data/app/jobs/bulkrax/create_relationships_job.rb CHANGED Viewed

@@ -42,10 +42,12 @@ module Bulkrax
       pending_relationships.each do |rel|
         raise ::StandardError, %("#{rel}" needs either a child or a parent to create a relationship) if rel.child_id.nil? || rel.parent_id.nil?
         @child_entry, child_record = find_record(rel.child_id, importer_run_id)
-        child_record.is_a?(::Collection) ? @child_records[:collections] << child_record : @child_records[:works] << child_record
+        if child_record
+          child_record.is_a?(::Collection) ? @child_records[:collections] << child_record : @child_records[:works] << child_record
+        end
       end
-      if (child_records[:collections].blank? && child_records[:works].blank?) || parent_record.blank?
+      if (child_records[:collections].blank? && child_records[:works].blank?) || parent_record.nil?
         reschedule({ parent_identifier: parent_identifier, importer_run_id: importer_run_id })
         return false # stop current job from continuing to run after rescheduling
       end

data/app/models/bulkrax/entry.rb CHANGED Viewed

@@ -4,8 +4,6 @@ module Bulkrax
   # Custom error class for collections_created?
   class CollectionsCreatedError < RuntimeError; end
   class OAIError < RuntimeError; end
-  # TODO: remove when ApplicationParser#bagit_zip_file_size_check is removed
-  class BagitZipError < RuntimeError; end
   class Entry < ApplicationRecord
     include Bulkrax::HasMatchers
     include Bulkrax::ImportBehavior

data/app/models/bulkrax/exporter.rb CHANGED Viewed

@@ -124,9 +124,13 @@ module Bulkrax
     end
     def exporter_export_zip_path
-      @exporter_export_zip_path ||= File.join(parser.base_path('export'), "export_#{self.id}_#{self.exporter_runs.last.id}.zip")
+      @exporter_export_zip_path ||= File.join(parser.base_path('export'), "export_#{self.id}_#{self.exporter_runs.last.id}")
     rescue
-      @exporter_export_zip_path ||= File.join(parser.base_path('export'), "export_#{self.id}_0.zip")
+      @exporter_export_zip_path ||= File.join(parser.base_path('export'), "export_#{self.id}_0")
+    end
+    def exporter_export_zip_files
+      @exporter_export_zip_files ||= Dir["#{exporter_export_zip_path}/**"].map { |zip| Array(zip.split('/').last) }
     end
     def export_properties
@@ -137,5 +141,14 @@ module Bulkrax
     def metadata_only?
       export_type == 'metadata'
     end
+    def sort_zip_files(zip_files)
+      zip_files.sort_by do |item|
+        number = item.split('_').last.match(/\d+/)&.[](0) || 0.to_s
+        sort_number = number.rjust(4, "0")
+        sort_number
+      end
+    end
   end
 end

data/app/models/concerns/bulkrax/dynamic_record_lookup.rb CHANGED Viewed

@@ -12,15 +12,14 @@ module Bulkrax
       # check for our entry in our current importer first
       importer_id = ImporterRun.find(importer_run_id).importer_id
       default_scope = { identifier: identifier, importerexporter_type: 'Bulkrax::Importer' }
-      record = Entry.find_by(default_scope.merge({ importerexporter_id: importer_id })) || Entry.find_by(default_scope)
-      # TODO(alishaevn): discuss whether we are only looking for Collection models here
-      # use ActiveFedora::Base.find(identifier) instead?
-      record ||= ::Collection.where(id: identifier).first # rubocop:disable Rails/FindBy
-      if record.blank?
-        available_work_types.each do |work_type|
-          record ||= work_type.where(id: identifier).first # rubocop:disable Rails/FindBy
-        end
+      begin
+        # the identifier parameter can be a :source_identifier or the id of an object
+        record = Entry.find_by(default_scope.merge({ importerexporter_id: importer_id })) || Entry.find_by(default_scope)
+        record ||= ActiveFedora::Base.find(identifier)
+      # NameError for if ActiveFedora isn't installed
+      rescue NameError, ActiveFedora::ObjectNotFoundError
+        record = nil
       end
       # return the found entry here instead of searching for it again in the CreateRelationshipsJob

data/app/models/concerns/bulkrax/export_behavior.rb CHANGED Viewed

@@ -7,9 +7,6 @@ module Bulkrax
     def build_for_exporter
       build_export_metadata
-      # TODO(alishaevn): determine if the line below is still necessary
-      # the csv and bagit parsers also have write_files methods
-      write_files if export_type == 'full' && !importerexporter.parser_klass.include?('Bagit')
     rescue RSolr::Error::Http, CollectionsCreatedError => e
       raise e
     rescue StandardError => e
@@ -26,25 +23,6 @@ module Bulkrax
       @hyrax_record ||= ActiveFedora::Base.find(self.identifier)
     end
-    def write_files
-      return if hyrax_record.is_a?(Collection)
-      file_sets = hyrax_record.file_set? ? Array.wrap(hyrax_record) : hyrax_record.file_sets
-      file_sets << hyrax_record.thumbnail if hyrax_record.thumbnail.present? && hyrax_record.work? && exporter.include_thumbnails
-      file_sets.each do |fs|
-        path = File.join(exporter_export_path, 'files')
-        FileUtils.mkdir_p(path)
-        file = filename(fs)
-        require 'open-uri'
-        io = open(fs.original_file.uri)
-        next if file.blank?
-        File.open(File.join(path, file), 'wb') do |f|
-          f.write(io.read)
-          f.close
-        end
-      end
-    end
     # Prepend the file_set id to ensure a unique filename and also one that is not longer than 255 characters
     def filename(file_set)
       return if file_set.original_file.blank?

data/app/models/concerns/bulkrax/file_set_entry_behavior.rb CHANGED Viewed

@@ -8,10 +8,14 @@ module Bulkrax
     def add_path_to_file
       parsed_metadata['file'].each_with_index do |filename, i|
-        path_to_file = ::File.join(parser.path_to_files, filename)
+        next if filename.blank?
+        path_to_file = parser.path_to_files(filename: filename)
         parsed_metadata['file'][i] = path_to_file
       end
+      parsed_metadata['file'].delete('')
       raise ::StandardError, "one or more file paths are invalid: #{parsed_metadata['file'].join(', ')}" unless parsed_metadata['file'].map { |file_path| ::File.file?(file_path) }.all?
       parsed_metadata['file']

data/app/models/concerns/bulkrax/import_behavior.rb CHANGED Viewed

@@ -12,8 +12,8 @@ module Bulkrax
           raise CollectionsCreatedError unless collections_created?
           @item = factory.run!
           add_user_to_permission_templates! if self.class.to_s.include?("Collection")
-          parent_jobs if self.parsed_metadata[related_parents_parsed_mapping].present?
-          child_jobs if self.parsed_metadata[related_children_parsed_mapping].present?
+          parent_jobs if self.parsed_metadata[related_parents_parsed_mapping]&.join.present?
+          child_jobs if self.parsed_metadata[related_children_parsed_mapping]&.join.present?
         end
       rescue RSolr::Error::Http, CollectionsCreatedError => e
         raise e

data/app/parsers/bulkrax/application_parser.rb CHANGED Viewed

@@ -247,8 +247,6 @@ module Bulkrax
     def write
       write_files
       zip
-      # uncomment next line to debug for faulty zipping during bagit export
-      bagit_zip_file_size_check if importerexporter.parser_klass.include?('Bagit')
     end
     def unzip(file_to_unzip)
@@ -262,30 +260,13 @@ module Bulkrax
     end
     def zip
-      FileUtils.rm_rf(exporter_export_zip_path)
-      Zip::File.open(exporter_export_zip_path, create: true) do |zip_file|
-        Dir["#{exporter_export_path}/**/**"].each do |file|
-          zip_file.add(file.sub("#{exporter_export_path}/", ''), file)
-        end
-      end
-    end
+      FileUtils.mkdir_p(exporter_export_zip_path)
-    # TODO: remove Entry::BagitZipError as well as this method when we're sure it's not needed
-    def bagit_zip_file_size_check
-      Zip::File.open(exporter_export_zip_path) do |zip_file|
-        zip_file.select { |entry| entry.name.include?('data/') && entry.file? }.each do |zipped_file|
-          Dir["#{exporter_export_path}/**/data/*"].select { |file| file.include?(zipped_file.name) }.each do |file|
-            begin
-              raise BagitZipError, "Invalid Bag, file size mismatch for #{file.sub("#{exporter_export_path}/", '')}" if File.size(file) != zipped_file.size
-            rescue BagitZipError => e
-              matched_entry_ids = importerexporter.entry_ids.select do |id|
-                Bulkrax::Entry.find(id).identifier.include?(zipped_file.name.split('/').first)
-              end
-              matched_entry_ids.each do |entry_id|
-                Bulkrax::Entry.find(entry_id).status_info(e)
-                status_info('Complete (with failures)')
-              end
-            end
+      Dir["#{exporter_export_path}/**"].each do |folder|
+        zip_path = "#{exporter_export_zip_path.split('/').last}_#{folder.split('/').last}.zip"
+        Zip::File.open(File.join("#{exporter_export_zip_path}/#{zip_path}"), create: true) do |zip_file|
+          Dir["#{folder}/**/**"].each do |file|
+            zip_file.add(file.sub("#{folder}/", ''), file)
           end
         end
       end

data/app/parsers/bulkrax/bagit_parser.rb CHANGED Viewed

@@ -1,7 +1,7 @@
 # frozen_string_literal: true
 module Bulkrax
-  class BagitParser < ApplicationParser # rubocop:disable Metrics/ClassLength
+  class BagitParser < CsvParser # rubocop:disable Metrics/ClassLength
     include ExportBehavior
     def self.export_supported?
@@ -20,12 +20,8 @@ module Bulkrax
       rdf_format ? RdfEntry : CsvEntry
     end
-    def collection_entry_class
-      CsvCollectionEntry
-    end
-    def file_set_entry_class
-      CsvFileSetEntry
+    def path_to_files(filename:)
+      @path_to_files ||= Dir.glob(File.join(import_file_path, '**/data', filename)).first
     end
     # Take a random sample of 10 metadata_paths and work out the import fields from that
@@ -36,39 +32,41 @@ module Bulkrax
       end.flatten.compact.uniq
     end
-    # Assume a single metadata record per path
-    # Create an Array of all metadata records, one per file
+    # Create an Array of all metadata records
     def records(_opts = {})
       raise StandardError, 'No BagIt records were found' if bags.blank?
       @records ||= bags.map do |bag|
         path = metadata_path(bag)
         raise StandardError, 'No metadata files were found' if path.blank?
         data = entry_class.read_data(path)
-        data = entry_class.data_for_entry(data, source_identifier, self)
-        data[:file] = bag.bag_files.join('|') unless importerexporter.metadata_only?
-        data
+        get_data(bag, data)
       end
+      @records = @records.flatten
     end
-    # Find or create collections referenced by works
-    # If the import data also contains records for these works, they will be updated
-    # during create works
-    def create_collections
-      collections.each_with_index do |collection, index|
-        next if collection.blank?
-        metadata = {
-          title: [collection],
-          work_identifier => [collection],
-          visibility: 'open',
-          collection_type_gid: Hyrax::CollectionType.find_or_create_default_collection_type.gid
-        }
-        new_entry = find_or_create_entry(collection_entry_class, collection, 'Bulkrax::Importer', metadata)
-        ImportCollectionJob.perform_now(new_entry.id, current_run.id)
-        increment_counters(index, collection: true)
+    def get_data(bag, data)
+      if entry_class == CsvEntry
+        data = data.map do |data_row|
+          record_data = entry_class.data_for_entry(data_row, source_identifier, self)
+          next record_data if importerexporter.metadata_only?
+          record_data[:file] = bag.bag_files.join('|') if ::Hyrax.config.curation_concerns.include? record_data[:model]&.constantize
+          record_data
+        end
+      else
+        data = entry_class.data_for_entry(data, source_identifier, self)
+        data[:file] = bag.bag_files.join('|') unless importerexporter.metadata_only?
       end
+      data
     end
     def create_works
+      entry_class == CsvEntry ? super : create_rdf_works
+    end
+    def create_rdf_works
       records.each_with_index do |record, index|
         next unless record_has_source_identifier(record, index)
         break if limit_reached?(limit, index)
@@ -87,19 +85,6 @@ module Bulkrax
       status_info(e)
     end
-    def collections
-      records.map { |r| r[related_parents_parsed_mapping].split(/\s*[;|]\s*/) if r[related_parents_parsed_mapping].present? }.flatten.compact.uniq
-    end
-    def collections_total
-      collections.size
-    end
-    # TODO: change to differentiate between collection and work records when adding ability to import collection metadata
-    def works_total
-      total
-    end
     def total
       @total = importer.parser_fields['total'] || 0 if importer?
@@ -112,18 +97,6 @@ module Bulkrax
       @total = 0
     end
-    def extra_filters
-      output = ""
-      if importerexporter.start_date.present?
-        start_dt = importerexporter.start_date.to_datetime.strftime('%FT%TZ')
-        finish_dt = importerexporter.finish_date.present? ? importerexporter.finish_date.to_datetime.end_of_day.strftime('%FT%TZ') : "NOW"
-        output += " AND system_modified_dtsi:[#{start_dt} TO #{finish_dt}]"
-      end
-      output += importerexporter.work_visibility.present? ? " AND visibility_ssi:#{importerexporter.work_visibility}" : ""
-      output += importerexporter.workflow_status.present? ? " AND workflow_state_name_ssim:#{importerexporter.workflow_status}" : ""
-      output
-    end
     def current_record_ids
       @work_ids = []
       @collection_ids = []
@@ -140,78 +113,39 @@ module Bulkrax
       when 'importer'
         set_ids_for_exporting_from_importer
       end
-      @work_ids + @collection_ids + @file_set_ids
-    end
-    # Set the following instance variables: @work_ids, @collection_ids, @file_set_ids
-    # @see #current_record_ids
-    def set_ids_for_exporting_from_importer
-      entry_ids = Importer.find(importerexporter.export_source).entries.pluck(:id)
-      complete_statuses = Status.latest_by_statusable
-                                .includes(:statusable)
-                                .where('bulkrax_statuses.statusable_id IN (?) AND bulkrax_statuses.statusable_type = ? AND status_message = ?', entry_ids, 'Bulkrax::Entry', 'Complete')
-      complete_entry_identifiers = complete_statuses.map { |s| s.statusable&.identifier&.gsub(':', '\:') }
-      extra_filters = extra_filters.presence || '*:*'
-      { :@work_ids => ::Hyrax.config.curation_concerns, :@collection_ids => [::Collection], :@file_set_ids => [::FileSet] }.each do |instance_var, models_to_search|
-        instance_variable_set(instance_var, ActiveFedora::SolrService.post(
-          extra_filters.to_s,
-          fq: [
-            %(#{::Solrizer.solr_name(work_identifier)}:("#{complete_entry_identifiers.join('" OR "')}")),
-            "has_model_ssim:(#{models_to_search.join(' OR ')})"
-          ],
-          fl: 'id',
-          rows: 2_000_000_000
-        )['response']['docs'].map { |obj| obj['id'] })
-      end
+      find_child_file_sets(@work_ids) if importerexporter.export_from == 'collection' || importerexporter.export_from == 'worktype'
+      @work_ids + @collection_ids + @file_set_ids
     end
     # export methods
-    def create_new_entries
-      current_record_ids.each_with_index do |id, index|
-        break if limit_reached?(limit, index)
-        this_entry_class = if @collection_ids.include?(id)
-                             collection_entry_class
-                           elsif @file_set_ids.include?(id)
-                             file_set_entry_class
-                           else
-                             entry_class
-                           end
-        new_entry = find_or_create_entry(this_entry_class, id, 'Bulkrax::Exporter')
-        begin
-          entry = ExportWorkJob.perform_now(new_entry.id, current_run.id)
-        rescue => e
-          Rails.logger.info("#{e.message} was detected during export")
-        end
-        self.headers |= entry.parsed_metadata.keys if entry
-      end
-    end
-    alias create_from_collection create_new_entries
-    alias create_from_importer create_new_entries
-    alias create_from_worktype create_new_entries
-    alias create_from_all create_new_entries
     # rubocop:disable Metrics/MethodLength, Metrics/AbcSize
     def write_files
       require 'open-uri'
       require 'socket'
+      folder_count = 1
+      records_in_folder = 0
       importerexporter.entries.where(identifier: current_record_ids)[0..limit || total].each do |entry|
-        work = ActiveFedora::Base.find(entry.identifier)
-        next unless Hyrax.config.curation_concerns.include?(work.class)
-        bag = BagIt::Bag.new setup_bagit_folder(entry.identifier)
+        record = ActiveFedora::Base.find(entry.identifier)
+        next unless Hyrax.config.curation_concerns.include?(record.class)
         bag_entries = [entry]
+        file_set_entries = Bulkrax::CsvFileSetEntry.where(importerexporter_id: importerexporter.id).where("parsed_metadata LIKE '%#{record.id}%'")
+        file_set_entries.each { |fse| bag_entries << fse }
-        work.file_sets.each do |fs|
-          if @file_set_ids.present?
-            file_set_entry = Bulkrax::CsvFileSetEntry.where("parsed_metadata LIKE '%#{fs.id}%'").first
-            bag_entries << file_set_entry unless file_set_entry.nil?
-          end
+        records_in_folder += bag_entries.count
+        if records_in_folder > records_split_count
+          folder_count += 1
+          records_in_folder = bag_entries.count
+        end
+        bag ||= BagIt::Bag.new setup_bagit_folder(folder_count, entry.identifier)
+        record.file_sets.each do |fs|
           file_name = filename(fs)
           next if file_name.blank?
           io = open(fs.original_file.uri)
@@ -226,17 +160,21 @@ module Bulkrax
           end
         end
-        CSV.open(setup_csv_metadata_export_file(entry.identifier), "w", headers: export_headers, write_headers: true) do |csv|
+        CSV.open(setup_csv_metadata_export_file(folder_count, entry.identifier), "w", headers: export_headers, write_headers: true) do |csv|
           bag_entries.each { |csv_entry| csv << csv_entry.parsed_metadata }
         end
-        write_triples(entry)
+        write_triples(folder_count, entry)
         bag.manifest!(algo: 'sha256')
       end
     end
     # rubocop:enable Metrics/MethodLength, Metrics/AbcSize
-    def setup_csv_metadata_export_file(id)
-      File.join(importerexporter.exporter_export_path, id, 'metadata.csv')
+    def setup_csv_metadata_export_file(folder_count, id)
+      path = File.join(importerexporter.exporter_export_path, folder_count.to_s)
+      FileUtils.mkdir_p(path) unless File.exist?(path)
+      File.join(path, id, 'metadata.csv')
     end
     def key_allowed(key)
@@ -245,66 +183,31 @@ module Bulkrax
         key != source_identifier.to_s
     end
-    # All possible column names
-    def export_headers
-      headers = sort_headers(self.headers)
-      # we don't want access_control_id exported and we want file at the end
-      headers.delete('access_control_id') if headers.include?('access_control_id')
+    def setup_triple_metadata_export_file(folder_count, id)
+      path = File.join(importerexporter.exporter_export_path, folder_count.to_s)
+      FileUtils.mkdir_p(path) unless File.exist?(path)
-      # add the headers below at the beginning or end to maintain the preexisting export behavior
-      headers.prepend('model')
-      headers.prepend(source_identifier.to_s)
-      headers.prepend('id')
-      headers.uniq
+      File.join(path, id, 'metadata.nt')
     end
-    def object_names
-      return @object_names if @object_names
-      @object_names = mapping.values.map { |value| value['object'] }
-      @object_names.uniq!.delete(nil)
-      @object_names
-    end
-    def sort_headers(headers)
-      # converting headers like creator_name_1 to creator_1_name so they get sorted by numerical order
-      # while keeping objects grouped together
-      headers.sort_by do |item|
-        number = item.match(/\d+/)&.[](0) || 0.to_s
-        sort_number = number.rjust(4, "0")
-        object_prefix = object_names.detect { |o| item.match(/^#{o}/) } || item
-        remainder = item.gsub(/^#{object_prefix}_/, '').gsub(/_#{number}/, '')
-        "#{object_prefix}_#{sort_number}_#{remainder}"
-      end
-    end
+    def setup_bagit_folder(folder_count, id)
+      path = File.join(importerexporter.exporter_export_path, folder_count.to_s)
+      FileUtils.mkdir_p(path) unless File.exist?(path)
-    def setup_triple_metadata_export_file(id)
-      File.join(importerexporter.exporter_export_path, id, 'metadata.nt')
+      File.join(path, id)
     end
-    def setup_bagit_folder(id)
-      File.join(importerexporter.exporter_export_path, id)
-    end
-    def write_triples(e)
+    def write_triples(folder_count, e)
       sd = SolrDocument.find(e.identifier)
       return if sd.nil?
       req = ActionDispatch::Request.new({ 'HTTP_HOST' => Socket.gethostname })
       rdf = Hyrax::GraphExporter.new(sd, req).fetch.dump(:ntriples)
-      File.open(setup_triple_metadata_export_file(e.identifier), "w") do |triples|
+      File.open(setup_triple_metadata_export_file(folder_count, e.identifier), "w") do |triples|
         triples.write(rdf)
       end
     end
-    def required_elements?(keys)
-      return if keys.blank?
-      !required_elements.map { |el| keys.map(&:to_s).include?(el) }.include?(false)
-    end
     # @todo - investigate getting directory structure
     # @todo - investigate using perform_later, and having the importer check for
     #   DownloadCloudFileJob before it starts
@@ -355,5 +258,11 @@ module Bulkrax
       return nil unless bag.valid?
       bag
     end
+    # use the version of this method from the application parser instead
+    def real_import_file_path
+      return importer_unzip_path if file? && zip?
+      parser_fields['import_file_path']
+    end
   end
 end

data/app/parsers/bulkrax/csv_parser.rb CHANGED Viewed

@@ -4,6 +4,7 @@ require 'csv'
 module Bulkrax
   class CsvParser < ApplicationParser # rubocop:disable Metrics/ClassLength
     include ErroredEntries
+    include ExportBehavior
     attr_writer :collections, :file_sets, :works
     def self.export_supported?
@@ -207,6 +208,13 @@ module Bulkrax
       @work_ids + @collection_ids + @file_set_ids
     end
+    # find the related file set ids so entries can be made for export
+    def find_child_file_sets(work_ids)
+      work_ids.each do |id|
+        ActiveFedora::Base.find(id).file_set_ids.each { |fs_id| @file_set_ids << fs_id }
+      end
+    end
     # Set the following instance variables: @work_ids, @collection_ids, @file_set_ids
     # @see #current_record_ids
     def set_ids_for_exporting_from_importer
@@ -272,8 +280,8 @@ module Bulkrax
       CsvFileSetEntry
     end
-    # See https://stackoverflow.com/questions/2650517/count-the-number-of-lines-in-a-file-without-reading-entire-file-into-memory
-    #   Changed to grep as wc -l counts blank lines, and ignores the final unescaped line (which may or may not contain data)
+    # TODO: figure out why using the version of this method that's in the bagit parser
+    # breaks specs for the "if importer?" line
     def total
       @total = importer.parser_fields['total'] || 0 if importer?
       @total = limit || current_record_ids.count if exporter?
@@ -283,6 +291,10 @@ module Bulkrax
       @total = 0
     end
+    def records_split_count
+      1000
+    end
     # @todo - investigate getting directory structure
     # @todo - investigate using perform_later, and having the importer check for
     #   DownloadCloudFileJob before it starts
@@ -307,9 +319,37 @@ module Bulkrax
     # export methods
     def write_files
-      CSV.open(setup_export_file, "w", headers: export_headers, write_headers: true) do |csv|
-        importerexporter.entries.where(identifier: current_record_ids)[0..limit || total].each do |e|
-          csv << e.parsed_metadata
+      require 'open-uri'
+      folder_count = 0
+      importerexporter.entries.where(identifier: current_record_ids)[0..limit || total].in_groups_of(records_split_count, false) do |group|
+        folder_count += 1
+        CSV.open(setup_export_file(folder_count), "w", headers: export_headers, write_headers: true) do |csv|
+          group.each do |entry|
+            csv << entry.parsed_metadata
+            next if importerexporter.metadata_only? || entry.type == 'Bulkrax::CsvCollectionEntry'
+            store_files(entry.identifier, folder_count.to_s)
+          end
+        end
+      end
+    end
+    def store_files(identifier, folder_count)
+      record = ActiveFedora::Base.find(identifier)
+      file_sets = record.file_set? ? Array.wrap(record) : record.file_sets
+      file_sets << record.thumbnail if exporter.include_thumbnails && record.thumbnail.present? && record.work?
+      file_sets.each do |fs|
+        path = File.join(exporter_export_path, folder_count, 'files')
+        FileUtils.mkdir_p(path) unless File.exist? path
+        file = filename(fs)
+        io = open(fs.original_file.uri)
+        next if file.blank?
+        File.open(File.join(path, file), 'wb') do |f|
+          f.write(io.read)
+          f.close
         end
       end
     end
@@ -356,8 +396,11 @@ module Bulkrax
     end
     # in the parser as it is specific to the format
-    def setup_export_file
-      File.join(importerexporter.exporter_export_path, "export_#{importerexporter.export_source}_from_#{importerexporter.export_from}.csv")
+    def setup_export_file(folder_count)
+      path = File.join(importerexporter.exporter_export_path, folder_count.to_s)
+      FileUtils.mkdir_p(path) unless File.exist?(path)
+      File.join(path, "export_#{importerexporter.export_source}_from_#{importerexporter.export_from}_#{folder_count}.csv")
     end
     # Retrieve file paths for [:file] mapping in records
@@ -382,10 +425,11 @@ module Bulkrax
     end
     # Retrieve the path where we expect to find the files
-    def path_to_files
+    def path_to_files(**args)
+      filename = args.fetch(:filename, '')
       @path_to_files ||= File.join(
-        zip? ? importer_unzip_path : File.dirname(import_file_path),
-        'files'
+        zip? ? importer_unzip_path : File.dirname(import_file_path), 'files', filename
       )
     end

data/app/views/bulkrax/exporters/_downloads.html.erb ADDED Viewed

@@ -0,0 +1,8 @@
+<%= form.select :exporter_export_zip_files,
+  exporter.sort_zip_files(form.object.exporter_export_zip_files.flatten),
+  {},
+  {
+    class: 'btn btn-default form-control',
+    style: 'width: 200px'
+  }
+%>

data/app/views/bulkrax/exporters/_form.html.erb CHANGED Viewed

@@ -29,6 +29,7 @@
   <%= form.input :export_source_importer,
     label: t('bulkrax.exporter.labels.importer'),
+    # required: true,
     prompt: 'Select from the list',
     label_html: { class: 'importer export-source-option hidden' },
     input_html: { class: 'importer export-source-option hidden' },
@@ -37,6 +38,7 @@
   <%= form.input :export_source_collection,
     prompt: 'Start typing ...',
     label: t('bulkrax.exporter.labels.collection'),
+    # required: true,
     placeholder: @collection&.title&.first,
     label_html: { class: 'collection export-source-option hidden' },
     input_html: {
@@ -50,6 +52,7 @@
   <%= form.input :export_source_worktype,
     label: t('bulkrax.exporter.labels.worktype'),
+    # required: true,
     prompt: 'Select from the list',
     label_html: { class: 'worktype export-source-option hidden' },
     input_html: { class: 'worktype export-source-option hidden' },

data/app/views/bulkrax/exporters/index.html.erb CHANGED Viewed

@@ -21,7 +21,7 @@
               <th scope="col">Name</th>
               <th scope="col">Status</th>
               <th scope="col">Date Exported</th>
-              <th scope="col"></th>
+              <th scope="col">Downloadable Files</th>
               <th scope="col"></th>
               <th scope="col"></th>
               <th scope="col"></th>
@@ -35,7 +35,10 @@
                 <td><%= exporter.created_at %></td>
                 <td>
                   <% if File.exist?(exporter.exporter_export_zip_path) %>
-                    <%= link_to raw('<span class="glyphicon glyphicon-download"></span>'), exporter_download_path(exporter) %>
+                    <%= simple_form_for(exporter, method: :get, url: exporter_download_path(exporter)) do |form| %>
+                      <%= render 'downloads', exporter: exporter, form: form %>
+                      <%= form.button :submit, value: 'Download', data: { disable_with: false } %>
+                    <% end %>
                   <% end%>
                 </td>
                 <td><%= link_to raw('<span class="glyphicon glyphicon-info-sign"></span>'), exporter_path(exporter) %></td>

data/app/views/bulkrax/exporters/show.html.erb CHANGED Viewed

@@ -8,10 +8,11 @@
   <div class='panel-body'>
     <% if File.exist?(@exporter.exporter_export_zip_path) %>
-      <p class='bulkrax-p-align'>
+      <%= simple_form_for @exporter, method: :get, url: exporter_download_path(@exporter), html: { class: 'form-inline bulkrax-p-align' } do |form| %>
         <strong>Download:</strong>
-        <%= link_to raw('<span class="glyphicon glyphicon-download"></span>'), exporter_download_path(@exporter) %>
-      </p>
+        <%= render 'downloads', exporter: @exporter, form: form %>
+        <%= form.button :submit, value: 'Download', data: { disable_with: false } %>
+      <% end %>
     <% end %>
     <p class='bulkrax-p-align'>
@@ -135,10 +136,6 @@
       <%= page_entries_info(@work_entries) %><br>
       <%= paginate(@work_entries, param_name: :work_entries_page) %>
       <br>
-      <% if File.exist?(@exporter.exporter_export_zip_path) %>
-        <%= link_to 'Download', exporter_download_path(@exporter) %>
-        |
-      <% end %>
       <%= link_to 'Edit', edit_exporter_path(@exporter) %>
       |
       <%= link_to 'Back', exporters_path %>

data/lib/bulkrax/version.rb CHANGED Viewed

@@ -1,5 +1,5 @@
 # frozen_string_literal: true
 module Bulkrax
-  VERSION = '3.4.0'
+  VERSION = '4.0.0'
 end

data/lib/tasks/bulkrax_tasks.rake CHANGED Viewed

@@ -1,6 +1,30 @@
 # frozen_string_literal: true
-# desc "Explaining what the task does"
-# task :bulkrax do
-#   # Task goes here
-# end
+namespace :bulkrax do
+  desc "Remove old exported zips and create new ones with the new file structure"
+  task rerun_all_exporters: :environment do
+    if defined?(::Hyku)
+      Account.find_each do |account|
+        puts "=============== updating #{account.name} ============"
+        next if account.name == "search"
+        switch!(account)
+        rerun_exporters_and_delete_zips
+        puts "=============== finished updating #{account.name} ============"
+      end
+    else
+      rerun_exporters_and_delete_zips
+    end
+  end
+  def rerun_exporters_and_delete_zips
+    begin
+      Bulkrax::Exporter.all.each { |e| Bulkrax::ExporterJob.perform_later(e.id) }
+    rescue => e
+      puts "(#{e.message})"
+    end
+    Dir["tmp/exports/**.zip"].each { |zip_path| FileUtils.rm_rf(zip_path) }
+  end
+end

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: bulkrax
 version: !ruby/object:Gem::Version
-  version: 3.4.0
+  version: 4.0.0
 platform: ruby
 authors:
 - Rob Kaufman
-autorequire:
+autorequire:
 bindir: bin
 cert_chain: []
-date: 2022-06-22 00:00:00.000000000 Z
+date: 2022-07-15 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: rails
@@ -331,6 +331,7 @@ files:
 - app/views/bulkrax/entries/_parsed_metadata.html.erb
 - app/views/bulkrax/entries/_raw_metadata.html.erb
 - app/views/bulkrax/entries/show.html.erb
+- app/views/bulkrax/exporters/_downloads.html.erb
 - app/views/bulkrax/exporters/_form.html.erb
 - app/views/bulkrax/exporters/edit.html.erb
 - app/views/bulkrax/exporters/index.html.erb
@@ -404,7 +405,7 @@ homepage: https://github.com/samvera-labs/bulkrax
 licenses:
 - Apache-2.0
 metadata: {}
-post_install_message:
+post_install_message:
 rdoc_options: []
 require_paths:
 - lib
@@ -419,8 +420,8 @@ required_rubygems_version: !ruby/object:Gem::Requirement
     - !ruby/object:Gem::Version
       version: '0'
 requirements: []
-rubygems_version: 3.1.4
-signing_key:
+rubygems_version: 3.0.3
+signing_key:
 specification_version: 4
 summary: Import and export tool for Hyrax and Hyku
 test_files: []