bulkrax 3.5.1 → 4.0.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 6d9849522210958f0a64f21b8c5737f9cf0e0a7f7b9dc7e35d91cb429e5b8fb5
4
- data.tar.gz: 4be9893422301b2cbf10af36ccf90898d2bfbc816ed832dd7ad686b2d0b30fcc
3
+ metadata.gz: eb56d86ee90ae9e1cf0628504694e1301ab8f2d6b24ffa8fd323f8953a8ee956
4
+ data.tar.gz: 71056b077e300f27eee3bcccd9d7e2bee2fc7bdf2fc6ba9248b69a29f3994f9c
5
5
  SHA512:
6
- metadata.gz: ae33b239d56a10d82616c7966c47a599d69a6148641b507dd7bbe5e94509e9b3afd633011fe0028e2b1949c1677ad8cb1e456478a2074fd536d9b262913b9460
7
- data.tar.gz: af4ba8cc3d9d3f68e931bea1bc4023c57789c21716108416c760f303b931d2bca9b38dbd6fa704e235e286b03bd38ce214346f3b35039036326cad978879c934
6
+ metadata.gz: 05ea49e6f2c5e73cbddacf35dcaf9de499760d7093e3ae8f3ce4ea5ab28e25d065b7877607436fcbe02d21e17c2df940b0224f1ea7d638a602486ce807d99981
7
+ data.tar.gz: ecdda29924e09793e62684f16ebcd79cd90ab9e8204d011b89a577ca667a644709ff2df5b525ac6c32355426038c05d9fc3d74c6efb20ee7cfab653d9b89b67a
data/README.md CHANGED
@@ -70,7 +70,7 @@ Bulkrax.setup do |config|
70
70
  end
71
71
  ```
72
72
 
73
- The [configuration guide](https://github.com/samvera-labs/bulkrax/wiki/Configuration) provides detailed instructions on the various available configurations.
73
+ The [configuration guide](https://github.com/samvera-labs/bulkrax/wiki/Configuring-Bulkrax) provides detailed instructions on the various available configurations.
74
74
 
75
75
  Example:
76
76
 
@@ -120,7 +120,7 @@ It's unlikely that the incoming import data has fields that exactly match those
120
120
 
121
121
  By default, a mapping for the OAI parser has been added to map standard oai_dc fields to Hyrax basic_metadata. The other parsers have no default mapping, and will map any incoming fields to Hyrax properties with the same name. Configurations can be added in `config/intializers/bulkrax.rb`
122
122
 
123
- Configuring field mappings is documented in the [Bulkrax Configuration Guide](https://github.com/samvera-labs/bulkrax/wiki/Configuration).
123
+ Configuring field mappings is documented in the [Bulkrax Configuration Guide](https://github.com/samvera-labs/bulkrax/wiki/Configuring-Bulkrax).
124
124
 
125
125
  ## Importing Files
126
126
 
@@ -151,7 +151,7 @@ end
151
151
 
152
152
  ## Customizing Bulkrax
153
153
 
154
- For further information on how to extend and customize Bulkrax, please see the [Bulkrax Customization Guide](https://github.com/samvera-labs/bulkrax/wiki/Customizing).
154
+ For further information on how to extend and customize Bulkrax, please see the [Bulkrax Customization Guide](https://github.com/samvera-labs/bulkrax/wiki/Customizing-Bulkrax).
155
155
 
156
156
  ## How it Works
157
157
  Once you have Bulkrax installed, you will have access to an easy to use interface with which you are able to create, edit, delete, run, and re-run imports and exports.
@@ -191,8 +191,6 @@ We encourage everyone to help improve this project. Bug reports and pull reques
191
191
 
192
192
  This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the [Contributor Covenant](https://contributor-covenant.org) code of conduct.
193
193
 
194
- All Contributors should have signed the Samvera Contributor License Agreement (CLA)
195
-
196
194
  ## Questions
197
195
  Questions can be sent to support@notch8.com. Please make sure to include "Bulkrax" in the subject line of your email.
198
196
 
@@ -127,7 +127,7 @@ module Bulkrax
127
127
  # Download methods
128
128
 
129
129
  def file_path
130
- @exporter.exporter_export_zip_path
130
+ "#{@exporter.exporter_export_zip_path}/#{params['exporter']['exporter_export_zip_files']}"
131
131
  end
132
132
  end
133
133
  end
@@ -4,8 +4,6 @@ module Bulkrax
4
4
  # Custom error class for collections_created?
5
5
  class CollectionsCreatedError < RuntimeError; end
6
6
  class OAIError < RuntimeError; end
7
- # TODO: remove when ApplicationParser#bagit_zip_file_size_check is removed
8
- class BagitZipError < RuntimeError; end
9
7
  class Entry < ApplicationRecord
10
8
  include Bulkrax::HasMatchers
11
9
  include Bulkrax::ImportBehavior
@@ -124,9 +124,13 @@ module Bulkrax
124
124
  end
125
125
 
126
126
  def exporter_export_zip_path
127
- @exporter_export_zip_path ||= File.join(parser.base_path('export'), "export_#{self.id}_#{self.exporter_runs.last.id}.zip")
127
+ @exporter_export_zip_path ||= File.join(parser.base_path('export'), "export_#{self.id}_#{self.exporter_runs.last.id}")
128
128
  rescue
129
- @exporter_export_zip_path ||= File.join(parser.base_path('export'), "export_#{self.id}_0.zip")
129
+ @exporter_export_zip_path ||= File.join(parser.base_path('export'), "export_#{self.id}_0")
130
+ end
131
+
132
+ def exporter_export_zip_files
133
+ @exporter_export_zip_files ||= Dir["#{exporter_export_zip_path}/**"].map { |zip| Array(zip.split('/').last) }
130
134
  end
131
135
 
132
136
  def export_properties
@@ -137,5 +141,14 @@ module Bulkrax
137
141
  def metadata_only?
138
142
  export_type == 'metadata'
139
143
  end
144
+
145
+ def sort_zip_files(zip_files)
146
+ zip_files.sort_by do |item|
147
+ number = item.split('_').last.match(/\d+/)&.[](0) || 0.to_s
148
+ sort_number = number.rjust(4, "0")
149
+
150
+ sort_number
151
+ end
152
+ end
140
153
  end
141
154
  end
@@ -7,9 +7,6 @@ module Bulkrax
7
7
 
8
8
  def build_for_exporter
9
9
  build_export_metadata
10
- # TODO(alishaevn): determine if the line below is still necessary
11
- # the csv and bagit parsers also have write_files methods
12
- write_files if export_type == 'full' && !importerexporter.parser_klass.include?('Bagit')
13
10
  rescue RSolr::Error::Http, CollectionsCreatedError => e
14
11
  raise e
15
12
  rescue StandardError => e
@@ -26,25 +23,6 @@ module Bulkrax
26
23
  @hyrax_record ||= ActiveFedora::Base.find(self.identifier)
27
24
  end
28
25
 
29
- def write_files
30
- return if hyrax_record.is_a?(Collection)
31
-
32
- file_sets = hyrax_record.file_set? ? Array.wrap(hyrax_record) : hyrax_record.file_sets
33
- file_sets << hyrax_record.thumbnail if hyrax_record.thumbnail.present? && hyrax_record.work? && exporter.include_thumbnails
34
- file_sets.each do |fs|
35
- path = File.join(exporter_export_path, 'files')
36
- FileUtils.mkdir_p(path)
37
- file = filename(fs)
38
- require 'open-uri'
39
- io = open(fs.original_file.uri)
40
- next if file.blank?
41
- File.open(File.join(path, file), 'wb') do |f|
42
- f.write(io.read)
43
- f.close
44
- end
45
- end
46
- end
47
-
48
26
  # Prepend the file_set id to ensure a unique filename and also one that is not longer than 255 characters
49
27
  def filename(file_set)
50
28
  return if file_set.original_file.blank?
@@ -247,8 +247,6 @@ module Bulkrax
247
247
  def write
248
248
  write_files
249
249
  zip
250
- # uncomment next line to debug for faulty zipping during bagit export
251
- bagit_zip_file_size_check if importerexporter.parser_klass.include?('Bagit')
252
250
  end
253
251
 
254
252
  def unzip(file_to_unzip)
@@ -262,30 +260,13 @@ module Bulkrax
262
260
  end
263
261
 
264
262
  def zip
265
- FileUtils.rm_rf(exporter_export_zip_path)
266
- Zip::File.open(exporter_export_zip_path, create: true) do |zip_file|
267
- Dir["#{exporter_export_path}/**/**"].each do |file|
268
- zip_file.add(file.sub("#{exporter_export_path}/", ''), file)
269
- end
270
- end
271
- end
263
+ FileUtils.mkdir_p(exporter_export_zip_path)
272
264
 
273
- # TODO: remove Entry::BagitZipError as well as this method when we're sure it's not needed
274
- def bagit_zip_file_size_check
275
- Zip::File.open(exporter_export_zip_path) do |zip_file|
276
- zip_file.select { |entry| entry.name.include?('data/') && entry.file? }.each do |zipped_file|
277
- Dir["#{exporter_export_path}/**/data/*"].select { |file| file.include?(zipped_file.name) }.each do |file|
278
- begin
279
- raise BagitZipError, "Invalid Bag, file size mismatch for #{file.sub("#{exporter_export_path}/", '')}" if File.size(file) != zipped_file.size
280
- rescue BagitZipError => e
281
- matched_entry_ids = importerexporter.entry_ids.select do |id|
282
- Bulkrax::Entry.find(id).identifier.include?(zipped_file.name.split('/').first)
283
- end
284
- matched_entry_ids.each do |entry_id|
285
- Bulkrax::Entry.find(entry_id).status_info(e)
286
- status_info('Complete (with failures)')
287
- end
288
- end
265
+ Dir["#{exporter_export_path}/**"].each do |folder|
266
+ zip_path = "#{exporter_export_zip_path.split('/').last}_#{folder.split('/').last}.zip"
267
+ Zip::File.open(File.join("#{exporter_export_zip_path}/#{zip_path}"), create: true) do |zip_file|
268
+ Dir["#{folder}/**/**"].each do |file|
269
+ zip_file.add(file.sub("#{folder}/", ''), file)
289
270
  end
290
271
  end
291
272
  end
@@ -113,6 +113,9 @@ module Bulkrax
113
113
  when 'importer'
114
114
  set_ids_for_exporting_from_importer
115
115
  end
116
+
117
+ find_child_file_sets(@work_ids) if importerexporter.export_from == 'collection' || importerexporter.export_from == 'worktype'
118
+
116
119
  @work_ids + @collection_ids + @file_set_ids
117
120
  end
118
121
 
@@ -122,18 +125,27 @@ module Bulkrax
122
125
  def write_files
123
126
  require 'open-uri'
124
127
  require 'socket'
128
+
129
+ folder_count = 1
130
+ records_in_folder = 0
131
+
125
132
  importerexporter.entries.where(identifier: current_record_ids)[0..limit || total].each do |entry|
126
133
  record = ActiveFedora::Base.find(entry.identifier)
127
134
  next unless Hyrax.config.curation_concerns.include?(record.class)
128
- bag = BagIt::Bag.new setup_bagit_folder(entry.identifier)
135
+
129
136
  bag_entries = [entry]
137
+ file_set_entries = Bulkrax::CsvFileSetEntry.where(importerexporter_id: importerexporter.id).where("parsed_metadata LIKE '%#{record.id}%'")
138
+ file_set_entries.each { |fse| bag_entries << fse }
130
139
 
131
- record.file_sets.each do |fs|
132
- if @file_set_ids.present?
133
- file_set_entry = Bulkrax::CsvFileSetEntry.where("parsed_metadata LIKE '%#{fs.id}%'").first
134
- bag_entries << file_set_entry unless file_set_entry.nil?
135
- end
140
+ records_in_folder += bag_entries.count
141
+ if records_in_folder > records_split_count
142
+ folder_count += 1
143
+ records_in_folder = bag_entries.count
144
+ end
145
+
146
+ bag ||= BagIt::Bag.new setup_bagit_folder(folder_count, entry.identifier)
136
147
 
148
+ record.file_sets.each do |fs|
137
149
  file_name = filename(fs)
138
150
  next if file_name.blank?
139
151
  io = open(fs.original_file.uri)
@@ -148,17 +160,21 @@ module Bulkrax
148
160
  end
149
161
  end
150
162
 
151
- CSV.open(setup_csv_metadata_export_file(entry.identifier), "w", headers: export_headers, write_headers: true) do |csv|
163
+ CSV.open(setup_csv_metadata_export_file(folder_count, entry.identifier), "w", headers: export_headers, write_headers: true) do |csv|
152
164
  bag_entries.each { |csv_entry| csv << csv_entry.parsed_metadata }
153
165
  end
154
- write_triples(entry)
166
+
167
+ write_triples(folder_count, entry)
155
168
  bag.manifest!(algo: 'sha256')
156
169
  end
157
170
  end
158
171
  # rubocop:enable Metrics/MethodLength, Metrics/AbcSize
159
172
 
160
- def setup_csv_metadata_export_file(id)
161
- File.join(importerexporter.exporter_export_path, id, 'metadata.csv')
173
+ def setup_csv_metadata_export_file(folder_count, id)
174
+ path = File.join(importerexporter.exporter_export_path, folder_count.to_s)
175
+ FileUtils.mkdir_p(path) unless File.exist?(path)
176
+
177
+ File.join(path, id, 'metadata.csv')
162
178
  end
163
179
 
164
180
  def key_allowed(key)
@@ -167,21 +183,27 @@ module Bulkrax
167
183
  key != source_identifier.to_s
168
184
  end
169
185
 
170
- def setup_triple_metadata_export_file(id)
171
- File.join(importerexporter.exporter_export_path, id, 'metadata.nt')
186
+ def setup_triple_metadata_export_file(folder_count, id)
187
+ path = File.join(importerexporter.exporter_export_path, folder_count.to_s)
188
+ FileUtils.mkdir_p(path) unless File.exist?(path)
189
+
190
+ File.join(path, id, 'metadata.nt')
172
191
  end
173
192
 
174
- def setup_bagit_folder(id)
175
- File.join(importerexporter.exporter_export_path, id)
193
+ def setup_bagit_folder(folder_count, id)
194
+ path = File.join(importerexporter.exporter_export_path, folder_count.to_s)
195
+ FileUtils.mkdir_p(path) unless File.exist?(path)
196
+
197
+ File.join(path, id)
176
198
  end
177
199
 
178
- def write_triples(e)
200
+ def write_triples(folder_count, e)
179
201
  sd = SolrDocument.find(e.identifier)
180
202
  return if sd.nil?
181
203
 
182
204
  req = ActionDispatch::Request.new({ 'HTTP_HOST' => Socket.gethostname })
183
205
  rdf = Hyrax::GraphExporter.new(sd, req).fetch.dump(:ntriples)
184
- File.open(setup_triple_metadata_export_file(e.identifier), "w") do |triples|
206
+ File.open(setup_triple_metadata_export_file(folder_count, e.identifier), "w") do |triples|
185
207
  triples.write(rdf)
186
208
  end
187
209
  end
@@ -4,6 +4,7 @@ require 'csv'
4
4
  module Bulkrax
5
5
  class CsvParser < ApplicationParser # rubocop:disable Metrics/ClassLength
6
6
  include ErroredEntries
7
+ include ExportBehavior
7
8
  attr_writer :collections, :file_sets, :works
8
9
 
9
10
  def self.export_supported?
@@ -207,6 +208,13 @@ module Bulkrax
207
208
  @work_ids + @collection_ids + @file_set_ids
208
209
  end
209
210
 
211
+ # find the related file set ids so entries can be made for export
212
+ def find_child_file_sets(work_ids)
213
+ work_ids.each do |id|
214
+ ActiveFedora::Base.find(id).file_set_ids.each { |fs_id| @file_set_ids << fs_id }
215
+ end
216
+ end
217
+
210
218
  # Set the following instance variables: @work_ids, @collection_ids, @file_set_ids
211
219
  # @see #current_record_ids
212
220
  def set_ids_for_exporting_from_importer
@@ -283,6 +291,10 @@ module Bulkrax
283
291
  @total = 0
284
292
  end
285
293
 
294
+ def records_split_count
295
+ 1000
296
+ end
297
+
286
298
  # @todo - investigate getting directory structure
287
299
  # @todo - investigate using perform_later, and having the importer check for
288
300
  # DownloadCloudFileJob before it starts
@@ -307,9 +319,37 @@ module Bulkrax
307
319
  # export methods
308
320
 
309
321
  def write_files
310
- CSV.open(setup_export_file, "w", headers: export_headers, write_headers: true) do |csv|
311
- importerexporter.entries.where(identifier: current_record_ids)[0..limit || total].each do |e|
312
- csv << e.parsed_metadata
322
+ require 'open-uri'
323
+ folder_count = 0
324
+
325
+ importerexporter.entries.where(identifier: current_record_ids)[0..limit || total].in_groups_of(records_split_count, false) do |group|
326
+ folder_count += 1
327
+
328
+ CSV.open(setup_export_file(folder_count), "w", headers: export_headers, write_headers: true) do |csv|
329
+ group.each do |entry|
330
+ csv << entry.parsed_metadata
331
+ next if importerexporter.metadata_only? || entry.type == 'Bulkrax::CsvCollectionEntry'
332
+
333
+ store_files(entry.identifier, folder_count.to_s)
334
+ end
335
+ end
336
+ end
337
+ end
338
+
339
+ def store_files(identifier, folder_count)
340
+ record = ActiveFedora::Base.find(identifier)
341
+ file_sets = record.file_set? ? Array.wrap(record) : record.file_sets
342
+ file_sets << record.thumbnail if exporter.include_thumbnails && record.thumbnail.present? && record.work?
343
+ file_sets.each do |fs|
344
+ path = File.join(exporter_export_path, folder_count, 'files')
345
+ FileUtils.mkdir_p(path) unless File.exist? path
346
+ file = filename(fs)
347
+ io = open(fs.original_file.uri)
348
+ next if file.blank?
349
+
350
+ File.open(File.join(path, file), 'wb') do |f|
351
+ f.write(io.read)
352
+ f.close
313
353
  end
314
354
  end
315
355
  end
@@ -356,8 +396,11 @@ module Bulkrax
356
396
  end
357
397
 
358
398
  # in the parser as it is specific to the format
359
- def setup_export_file
360
- File.join(importerexporter.exporter_export_path, "export_#{importerexporter.export_source}_from_#{importerexporter.export_from}.csv")
399
+ def setup_export_file(folder_count)
400
+ path = File.join(importerexporter.exporter_export_path, folder_count.to_s)
401
+ FileUtils.mkdir_p(path) unless File.exist?(path)
402
+
403
+ File.join(path, "export_#{importerexporter.export_source}_from_#{importerexporter.export_from}_#{folder_count}.csv")
361
404
  end
362
405
 
363
406
  # Retrieve file paths for [:file] mapping in records
@@ -0,0 +1,8 @@
1
+ <%= form.select :exporter_export_zip_files,
2
+ exporter.sort_zip_files(form.object.exporter_export_zip_files.flatten),
3
+ {},
4
+ {
5
+ class: 'btn btn-default form-control',
6
+ style: 'width: 200px'
7
+ }
8
+ %>
@@ -21,7 +21,7 @@
21
21
  <th scope="col">Name</th>
22
22
  <th scope="col">Status</th>
23
23
  <th scope="col">Date Exported</th>
24
- <th scope="col"></th>
24
+ <th scope="col">Downloadable Files</th>
25
25
  <th scope="col"></th>
26
26
  <th scope="col"></th>
27
27
  <th scope="col"></th>
@@ -35,7 +35,10 @@
35
35
  <td><%= exporter.created_at %></td>
36
36
  <td>
37
37
  <% if File.exist?(exporter.exporter_export_zip_path) %>
38
- <%= link_to raw('<span class="glyphicon glyphicon-download"></span>'), exporter_download_path(exporter) %>
38
+ <%= simple_form_for(exporter, method: :get, url: exporter_download_path(exporter)) do |form| %>
39
+ <%= render 'downloads', exporter: exporter, form: form %>
40
+ <%= form.button :submit, value: 'Download', data: { disable_with: false } %>
41
+ <% end %>
39
42
  <% end%>
40
43
  </td>
41
44
  <td><%= link_to raw('<span class="glyphicon glyphicon-info-sign"></span>'), exporter_path(exporter) %></td>
@@ -8,10 +8,11 @@
8
8
  <div class='panel-body'>
9
9
 
10
10
  <% if File.exist?(@exporter.exporter_export_zip_path) %>
11
- <p class='bulkrax-p-align'>
11
+ <%= simple_form_for @exporter, method: :get, url: exporter_download_path(@exporter), html: { class: 'form-inline bulkrax-p-align' } do |form| %>
12
12
  <strong>Download:</strong>
13
- <%= link_to raw('<span class="glyphicon glyphicon-download"></span>'), exporter_download_path(@exporter) %>
14
- </p>
13
+ <%= render 'downloads', exporter: @exporter, form: form %>
14
+ <%= form.button :submit, value: 'Download', data: { disable_with: false } %>
15
+ <% end %>
15
16
  <% end %>
16
17
 
17
18
  <p class='bulkrax-p-align'>
@@ -135,10 +136,6 @@
135
136
  <%= page_entries_info(@work_entries) %><br>
136
137
  <%= paginate(@work_entries, param_name: :work_entries_page) %>
137
138
  <br>
138
- <% if File.exist?(@exporter.exporter_export_zip_path) %>
139
- <%= link_to 'Download', exporter_download_path(@exporter) %>
140
- |
141
- <% end %>
142
139
  <%= link_to 'Edit', edit_exporter_path(@exporter) %>
143
140
  |
144
141
  <%= link_to 'Back', exporters_path %>
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module Bulkrax
4
- VERSION = '3.5.1'
4
+ VERSION = '4.0.0'
5
5
  end
@@ -1,6 +1,30 @@
1
1
  # frozen_string_literal: true
2
2
 
3
- # desc "Explaining what the task does"
4
- # task :bulkrax do
5
- # # Task goes here
6
- # end
3
+ namespace :bulkrax do
4
+ desc "Remove old exported zips and create new ones with the new file structure"
5
+ task rerun_all_exporters: :environment do
6
+ if defined?(::Hyku)
7
+ Account.find_each do |account|
8
+ puts "=============== updating #{account.name} ============"
9
+ next if account.name == "search"
10
+ switch!(account)
11
+
12
+ rerun_exporters_and_delete_zips
13
+
14
+ puts "=============== finished updating #{account.name} ============"
15
+ end
16
+ else
17
+ rerun_exporters_and_delete_zips
18
+ end
19
+ end
20
+
21
+ def rerun_exporters_and_delete_zips
22
+ begin
23
+ Bulkrax::Exporter.all.each { |e| Bulkrax::ExporterJob.perform_later(e.id) }
24
+ rescue => e
25
+ puts "(#{e.message})"
26
+ end
27
+
28
+ Dir["tmp/exports/**.zip"].each { |zip_path| FileUtils.rm_rf(zip_path) }
29
+ end
30
+ end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: bulkrax
3
3
  version: !ruby/object:Gem::Version
4
- version: 3.5.1
4
+ version: 4.0.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Rob Kaufman
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2022-06-27 00:00:00.000000000 Z
11
+ date: 2022-07-15 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: rails
@@ -331,6 +331,7 @@ files:
331
331
  - app/views/bulkrax/entries/_parsed_metadata.html.erb
332
332
  - app/views/bulkrax/entries/_raw_metadata.html.erb
333
333
  - app/views/bulkrax/entries/show.html.erb
334
+ - app/views/bulkrax/exporters/_downloads.html.erb
334
335
  - app/views/bulkrax/exporters/_form.html.erb
335
336
  - app/views/bulkrax/exporters/edit.html.erb
336
337
  - app/views/bulkrax/exporters/index.html.erb