derivative-rodeo 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (42) hide show
  1. checksums.yaml +7 -0
  2. data/Gemfile +6 -0
  3. data/LICENSE +15 -0
  4. data/README.md +251 -0
  5. data/Rakefile +42 -0
  6. data/derivative_rodeo.gemspec +54 -0
  7. data/lib/derivative/rodeo.rb +3 -0
  8. data/lib/derivative-rodeo.rb +3 -0
  9. data/lib/derivative_rodeo/configuration.rb +95 -0
  10. data/lib/derivative_rodeo/errors.rb +56 -0
  11. data/lib/derivative_rodeo/generators/base_generator.rb +200 -0
  12. data/lib/derivative_rodeo/generators/concerns/copy_file_concern.rb +28 -0
  13. data/lib/derivative_rodeo/generators/copy_generator.rb +14 -0
  14. data/lib/derivative_rodeo/generators/hocr_generator.rb +112 -0
  15. data/lib/derivative_rodeo/generators/monochrome_generator.rb +39 -0
  16. data/lib/derivative_rodeo/generators/pdf_split_generator.rb +61 -0
  17. data/lib/derivative_rodeo/generators/thumbnail_generator.rb +38 -0
  18. data/lib/derivative_rodeo/generators/word_coordinates_generator.rb +39 -0
  19. data/lib/derivative_rodeo/services/base_service.rb +15 -0
  20. data/lib/derivative_rodeo/services/convert_uri_via_template_service.rb +87 -0
  21. data/lib/derivative_rodeo/services/extract_word_coordinates_from_hocr_sgml_service.rb +218 -0
  22. data/lib/derivative_rodeo/services/image_identify_service.rb +89 -0
  23. data/lib/derivative_rodeo/services/image_jp2_service.rb +112 -0
  24. data/lib/derivative_rodeo/services/image_service.rb +73 -0
  25. data/lib/derivative_rodeo/services/pdf_splitter/base.rb +177 -0
  26. data/lib/derivative_rodeo/services/pdf_splitter/jpg_page.rb +14 -0
  27. data/lib/derivative_rodeo/services/pdf_splitter/pages_summary.rb +130 -0
  28. data/lib/derivative_rodeo/services/pdf_splitter/png_page.rb +26 -0
  29. data/lib/derivative_rodeo/services/pdf_splitter/tiff_page.rb +52 -0
  30. data/lib/derivative_rodeo/services/pdf_splitter_service.rb +19 -0
  31. data/lib/derivative_rodeo/services/url_service.rb +42 -0
  32. data/lib/derivative_rodeo/storage_locations/base_location.rb +251 -0
  33. data/lib/derivative_rodeo/storage_locations/concerns/download_concern.rb +67 -0
  34. data/lib/derivative_rodeo/storage_locations/file_location.rb +39 -0
  35. data/lib/derivative_rodeo/storage_locations/http_location.rb +13 -0
  36. data/lib/derivative_rodeo/storage_locations/https_location.rb +13 -0
  37. data/lib/derivative_rodeo/storage_locations/s3_location.rb +103 -0
  38. data/lib/derivative_rodeo/storage_locations/sqs_location.rb +187 -0
  39. data/lib/derivative_rodeo/technical_metadata.rb +23 -0
  40. data/lib/derivative_rodeo/version.rb +5 -0
  41. data/lib/derivative_rodeo.rb +36 -0
  42. metadata +339 -0
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: 20e512d7162170875d60f90ff48bba694237ac7d31f38d806d2b87f570536c1c
4
+ data.tar.gz: a1311ea39a3994b4d24ffdbdbade62b7fbb15d2c326a3b510607f315fb4dd865
5
+ SHA512:
6
+ metadata.gz: 157a9e276c6cefe739137fbe17e783557d0317dcee531cd353ad86987bda33ad55c2ada8179e254b306a467d01f8f759a6e89fc91b8cbcf6e968cf5a28a9037b
7
+ data.tar.gz: 5bd45db467194cf1e8af7f7e1ed625c2b3898d011f20a581a9a55a2ccbb7be56ca8c276752b7bef41e2c1d5efa4737d1291f83211dd69cd18e4f7caeed25fef2
data/Gemfile ADDED
@@ -0,0 +1,6 @@
1
+ # frozen_string_literal: true
2
+
3
+ source 'https://rubygems.org'
4
+
5
+ # Specify your gem's dependencies in derivative_rodeo.gemspec
6
+ gemspec
data/LICENSE ADDED
@@ -0,0 +1,15 @@
1
+ Copyright 2023 Software Services by Scientist.com
2
+
3
+ Additional copyright may be held by others, as reflected in the commit history.
4
+
5
+ Licensed under the Apache License, Version 2.0 (the "License");
6
+ you may not use this file except in compliance with the License.
7
+ You may obtain a copy of the License at
8
+
9
+ http://www.apache.org/licenses/LICENSE-2.0
10
+
11
+ Unless required by applicable law or agreed to in writing, software
12
+ distributed under the License is distributed on an "AS IS" BASIS,
13
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14
+ See the License for the specific language governing permissions and
15
+ limitations under the License.
data/README.md ADDED
@@ -0,0 +1,251 @@
1
+ <!-- START doctoc generated TOC please keep comment here to allow auto update -->
2
+ <!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
3
+ **Table of Contents** *generated with [DocToc](https://github.com/thlorenz/doctoc)*
4
+
5
+ - [DerivativeRodeo](#derivativerodeo)
6
+ - [Process Life Cycle](#process-life-cycle)
7
+ - [Concepts](#concepts)
8
+ - [Common Storage](#common-storage)
9
+ - [Related Files](#related-files)
10
+ - [Sequence Diagram](#sequence-diagram)
11
+ - [Installation](#installation)
12
+ - [Usage](#usage)
13
+ - [Technical Overview of the DerivativeRodeo](#technical-overview-of-the-derivativerodeo)
14
+ - [Generators](#generators)
15
+ - [Interface(s)](#interfaces)
16
+ - [Supported Generators](#supported-generators)
17
+ - [Registered Generators](#registered-generators)
18
+ - [Storage Locations](#storage-locations)
19
+ - [Supported Storage Locations](#supported-storage-locations)
20
+ - [Development](#development)
21
+ - [Logging in Test Environment](#logging-in-test-environment)
22
+ - [Contributing](#contributing)
23
+
24
+ <!-- END doctoc generated TOC please keep comment here to allow auto update -->
25
+
26
+ # DerivativeRodeo
27
+
28
+ “This ain’t my first rodeo.” (an idiomatic American slang for “I’m prepared for what comes next.”)
29
+
30
+ The `DerivativeRodeo` "moves" files from one storage location (e.g. *input*) to one or more storage locations (e.g. *output*) via a generator.
31
+
32
+ - [Storage Location](./lib/derivative_rodeo/storage_locations/base_location.rb) :: where we can expect to find a file.
33
+ - [Generator](./lib/derivative_rodeo/generators/base_generator.rb) :: a process to transform a file into another file.
34
+
35
+ ## Process Life Cycle
36
+
37
+ In the case of a *input* storage location, we expect that the underlying file pointed at by the *input* storage location exists. After all we can't move what we don't have.
38
+
39
+ In the case of a *output* storage location, we expect that the underlying file will exist after the generator has completed. The *output* storage location *could* already exist or we might need to generate the file for the *output* location.
40
+
41
+ During the generator's process, we need to have a working copy of both the *input* and *output* file. This is done by creating a temporary file.
42
+
43
+ In the case of the *input*, the creation of that temporary file involves getting the file from the *input* storage location. In the case of the *output*, we create a temporary file that the *output* storage location then knows how to move to the resulting place.
44
+
45
+ ![Storage Lifecycle](./artifacts/derivative_rodeo-generator_storage_lifecycle.png)
46
+
47
+ The above Storage Lifecycle diagram is as follows: `input location` to `input tmp file` to `generator` to `output tmp file` to `output location`.
48
+
49
+ *Note:* We've designed and implemented the data life cycle to automatically clean-up the temporary files as the generator completes. In this way we can use the smallest working space possible. A design decision that helps run `DerivativeRodeo` within distributed clusters (e.g. AWS Serverless).
50
+
51
+ ## Concepts
52
+
53
+ ![Overview](./artifacts/derivative_rodeo-overview.png)
54
+
55
+ <details>
56
+ <summary>The PlantUML Text for the Overview Diagram</summary>
57
+
58
+ ```plantuml
59
+ @startuml
60
+ !theme amiga
61
+
62
+ cloud "Source 1" as S1
63
+ cloud "Source 2" as S2
64
+ cloud "Source 3" as S3
65
+
66
+ storage "IMAGEs" as IMAGEs
67
+ storage "HOCRs" as HOCRs
68
+ storage "TXTs" as TXTs
69
+
70
+ control Preprocess as G1
71
+
72
+ S1 -down-> G1
73
+ S2 -down-> G1
74
+ S3 -down-> G1
75
+
76
+ G1 -down-> IMAGEs
77
+ G1 -down-> HOCRs
78
+ G1 -down-> TXTs
79
+
80
+ control Import as I1
81
+
82
+ IMAGEs -down-> I1
83
+ HOCRs -down-> I1
84
+ TXTs -down-> I1
85
+
86
+ package FileSet as FileSet1 {
87
+ file Image1
88
+ file Hocr1
89
+ file Txt1
90
+ }
91
+ package FileSet as FileSet2 {
92
+ file Image2
93
+ file Hocr2
94
+ file Txt2
95
+ }
96
+
97
+ I1 -down-> FileSet1
98
+ I1 -down-> FileSet2
99
+
100
+ @enduml
101
+
102
+ ```
103
+
104
+ </details>
105
+
106
+ ### Common Storage
107
+
108
+ In this case, <dfn>common storage</dfn> could mean the storage where we're writing all pre-processing of files. Or it could mean the storage where we're writing for application access (e.g. [Fedora Commons](https://fedora.lyrasis.org) for a [Hyrax](https://github.com/samvera/hyrax) application).
109
+
110
+ In other words, the `DerivativeRodeo` is part of moving files from one location to another, and ensuring that at each step we have all of the expected files we want.
111
+
112
+ ### Related Files
113
+
114
+ This is not strictly related to <dfn>Hyrax's FileSet</dfn>, that is a set of files in which one is considered the original and all others are _derivatives_ of the original.
115
+
116
+ However it is helpful to think in those terms; files that have a significant relation to each other; one derived from the other. For example an original PDF and it's extracted text would be two significantly related files.
117
+
118
+ ### Sequence Diagram
119
+
120
+ ![Sequence Diagram](./artifacts/derivative_rodeo-sequence-diagram.png)
121
+
122
+ <details>
123
+ <summary>The PlantUML Text for the Sequence Diagram</summary>
124
+
125
+ ```plantuml
126
+ @startuml
127
+ !theme amiga
128
+
129
+ actor Instigator
130
+ database S3
131
+ control AWS
132
+ queue SQS
133
+ control SpaceStone
134
+ control DerivativeRodeo
135
+ collections From
136
+ collections To
137
+ Instigator -> S3 : "Upload bucket\nof files associated\n with FileSet"
138
+ S3 -> AWS : "AWS enqueues\nthe bucket"
139
+ AWS -> SQS : "AWS adds to SQS"
140
+ SQS -> SpaceStone : "SQS invokes\nSpaceStone method"
141
+ SpaceStone -> DerivativeRodeo : "SpaceStone calls\n DerivativeRodeo"
142
+ DerivativeRodeo --> S3 : "Request file for\ntemporary processing"
143
+ S3 --> From : "Write requested\n file to\ntemporary storage"
144
+ DerivativeRodeo <-- From
145
+ DerivativeRodeo -> To : "Generate derivative\n writing to local\n processing storage."
146
+ To --> S3 : "Write file\n to S3 Bucket"
147
+ DerivativeRodeo <-- To : "Return to DerivativeRodeo\n with generated URIs"
148
+ SpaceStone <- DerivativeRodeo : "Return generated\n URIs"
149
+ SpaceStone -> SQS : "Optionally enqueue\nfurther work"
150
+ @enduml
151
+ ```
152
+ </details>
153
+
154
+ Given a single original file in a previous home, we are copying that original file (and derivatives) to various locations:
155
+
156
+ - From previous home to S3.
157
+ - From S3 to local temporary storage (for processing).
158
+ - Create a derivative temporary file based on existing file.
159
+ - Copying derivative temporary file to S3.
160
+
161
+ ## Installation
162
+
163
+ Add this line to your application's Gemfile:
164
+
165
+ ```ruby
166
+ gem 'derivative-rodeo'
167
+ ```
168
+
169
+ (Due to historical reasons the gem name is `derivative-rodeo` even though the repository is `derivative_rodeo`. The following "require" methods will work:
170
+
171
+ - `require 'derivative_rodeo'`
172
+ - `require 'derivative-rodeo'`
173
+ - `require 'derivative/rodeo'`
174
+
175
+ And then execute: `$ bundle install`
176
+
177
+ Be aware that you need `pdfinfo` command line tool installed for this gem to run specs or when using PDF functionality.
178
+
179
+ ## Usage
180
+
181
+ TODO
182
+
183
+ ## Technical Overview of the DerivativeRodeo
184
+
185
+ ### Generators
186
+
187
+ Generators are responsible for ensuring that we have the file associated with the generator. For example, the [HocrGenerator](./lib/derivative_rodeo/generators/hocr_generator.rb) is responsible for ensuring that we have the `.hocr` file in the expected desired storage location.
188
+
189
+ #### Interface(s)
190
+
191
+ Generators must have an initializer and build command:
192
+
193
+ - `.new(array_of_file_urls, output_location_template, preprocessed_location_template)`
194
+ - `#generated_files` (executes the generators actions) and returns array of files
195
+ - `#generated_uris` (executes the generators actions) and returns array of output uris
196
+
197
+ #### Supported Generators
198
+
199
+ Below is the current list of generators.
200
+
201
+ - [HocrGenerator](./lib/derivative_rodeo/generators/hocr_generator.rb) :: generated tesseract files from images, also creates monocrhome files as a prestep
202
+ - [MonochromeGenerator](./lib/derivative_rodeo/generators/monochrome_generator.rb) :: converts images to monochrome
203
+ - [CopyGenerator](./lib/derivative_rodeo/generators/copy_generator.rb) :: sends a set of uris to another location. For example from <abbr title="Simple Storage Service">S3</abbr> to <abbr title="Simple Queue Service">SQS</abbr> or from filesystem to S3.
204
+ - [PdfSplitGenerator](./lib/derivative_rodeo/generators/pdf_split_generator.rb) :: split a PDF into one image per page
205
+ - [WordCoordinatesGenerator](./lib/derivative_rodeo/generators/word_coordinates_generator.rb) :: create a JSON file representing the words and coordinates (derived from the `.hocr` file).
206
+
207
+ #### Registered Generators
208
+
209
+ TODO: We want to expose a list of registered generators
210
+
211
+ ### Storage Locations
212
+
213
+ Storage locations are where we put things. Each location has a specific implementation but is expected to inherit from the [DerivativeRodeo::StorageLocation::BaseLocation](./lib/derivative_rodeo/storage_adapters/base_adapter.rb).
214
+
215
+ `DerivativeRodeo::StorageLocation::BaseLocation.locations` method tracks the registered locations.
216
+
217
+ The location represents where the file *should* be.
218
+
219
+ #### Supported Storage Locations
220
+
221
+ Storage locations follow a [URI pattern](https://en.wikipedia.org/wiki/Uniform_Resource_Identifier#Example_URIs)
222
+
223
+ - `file://` :: “local” file system storage
224
+ - `s3://` :: <abbr title="Amazon Web Service">AWS</abbr>’s <abbr title="Simple Storage Service">S3</abbr> storage system
225
+ - `sqs://` :: <abbr title="Amazon Web Service">AWS</abbr>’s <abbr title="Simple Queue Service">SQS</abbr>
226
+
227
+ ## Development
228
+
229
+ - Checkout the repository: `git clone https://github.com/scientist-softserv/derivative_rodeo`
230
+ - Install dependencies: `cd derivative_rodeo; bundle install`
231
+ - Install git hooks: `rake install_hooks`
232
+ - Install binaries:
233
+ - `pdfinfo`: provided by poppler (e.g. `brew install poppler`)
234
+ - GhostScript (e.g. `gs`): run `brew install gs`
235
+
236
+ Then go about writing your code and documentation.
237
+
238
+ The git hooks call `rake default` which will:
239
+
240
+ - Amend the table of contents of this file
241
+ - Run `rubocop`
242
+ - Validate yard documentation (see http://rubydoc.info/gems/yard/file/docs/Tags.md#List_of_Available_Tags for help correcting warnings)
243
+ - Run `rspec` with `simplecov`
244
+
245
+ ### Logging in Test Environment
246
+
247
+ Throughout the `DerivativeRodeo` we log some activity. In the typical test run, the logs are overly chatty. If you want the more chatty logs run the following: `DEBUG=t rspec`.
248
+
249
+ ## Contributing
250
+
251
+ Bug reports and pull requests are welcome on GitHub at https://github.com/scientist-softserv/derivative_rodeo.
data/Rakefile ADDED
@@ -0,0 +1,42 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'bundler/gem_tasks'
4
+ require 'rspec/core/rake_task'
5
+ require 'rubocop/rake_task'
6
+
7
+ desc 'Run style checker'
8
+ RuboCop::RakeTask.new(:rubocop) do |task|
9
+ task.fail_on_error = true
10
+ end
11
+
12
+ desc 'Install commit hooks to ensure better practices'
13
+ task :install_hooks do
14
+ require 'fileutils'
15
+ Dir.glob('./git-hooks/*').each do |hook|
16
+ next if File.file?("./.git/hooks/#{File.basename(hook)}")
17
+
18
+ puts "Installing #{File.basename(hook)} git hook"
19
+ FileUtils.cp(hook, './.git/hooks/')
20
+ end
21
+ end
22
+
23
+ require 'yard'
24
+ YARD::Rake::YardocTask.new do |t|
25
+ t.options = ['--fail-on-warning']
26
+ end
27
+
28
+ RSpec::Core::RakeTask.new(:spec)
29
+
30
+ desc 'Generate table of contents for README.md'
31
+ task :doctoc do
32
+ if `which doctoc`.strip.empty?
33
+ $stdout.puts 'Skipping doctoc generation; install via "npm install -g doctoc"'
34
+ else
35
+ $stdout.puts 'Generating table of contents for README.md'
36
+ `doctoc README.md`
37
+ end
38
+ end
39
+
40
+ task ci: %i[doctoc rubocop yard spec]
41
+
42
+ task default: %i[ci]
@@ -0,0 +1,54 @@
1
+ # frozen_string_literal: true
2
+
3
+ require_relative 'lib/derivative_rodeo/version'
4
+
5
+ Gem::Specification.new do |spec|
6
+ # Renaming to reflect that we previously registered 'derivative-rodeo' and Rubygems guards against
7
+ # names that are close in resemblence.
8
+ spec.name = 'derivative-rodeo'
9
+ spec.version = DerivativeRodeo::VERSION
10
+ spec.authors = ['Rob Kaufman', 'Jeremy Friesen']
11
+ spec.email = ['rob@notch8.com', 'jeremy.n.friesen@gmail.com']
12
+
13
+ spec.summary = 'An ETL Ecosystem for Derivative Processing.'
14
+ spec.description = spec.summary
15
+ spec.homepage = 'https://github.com/scientist-softserv/derivative_rodeo'
16
+ spec.required_ruby_version = '>= 2.7.0'
17
+ spec.licenses = ['APACHE-2.0']
18
+
19
+ spec.metadata['homepage_uri'] = spec.homepage
20
+ spec.metadata['source_code_uri'] = spec.homepage
21
+
22
+ # Specify which files should be added to the gem when it is released.
23
+ # The `git ls-files -z` loads the files in the RubyGem that have been added into git.
24
+ # spec.files = Dir.chdir(__dir__) do
25
+ # `git ls-files -z`.split("\x0").reject do |f|
26
+ # (f == __FILE__) || f.match(%r{\A(?:(?:bin|test|spec|features)/|\.(?:git|travis|circleci)|appveyor)})
27
+ # end
28
+ # end
29
+ spec.files = Dir['lib/**/*'].keep_if { |file| File.file?(file) } + %w[Gemfile LICENSE README.md Rakefile derivative_rodeo.gemspec]
30
+ spec.bindir = 'exe'
31
+ spec.executables = spec.files.grep(%r{\Aexe/}) { |f| File.basename(f) }
32
+ spec.require_paths = ['lib']
33
+
34
+ spec.add_dependency 'activesupport', '>= 5'
35
+ spec.add_dependency 'aws-sdk-s3'
36
+ spec.add_dependency 'aws-sdk-sqs'
37
+ spec.add_dependency 'httparty'
38
+ spec.add_dependency 'marcel'
39
+ spec.add_dependency 'mime-types'
40
+ spec.add_dependency 'mini_magick'
41
+ spec.add_dependency 'nokogiri'
42
+
43
+ spec.add_development_dependency 'bixby'
44
+ spec.add_development_dependency 'byebug'
45
+ # spec.add_development_dependency 'hydra-file_characterization'
46
+ spec.add_development_dependency 'rspec', '~> 3.0'
47
+ spec.add_development_dependency 'rake', '~> 13.0'
48
+ spec.add_development_dependency 'simplecov'
49
+ spec.add_development_dependency 'yard-activerecord'
50
+ spec.add_development_dependency 'rspec-its'
51
+ spec.add_development_dependency 'shoulda-matchers'
52
+ spec.add_development_dependency 'solargraph'
53
+ spec.add_development_dependency 'yard'
54
+ end
@@ -0,0 +1,3 @@
1
+ # frozen_string_literal: true
2
+
3
+ require_relative "../derivative_rodeo"
@@ -0,0 +1,3 @@
1
+ # frozen_string_literal: true
2
+
3
+ require_relative './derivative_rodeo'
@@ -0,0 +1,95 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'mime/types'
4
+ require 'logger'
5
+ module DerivativeRodeo
6
+ ##
7
+ # @api public
8
+ #
9
+ # This class is responsible for the consistent configuration of the "application" that leverages
10
+ # the {DerivativeRodeo}.
11
+ #
12
+ # This configuration helps set defaults for storage locations and generators.
13
+ class Configuration
14
+ ##
15
+ # Allows AWS configuration to be set via environment variables by declairing them in the configuration
16
+ # class as follows:
17
+ #
18
+ # @example
19
+ #
20
+ # aws_config prefix: 's3', name: 'region', default: 'us-east-1'
21
+ #
22
+ # @param prefix [String]
23
+ # @param name [String]
24
+ # @param default [String] (optional)
25
+ def self.aws_config(prefix:, name:, default: nil)
26
+ aws_config_getter(prefix: prefix, name: name, default: default)
27
+ aws_config_setter(prefix: prefix, name: name)
28
+ end
29
+
30
+ def self.aws_config_getter(prefix:, name:, default: nil)
31
+ define_method "aws_#{prefix}_#{name}" do
32
+ val = instance_variable_get("@aws_#{prefix}_#{name}")
33
+ return val if val
34
+
35
+ val = ENV["AWS_#{prefix.upcase}_#{name.upcase}"] ||
36
+ ENV["AWS__#{name.upcase}"] ||
37
+ ENV["AWS_DEFAULT_#{name.upcase}"] ||
38
+ default
39
+ instance_variable_set("@aws_#{prefix}_#{name}", val)
40
+ end
41
+ end
42
+ private_class_method :aws_config_getter
43
+
44
+ def self.aws_config_setter(prefix:, name:)
45
+ define_method "aws_#{prefix}_#{name}=" do |val|
46
+ instance_variable_set("@aws_#{prefix}_#{name}", val)
47
+ end
48
+ end
49
+ private_class_method :aws_config_setter
50
+
51
+ def initialize
52
+ @logger = if defined?(Rails)
53
+ Rails.logger
54
+ else
55
+ # By default, minimize the chatter of the specs. Add ENV['DEBUG'] to expose the
56
+ # chatter.
57
+ Logger.new($stderr, level: Logger::FATAL)
58
+ end
59
+ yield self if block_given?
60
+ end
61
+
62
+ ##
63
+ # @return [Logger]
64
+ attr_accessor :logger
65
+
66
+ ##
67
+ # @!group AWS S3 Configuration
68
+ #
69
+ # Various AWS items for {StorageLocations::S3Location}. These can be set from the ENV or the configuration block
70
+ #
71
+ # @note
72
+ #
73
+ # The order we use is:
74
+ # * `config.aws_s3_<variable_name> = value`
75
+ # * `AWS_S3_<variable_name>`
76
+ # * `AWS_<variable_name>`
77
+ # * `AWS_DEFAULT_<variable_name>`
78
+ # * default
79
+ #
80
+ # @return [String]
81
+
82
+ aws_config prefix: 's3', name: 'region', default: 'us-east-1'
83
+ aws_config prefix: 's3', name: 'bucket'
84
+ aws_config prefix: 's3', name: 'access_key_id'
85
+ aws_config prefix: 's3', name: 'secret_access_key'
86
+
87
+ aws_config prefix: 'sqs', name: 'region', default: 'us-east-1'
88
+ aws_config prefix: 'sqs', name: 'queue'
89
+ aws_config prefix: 'sqs', name: 'account_id'
90
+ aws_config prefix: 'sqs', name: 'access_key_id'
91
+ aws_config prefix: 'sqs', name: 'secret_access_key'
92
+ aws_config prefix: 'sqs', name: 'max_batch_size', default: 10
93
+ # @!endgroup AWS SQS Configurations
94
+ end
95
+ end
@@ -0,0 +1,56 @@
1
+ # frozen_string_literal: true
2
+ module DerivativeRodeo
3
+ ##
4
+ # A module namespace for establishing the possible errors that the {DerivativeRodeo} could raise.
5
+ # The rodeo could raise other errors, but these are the ones we've named.
6
+ module Errors
7
+ ##
8
+ # That which all DerivativeRodeo errors shall extend!
9
+ class Error < StandardError; end
10
+
11
+ ##
12
+ # Raised when a file uri is passed in that does not contain a storage adapter part before the ://
13
+ class StorageLocationMissing < Error
14
+ def initialize(file_uri: '')
15
+ super("#{file_uri} does not contain an adapter. Should look like file:///my_dir/myfile or s3://bucket_name/location/file_name. The part before the :// is used to select the storage adapter.") # rubocop:disable Layout/LineLength
16
+ end
17
+ end
18
+
19
+ ##
20
+ # Raised when a storage adapter is called for but does not exist in the registered adapter list
21
+ class StorageLocationNotFoundError < Error
22
+ def initialize(location_name: '')
23
+ super("Could not find the adapter #{location_name}. Make sure it is required and registerd properly.")
24
+ end
25
+ end
26
+
27
+ ##
28
+ # Raised when a storage adapter is called for but does not exist in the registered adapter list
29
+ class MaxQueueSize < Error
30
+ def initialize(batch_size:)
31
+ super("Batch size #{batch_size} is larger than the max queue size #{DerivativeRodeo.config.aws_sqs_max_batch_size}")
32
+ end
33
+ end
34
+
35
+ ##
36
+ # Raised when AWS bucket does not exist or is not accessible by current permissions
37
+ class BucketMissingError < Error
38
+ def initialize
39
+ super("Bucket part missing #{file_uri}")
40
+ end
41
+ end
42
+
43
+ ##
44
+ # Raised when trying to write a tmp file that does not exist
45
+ class FileMissingError < Error
46
+ end
47
+
48
+ ##
49
+ # Raised because the Generator class must declare an extension for the output file extension
50
+ class ExtensionMissingError < Error
51
+ def initialize(klass: '')
52
+ super("Extension must be declared in the Generator class #{klass}")
53
+ end
54
+ end
55
+ end
56
+ end