derivative-rodeo 0.2.0

Sign up to get free protection for your applications and to get access to all the features.
Files changed (42) hide show
  1. checksums.yaml +7 -0
  2. data/Gemfile +6 -0
  3. data/LICENSE +15 -0
  4. data/README.md +251 -0
  5. data/Rakefile +42 -0
  6. data/derivative_rodeo.gemspec +54 -0
  7. data/lib/derivative/rodeo.rb +3 -0
  8. data/lib/derivative-rodeo.rb +3 -0
  9. data/lib/derivative_rodeo/configuration.rb +95 -0
  10. data/lib/derivative_rodeo/errors.rb +56 -0
  11. data/lib/derivative_rodeo/generators/base_generator.rb +200 -0
  12. data/lib/derivative_rodeo/generators/concerns/copy_file_concern.rb +28 -0
  13. data/lib/derivative_rodeo/generators/copy_generator.rb +14 -0
  14. data/lib/derivative_rodeo/generators/hocr_generator.rb +112 -0
  15. data/lib/derivative_rodeo/generators/monochrome_generator.rb +39 -0
  16. data/lib/derivative_rodeo/generators/pdf_split_generator.rb +61 -0
  17. data/lib/derivative_rodeo/generators/thumbnail_generator.rb +38 -0
  18. data/lib/derivative_rodeo/generators/word_coordinates_generator.rb +39 -0
  19. data/lib/derivative_rodeo/services/base_service.rb +15 -0
  20. data/lib/derivative_rodeo/services/convert_uri_via_template_service.rb +87 -0
  21. data/lib/derivative_rodeo/services/extract_word_coordinates_from_hocr_sgml_service.rb +218 -0
  22. data/lib/derivative_rodeo/services/image_identify_service.rb +89 -0
  23. data/lib/derivative_rodeo/services/image_jp2_service.rb +112 -0
  24. data/lib/derivative_rodeo/services/image_service.rb +73 -0
  25. data/lib/derivative_rodeo/services/pdf_splitter/base.rb +177 -0
  26. data/lib/derivative_rodeo/services/pdf_splitter/jpg_page.rb +14 -0
  27. data/lib/derivative_rodeo/services/pdf_splitter/pages_summary.rb +130 -0
  28. data/lib/derivative_rodeo/services/pdf_splitter/png_page.rb +26 -0
  29. data/lib/derivative_rodeo/services/pdf_splitter/tiff_page.rb +52 -0
  30. data/lib/derivative_rodeo/services/pdf_splitter_service.rb +19 -0
  31. data/lib/derivative_rodeo/services/url_service.rb +42 -0
  32. data/lib/derivative_rodeo/storage_locations/base_location.rb +251 -0
  33. data/lib/derivative_rodeo/storage_locations/concerns/download_concern.rb +67 -0
  34. data/lib/derivative_rodeo/storage_locations/file_location.rb +39 -0
  35. data/lib/derivative_rodeo/storage_locations/http_location.rb +13 -0
  36. data/lib/derivative_rodeo/storage_locations/https_location.rb +13 -0
  37. data/lib/derivative_rodeo/storage_locations/s3_location.rb +103 -0
  38. data/lib/derivative_rodeo/storage_locations/sqs_location.rb +187 -0
  39. data/lib/derivative_rodeo/technical_metadata.rb +23 -0
  40. data/lib/derivative_rodeo/version.rb +5 -0
  41. data/lib/derivative_rodeo.rb +36 -0
  42. metadata +339 -0
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: 20e512d7162170875d60f90ff48bba694237ac7d31f38d806d2b87f570536c1c
4
+ data.tar.gz: a1311ea39a3994b4d24ffdbdbade62b7fbb15d2c326a3b510607f315fb4dd865
5
+ SHA512:
6
+ metadata.gz: 157a9e276c6cefe739137fbe17e783557d0317dcee531cd353ad86987bda33ad55c2ada8179e254b306a467d01f8f759a6e89fc91b8cbcf6e968cf5a28a9037b
7
+ data.tar.gz: 5bd45db467194cf1e8af7f7e1ed625c2b3898d011f20a581a9a55a2ccbb7be56ca8c276752b7bef41e2c1d5efa4737d1291f83211dd69cd18e4f7caeed25fef2
data/Gemfile ADDED
@@ -0,0 +1,6 @@
1
+ # frozen_string_literal: true
2
+
3
+ source 'https://rubygems.org'
4
+
5
+ # Specify your gem's dependencies in derivative_rodeo.gemspec
6
+ gemspec
data/LICENSE ADDED
@@ -0,0 +1,15 @@
1
+ Copyright 2023 Software Services by Scientist.com
2
+
3
+ Additional copyright may be held by others, as reflected in the commit history.
4
+
5
+ Licensed under the Apache License, Version 2.0 (the "License");
6
+ you may not use this file except in compliance with the License.
7
+ You may obtain a copy of the License at
8
+
9
+ http://www.apache.org/licenses/LICENSE-2.0
10
+
11
+ Unless required by applicable law or agreed to in writing, software
12
+ distributed under the License is distributed on an "AS IS" BASIS,
13
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14
+ See the License for the specific language governing permissions and
15
+ limitations under the License.
data/README.md ADDED
@@ -0,0 +1,251 @@
1
+ <!-- START doctoc generated TOC please keep comment here to allow auto update -->
2
+ <!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
3
+ **Table of Contents** *generated with [DocToc](https://github.com/thlorenz/doctoc)*
4
+
5
+ - [DerivativeRodeo](#derivativerodeo)
6
+ - [Process Life Cycle](#process-life-cycle)
7
+ - [Concepts](#concepts)
8
+ - [Common Storage](#common-storage)
9
+ - [Related Files](#related-files)
10
+ - [Sequence Diagram](#sequence-diagram)
11
+ - [Installation](#installation)
12
+ - [Usage](#usage)
13
+ - [Technical Overview of the DerivativeRodeo](#technical-overview-of-the-derivativerodeo)
14
+ - [Generators](#generators)
15
+ - [Interface(s)](#interfaces)
16
+ - [Supported Generators](#supported-generators)
17
+ - [Registered Generators](#registered-generators)
18
+ - [Storage Locations](#storage-locations)
19
+ - [Supported Storage Locations](#supported-storage-locations)
20
+ - [Development](#development)
21
+ - [Logging in Test Environment](#logging-in-test-environment)
22
+ - [Contributing](#contributing)
23
+
24
+ <!-- END doctoc generated TOC please keep comment here to allow auto update -->
25
+
26
+ # DerivativeRodeo
27
+
28
+ “This ain’t my first rodeo.” (an idiomatic American slang for “I’m prepared for what comes next.”)
29
+
30
+ The `DerivativeRodeo` "moves" files from one storage location (e.g. *input*) to one or more storage locations (e.g. *output*) via a generator.
31
+
32
+ - [Storage Location](./lib/derivative_rodeo/storage_locations/base_location.rb) :: where we can expect to find a file.
33
+ - [Generator](./lib/derivative_rodeo/generators/base_generator.rb) :: a process to transform a file into another file.
34
+
35
+ ## Process Life Cycle
36
+
37
+ In the case of a *input* storage location, we expect that the underlying file pointed at by the *input* storage location exists. After all we can't move what we don't have.
38
+
39
+ In the case of a *output* storage location, we expect that the underlying file will exist after the generator has completed. The *output* storage location *could* already exist or we might need to generate the file for the *output* location.
40
+
41
+ During the generator's process, we need to have a working copy of both the *input* and *output* file. This is done by creating a temporary file.
42
+
43
+ In the case of the *input*, the creation of that temporary file involves getting the file from the *input* storage location. In the case of the *output*, we create a temporary file that the *output* storage location then knows how to move to the resulting place.
44
+
45
+ ![Storage Lifecycle](./artifacts/derivative_rodeo-generator_storage_lifecycle.png)
46
+
47
+ The above Storage Lifecycle diagram is as follows: `input location` to `input tmp file` to `generator` to `output tmp file` to `output location`.
48
+
49
+ *Note:* We've designed and implemented the data life cycle to automatically clean-up the temporary files as the generator completes. In this way we can use the smallest working space possible. A design decision that helps run `DerivativeRodeo` within distributed clusters (e.g. AWS Serverless).
50
+
51
+ ## Concepts
52
+
53
+ ![Overview](./artifacts/derivative_rodeo-overview.png)
54
+
55
+ <details>
56
+ <summary>The PlantUML Text for the Overview Diagram</summary>
57
+
58
+ ```plantuml
59
+ @startuml
60
+ !theme amiga
61
+
62
+ cloud "Source 1" as S1
63
+ cloud "Source 2" as S2
64
+ cloud "Source 3" as S3
65
+
66
+ storage "IMAGEs" as IMAGEs
67
+ storage "HOCRs" as HOCRs
68
+ storage "TXTs" as TXTs
69
+
70
+ control Preprocess as G1
71
+
72
+ S1 -down-> G1
73
+ S2 -down-> G1
74
+ S3 -down-> G1
75
+
76
+ G1 -down-> IMAGEs
77
+ G1 -down-> HOCRs
78
+ G1 -down-> TXTs
79
+
80
+ control Import as I1
81
+
82
+ IMAGEs -down-> I1
83
+ HOCRs -down-> I1
84
+ TXTs -down-> I1
85
+
86
+ package FileSet as FileSet1 {
87
+ file Image1
88
+ file Hocr1
89
+ file Txt1
90
+ }
91
+ package FileSet as FileSet2 {
92
+ file Image2
93
+ file Hocr2
94
+ file Txt2
95
+ }
96
+
97
+ I1 -down-> FileSet1
98
+ I1 -down-> FileSet2
99
+
100
+ @enduml
101
+
102
+ ```
103
+
104
+ </details>
105
+
106
+ ### Common Storage
107
+
108
+ In this case, <dfn>common storage</dfn> could mean the storage where we're writing all pre-processing of files. Or it could mean the storage where we're writing for application access (e.g. [Fedora Commons](https://fedora.lyrasis.org) for a [Hyrax](https://github.com/samvera/hyrax) application).
109
+
110
+ In other words, the `DerivativeRodeo` is part of moving files from one location to another, and ensuring that at each step we have all of the expected files we want.
111
+
112
+ ### Related Files
113
+
114
+ This is not strictly related to <dfn>Hyrax's FileSet</dfn>, that is a set of files in which one is considered the original and all others are _derivatives_ of the original.
115
+
116
+ However it is helpful to think in those terms; files that have a significant relation to each other; one derived from the other. For example an original PDF and it's extracted text would be two significantly related files.
117
+
118
+ ### Sequence Diagram
119
+
120
+ ![Sequence Diagram](./artifacts/derivative_rodeo-sequence-diagram.png)
121
+
122
+ <details>
123
+ <summary>The PlantUML Text for the Sequence Diagram</summary>
124
+
125
+ ```plantuml
126
+ @startuml
127
+ !theme amiga
128
+
129
+ actor Instigator
130
+ database S3
131
+ control AWS
132
+ queue SQS
133
+ control SpaceStone
134
+ control DerivativeRodeo
135
+ collections From
136
+ collections To
137
+ Instigator -> S3 : "Upload bucket\nof files associated\n with FileSet"
138
+ S3 -> AWS : "AWS enqueues\nthe bucket"
139
+ AWS -> SQS : "AWS adds to SQS"
140
+ SQS -> SpaceStone : "SQS invokes\nSpaceStone method"
141
+ SpaceStone -> DerivativeRodeo : "SpaceStone calls\n DerivativeRodeo"
142
+ DerivativeRodeo --> S3 : "Request file for\ntemporary processing"
143
+ S3 --> From : "Write requested\n file to\ntemporary storage"
144
+ DerivativeRodeo <-- From
145
+ DerivativeRodeo -> To : "Generate derivative\n writing to local\n processing storage."
146
+ To --> S3 : "Write file\n to S3 Bucket"
147
+ DerivativeRodeo <-- To : "Return to DerivativeRodeo\n with generated URIs"
148
+ SpaceStone <- DerivativeRodeo : "Return generated\n URIs"
149
+ SpaceStone -> SQS : "Optionally enqueue\nfurther work"
150
+ @enduml
151
+ ```
152
+ </details>
153
+
154
+ Given a single original file in a previous home, we are copying that original file (and derivatives) to various locations:
155
+
156
+ - From previous home to S3.
157
+ - From S3 to local temporary storage (for processing).
158
+ - Create a derivative temporary file based on existing file.
159
+ - Copying derivative temporary file to S3.
160
+
161
+ ## Installation
162
+
163
+ Add this line to your application's Gemfile:
164
+
165
+ ```ruby
166
+ gem 'derivative-rodeo'
167
+ ```
168
+
169
+ (Due to historical reasons the gem name is `derivative-rodeo` even though the repository is `derivative_rodeo`. The following "require" methods will work:
170
+
171
+ - `require 'derivative_rodeo'`
172
+ - `require 'derivative-rodeo'`
173
+ - `require 'derivative/rodeo'`
174
+
175
+ And then execute: `$ bundle install`
176
+
177
+ Be aware that you need `pdfinfo` command line tool installed for this gem to run specs or when using PDF functionality.
178
+
179
+ ## Usage
180
+
181
+ TODO
182
+
183
+ ## Technical Overview of the DerivativeRodeo
184
+
185
+ ### Generators
186
+
187
+ Generators are responsible for ensuring that we have the file associated with the generator. For example, the [HocrGenerator](./lib/derivative_rodeo/generators/hocr_generator.rb) is responsible for ensuring that we have the `.hocr` file in the expected desired storage location.
188
+
189
+ #### Interface(s)
190
+
191
+ Generators must have an initializer and build command:
192
+
193
+ - `.new(array_of_file_urls, output_location_template, preprocessed_location_template)`
194
+ - `#generated_files` (executes the generators actions) and returns array of files
195
+ - `#generated_uris` (executes the generators actions) and returns array of output uris
196
+
197
+ #### Supported Generators
198
+
199
+ Below is the current list of generators.
200
+
201
+ - [HocrGenerator](./lib/derivative_rodeo/generators/hocr_generator.rb) :: generated tesseract files from images, also creates monocrhome files as a prestep
202
+ - [MonochromeGenerator](./lib/derivative_rodeo/generators/monochrome_generator.rb) :: converts images to monochrome
203
+ - [CopyGenerator](./lib/derivative_rodeo/generators/copy_generator.rb) :: sends a set of uris to another location. For example from <abbr title="Simple Storage Service">S3</abbr> to <abbr title="Simple Queue Service">SQS</abbr> or from filesystem to S3.
204
+ - [PdfSplitGenerator](./lib/derivative_rodeo/generators/pdf_split_generator.rb) :: split a PDF into one image per page
205
+ - [WordCoordinatesGenerator](./lib/derivative_rodeo/generators/word_coordinates_generator.rb) :: create a JSON file representing the words and coordinates (derived from the `.hocr` file).
206
+
207
+ #### Registered Generators
208
+
209
+ TODO: We want to expose a list of registered generators
210
+
211
+ ### Storage Locations
212
+
213
+ Storage locations are where we put things. Each location has a specific implementation but is expected to inherit from the [DerivativeRodeo::StorageLocation::BaseLocation](./lib/derivative_rodeo/storage_adapters/base_adapter.rb).
214
+
215
+ `DerivativeRodeo::StorageLocation::BaseLocation.locations` method tracks the registered locations.
216
+
217
+ The location represents where the file *should* be.
218
+
219
+ #### Supported Storage Locations
220
+
221
+ Storage locations follow a [URI pattern](https://en.wikipedia.org/wiki/Uniform_Resource_Identifier#Example_URIs)
222
+
223
+ - `file://` :: “local” file system storage
224
+ - `s3://` :: <abbr title="Amazon Web Service">AWS</abbr>’s <abbr title="Simple Storage Service">S3</abbr> storage system
225
+ - `sqs://` :: <abbr title="Amazon Web Service">AWS</abbr>’s <abbr title="Simple Queue Service">SQS</abbr>
226
+
227
+ ## Development
228
+
229
+ - Checkout the repository: `git clone https://github.com/scientist-softserv/derivative_rodeo`
230
+ - Install dependencies: `cd derivative_rodeo; bundle install`
231
+ - Install git hooks: `rake install_hooks`
232
+ - Install binaries:
233
+ - `pdfinfo`: provided by poppler (e.g. `brew install poppler`)
234
+ - GhostScript (e.g. `gs`): run `brew install gs`
235
+
236
+ Then go about writing your code and documentation.
237
+
238
+ The git hooks call `rake default` which will:
239
+
240
+ - Amend the table of contents of this file
241
+ - Run `rubocop`
242
+ - Validate yard documentation (see http://rubydoc.info/gems/yard/file/docs/Tags.md#List_of_Available_Tags for help correcting warnings)
243
+ - Run `rspec` with `simplecov`
244
+
245
+ ### Logging in Test Environment
246
+
247
+ Throughout the `DerivativeRodeo` we log some activity. In the typical test run, the logs are overly chatty. If you want the more chatty logs run the following: `DEBUG=t rspec`.
248
+
249
+ ## Contributing
250
+
251
+ Bug reports and pull requests are welcome on GitHub at https://github.com/scientist-softserv/derivative_rodeo.
data/Rakefile ADDED
@@ -0,0 +1,42 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'bundler/gem_tasks'
4
+ require 'rspec/core/rake_task'
5
+ require 'rubocop/rake_task'
6
+
7
+ desc 'Run style checker'
8
+ RuboCop::RakeTask.new(:rubocop) do |task|
9
+ task.fail_on_error = true
10
+ end
11
+
12
+ desc 'Install commit hooks to ensure better practices'
13
+ task :install_hooks do
14
+ require 'fileutils'
15
+ Dir.glob('./git-hooks/*').each do |hook|
16
+ next if File.file?("./.git/hooks/#{File.basename(hook)}")
17
+
18
+ puts "Installing #{File.basename(hook)} git hook"
19
+ FileUtils.cp(hook, './.git/hooks/')
20
+ end
21
+ end
22
+
23
+ require 'yard'
24
+ YARD::Rake::YardocTask.new do |t|
25
+ t.options = ['--fail-on-warning']
26
+ end
27
+
28
+ RSpec::Core::RakeTask.new(:spec)
29
+
30
+ desc 'Generate table of contents for README.md'
31
+ task :doctoc do
32
+ if `which doctoc`.strip.empty?
33
+ $stdout.puts 'Skipping doctoc generation; install via "npm install -g doctoc"'
34
+ else
35
+ $stdout.puts 'Generating table of contents for README.md'
36
+ `doctoc README.md`
37
+ end
38
+ end
39
+
40
+ task ci: %i[doctoc rubocop yard spec]
41
+
42
+ task default: %i[ci]
@@ -0,0 +1,54 @@
1
+ # frozen_string_literal: true
2
+
3
+ require_relative 'lib/derivative_rodeo/version'
4
+
5
+ Gem::Specification.new do |spec|
6
+ # Renaming to reflect that we previously registered 'derivative-rodeo' and Rubygems guards against
7
+ # names that are close in resemblence.
8
+ spec.name = 'derivative-rodeo'
9
+ spec.version = DerivativeRodeo::VERSION
10
+ spec.authors = ['Rob Kaufman', 'Jeremy Friesen']
11
+ spec.email = ['rob@notch8.com', 'jeremy.n.friesen@gmail.com']
12
+
13
+ spec.summary = 'An ETL Ecosystem for Derivative Processing.'
14
+ spec.description = spec.summary
15
+ spec.homepage = 'https://github.com/scientist-softserv/derivative_rodeo'
16
+ spec.required_ruby_version = '>= 2.7.0'
17
+ spec.licenses = ['APACHE-2.0']
18
+
19
+ spec.metadata['homepage_uri'] = spec.homepage
20
+ spec.metadata['source_code_uri'] = spec.homepage
21
+
22
+ # Specify which files should be added to the gem when it is released.
23
+ # The `git ls-files -z` loads the files in the RubyGem that have been added into git.
24
+ # spec.files = Dir.chdir(__dir__) do
25
+ # `git ls-files -z`.split("\x0").reject do |f|
26
+ # (f == __FILE__) || f.match(%r{\A(?:(?:bin|test|spec|features)/|\.(?:git|travis|circleci)|appveyor)})
27
+ # end
28
+ # end
29
+ spec.files = Dir['lib/**/*'].keep_if { |file| File.file?(file) } + %w[Gemfile LICENSE README.md Rakefile derivative_rodeo.gemspec]
30
+ spec.bindir = 'exe'
31
+ spec.executables = spec.files.grep(%r{\Aexe/}) { |f| File.basename(f) }
32
+ spec.require_paths = ['lib']
33
+
34
+ spec.add_dependency 'activesupport', '>= 5'
35
+ spec.add_dependency 'aws-sdk-s3'
36
+ spec.add_dependency 'aws-sdk-sqs'
37
+ spec.add_dependency 'httparty'
38
+ spec.add_dependency 'marcel'
39
+ spec.add_dependency 'mime-types'
40
+ spec.add_dependency 'mini_magick'
41
+ spec.add_dependency 'nokogiri'
42
+
43
+ spec.add_development_dependency 'bixby'
44
+ spec.add_development_dependency 'byebug'
45
+ # spec.add_development_dependency 'hydra-file_characterization'
46
+ spec.add_development_dependency 'rspec', '~> 3.0'
47
+ spec.add_development_dependency 'rake', '~> 13.0'
48
+ spec.add_development_dependency 'simplecov'
49
+ spec.add_development_dependency 'yard-activerecord'
50
+ spec.add_development_dependency 'rspec-its'
51
+ spec.add_development_dependency 'shoulda-matchers'
52
+ spec.add_development_dependency 'solargraph'
53
+ spec.add_development_dependency 'yard'
54
+ end
@@ -0,0 +1,3 @@
1
+ # frozen_string_literal: true
2
+
3
+ require_relative "../derivative_rodeo"
@@ -0,0 +1,3 @@
1
+ # frozen_string_literal: true
2
+
3
+ require_relative './derivative_rodeo'
@@ -0,0 +1,95 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'mime/types'
4
+ require 'logger'
5
+ module DerivativeRodeo
6
+ ##
7
+ # @api public
8
+ #
9
+ # This class is responsible for the consistent configuration of the "application" that leverages
10
+ # the {DerivativeRodeo}.
11
+ #
12
+ # This configuration helps set defaults for storage locations and generators.
13
+ class Configuration
14
+ ##
15
+ # Allows AWS configuration to be set via environment variables by declairing them in the configuration
16
+ # class as follows:
17
+ #
18
+ # @example
19
+ #
20
+ # aws_config prefix: 's3', name: 'region', default: 'us-east-1'
21
+ #
22
+ # @param prefix [String]
23
+ # @param name [String]
24
+ # @param default [String] (optional)
25
+ def self.aws_config(prefix:, name:, default: nil)
26
+ aws_config_getter(prefix: prefix, name: name, default: default)
27
+ aws_config_setter(prefix: prefix, name: name)
28
+ end
29
+
30
+ def self.aws_config_getter(prefix:, name:, default: nil)
31
+ define_method "aws_#{prefix}_#{name}" do
32
+ val = instance_variable_get("@aws_#{prefix}_#{name}")
33
+ return val if val
34
+
35
+ val = ENV["AWS_#{prefix.upcase}_#{name.upcase}"] ||
36
+ ENV["AWS__#{name.upcase}"] ||
37
+ ENV["AWS_DEFAULT_#{name.upcase}"] ||
38
+ default
39
+ instance_variable_set("@aws_#{prefix}_#{name}", val)
40
+ end
41
+ end
42
+ private_class_method :aws_config_getter
43
+
44
+ def self.aws_config_setter(prefix:, name:)
45
+ define_method "aws_#{prefix}_#{name}=" do |val|
46
+ instance_variable_set("@aws_#{prefix}_#{name}", val)
47
+ end
48
+ end
49
+ private_class_method :aws_config_setter
50
+
51
+ def initialize
52
+ @logger = if defined?(Rails)
53
+ Rails.logger
54
+ else
55
+ # By default, minimize the chatter of the specs. Add ENV['DEBUG'] to expose the
56
+ # chatter.
57
+ Logger.new($stderr, level: Logger::FATAL)
58
+ end
59
+ yield self if block_given?
60
+ end
61
+
62
+ ##
63
+ # @return [Logger]
64
+ attr_accessor :logger
65
+
66
+ ##
67
+ # @!group AWS S3 Configuration
68
+ #
69
+ # Various AWS items for {StorageLocations::S3Location}. These can be set from the ENV or the configuration block
70
+ #
71
+ # @note
72
+ #
73
+ # The order we use is:
74
+ # * `config.aws_s3_<variable_name> = value`
75
+ # * `AWS_S3_<variable_name>`
76
+ # * `AWS_<variable_name>`
77
+ # * `AWS_DEFAULT_<variable_name>`
78
+ # * default
79
+ #
80
+ # @return [String]
81
+
82
+ aws_config prefix: 's3', name: 'region', default: 'us-east-1'
83
+ aws_config prefix: 's3', name: 'bucket'
84
+ aws_config prefix: 's3', name: 'access_key_id'
85
+ aws_config prefix: 's3', name: 'secret_access_key'
86
+
87
+ aws_config prefix: 'sqs', name: 'region', default: 'us-east-1'
88
+ aws_config prefix: 'sqs', name: 'queue'
89
+ aws_config prefix: 'sqs', name: 'account_id'
90
+ aws_config prefix: 'sqs', name: 'access_key_id'
91
+ aws_config prefix: 'sqs', name: 'secret_access_key'
92
+ aws_config prefix: 'sqs', name: 'max_batch_size', default: 10
93
+ # @!endgroup AWS SQS Configurations
94
+ end
95
+ end
@@ -0,0 +1,56 @@
1
+ # frozen_string_literal: true
2
+ module DerivativeRodeo
3
+ ##
4
+ # A module namespace for establishing the possible errors that the {DerivativeRodeo} could raise.
5
+ # The rodeo could raise other errors, but these are the ones we've named.
6
+ module Errors
7
+ ##
8
+ # That which all DerivativeRodeo errors shall extend!
9
+ class Error < StandardError; end
10
+
11
+ ##
12
+ # Raised when a file uri is passed in that does not contain a storage adapter part before the ://
13
+ class StorageLocationMissing < Error
14
+ def initialize(file_uri: '')
15
+ super("#{file_uri} does not contain an adapter. Should look like file:///my_dir/myfile or s3://bucket_name/location/file_name. The part before the :// is used to select the storage adapter.") # rubocop:disable Layout/LineLength
16
+ end
17
+ end
18
+
19
+ ##
20
+ # Raised when a storage adapter is called for but does not exist in the registered adapter list
21
+ class StorageLocationNotFoundError < Error
22
+ def initialize(location_name: '')
23
+ super("Could not find the adapter #{location_name}. Make sure it is required and registerd properly.")
24
+ end
25
+ end
26
+
27
+ ##
28
+ # Raised when a storage adapter is called for but does not exist in the registered adapter list
29
+ class MaxQueueSize < Error
30
+ def initialize(batch_size:)
31
+ super("Batch size #{batch_size} is larger than the max queue size #{DerivativeRodeo.config.aws_sqs_max_batch_size}")
32
+ end
33
+ end
34
+
35
+ ##
36
+ # Raised when AWS bucket does not exist or is not accessible by current permissions
37
+ class BucketMissingError < Error
38
+ def initialize
39
+ super("Bucket part missing #{file_uri}")
40
+ end
41
+ end
42
+
43
+ ##
44
+ # Raised when trying to write a tmp file that does not exist
45
+ class FileMissingError < Error
46
+ end
47
+
48
+ ##
49
+ # Raised because the Generator class must declare an extension for the output file extension
50
+ class ExtensionMissingError < Error
51
+ def initialize(klass: '')
52
+ super("Extension must be declared in the Generator class #{klass}")
53
+ end
54
+ end
55
+ end
56
+ end