derivative-rodeo 0.2.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +7 -0
- data/Gemfile +6 -0
- data/LICENSE +15 -0
- data/README.md +251 -0
- data/Rakefile +42 -0
- data/derivative_rodeo.gemspec +54 -0
- data/lib/derivative/rodeo.rb +3 -0
- data/lib/derivative-rodeo.rb +3 -0
- data/lib/derivative_rodeo/configuration.rb +95 -0
- data/lib/derivative_rodeo/errors.rb +56 -0
- data/lib/derivative_rodeo/generators/base_generator.rb +200 -0
- data/lib/derivative_rodeo/generators/concerns/copy_file_concern.rb +28 -0
- data/lib/derivative_rodeo/generators/copy_generator.rb +14 -0
- data/lib/derivative_rodeo/generators/hocr_generator.rb +112 -0
- data/lib/derivative_rodeo/generators/monochrome_generator.rb +39 -0
- data/lib/derivative_rodeo/generators/pdf_split_generator.rb +61 -0
- data/lib/derivative_rodeo/generators/thumbnail_generator.rb +38 -0
- data/lib/derivative_rodeo/generators/word_coordinates_generator.rb +39 -0
- data/lib/derivative_rodeo/services/base_service.rb +15 -0
- data/lib/derivative_rodeo/services/convert_uri_via_template_service.rb +87 -0
- data/lib/derivative_rodeo/services/extract_word_coordinates_from_hocr_sgml_service.rb +218 -0
- data/lib/derivative_rodeo/services/image_identify_service.rb +89 -0
- data/lib/derivative_rodeo/services/image_jp2_service.rb +112 -0
- data/lib/derivative_rodeo/services/image_service.rb +73 -0
- data/lib/derivative_rodeo/services/pdf_splitter/base.rb +177 -0
- data/lib/derivative_rodeo/services/pdf_splitter/jpg_page.rb +14 -0
- data/lib/derivative_rodeo/services/pdf_splitter/pages_summary.rb +130 -0
- data/lib/derivative_rodeo/services/pdf_splitter/png_page.rb +26 -0
- data/lib/derivative_rodeo/services/pdf_splitter/tiff_page.rb +52 -0
- data/lib/derivative_rodeo/services/pdf_splitter_service.rb +19 -0
- data/lib/derivative_rodeo/services/url_service.rb +42 -0
- data/lib/derivative_rodeo/storage_locations/base_location.rb +251 -0
- data/lib/derivative_rodeo/storage_locations/concerns/download_concern.rb +67 -0
- data/lib/derivative_rodeo/storage_locations/file_location.rb +39 -0
- data/lib/derivative_rodeo/storage_locations/http_location.rb +13 -0
- data/lib/derivative_rodeo/storage_locations/https_location.rb +13 -0
- data/lib/derivative_rodeo/storage_locations/s3_location.rb +103 -0
- data/lib/derivative_rodeo/storage_locations/sqs_location.rb +187 -0
- data/lib/derivative_rodeo/technical_metadata.rb +23 -0
- data/lib/derivative_rodeo/version.rb +5 -0
- data/lib/derivative_rodeo.rb +36 -0
- metadata +339 -0
checksums.yaml
ADDED
@@ -0,0 +1,7 @@
|
|
1
|
+
---
|
2
|
+
SHA256:
|
3
|
+
metadata.gz: 20e512d7162170875d60f90ff48bba694237ac7d31f38d806d2b87f570536c1c
|
4
|
+
data.tar.gz: a1311ea39a3994b4d24ffdbdbade62b7fbb15d2c326a3b510607f315fb4dd865
|
5
|
+
SHA512:
|
6
|
+
metadata.gz: 157a9e276c6cefe739137fbe17e783557d0317dcee531cd353ad86987bda33ad55c2ada8179e254b306a467d01f8f759a6e89fc91b8cbcf6e968cf5a28a9037b
|
7
|
+
data.tar.gz: 5bd45db467194cf1e8af7f7e1ed625c2b3898d011f20a581a9a55a2ccbb7be56ca8c276752b7bef41e2c1d5efa4737d1291f83211dd69cd18e4f7caeed25fef2
|
data/Gemfile
ADDED
data/LICENSE
ADDED
@@ -0,0 +1,15 @@
|
|
1
|
+
Copyright 2023 Software Services by Scientist.com
|
2
|
+
|
3
|
+
Additional copyright may be held by others, as reflected in the commit history.
|
4
|
+
|
5
|
+
Licensed under the Apache License, Version 2.0 (the "License");
|
6
|
+
you may not use this file except in compliance with the License.
|
7
|
+
You may obtain a copy of the License at
|
8
|
+
|
9
|
+
http://www.apache.org/licenses/LICENSE-2.0
|
10
|
+
|
11
|
+
Unless required by applicable law or agreed to in writing, software
|
12
|
+
distributed under the License is distributed on an "AS IS" BASIS,
|
13
|
+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
14
|
+
See the License for the specific language governing permissions and
|
15
|
+
limitations under the License.
|
data/README.md
ADDED
@@ -0,0 +1,251 @@
|
|
1
|
+
<!-- START doctoc generated TOC please keep comment here to allow auto update -->
|
2
|
+
<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
|
3
|
+
**Table of Contents** *generated with [DocToc](https://github.com/thlorenz/doctoc)*
|
4
|
+
|
5
|
+
- [DerivativeRodeo](#derivativerodeo)
|
6
|
+
- [Process Life Cycle](#process-life-cycle)
|
7
|
+
- [Concepts](#concepts)
|
8
|
+
- [Common Storage](#common-storage)
|
9
|
+
- [Related Files](#related-files)
|
10
|
+
- [Sequence Diagram](#sequence-diagram)
|
11
|
+
- [Installation](#installation)
|
12
|
+
- [Usage](#usage)
|
13
|
+
- [Technical Overview of the DerivativeRodeo](#technical-overview-of-the-derivativerodeo)
|
14
|
+
- [Generators](#generators)
|
15
|
+
- [Interface(s)](#interfaces)
|
16
|
+
- [Supported Generators](#supported-generators)
|
17
|
+
- [Registered Generators](#registered-generators)
|
18
|
+
- [Storage Locations](#storage-locations)
|
19
|
+
- [Supported Storage Locations](#supported-storage-locations)
|
20
|
+
- [Development](#development)
|
21
|
+
- [Logging in Test Environment](#logging-in-test-environment)
|
22
|
+
- [Contributing](#contributing)
|
23
|
+
|
24
|
+
<!-- END doctoc generated TOC please keep comment here to allow auto update -->
|
25
|
+
|
26
|
+
# DerivativeRodeo
|
27
|
+
|
28
|
+
“This ain’t my first rodeo.” (an idiomatic American slang for “I’m prepared for what comes next.”)
|
29
|
+
|
30
|
+
The `DerivativeRodeo` "moves" files from one storage location (e.g. *input*) to one or more storage locations (e.g. *output*) via a generator.
|
31
|
+
|
32
|
+
- [Storage Location](./lib/derivative_rodeo/storage_locations/base_location.rb) :: where we can expect to find a file.
|
33
|
+
- [Generator](./lib/derivative_rodeo/generators/base_generator.rb) :: a process to transform a file into another file.
|
34
|
+
|
35
|
+
## Process Life Cycle
|
36
|
+
|
37
|
+
In the case of a *input* storage location, we expect that the underlying file pointed at by the *input* storage location exists. After all we can't move what we don't have.
|
38
|
+
|
39
|
+
In the case of a *output* storage location, we expect that the underlying file will exist after the generator has completed. The *output* storage location *could* already exist or we might need to generate the file for the *output* location.
|
40
|
+
|
41
|
+
During the generator's process, we need to have a working copy of both the *input* and *output* file. This is done by creating a temporary file.
|
42
|
+
|
43
|
+
In the case of the *input*, the creation of that temporary file involves getting the file from the *input* storage location. In the case of the *output*, we create a temporary file that the *output* storage location then knows how to move to the resulting place.
|
44
|
+
|
45
|
+
![Storage Lifecycle](./artifacts/derivative_rodeo-generator_storage_lifecycle.png)
|
46
|
+
|
47
|
+
The above Storage Lifecycle diagram is as follows: `input location` to `input tmp file` to `generator` to `output tmp file` to `output location`.
|
48
|
+
|
49
|
+
*Note:* We've designed and implemented the data life cycle to automatically clean-up the temporary files as the generator completes. In this way we can use the smallest working space possible. A design decision that helps run `DerivativeRodeo` within distributed clusters (e.g. AWS Serverless).
|
50
|
+
|
51
|
+
## Concepts
|
52
|
+
|
53
|
+
![Overview](./artifacts/derivative_rodeo-overview.png)
|
54
|
+
|
55
|
+
<details>
|
56
|
+
<summary>The PlantUML Text for the Overview Diagram</summary>
|
57
|
+
|
58
|
+
```plantuml
|
59
|
+
@startuml
|
60
|
+
!theme amiga
|
61
|
+
|
62
|
+
cloud "Source 1" as S1
|
63
|
+
cloud "Source 2" as S2
|
64
|
+
cloud "Source 3" as S3
|
65
|
+
|
66
|
+
storage "IMAGEs" as IMAGEs
|
67
|
+
storage "HOCRs" as HOCRs
|
68
|
+
storage "TXTs" as TXTs
|
69
|
+
|
70
|
+
control Preprocess as G1
|
71
|
+
|
72
|
+
S1 -down-> G1
|
73
|
+
S2 -down-> G1
|
74
|
+
S3 -down-> G1
|
75
|
+
|
76
|
+
G1 -down-> IMAGEs
|
77
|
+
G1 -down-> HOCRs
|
78
|
+
G1 -down-> TXTs
|
79
|
+
|
80
|
+
control Import as I1
|
81
|
+
|
82
|
+
IMAGEs -down-> I1
|
83
|
+
HOCRs -down-> I1
|
84
|
+
TXTs -down-> I1
|
85
|
+
|
86
|
+
package FileSet as FileSet1 {
|
87
|
+
file Image1
|
88
|
+
file Hocr1
|
89
|
+
file Txt1
|
90
|
+
}
|
91
|
+
package FileSet as FileSet2 {
|
92
|
+
file Image2
|
93
|
+
file Hocr2
|
94
|
+
file Txt2
|
95
|
+
}
|
96
|
+
|
97
|
+
I1 -down-> FileSet1
|
98
|
+
I1 -down-> FileSet2
|
99
|
+
|
100
|
+
@enduml
|
101
|
+
|
102
|
+
```
|
103
|
+
|
104
|
+
</details>
|
105
|
+
|
106
|
+
### Common Storage
|
107
|
+
|
108
|
+
In this case, <dfn>common storage</dfn> could mean the storage where we're writing all pre-processing of files. Or it could mean the storage where we're writing for application access (e.g. [Fedora Commons](https://fedora.lyrasis.org) for a [Hyrax](https://github.com/samvera/hyrax) application).
|
109
|
+
|
110
|
+
In other words, the `DerivativeRodeo` is part of moving files from one location to another, and ensuring that at each step we have all of the expected files we want.
|
111
|
+
|
112
|
+
### Related Files
|
113
|
+
|
114
|
+
This is not strictly related to <dfn>Hyrax's FileSet</dfn>, that is a set of files in which one is considered the original and all others are _derivatives_ of the original.
|
115
|
+
|
116
|
+
However it is helpful to think in those terms; files that have a significant relation to each other; one derived from the other. For example an original PDF and it's extracted text would be two significantly related files.
|
117
|
+
|
118
|
+
### Sequence Diagram
|
119
|
+
|
120
|
+
![Sequence Diagram](./artifacts/derivative_rodeo-sequence-diagram.png)
|
121
|
+
|
122
|
+
<details>
|
123
|
+
<summary>The PlantUML Text for the Sequence Diagram</summary>
|
124
|
+
|
125
|
+
```plantuml
|
126
|
+
@startuml
|
127
|
+
!theme amiga
|
128
|
+
|
129
|
+
actor Instigator
|
130
|
+
database S3
|
131
|
+
control AWS
|
132
|
+
queue SQS
|
133
|
+
control SpaceStone
|
134
|
+
control DerivativeRodeo
|
135
|
+
collections From
|
136
|
+
collections To
|
137
|
+
Instigator -> S3 : "Upload bucket\nof files associated\n with FileSet"
|
138
|
+
S3 -> AWS : "AWS enqueues\nthe bucket"
|
139
|
+
AWS -> SQS : "AWS adds to SQS"
|
140
|
+
SQS -> SpaceStone : "SQS invokes\nSpaceStone method"
|
141
|
+
SpaceStone -> DerivativeRodeo : "SpaceStone calls\n DerivativeRodeo"
|
142
|
+
DerivativeRodeo --> S3 : "Request file for\ntemporary processing"
|
143
|
+
S3 --> From : "Write requested\n file to\ntemporary storage"
|
144
|
+
DerivativeRodeo <-- From
|
145
|
+
DerivativeRodeo -> To : "Generate derivative\n writing to local\n processing storage."
|
146
|
+
To --> S3 : "Write file\n to S3 Bucket"
|
147
|
+
DerivativeRodeo <-- To : "Return to DerivativeRodeo\n with generated URIs"
|
148
|
+
SpaceStone <- DerivativeRodeo : "Return generated\n URIs"
|
149
|
+
SpaceStone -> SQS : "Optionally enqueue\nfurther work"
|
150
|
+
@enduml
|
151
|
+
```
|
152
|
+
</details>
|
153
|
+
|
154
|
+
Given a single original file in a previous home, we are copying that original file (and derivatives) to various locations:
|
155
|
+
|
156
|
+
- From previous home to S3.
|
157
|
+
- From S3 to local temporary storage (for processing).
|
158
|
+
- Create a derivative temporary file based on existing file.
|
159
|
+
- Copying derivative temporary file to S3.
|
160
|
+
|
161
|
+
## Installation
|
162
|
+
|
163
|
+
Add this line to your application's Gemfile:
|
164
|
+
|
165
|
+
```ruby
|
166
|
+
gem 'derivative-rodeo'
|
167
|
+
```
|
168
|
+
|
169
|
+
(Due to historical reasons the gem name is `derivative-rodeo` even though the repository is `derivative_rodeo`. The following "require" methods will work:
|
170
|
+
|
171
|
+
- `require 'derivative_rodeo'`
|
172
|
+
- `require 'derivative-rodeo'`
|
173
|
+
- `require 'derivative/rodeo'`
|
174
|
+
|
175
|
+
And then execute: `$ bundle install`
|
176
|
+
|
177
|
+
Be aware that you need `pdfinfo` command line tool installed for this gem to run specs or when using PDF functionality.
|
178
|
+
|
179
|
+
## Usage
|
180
|
+
|
181
|
+
TODO
|
182
|
+
|
183
|
+
## Technical Overview of the DerivativeRodeo
|
184
|
+
|
185
|
+
### Generators
|
186
|
+
|
187
|
+
Generators are responsible for ensuring that we have the file associated with the generator. For example, the [HocrGenerator](./lib/derivative_rodeo/generators/hocr_generator.rb) is responsible for ensuring that we have the `.hocr` file in the expected desired storage location.
|
188
|
+
|
189
|
+
#### Interface(s)
|
190
|
+
|
191
|
+
Generators must have an initializer and build command:
|
192
|
+
|
193
|
+
- `.new(array_of_file_urls, output_location_template, preprocessed_location_template)`
|
194
|
+
- `#generated_files` (executes the generators actions) and returns array of files
|
195
|
+
- `#generated_uris` (executes the generators actions) and returns array of output uris
|
196
|
+
|
197
|
+
#### Supported Generators
|
198
|
+
|
199
|
+
Below is the current list of generators.
|
200
|
+
|
201
|
+
- [HocrGenerator](./lib/derivative_rodeo/generators/hocr_generator.rb) :: generated tesseract files from images, also creates monocrhome files as a prestep
|
202
|
+
- [MonochromeGenerator](./lib/derivative_rodeo/generators/monochrome_generator.rb) :: converts images to monochrome
|
203
|
+
- [CopyGenerator](./lib/derivative_rodeo/generators/copy_generator.rb) :: sends a set of uris to another location. For example from <abbr title="Simple Storage Service">S3</abbr> to <abbr title="Simple Queue Service">SQS</abbr> or from filesystem to S3.
|
204
|
+
- [PdfSplitGenerator](./lib/derivative_rodeo/generators/pdf_split_generator.rb) :: split a PDF into one image per page
|
205
|
+
- [WordCoordinatesGenerator](./lib/derivative_rodeo/generators/word_coordinates_generator.rb) :: create a JSON file representing the words and coordinates (derived from the `.hocr` file).
|
206
|
+
|
207
|
+
#### Registered Generators
|
208
|
+
|
209
|
+
TODO: We want to expose a list of registered generators
|
210
|
+
|
211
|
+
### Storage Locations
|
212
|
+
|
213
|
+
Storage locations are where we put things. Each location has a specific implementation but is expected to inherit from the [DerivativeRodeo::StorageLocation::BaseLocation](./lib/derivative_rodeo/storage_adapters/base_adapter.rb).
|
214
|
+
|
215
|
+
`DerivativeRodeo::StorageLocation::BaseLocation.locations` method tracks the registered locations.
|
216
|
+
|
217
|
+
The location represents where the file *should* be.
|
218
|
+
|
219
|
+
#### Supported Storage Locations
|
220
|
+
|
221
|
+
Storage locations follow a [URI pattern](https://en.wikipedia.org/wiki/Uniform_Resource_Identifier#Example_URIs)
|
222
|
+
|
223
|
+
- `file://` :: “local” file system storage
|
224
|
+
- `s3://` :: <abbr title="Amazon Web Service">AWS</abbr>’s <abbr title="Simple Storage Service">S3</abbr> storage system
|
225
|
+
- `sqs://` :: <abbr title="Amazon Web Service">AWS</abbr>’s <abbr title="Simple Queue Service">SQS</abbr>
|
226
|
+
|
227
|
+
## Development
|
228
|
+
|
229
|
+
- Checkout the repository: `git clone https://github.com/scientist-softserv/derivative_rodeo`
|
230
|
+
- Install dependencies: `cd derivative_rodeo; bundle install`
|
231
|
+
- Install git hooks: `rake install_hooks`
|
232
|
+
- Install binaries:
|
233
|
+
- `pdfinfo`: provided by poppler (e.g. `brew install poppler`)
|
234
|
+
- GhostScript (e.g. `gs`): run `brew install gs`
|
235
|
+
|
236
|
+
Then go about writing your code and documentation.
|
237
|
+
|
238
|
+
The git hooks call `rake default` which will:
|
239
|
+
|
240
|
+
- Amend the table of contents of this file
|
241
|
+
- Run `rubocop`
|
242
|
+
- Validate yard documentation (see http://rubydoc.info/gems/yard/file/docs/Tags.md#List_of_Available_Tags for help correcting warnings)
|
243
|
+
- Run `rspec` with `simplecov`
|
244
|
+
|
245
|
+
### Logging in Test Environment
|
246
|
+
|
247
|
+
Throughout the `DerivativeRodeo` we log some activity. In the typical test run, the logs are overly chatty. If you want the more chatty logs run the following: `DEBUG=t rspec`.
|
248
|
+
|
249
|
+
## Contributing
|
250
|
+
|
251
|
+
Bug reports and pull requests are welcome on GitHub at https://github.com/scientist-softserv/derivative_rodeo.
|
data/Rakefile
ADDED
@@ -0,0 +1,42 @@
|
|
1
|
+
# frozen_string_literal: true
|
2
|
+
|
3
|
+
require 'bundler/gem_tasks'
|
4
|
+
require 'rspec/core/rake_task'
|
5
|
+
require 'rubocop/rake_task'
|
6
|
+
|
7
|
+
desc 'Run style checker'
|
8
|
+
RuboCop::RakeTask.new(:rubocop) do |task|
|
9
|
+
task.fail_on_error = true
|
10
|
+
end
|
11
|
+
|
12
|
+
desc 'Install commit hooks to ensure better practices'
|
13
|
+
task :install_hooks do
|
14
|
+
require 'fileutils'
|
15
|
+
Dir.glob('./git-hooks/*').each do |hook|
|
16
|
+
next if File.file?("./.git/hooks/#{File.basename(hook)}")
|
17
|
+
|
18
|
+
puts "Installing #{File.basename(hook)} git hook"
|
19
|
+
FileUtils.cp(hook, './.git/hooks/')
|
20
|
+
end
|
21
|
+
end
|
22
|
+
|
23
|
+
require 'yard'
|
24
|
+
YARD::Rake::YardocTask.new do |t|
|
25
|
+
t.options = ['--fail-on-warning']
|
26
|
+
end
|
27
|
+
|
28
|
+
RSpec::Core::RakeTask.new(:spec)
|
29
|
+
|
30
|
+
desc 'Generate table of contents for README.md'
|
31
|
+
task :doctoc do
|
32
|
+
if `which doctoc`.strip.empty?
|
33
|
+
$stdout.puts 'Skipping doctoc generation; install via "npm install -g doctoc"'
|
34
|
+
else
|
35
|
+
$stdout.puts 'Generating table of contents for README.md'
|
36
|
+
`doctoc README.md`
|
37
|
+
end
|
38
|
+
end
|
39
|
+
|
40
|
+
task ci: %i[doctoc rubocop yard spec]
|
41
|
+
|
42
|
+
task default: %i[ci]
|
@@ -0,0 +1,54 @@
|
|
1
|
+
# frozen_string_literal: true
|
2
|
+
|
3
|
+
require_relative 'lib/derivative_rodeo/version'
|
4
|
+
|
5
|
+
Gem::Specification.new do |spec|
|
6
|
+
# Renaming to reflect that we previously registered 'derivative-rodeo' and Rubygems guards against
|
7
|
+
# names that are close in resemblence.
|
8
|
+
spec.name = 'derivative-rodeo'
|
9
|
+
spec.version = DerivativeRodeo::VERSION
|
10
|
+
spec.authors = ['Rob Kaufman', 'Jeremy Friesen']
|
11
|
+
spec.email = ['rob@notch8.com', 'jeremy.n.friesen@gmail.com']
|
12
|
+
|
13
|
+
spec.summary = 'An ETL Ecosystem for Derivative Processing.'
|
14
|
+
spec.description = spec.summary
|
15
|
+
spec.homepage = 'https://github.com/scientist-softserv/derivative_rodeo'
|
16
|
+
spec.required_ruby_version = '>= 2.7.0'
|
17
|
+
spec.licenses = ['APACHE-2.0']
|
18
|
+
|
19
|
+
spec.metadata['homepage_uri'] = spec.homepage
|
20
|
+
spec.metadata['source_code_uri'] = spec.homepage
|
21
|
+
|
22
|
+
# Specify which files should be added to the gem when it is released.
|
23
|
+
# The `git ls-files -z` loads the files in the RubyGem that have been added into git.
|
24
|
+
# spec.files = Dir.chdir(__dir__) do
|
25
|
+
# `git ls-files -z`.split("\x0").reject do |f|
|
26
|
+
# (f == __FILE__) || f.match(%r{\A(?:(?:bin|test|spec|features)/|\.(?:git|travis|circleci)|appveyor)})
|
27
|
+
# end
|
28
|
+
# end
|
29
|
+
spec.files = Dir['lib/**/*'].keep_if { |file| File.file?(file) } + %w[Gemfile LICENSE README.md Rakefile derivative_rodeo.gemspec]
|
30
|
+
spec.bindir = 'exe'
|
31
|
+
spec.executables = spec.files.grep(%r{\Aexe/}) { |f| File.basename(f) }
|
32
|
+
spec.require_paths = ['lib']
|
33
|
+
|
34
|
+
spec.add_dependency 'activesupport', '>= 5'
|
35
|
+
spec.add_dependency 'aws-sdk-s3'
|
36
|
+
spec.add_dependency 'aws-sdk-sqs'
|
37
|
+
spec.add_dependency 'httparty'
|
38
|
+
spec.add_dependency 'marcel'
|
39
|
+
spec.add_dependency 'mime-types'
|
40
|
+
spec.add_dependency 'mini_magick'
|
41
|
+
spec.add_dependency 'nokogiri'
|
42
|
+
|
43
|
+
spec.add_development_dependency 'bixby'
|
44
|
+
spec.add_development_dependency 'byebug'
|
45
|
+
# spec.add_development_dependency 'hydra-file_characterization'
|
46
|
+
spec.add_development_dependency 'rspec', '~> 3.0'
|
47
|
+
spec.add_development_dependency 'rake', '~> 13.0'
|
48
|
+
spec.add_development_dependency 'simplecov'
|
49
|
+
spec.add_development_dependency 'yard-activerecord'
|
50
|
+
spec.add_development_dependency 'rspec-its'
|
51
|
+
spec.add_development_dependency 'shoulda-matchers'
|
52
|
+
spec.add_development_dependency 'solargraph'
|
53
|
+
spec.add_development_dependency 'yard'
|
54
|
+
end
|
@@ -0,0 +1,95 @@
|
|
1
|
+
# frozen_string_literal: true
|
2
|
+
|
3
|
+
require 'mime/types'
|
4
|
+
require 'logger'
|
5
|
+
module DerivativeRodeo
|
6
|
+
##
|
7
|
+
# @api public
|
8
|
+
#
|
9
|
+
# This class is responsible for the consistent configuration of the "application" that leverages
|
10
|
+
# the {DerivativeRodeo}.
|
11
|
+
#
|
12
|
+
# This configuration helps set defaults for storage locations and generators.
|
13
|
+
class Configuration
|
14
|
+
##
|
15
|
+
# Allows AWS configuration to be set via environment variables by declairing them in the configuration
|
16
|
+
# class as follows:
|
17
|
+
#
|
18
|
+
# @example
|
19
|
+
#
|
20
|
+
# aws_config prefix: 's3', name: 'region', default: 'us-east-1'
|
21
|
+
#
|
22
|
+
# @param prefix [String]
|
23
|
+
# @param name [String]
|
24
|
+
# @param default [String] (optional)
|
25
|
+
def self.aws_config(prefix:, name:, default: nil)
|
26
|
+
aws_config_getter(prefix: prefix, name: name, default: default)
|
27
|
+
aws_config_setter(prefix: prefix, name: name)
|
28
|
+
end
|
29
|
+
|
30
|
+
def self.aws_config_getter(prefix:, name:, default: nil)
|
31
|
+
define_method "aws_#{prefix}_#{name}" do
|
32
|
+
val = instance_variable_get("@aws_#{prefix}_#{name}")
|
33
|
+
return val if val
|
34
|
+
|
35
|
+
val = ENV["AWS_#{prefix.upcase}_#{name.upcase}"] ||
|
36
|
+
ENV["AWS__#{name.upcase}"] ||
|
37
|
+
ENV["AWS_DEFAULT_#{name.upcase}"] ||
|
38
|
+
default
|
39
|
+
instance_variable_set("@aws_#{prefix}_#{name}", val)
|
40
|
+
end
|
41
|
+
end
|
42
|
+
private_class_method :aws_config_getter
|
43
|
+
|
44
|
+
def self.aws_config_setter(prefix:, name:)
|
45
|
+
define_method "aws_#{prefix}_#{name}=" do |val|
|
46
|
+
instance_variable_set("@aws_#{prefix}_#{name}", val)
|
47
|
+
end
|
48
|
+
end
|
49
|
+
private_class_method :aws_config_setter
|
50
|
+
|
51
|
+
def initialize
|
52
|
+
@logger = if defined?(Rails)
|
53
|
+
Rails.logger
|
54
|
+
else
|
55
|
+
# By default, minimize the chatter of the specs. Add ENV['DEBUG'] to expose the
|
56
|
+
# chatter.
|
57
|
+
Logger.new($stderr, level: Logger::FATAL)
|
58
|
+
end
|
59
|
+
yield self if block_given?
|
60
|
+
end
|
61
|
+
|
62
|
+
##
|
63
|
+
# @return [Logger]
|
64
|
+
attr_accessor :logger
|
65
|
+
|
66
|
+
##
|
67
|
+
# @!group AWS S3 Configuration
|
68
|
+
#
|
69
|
+
# Various AWS items for {StorageLocations::S3Location}. These can be set from the ENV or the configuration block
|
70
|
+
#
|
71
|
+
# @note
|
72
|
+
#
|
73
|
+
# The order we use is:
|
74
|
+
# * `config.aws_s3_<variable_name> = value`
|
75
|
+
# * `AWS_S3_<variable_name>`
|
76
|
+
# * `AWS_<variable_name>`
|
77
|
+
# * `AWS_DEFAULT_<variable_name>`
|
78
|
+
# * default
|
79
|
+
#
|
80
|
+
# @return [String]
|
81
|
+
|
82
|
+
aws_config prefix: 's3', name: 'region', default: 'us-east-1'
|
83
|
+
aws_config prefix: 's3', name: 'bucket'
|
84
|
+
aws_config prefix: 's3', name: 'access_key_id'
|
85
|
+
aws_config prefix: 's3', name: 'secret_access_key'
|
86
|
+
|
87
|
+
aws_config prefix: 'sqs', name: 'region', default: 'us-east-1'
|
88
|
+
aws_config prefix: 'sqs', name: 'queue'
|
89
|
+
aws_config prefix: 'sqs', name: 'account_id'
|
90
|
+
aws_config prefix: 'sqs', name: 'access_key_id'
|
91
|
+
aws_config prefix: 'sqs', name: 'secret_access_key'
|
92
|
+
aws_config prefix: 'sqs', name: 'max_batch_size', default: 10
|
93
|
+
# @!endgroup AWS SQS Configurations
|
94
|
+
end
|
95
|
+
end
|
@@ -0,0 +1,56 @@
|
|
1
|
+
# frozen_string_literal: true
|
2
|
+
module DerivativeRodeo
|
3
|
+
##
|
4
|
+
# A module namespace for establishing the possible errors that the {DerivativeRodeo} could raise.
|
5
|
+
# The rodeo could raise other errors, but these are the ones we've named.
|
6
|
+
module Errors
|
7
|
+
##
|
8
|
+
# That which all DerivativeRodeo errors shall extend!
|
9
|
+
class Error < StandardError; end
|
10
|
+
|
11
|
+
##
|
12
|
+
# Raised when a file uri is passed in that does not contain a storage adapter part before the ://
|
13
|
+
class StorageLocationMissing < Error
|
14
|
+
def initialize(file_uri: '')
|
15
|
+
super("#{file_uri} does not contain an adapter. Should look like file:///my_dir/myfile or s3://bucket_name/location/file_name. The part before the :// is used to select the storage adapter.") # rubocop:disable Layout/LineLength
|
16
|
+
end
|
17
|
+
end
|
18
|
+
|
19
|
+
##
|
20
|
+
# Raised when a storage adapter is called for but does not exist in the registered adapter list
|
21
|
+
class StorageLocationNotFoundError < Error
|
22
|
+
def initialize(location_name: '')
|
23
|
+
super("Could not find the adapter #{location_name}. Make sure it is required and registerd properly.")
|
24
|
+
end
|
25
|
+
end
|
26
|
+
|
27
|
+
##
|
28
|
+
# Raised when a storage adapter is called for but does not exist in the registered adapter list
|
29
|
+
class MaxQueueSize < Error
|
30
|
+
def initialize(batch_size:)
|
31
|
+
super("Batch size #{batch_size} is larger than the max queue size #{DerivativeRodeo.config.aws_sqs_max_batch_size}")
|
32
|
+
end
|
33
|
+
end
|
34
|
+
|
35
|
+
##
|
36
|
+
# Raised when AWS bucket does not exist or is not accessible by current permissions
|
37
|
+
class BucketMissingError < Error
|
38
|
+
def initialize
|
39
|
+
super("Bucket part missing #{file_uri}")
|
40
|
+
end
|
41
|
+
end
|
42
|
+
|
43
|
+
##
|
44
|
+
# Raised when trying to write a tmp file that does not exist
|
45
|
+
class FileMissingError < Error
|
46
|
+
end
|
47
|
+
|
48
|
+
##
|
49
|
+
# Raised because the Generator class must declare an extension for the output file extension
|
50
|
+
class ExtensionMissingError < Error
|
51
|
+
def initialize(klass: '')
|
52
|
+
super("Extension must be declared in the Generator class #{klass}")
|
53
|
+
end
|
54
|
+
end
|
55
|
+
end
|
56
|
+
end
|