derivative-rodeo 0.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/Gemfile +6 -0
- data/LICENSE +15 -0
- data/README.md +251 -0
- data/Rakefile +42 -0
- data/derivative_rodeo.gemspec +54 -0
- data/lib/derivative/rodeo.rb +3 -0
- data/lib/derivative-rodeo.rb +3 -0
- data/lib/derivative_rodeo/configuration.rb +95 -0
- data/lib/derivative_rodeo/errors.rb +56 -0
- data/lib/derivative_rodeo/generators/base_generator.rb +200 -0
- data/lib/derivative_rodeo/generators/concerns/copy_file_concern.rb +28 -0
- data/lib/derivative_rodeo/generators/copy_generator.rb +14 -0
- data/lib/derivative_rodeo/generators/hocr_generator.rb +112 -0
- data/lib/derivative_rodeo/generators/monochrome_generator.rb +39 -0
- data/lib/derivative_rodeo/generators/pdf_split_generator.rb +61 -0
- data/lib/derivative_rodeo/generators/thumbnail_generator.rb +38 -0
- data/lib/derivative_rodeo/generators/word_coordinates_generator.rb +39 -0
- data/lib/derivative_rodeo/services/base_service.rb +15 -0
- data/lib/derivative_rodeo/services/convert_uri_via_template_service.rb +87 -0
- data/lib/derivative_rodeo/services/extract_word_coordinates_from_hocr_sgml_service.rb +218 -0
- data/lib/derivative_rodeo/services/image_identify_service.rb +89 -0
- data/lib/derivative_rodeo/services/image_jp2_service.rb +112 -0
- data/lib/derivative_rodeo/services/image_service.rb +73 -0
- data/lib/derivative_rodeo/services/pdf_splitter/base.rb +177 -0
- data/lib/derivative_rodeo/services/pdf_splitter/jpg_page.rb +14 -0
- data/lib/derivative_rodeo/services/pdf_splitter/pages_summary.rb +130 -0
- data/lib/derivative_rodeo/services/pdf_splitter/png_page.rb +26 -0
- data/lib/derivative_rodeo/services/pdf_splitter/tiff_page.rb +52 -0
- data/lib/derivative_rodeo/services/pdf_splitter_service.rb +19 -0
- data/lib/derivative_rodeo/services/url_service.rb +42 -0
- data/lib/derivative_rodeo/storage_locations/base_location.rb +251 -0
- data/lib/derivative_rodeo/storage_locations/concerns/download_concern.rb +67 -0
- data/lib/derivative_rodeo/storage_locations/file_location.rb +39 -0
- data/lib/derivative_rodeo/storage_locations/http_location.rb +13 -0
- data/lib/derivative_rodeo/storage_locations/https_location.rb +13 -0
- data/lib/derivative_rodeo/storage_locations/s3_location.rb +103 -0
- data/lib/derivative_rodeo/storage_locations/sqs_location.rb +187 -0
- data/lib/derivative_rodeo/technical_metadata.rb +23 -0
- data/lib/derivative_rodeo/version.rb +5 -0
- data/lib/derivative_rodeo.rb +36 -0
- metadata +339 -0
checksums.yaml
ADDED
@@ -0,0 +1,7 @@
|
|
1
|
+
---
|
2
|
+
SHA256:
|
3
|
+
metadata.gz: 20e512d7162170875d60f90ff48bba694237ac7d31f38d806d2b87f570536c1c
|
4
|
+
data.tar.gz: a1311ea39a3994b4d24ffdbdbade62b7fbb15d2c326a3b510607f315fb4dd865
|
5
|
+
SHA512:
|
6
|
+
metadata.gz: 157a9e276c6cefe739137fbe17e783557d0317dcee531cd353ad86987bda33ad55c2ada8179e254b306a467d01f8f759a6e89fc91b8cbcf6e968cf5a28a9037b
|
7
|
+
data.tar.gz: 5bd45db467194cf1e8af7f7e1ed625c2b3898d011f20a581a9a55a2ccbb7be56ca8c276752b7bef41e2c1d5efa4737d1291f83211dd69cd18e4f7caeed25fef2
|
data/Gemfile
ADDED
data/LICENSE
ADDED
@@ -0,0 +1,15 @@
|
|
1
|
+
Copyright 2023 Software Services by Scientist.com
|
2
|
+
|
3
|
+
Additional copyright may be held by others, as reflected in the commit history.
|
4
|
+
|
5
|
+
Licensed under the Apache License, Version 2.0 (the "License");
|
6
|
+
you may not use this file except in compliance with the License.
|
7
|
+
You may obtain a copy of the License at
|
8
|
+
|
9
|
+
http://www.apache.org/licenses/LICENSE-2.0
|
10
|
+
|
11
|
+
Unless required by applicable law or agreed to in writing, software
|
12
|
+
distributed under the License is distributed on an "AS IS" BASIS,
|
13
|
+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
14
|
+
See the License for the specific language governing permissions and
|
15
|
+
limitations under the License.
|
data/README.md
ADDED
@@ -0,0 +1,251 @@
|
|
1
|
+
<!-- START doctoc generated TOC please keep comment here to allow auto update -->
|
2
|
+
<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
|
3
|
+
**Table of Contents** *generated with [DocToc](https://github.com/thlorenz/doctoc)*
|
4
|
+
|
5
|
+
- [DerivativeRodeo](#derivativerodeo)
|
6
|
+
- [Process Life Cycle](#process-life-cycle)
|
7
|
+
- [Concepts](#concepts)
|
8
|
+
- [Common Storage](#common-storage)
|
9
|
+
- [Related Files](#related-files)
|
10
|
+
- [Sequence Diagram](#sequence-diagram)
|
11
|
+
- [Installation](#installation)
|
12
|
+
- [Usage](#usage)
|
13
|
+
- [Technical Overview of the DerivativeRodeo](#technical-overview-of-the-derivativerodeo)
|
14
|
+
- [Generators](#generators)
|
15
|
+
- [Interface(s)](#interfaces)
|
16
|
+
- [Supported Generators](#supported-generators)
|
17
|
+
- [Registered Generators](#registered-generators)
|
18
|
+
- [Storage Locations](#storage-locations)
|
19
|
+
- [Supported Storage Locations](#supported-storage-locations)
|
20
|
+
- [Development](#development)
|
21
|
+
- [Logging in Test Environment](#logging-in-test-environment)
|
22
|
+
- [Contributing](#contributing)
|
23
|
+
|
24
|
+
<!-- END doctoc generated TOC please keep comment here to allow auto update -->
|
25
|
+
|
26
|
+
# DerivativeRodeo
|
27
|
+
|
28
|
+
“This ain’t my first rodeo.” (an idiomatic American slang for “I’m prepared for what comes next.”)
|
29
|
+
|
30
|
+
The `DerivativeRodeo` "moves" files from one storage location (e.g. *input*) to one or more storage locations (e.g. *output*) via a generator.
|
31
|
+
|
32
|
+
- [Storage Location](./lib/derivative_rodeo/storage_locations/base_location.rb) :: where we can expect to find a file.
|
33
|
+
- [Generator](./lib/derivative_rodeo/generators/base_generator.rb) :: a process to transform a file into another file.
|
34
|
+
|
35
|
+
## Process Life Cycle
|
36
|
+
|
37
|
+
In the case of a *input* storage location, we expect that the underlying file pointed at by the *input* storage location exists. After all we can't move what we don't have.
|
38
|
+
|
39
|
+
In the case of a *output* storage location, we expect that the underlying file will exist after the generator has completed. The *output* storage location *could* already exist or we might need to generate the file for the *output* location.
|
40
|
+
|
41
|
+
During the generator's process, we need to have a working copy of both the *input* and *output* file. This is done by creating a temporary file.
|
42
|
+
|
43
|
+
In the case of the *input*, the creation of that temporary file involves getting the file from the *input* storage location. In the case of the *output*, we create a temporary file that the *output* storage location then knows how to move to the resulting place.
|
44
|
+
|
45
|
+

|
46
|
+
|
47
|
+
The above Storage Lifecycle diagram is as follows: `input location` to `input tmp file` to `generator` to `output tmp file` to `output location`.
|
48
|
+
|
49
|
+
*Note:* We've designed and implemented the data life cycle to automatically clean-up the temporary files as the generator completes. In this way we can use the smallest working space possible. A design decision that helps run `DerivativeRodeo` within distributed clusters (e.g. AWS Serverless).
|
50
|
+
|
51
|
+
## Concepts
|
52
|
+
|
53
|
+

|
54
|
+
|
55
|
+
<details>
|
56
|
+
<summary>The PlantUML Text for the Overview Diagram</summary>
|
57
|
+
|
58
|
+
```plantuml
|
59
|
+
@startuml
|
60
|
+
!theme amiga
|
61
|
+
|
62
|
+
cloud "Source 1" as S1
|
63
|
+
cloud "Source 2" as S2
|
64
|
+
cloud "Source 3" as S3
|
65
|
+
|
66
|
+
storage "IMAGEs" as IMAGEs
|
67
|
+
storage "HOCRs" as HOCRs
|
68
|
+
storage "TXTs" as TXTs
|
69
|
+
|
70
|
+
control Preprocess as G1
|
71
|
+
|
72
|
+
S1 -down-> G1
|
73
|
+
S2 -down-> G1
|
74
|
+
S3 -down-> G1
|
75
|
+
|
76
|
+
G1 -down-> IMAGEs
|
77
|
+
G1 -down-> HOCRs
|
78
|
+
G1 -down-> TXTs
|
79
|
+
|
80
|
+
control Import as I1
|
81
|
+
|
82
|
+
IMAGEs -down-> I1
|
83
|
+
HOCRs -down-> I1
|
84
|
+
TXTs -down-> I1
|
85
|
+
|
86
|
+
package FileSet as FileSet1 {
|
87
|
+
file Image1
|
88
|
+
file Hocr1
|
89
|
+
file Txt1
|
90
|
+
}
|
91
|
+
package FileSet as FileSet2 {
|
92
|
+
file Image2
|
93
|
+
file Hocr2
|
94
|
+
file Txt2
|
95
|
+
}
|
96
|
+
|
97
|
+
I1 -down-> FileSet1
|
98
|
+
I1 -down-> FileSet2
|
99
|
+
|
100
|
+
@enduml
|
101
|
+
|
102
|
+
```
|
103
|
+
|
104
|
+
</details>
|
105
|
+
|
106
|
+
### Common Storage
|
107
|
+
|
108
|
+
In this case, <dfn>common storage</dfn> could mean the storage where we're writing all pre-processing of files. Or it could mean the storage where we're writing for application access (e.g. [Fedora Commons](https://fedora.lyrasis.org) for a [Hyrax](https://github.com/samvera/hyrax) application).
|
109
|
+
|
110
|
+
In other words, the `DerivativeRodeo` is part of moving files from one location to another, and ensuring that at each step we have all of the expected files we want.
|
111
|
+
|
112
|
+
### Related Files
|
113
|
+
|
114
|
+
This is not strictly related to <dfn>Hyrax's FileSet</dfn>, that is a set of files in which one is considered the original and all others are _derivatives_ of the original.
|
115
|
+
|
116
|
+
However it is helpful to think in those terms; files that have a significant relation to each other; one derived from the other. For example an original PDF and it's extracted text would be two significantly related files.
|
117
|
+
|
118
|
+
### Sequence Diagram
|
119
|
+
|
120
|
+

|
121
|
+
|
122
|
+
<details>
|
123
|
+
<summary>The PlantUML Text for the Sequence Diagram</summary>
|
124
|
+
|
125
|
+
```plantuml
|
126
|
+
@startuml
|
127
|
+
!theme amiga
|
128
|
+
|
129
|
+
actor Instigator
|
130
|
+
database S3
|
131
|
+
control AWS
|
132
|
+
queue SQS
|
133
|
+
control SpaceStone
|
134
|
+
control DerivativeRodeo
|
135
|
+
collections From
|
136
|
+
collections To
|
137
|
+
Instigator -> S3 : "Upload bucket\nof files associated\n with FileSet"
|
138
|
+
S3 -> AWS : "AWS enqueues\nthe bucket"
|
139
|
+
AWS -> SQS : "AWS adds to SQS"
|
140
|
+
SQS -> SpaceStone : "SQS invokes\nSpaceStone method"
|
141
|
+
SpaceStone -> DerivativeRodeo : "SpaceStone calls\n DerivativeRodeo"
|
142
|
+
DerivativeRodeo --> S3 : "Request file for\ntemporary processing"
|
143
|
+
S3 --> From : "Write requested\n file to\ntemporary storage"
|
144
|
+
DerivativeRodeo <-- From
|
145
|
+
DerivativeRodeo -> To : "Generate derivative\n writing to local\n processing storage."
|
146
|
+
To --> S3 : "Write file\n to S3 Bucket"
|
147
|
+
DerivativeRodeo <-- To : "Return to DerivativeRodeo\n with generated URIs"
|
148
|
+
SpaceStone <- DerivativeRodeo : "Return generated\n URIs"
|
149
|
+
SpaceStone -> SQS : "Optionally enqueue\nfurther work"
|
150
|
+
@enduml
|
151
|
+
```
|
152
|
+
</details>
|
153
|
+
|
154
|
+
Given a single original file in a previous home, we are copying that original file (and derivatives) to various locations:
|
155
|
+
|
156
|
+
- From previous home to S3.
|
157
|
+
- From S3 to local temporary storage (for processing).
|
158
|
+
- Create a derivative temporary file based on existing file.
|
159
|
+
- Copying derivative temporary file to S3.
|
160
|
+
|
161
|
+
## Installation
|
162
|
+
|
163
|
+
Add this line to your application's Gemfile:
|
164
|
+
|
165
|
+
```ruby
|
166
|
+
gem 'derivative-rodeo'
|
167
|
+
```
|
168
|
+
|
169
|
+
(Due to historical reasons the gem name is `derivative-rodeo` even though the repository is `derivative_rodeo`. The following "require" methods will work:
|
170
|
+
|
171
|
+
- `require 'derivative_rodeo'`
|
172
|
+
- `require 'derivative-rodeo'`
|
173
|
+
- `require 'derivative/rodeo'`
|
174
|
+
|
175
|
+
And then execute: `$ bundle install`
|
176
|
+
|
177
|
+
Be aware that you need `pdfinfo` command line tool installed for this gem to run specs or when using PDF functionality.
|
178
|
+
|
179
|
+
## Usage
|
180
|
+
|
181
|
+
TODO
|
182
|
+
|
183
|
+
## Technical Overview of the DerivativeRodeo
|
184
|
+
|
185
|
+
### Generators
|
186
|
+
|
187
|
+
Generators are responsible for ensuring that we have the file associated with the generator. For example, the [HocrGenerator](./lib/derivative_rodeo/generators/hocr_generator.rb) is responsible for ensuring that we have the `.hocr` file in the expected desired storage location.
|
188
|
+
|
189
|
+
#### Interface(s)
|
190
|
+
|
191
|
+
Generators must have an initializer and build command:
|
192
|
+
|
193
|
+
- `.new(array_of_file_urls, output_location_template, preprocessed_location_template)`
|
194
|
+
- `#generated_files` (executes the generators actions) and returns array of files
|
195
|
+
- `#generated_uris` (executes the generators actions) and returns array of output uris
|
196
|
+
|
197
|
+
#### Supported Generators
|
198
|
+
|
199
|
+
Below is the current list of generators.
|
200
|
+
|
201
|
+
- [HocrGenerator](./lib/derivative_rodeo/generators/hocr_generator.rb) :: generated tesseract files from images, also creates monocrhome files as a prestep
|
202
|
+
- [MonochromeGenerator](./lib/derivative_rodeo/generators/monochrome_generator.rb) :: converts images to monochrome
|
203
|
+
- [CopyGenerator](./lib/derivative_rodeo/generators/copy_generator.rb) :: sends a set of uris to another location. For example from <abbr title="Simple Storage Service">S3</abbr> to <abbr title="Simple Queue Service">SQS</abbr> or from filesystem to S3.
|
204
|
+
- [PdfSplitGenerator](./lib/derivative_rodeo/generators/pdf_split_generator.rb) :: split a PDF into one image per page
|
205
|
+
- [WordCoordinatesGenerator](./lib/derivative_rodeo/generators/word_coordinates_generator.rb) :: create a JSON file representing the words and coordinates (derived from the `.hocr` file).
|
206
|
+
|
207
|
+
#### Registered Generators
|
208
|
+
|
209
|
+
TODO: We want to expose a list of registered generators
|
210
|
+
|
211
|
+
### Storage Locations
|
212
|
+
|
213
|
+
Storage locations are where we put things. Each location has a specific implementation but is expected to inherit from the [DerivativeRodeo::StorageLocation::BaseLocation](./lib/derivative_rodeo/storage_adapters/base_adapter.rb).
|
214
|
+
|
215
|
+
`DerivativeRodeo::StorageLocation::BaseLocation.locations` method tracks the registered locations.
|
216
|
+
|
217
|
+
The location represents where the file *should* be.
|
218
|
+
|
219
|
+
#### Supported Storage Locations
|
220
|
+
|
221
|
+
Storage locations follow a [URI pattern](https://en.wikipedia.org/wiki/Uniform_Resource_Identifier#Example_URIs)
|
222
|
+
|
223
|
+
- `file://` :: “local” file system storage
|
224
|
+
- `s3://` :: <abbr title="Amazon Web Service">AWS</abbr>’s <abbr title="Simple Storage Service">S3</abbr> storage system
|
225
|
+
- `sqs://` :: <abbr title="Amazon Web Service">AWS</abbr>’s <abbr title="Simple Queue Service">SQS</abbr>
|
226
|
+
|
227
|
+
## Development
|
228
|
+
|
229
|
+
- Checkout the repository: `git clone https://github.com/scientist-softserv/derivative_rodeo`
|
230
|
+
- Install dependencies: `cd derivative_rodeo; bundle install`
|
231
|
+
- Install git hooks: `rake install_hooks`
|
232
|
+
- Install binaries:
|
233
|
+
- `pdfinfo`: provided by poppler (e.g. `brew install poppler`)
|
234
|
+
- GhostScript (e.g. `gs`): run `brew install gs`
|
235
|
+
|
236
|
+
Then go about writing your code and documentation.
|
237
|
+
|
238
|
+
The git hooks call `rake default` which will:
|
239
|
+
|
240
|
+
- Amend the table of contents of this file
|
241
|
+
- Run `rubocop`
|
242
|
+
- Validate yard documentation (see http://rubydoc.info/gems/yard/file/docs/Tags.md#List_of_Available_Tags for help correcting warnings)
|
243
|
+
- Run `rspec` with `simplecov`
|
244
|
+
|
245
|
+
### Logging in Test Environment
|
246
|
+
|
247
|
+
Throughout the `DerivativeRodeo` we log some activity. In the typical test run, the logs are overly chatty. If you want the more chatty logs run the following: `DEBUG=t rspec`.
|
248
|
+
|
249
|
+
## Contributing
|
250
|
+
|
251
|
+
Bug reports and pull requests are welcome on GitHub at https://github.com/scientist-softserv/derivative_rodeo.
|
data/Rakefile
ADDED
@@ -0,0 +1,42 @@
|
|
1
|
+
# frozen_string_literal: true
|
2
|
+
|
3
|
+
require 'bundler/gem_tasks'
|
4
|
+
require 'rspec/core/rake_task'
|
5
|
+
require 'rubocop/rake_task'
|
6
|
+
|
7
|
+
desc 'Run style checker'
|
8
|
+
RuboCop::RakeTask.new(:rubocop) do |task|
|
9
|
+
task.fail_on_error = true
|
10
|
+
end
|
11
|
+
|
12
|
+
desc 'Install commit hooks to ensure better practices'
|
13
|
+
task :install_hooks do
|
14
|
+
require 'fileutils'
|
15
|
+
Dir.glob('./git-hooks/*').each do |hook|
|
16
|
+
next if File.file?("./.git/hooks/#{File.basename(hook)}")
|
17
|
+
|
18
|
+
puts "Installing #{File.basename(hook)} git hook"
|
19
|
+
FileUtils.cp(hook, './.git/hooks/')
|
20
|
+
end
|
21
|
+
end
|
22
|
+
|
23
|
+
require 'yard'
|
24
|
+
YARD::Rake::YardocTask.new do |t|
|
25
|
+
t.options = ['--fail-on-warning']
|
26
|
+
end
|
27
|
+
|
28
|
+
RSpec::Core::RakeTask.new(:spec)
|
29
|
+
|
30
|
+
desc 'Generate table of contents for README.md'
|
31
|
+
task :doctoc do
|
32
|
+
if `which doctoc`.strip.empty?
|
33
|
+
$stdout.puts 'Skipping doctoc generation; install via "npm install -g doctoc"'
|
34
|
+
else
|
35
|
+
$stdout.puts 'Generating table of contents for README.md'
|
36
|
+
`doctoc README.md`
|
37
|
+
end
|
38
|
+
end
|
39
|
+
|
40
|
+
task ci: %i[doctoc rubocop yard spec]
|
41
|
+
|
42
|
+
task default: %i[ci]
|
@@ -0,0 +1,54 @@
|
|
1
|
+
# frozen_string_literal: true
|
2
|
+
|
3
|
+
require_relative 'lib/derivative_rodeo/version'
|
4
|
+
|
5
|
+
Gem::Specification.new do |spec|
|
6
|
+
# Renaming to reflect that we previously registered 'derivative-rodeo' and Rubygems guards against
|
7
|
+
# names that are close in resemblence.
|
8
|
+
spec.name = 'derivative-rodeo'
|
9
|
+
spec.version = DerivativeRodeo::VERSION
|
10
|
+
spec.authors = ['Rob Kaufman', 'Jeremy Friesen']
|
11
|
+
spec.email = ['rob@notch8.com', 'jeremy.n.friesen@gmail.com']
|
12
|
+
|
13
|
+
spec.summary = 'An ETL Ecosystem for Derivative Processing.'
|
14
|
+
spec.description = spec.summary
|
15
|
+
spec.homepage = 'https://github.com/scientist-softserv/derivative_rodeo'
|
16
|
+
spec.required_ruby_version = '>= 2.7.0'
|
17
|
+
spec.licenses = ['APACHE-2.0']
|
18
|
+
|
19
|
+
spec.metadata['homepage_uri'] = spec.homepage
|
20
|
+
spec.metadata['source_code_uri'] = spec.homepage
|
21
|
+
|
22
|
+
# Specify which files should be added to the gem when it is released.
|
23
|
+
# The `git ls-files -z` loads the files in the RubyGem that have been added into git.
|
24
|
+
# spec.files = Dir.chdir(__dir__) do
|
25
|
+
# `git ls-files -z`.split("\x0").reject do |f|
|
26
|
+
# (f == __FILE__) || f.match(%r{\A(?:(?:bin|test|spec|features)/|\.(?:git|travis|circleci)|appveyor)})
|
27
|
+
# end
|
28
|
+
# end
|
29
|
+
spec.files = Dir['lib/**/*'].keep_if { |file| File.file?(file) } + %w[Gemfile LICENSE README.md Rakefile derivative_rodeo.gemspec]
|
30
|
+
spec.bindir = 'exe'
|
31
|
+
spec.executables = spec.files.grep(%r{\Aexe/}) { |f| File.basename(f) }
|
32
|
+
spec.require_paths = ['lib']
|
33
|
+
|
34
|
+
spec.add_dependency 'activesupport', '>= 5'
|
35
|
+
spec.add_dependency 'aws-sdk-s3'
|
36
|
+
spec.add_dependency 'aws-sdk-sqs'
|
37
|
+
spec.add_dependency 'httparty'
|
38
|
+
spec.add_dependency 'marcel'
|
39
|
+
spec.add_dependency 'mime-types'
|
40
|
+
spec.add_dependency 'mini_magick'
|
41
|
+
spec.add_dependency 'nokogiri'
|
42
|
+
|
43
|
+
spec.add_development_dependency 'bixby'
|
44
|
+
spec.add_development_dependency 'byebug'
|
45
|
+
# spec.add_development_dependency 'hydra-file_characterization'
|
46
|
+
spec.add_development_dependency 'rspec', '~> 3.0'
|
47
|
+
spec.add_development_dependency 'rake', '~> 13.0'
|
48
|
+
spec.add_development_dependency 'simplecov'
|
49
|
+
spec.add_development_dependency 'yard-activerecord'
|
50
|
+
spec.add_development_dependency 'rspec-its'
|
51
|
+
spec.add_development_dependency 'shoulda-matchers'
|
52
|
+
spec.add_development_dependency 'solargraph'
|
53
|
+
spec.add_development_dependency 'yard'
|
54
|
+
end
|
@@ -0,0 +1,95 @@
|
|
1
|
+
# frozen_string_literal: true
|
2
|
+
|
3
|
+
require 'mime/types'
|
4
|
+
require 'logger'
|
5
|
+
module DerivativeRodeo
|
6
|
+
##
|
7
|
+
# @api public
|
8
|
+
#
|
9
|
+
# This class is responsible for the consistent configuration of the "application" that leverages
|
10
|
+
# the {DerivativeRodeo}.
|
11
|
+
#
|
12
|
+
# This configuration helps set defaults for storage locations and generators.
|
13
|
+
class Configuration
|
14
|
+
##
|
15
|
+
# Allows AWS configuration to be set via environment variables by declairing them in the configuration
|
16
|
+
# class as follows:
|
17
|
+
#
|
18
|
+
# @example
|
19
|
+
#
|
20
|
+
# aws_config prefix: 's3', name: 'region', default: 'us-east-1'
|
21
|
+
#
|
22
|
+
# @param prefix [String]
|
23
|
+
# @param name [String]
|
24
|
+
# @param default [String] (optional)
|
25
|
+
def self.aws_config(prefix:, name:, default: nil)
|
26
|
+
aws_config_getter(prefix: prefix, name: name, default: default)
|
27
|
+
aws_config_setter(prefix: prefix, name: name)
|
28
|
+
end
|
29
|
+
|
30
|
+
def self.aws_config_getter(prefix:, name:, default: nil)
|
31
|
+
define_method "aws_#{prefix}_#{name}" do
|
32
|
+
val = instance_variable_get("@aws_#{prefix}_#{name}")
|
33
|
+
return val if val
|
34
|
+
|
35
|
+
val = ENV["AWS_#{prefix.upcase}_#{name.upcase}"] ||
|
36
|
+
ENV["AWS__#{name.upcase}"] ||
|
37
|
+
ENV["AWS_DEFAULT_#{name.upcase}"] ||
|
38
|
+
default
|
39
|
+
instance_variable_set("@aws_#{prefix}_#{name}", val)
|
40
|
+
end
|
41
|
+
end
|
42
|
+
private_class_method :aws_config_getter
|
43
|
+
|
44
|
+
def self.aws_config_setter(prefix:, name:)
|
45
|
+
define_method "aws_#{prefix}_#{name}=" do |val|
|
46
|
+
instance_variable_set("@aws_#{prefix}_#{name}", val)
|
47
|
+
end
|
48
|
+
end
|
49
|
+
private_class_method :aws_config_setter
|
50
|
+
|
51
|
+
def initialize
|
52
|
+
@logger = if defined?(Rails)
|
53
|
+
Rails.logger
|
54
|
+
else
|
55
|
+
# By default, minimize the chatter of the specs. Add ENV['DEBUG'] to expose the
|
56
|
+
# chatter.
|
57
|
+
Logger.new($stderr, level: Logger::FATAL)
|
58
|
+
end
|
59
|
+
yield self if block_given?
|
60
|
+
end
|
61
|
+
|
62
|
+
##
|
63
|
+
# @return [Logger]
|
64
|
+
attr_accessor :logger
|
65
|
+
|
66
|
+
##
|
67
|
+
# @!group AWS S3 Configuration
|
68
|
+
#
|
69
|
+
# Various AWS items for {StorageLocations::S3Location}. These can be set from the ENV or the configuration block
|
70
|
+
#
|
71
|
+
# @note
|
72
|
+
#
|
73
|
+
# The order we use is:
|
74
|
+
# * `config.aws_s3_<variable_name> = value`
|
75
|
+
# * `AWS_S3_<variable_name>`
|
76
|
+
# * `AWS_<variable_name>`
|
77
|
+
# * `AWS_DEFAULT_<variable_name>`
|
78
|
+
# * default
|
79
|
+
#
|
80
|
+
# @return [String]
|
81
|
+
|
82
|
+
aws_config prefix: 's3', name: 'region', default: 'us-east-1'
|
83
|
+
aws_config prefix: 's3', name: 'bucket'
|
84
|
+
aws_config prefix: 's3', name: 'access_key_id'
|
85
|
+
aws_config prefix: 's3', name: 'secret_access_key'
|
86
|
+
|
87
|
+
aws_config prefix: 'sqs', name: 'region', default: 'us-east-1'
|
88
|
+
aws_config prefix: 'sqs', name: 'queue'
|
89
|
+
aws_config prefix: 'sqs', name: 'account_id'
|
90
|
+
aws_config prefix: 'sqs', name: 'access_key_id'
|
91
|
+
aws_config prefix: 'sqs', name: 'secret_access_key'
|
92
|
+
aws_config prefix: 'sqs', name: 'max_batch_size', default: 10
|
93
|
+
# @!endgroup AWS SQS Configurations
|
94
|
+
end
|
95
|
+
end
|
@@ -0,0 +1,56 @@
|
|
1
|
+
# frozen_string_literal: true
|
2
|
+
module DerivativeRodeo
|
3
|
+
##
|
4
|
+
# A module namespace for establishing the possible errors that the {DerivativeRodeo} could raise.
|
5
|
+
# The rodeo could raise other errors, but these are the ones we've named.
|
6
|
+
module Errors
|
7
|
+
##
|
8
|
+
# That which all DerivativeRodeo errors shall extend!
|
9
|
+
class Error < StandardError; end
|
10
|
+
|
11
|
+
##
|
12
|
+
# Raised when a file uri is passed in that does not contain a storage adapter part before the ://
|
13
|
+
class StorageLocationMissing < Error
|
14
|
+
def initialize(file_uri: '')
|
15
|
+
super("#{file_uri} does not contain an adapter. Should look like file:///my_dir/myfile or s3://bucket_name/location/file_name. The part before the :// is used to select the storage adapter.") # rubocop:disable Layout/LineLength
|
16
|
+
end
|
17
|
+
end
|
18
|
+
|
19
|
+
##
|
20
|
+
# Raised when a storage adapter is called for but does not exist in the registered adapter list
|
21
|
+
class StorageLocationNotFoundError < Error
|
22
|
+
def initialize(location_name: '')
|
23
|
+
super("Could not find the adapter #{location_name}. Make sure it is required and registerd properly.")
|
24
|
+
end
|
25
|
+
end
|
26
|
+
|
27
|
+
##
|
28
|
+
# Raised when a storage adapter is called for but does not exist in the registered adapter list
|
29
|
+
class MaxQueueSize < Error
|
30
|
+
def initialize(batch_size:)
|
31
|
+
super("Batch size #{batch_size} is larger than the max queue size #{DerivativeRodeo.config.aws_sqs_max_batch_size}")
|
32
|
+
end
|
33
|
+
end
|
34
|
+
|
35
|
+
##
|
36
|
+
# Raised when AWS bucket does not exist or is not accessible by current permissions
|
37
|
+
class BucketMissingError < Error
|
38
|
+
def initialize
|
39
|
+
super("Bucket part missing #{file_uri}")
|
40
|
+
end
|
41
|
+
end
|
42
|
+
|
43
|
+
##
|
44
|
+
# Raised when trying to write a tmp file that does not exist
|
45
|
+
class FileMissingError < Error
|
46
|
+
end
|
47
|
+
|
48
|
+
##
|
49
|
+
# Raised because the Generator class must declare an extension for the output file extension
|
50
|
+
class ExtensionMissingError < Error
|
51
|
+
def initialize(klass: '')
|
52
|
+
super("Extension must be declared in the Generator class #{klass}")
|
53
|
+
end
|
54
|
+
end
|
55
|
+
end
|
56
|
+
end
|