trove 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/CHANGELOG.md +3 -0
- data/LICENSE.txt +21 -0
- data/README.md +237 -0
- data/exe/trove +14 -0
- data/lib/trove.rb +162 -0
- data/lib/trove/cli.rb +113 -0
- data/lib/trove/storage/s3.rb +120 -0
- data/lib/trove/utils.rb +33 -0
- data/lib/trove/version.rb +3 -0
- metadata +122 -0
checksums.yaml
ADDED
|
@@ -0,0 +1,7 @@
|
|
|
1
|
+
---
|
|
2
|
+
SHA256:
|
|
3
|
+
metadata.gz: a501c7fded530b2dc537d412eae735c19cfa25f04e2e21dd8d6861ae42b2a271
|
|
4
|
+
data.tar.gz: 0cf407da8dce52a446e053a7e03791622ecd303a6514bdf551ce12bd31ebe788
|
|
5
|
+
SHA512:
|
|
6
|
+
metadata.gz: 4ad4fa70071eb4745046938a9b042b7df87eaa85964bc4d9a71fa90837335e40d8195f9de7ea27d70963c83a4facd163a1542527f7f88a5ffeb6aec8b4b71649
|
|
7
|
+
data.tar.gz: 73b4037bf7a50b498ef6d6bb640f2d62e237c7a1effba0aa298c9a54cf8d56edbd6efd539bdfc0768e1e6e1c85d3e8d2f524c60f5b996469881b6d67a66f9d1d
|
data/CHANGELOG.md
ADDED
data/LICENSE.txt
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
The MIT License (MIT)
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2020 Andrew Kane
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in
|
|
13
|
+
all copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
|
|
21
|
+
THE SOFTWARE.
|
data/README.md
ADDED
|
@@ -0,0 +1,237 @@
|
|
|
1
|
+
# Trove
|
|
2
|
+
|
|
3
|
+
:fire: Deploy machine learning models in Ruby (and Rails)
|
|
4
|
+
|
|
5
|
+
Works great with [XGBoost](https://github.com/ankane/xgboost), [Torch.rb](https://github.com/ankane/torch.rb), [fastText](https://github.com/ankane/fastText), and many other gems
|
|
6
|
+
|
|
7
|
+
## Installation
|
|
8
|
+
|
|
9
|
+
Add this line to your application’s Gemfile:
|
|
10
|
+
|
|
11
|
+
```ruby
|
|
12
|
+
gem 'trove'
|
|
13
|
+
```
|
|
14
|
+
|
|
15
|
+
And run:
|
|
16
|
+
|
|
17
|
+
```sh
|
|
18
|
+
bundle install
|
|
19
|
+
trove init
|
|
20
|
+
```
|
|
21
|
+
|
|
22
|
+
And [configure your storage](#storage) in `.trove.yml`.
|
|
23
|
+
|
|
24
|
+
## Storage
|
|
25
|
+
|
|
26
|
+
### Amazon S3
|
|
27
|
+
|
|
28
|
+
Create a bucket and enable object versioning.
|
|
29
|
+
|
|
30
|
+
Next, set up your AWS credentials. You can use the [AWS CLI](https://github.com/aws/aws-cli):
|
|
31
|
+
|
|
32
|
+
```sh
|
|
33
|
+
pip install awscli
|
|
34
|
+
aws configure
|
|
35
|
+
```
|
|
36
|
+
|
|
37
|
+
Or environment variables:
|
|
38
|
+
|
|
39
|
+
```sh
|
|
40
|
+
export AWS_ACCESS_KEY_ID=...
|
|
41
|
+
export AWS_SECRET_ACCESS_KEY=...
|
|
42
|
+
export AWS_REGION=...
|
|
43
|
+
```
|
|
44
|
+
|
|
45
|
+
IAM users need:
|
|
46
|
+
|
|
47
|
+
- `s3:GetObject` and `s3:GetObjectVersion` to pull files
|
|
48
|
+
- `s3:PutObject` to push files
|
|
49
|
+
- `s3:ListBucket` and `s3:ListBucketVersions` to list files and versions
|
|
50
|
+
- `s3:DeleteObject` and `s3:DeleteObjectVersion` to delete files
|
|
51
|
+
|
|
52
|
+
Here’s an example policy:
|
|
53
|
+
|
|
54
|
+
```json
|
|
55
|
+
{
|
|
56
|
+
"Version": "2012-10-17",
|
|
57
|
+
"Statement": [
|
|
58
|
+
{
|
|
59
|
+
"Sid": "Trove",
|
|
60
|
+
"Effect": "Allow",
|
|
61
|
+
"Action": [
|
|
62
|
+
"s3:GetObject",
|
|
63
|
+
"s3:GetObjectVersion",
|
|
64
|
+
"s3:PutObject",
|
|
65
|
+
"s3:ListBucket",
|
|
66
|
+
"s3:ListBucketVersions",
|
|
67
|
+
"s3:DeleteObject",
|
|
68
|
+
"s3:DeleteObjectVersion"
|
|
69
|
+
],
|
|
70
|
+
"Resource": [
|
|
71
|
+
"arn:aws:s3:::my-bucket",
|
|
72
|
+
"arn:aws:s3:::my-bucket/trove/*"
|
|
73
|
+
]
|
|
74
|
+
}
|
|
75
|
+
]
|
|
76
|
+
}
|
|
77
|
+
```
|
|
78
|
+
|
|
79
|
+
If your production servers only need to pull files, only give them `s3:GetObject` and `s3:GetObjectVersion` permissions.
|
|
80
|
+
|
|
81
|
+
## How It Works
|
|
82
|
+
|
|
83
|
+
Git is great for code, but it’s not ideal for large files like models. Instead, we use an object store like Amazon S3 to store and version them.
|
|
84
|
+
|
|
85
|
+
Trove creates an `trove` directory for you to use as a workspace. Files in the directory are ignored by Git, but can be pushed and pulled from the object store. By default, files are tracked in `.trove.yml` to make it easy to deploy specific versions with code changes.
|
|
86
|
+
|
|
87
|
+
## Getting Started
|
|
88
|
+
|
|
89
|
+
Use the `trove` directory to save and load models.
|
|
90
|
+
|
|
91
|
+
```ruby
|
|
92
|
+
# training code
|
|
93
|
+
model.save_model("trove/model.bin")
|
|
94
|
+
|
|
95
|
+
# prediction code
|
|
96
|
+
model = FastText.load_model("trove/model.bin")
|
|
97
|
+
```
|
|
98
|
+
|
|
99
|
+
When a model is ready, push it to the object store with:
|
|
100
|
+
|
|
101
|
+
```sh
|
|
102
|
+
trove push model.bin
|
|
103
|
+
```
|
|
104
|
+
|
|
105
|
+
And commit the changes to `.trove.yml`. The model is now ready to be deployed.
|
|
106
|
+
|
|
107
|
+
## Deployment
|
|
108
|
+
|
|
109
|
+
We recommend pulling files during the build process.
|
|
110
|
+
|
|
111
|
+
- [Heroku and Dokku](#heroku-and-dokku)
|
|
112
|
+
- [Docker](#docker)
|
|
113
|
+
|
|
114
|
+
Make sure your storage credentials are available in the build environment.
|
|
115
|
+
|
|
116
|
+
### Heroku and Dokku
|
|
117
|
+
|
|
118
|
+
Add to your `Rakefile`:
|
|
119
|
+
|
|
120
|
+
```ruby
|
|
121
|
+
Rake::Task["assets:precompile"].enhance do
|
|
122
|
+
Trove.pull
|
|
123
|
+
end
|
|
124
|
+
```
|
|
125
|
+
|
|
126
|
+
This will pull files at the very end of the asset precompile. Check the build output for:
|
|
127
|
+
|
|
128
|
+
```text
|
|
129
|
+
remote: Pulling model.bin...
|
|
130
|
+
remote: Asset precompilation completed (30.00s)
|
|
131
|
+
```
|
|
132
|
+
|
|
133
|
+
### Docker
|
|
134
|
+
|
|
135
|
+
And add to your `Dockerfile`:
|
|
136
|
+
|
|
137
|
+
```Dockerfile
|
|
138
|
+
RUN bundle exec trove pull
|
|
139
|
+
```
|
|
140
|
+
|
|
141
|
+
## Commands
|
|
142
|
+
|
|
143
|
+
Push a file
|
|
144
|
+
|
|
145
|
+
```sh
|
|
146
|
+
trove push model.bin
|
|
147
|
+
```
|
|
148
|
+
|
|
149
|
+
Pull all files in `.trove.yml`
|
|
150
|
+
|
|
151
|
+
```sh
|
|
152
|
+
trove pull
|
|
153
|
+
```
|
|
154
|
+
|
|
155
|
+
Pull a specific file (uses the version in `.trove.yml` if present)
|
|
156
|
+
|
|
157
|
+
```sh
|
|
158
|
+
trove pull model.bin
|
|
159
|
+
```
|
|
160
|
+
|
|
161
|
+
Pull a specific version of a file
|
|
162
|
+
|
|
163
|
+
```sh
|
|
164
|
+
trove pull model.bin --version 123
|
|
165
|
+
```
|
|
166
|
+
|
|
167
|
+
Delete a file
|
|
168
|
+
|
|
169
|
+
```sh
|
|
170
|
+
trove delete model.bin
|
|
171
|
+
```
|
|
172
|
+
|
|
173
|
+
List files
|
|
174
|
+
|
|
175
|
+
```sh
|
|
176
|
+
trove list
|
|
177
|
+
```
|
|
178
|
+
|
|
179
|
+
List versions
|
|
180
|
+
|
|
181
|
+
```sh
|
|
182
|
+
trove versions model.bin
|
|
183
|
+
```
|
|
184
|
+
|
|
185
|
+
## Ruby API
|
|
186
|
+
|
|
187
|
+
You can use the Ruby API in addition to the CLI.
|
|
188
|
+
|
|
189
|
+
```ruby
|
|
190
|
+
Trove.push(filename)
|
|
191
|
+
Trove.pull
|
|
192
|
+
Trove.pull(filename)
|
|
193
|
+
Trove.pull(filename, version: version)
|
|
194
|
+
Trove.delete(filename)
|
|
195
|
+
Trove.list
|
|
196
|
+
Trove.versions(filename)
|
|
197
|
+
```
|
|
198
|
+
|
|
199
|
+
This makes it easy to perform operations from code, iRuby notebooks, and the Rails console.
|
|
200
|
+
|
|
201
|
+
## Automated Training
|
|
202
|
+
|
|
203
|
+
By default, Trove tracks files in `.trove.yml` so you can deploy specific versions with `trove pull`. However, this functionality is entirely optional. Disable it with:
|
|
204
|
+
|
|
205
|
+
```yml
|
|
206
|
+
vcs: false
|
|
207
|
+
```
|
|
208
|
+
|
|
209
|
+
This is useful if you want to automate training or build more complex workflows.
|
|
210
|
+
|
|
211
|
+
## History
|
|
212
|
+
|
|
213
|
+
View the [changelog](https://github.com/ankane/trove/blob/master/CHANGELOG.md)
|
|
214
|
+
|
|
215
|
+
## Contributing
|
|
216
|
+
|
|
217
|
+
Everyone is encouraged to help improve this project. Here are a few ways you can help:
|
|
218
|
+
|
|
219
|
+
- [Report bugs](https://github.com/ankane/trove/issues)
|
|
220
|
+
- Fix bugs and [submit pull requests](https://github.com/ankane/trove/pulls)
|
|
221
|
+
- Write, clarify, or fix documentation
|
|
222
|
+
- Suggest or add new features
|
|
223
|
+
|
|
224
|
+
To get started with development:
|
|
225
|
+
|
|
226
|
+
```sh
|
|
227
|
+
git clone https://github.com/ankane/trove.git
|
|
228
|
+
cd trove
|
|
229
|
+
bundle install
|
|
230
|
+
|
|
231
|
+
export AWS_ACCESS_KEY_ID=...
|
|
232
|
+
export AWS_SECRET_ACCESS_KEY=...
|
|
233
|
+
export AWS_REGION=...
|
|
234
|
+
export S3_BUCKET=my-bucket
|
|
235
|
+
|
|
236
|
+
bundle exec rake test
|
|
237
|
+
```
|
data/exe/trove
ADDED
data/lib/trove.rb
ADDED
|
@@ -0,0 +1,162 @@
|
|
|
1
|
+
# stdlib
|
|
2
|
+
require "digest/md5"
|
|
3
|
+
require "yaml"
|
|
4
|
+
|
|
5
|
+
# modules
|
|
6
|
+
require "trove/utils"
|
|
7
|
+
require "trove/version"
|
|
8
|
+
|
|
9
|
+
module Trove
|
|
10
|
+
# storage
|
|
11
|
+
module Storage
|
|
12
|
+
autoload :S3, "trove/storage/s3"
|
|
13
|
+
end
|
|
14
|
+
|
|
15
|
+
# methods
|
|
16
|
+
class << self
|
|
17
|
+
# TODO use flock to prevent multiple concurrent downloads
|
|
18
|
+
def pull(filename = nil, version: nil)
|
|
19
|
+
if filename
|
|
20
|
+
pull_file(filename, version: version)
|
|
21
|
+
else
|
|
22
|
+
raise ArgumentError, "Specify filename for version" if version
|
|
23
|
+
|
|
24
|
+
(config["files"] || []).each do |file|
|
|
25
|
+
pull_file(file["name"], version: file["version"], all: true)
|
|
26
|
+
end
|
|
27
|
+
end
|
|
28
|
+
end
|
|
29
|
+
|
|
30
|
+
# could use upload_file method for multipart uploads over a certain size
|
|
31
|
+
# but multipart uploads have extra cost and cleanup, so keep it simple for now
|
|
32
|
+
def push(filename)
|
|
33
|
+
src = File.join(root, filename)
|
|
34
|
+
raise "File not found" unless File.exist?(src)
|
|
35
|
+
|
|
36
|
+
info = storage.info(filename)
|
|
37
|
+
upload = info.nil?
|
|
38
|
+
unless upload
|
|
39
|
+
version = info[:version]
|
|
40
|
+
if modified?(src, info)
|
|
41
|
+
upload = true
|
|
42
|
+
else
|
|
43
|
+
stream.puts "Already up-to-date"
|
|
44
|
+
end
|
|
45
|
+
end
|
|
46
|
+
|
|
47
|
+
if upload
|
|
48
|
+
stream.puts "Pushing #{filename}..." unless stream.tty?
|
|
49
|
+
resp = storage.upload(src, filename) do |current_size, total_size|
|
|
50
|
+
Utils.progress(stream, filename, current_size, total_size)
|
|
51
|
+
end
|
|
52
|
+
version = resp[:version]
|
|
53
|
+
end
|
|
54
|
+
|
|
55
|
+
if vcs?
|
|
56
|
+
# add files to yaml if needed
|
|
57
|
+
files = (config["files"] ||= [])
|
|
58
|
+
|
|
59
|
+
# find file
|
|
60
|
+
file = files.find { |f| f["name"] == filename }
|
|
61
|
+
unless file
|
|
62
|
+
file = {"name" => filename}
|
|
63
|
+
files << file
|
|
64
|
+
end
|
|
65
|
+
|
|
66
|
+
# update version
|
|
67
|
+
file["version"] = version
|
|
68
|
+
|
|
69
|
+
File.write(".trove.yml", config.to_yaml.sub(/\A---\n/, ""))
|
|
70
|
+
end
|
|
71
|
+
|
|
72
|
+
{
|
|
73
|
+
version: version
|
|
74
|
+
}
|
|
75
|
+
end
|
|
76
|
+
|
|
77
|
+
def delete(filename)
|
|
78
|
+
storage.delete(filename)
|
|
79
|
+
end
|
|
80
|
+
|
|
81
|
+
def list
|
|
82
|
+
storage.list
|
|
83
|
+
end
|
|
84
|
+
|
|
85
|
+
def versions(filename)
|
|
86
|
+
storage.versions(filename)
|
|
87
|
+
end
|
|
88
|
+
|
|
89
|
+
private
|
|
90
|
+
|
|
91
|
+
def pull_file(filename, version: nil, all: false)
|
|
92
|
+
dest = File.join(root, filename)
|
|
93
|
+
|
|
94
|
+
if !version
|
|
95
|
+
file = (config["files"] || []).find { |f| f["name"] == filename }
|
|
96
|
+
version = file["version"] if file
|
|
97
|
+
end
|
|
98
|
+
|
|
99
|
+
download = !File.exist?(dest)
|
|
100
|
+
unless download
|
|
101
|
+
info = storage.info(filename, version: version)
|
|
102
|
+
if info.nil? || modified?(dest, info)
|
|
103
|
+
download = true
|
|
104
|
+
else
|
|
105
|
+
stream.puts "Already up-to-date" unless all
|
|
106
|
+
end
|
|
107
|
+
end
|
|
108
|
+
|
|
109
|
+
if download
|
|
110
|
+
stream.puts "Pulling #{filename}..." unless stream.tty?
|
|
111
|
+
storage.download(filename, dest, version: version) do |current_size, total_size|
|
|
112
|
+
Utils.progress(stream, filename, current_size, total_size)
|
|
113
|
+
end
|
|
114
|
+
end
|
|
115
|
+
|
|
116
|
+
download
|
|
117
|
+
end
|
|
118
|
+
|
|
119
|
+
def modified?(src, info)
|
|
120
|
+
Digest::MD5.file(src).hexdigest != info[:md5]
|
|
121
|
+
end
|
|
122
|
+
|
|
123
|
+
# TODO test file not found
|
|
124
|
+
def config
|
|
125
|
+
@config ||= begin
|
|
126
|
+
begin
|
|
127
|
+
YAML.load_file(".trove.yml")
|
|
128
|
+
rescue Errno::ENOENT
|
|
129
|
+
raise "Config not found"
|
|
130
|
+
end
|
|
131
|
+
end
|
|
132
|
+
end
|
|
133
|
+
|
|
134
|
+
def root
|
|
135
|
+
@root ||= config["root"] || "trove"
|
|
136
|
+
end
|
|
137
|
+
|
|
138
|
+
def storage
|
|
139
|
+
@storage ||= begin
|
|
140
|
+
uri = URI.parse(config["storage"])
|
|
141
|
+
|
|
142
|
+
case uri.scheme
|
|
143
|
+
when "s3"
|
|
144
|
+
Storage::S3.new(
|
|
145
|
+
bucket: uri.host,
|
|
146
|
+
prefix: uri.path[1..-1]
|
|
147
|
+
)
|
|
148
|
+
else
|
|
149
|
+
raise "Invalid storage provider: #{uri.scheme}"
|
|
150
|
+
end
|
|
151
|
+
end
|
|
152
|
+
end
|
|
153
|
+
|
|
154
|
+
def vcs?
|
|
155
|
+
config.key?("vcs") ? config["vcs"] : true
|
|
156
|
+
end
|
|
157
|
+
|
|
158
|
+
def stream
|
|
159
|
+
$stderr
|
|
160
|
+
end
|
|
161
|
+
end
|
|
162
|
+
end
|
data/lib/trove/cli.rb
ADDED
|
@@ -0,0 +1,113 @@
|
|
|
1
|
+
require "thor"
|
|
2
|
+
|
|
3
|
+
module Trove
|
|
4
|
+
class CLI < Thor
|
|
5
|
+
include Thor::Actions
|
|
6
|
+
|
|
7
|
+
desc "init", "Initialize a project"
|
|
8
|
+
def init
|
|
9
|
+
create_file "trove/.keep", ""
|
|
10
|
+
|
|
11
|
+
if File.exist?(".gitignore")
|
|
12
|
+
contents = <<~EOS
|
|
13
|
+
|
|
14
|
+
# Ignore Trove storage
|
|
15
|
+
/trove/*
|
|
16
|
+
!/trove/.keep
|
|
17
|
+
EOS
|
|
18
|
+
unless File.read(".gitignore").include?(contents)
|
|
19
|
+
append_to_file(".gitignore", contents)
|
|
20
|
+
end
|
|
21
|
+
else
|
|
22
|
+
say "Check in trove/.keep and ignore trove/*"
|
|
23
|
+
end
|
|
24
|
+
|
|
25
|
+
create_file ".trove.yml", <<~EOS
|
|
26
|
+
storage: s3://my-bucket/trove
|
|
27
|
+
EOS
|
|
28
|
+
end
|
|
29
|
+
|
|
30
|
+
desc "push FILENAME", "Push a file"
|
|
31
|
+
def push(filename)
|
|
32
|
+
Trove.push(filename)
|
|
33
|
+
end
|
|
34
|
+
|
|
35
|
+
desc "pull [FILENAME]", "Pull files"
|
|
36
|
+
option :version
|
|
37
|
+
def pull(filename = nil)
|
|
38
|
+
Trove.pull(filename, version: options[:version])
|
|
39
|
+
end
|
|
40
|
+
|
|
41
|
+
desc "delete FILENAME", "Delete a file"
|
|
42
|
+
def delete(filename = nil)
|
|
43
|
+
Trove.delete(filename)
|
|
44
|
+
end
|
|
45
|
+
|
|
46
|
+
desc "list", "List files"
|
|
47
|
+
def list
|
|
48
|
+
say table(
|
|
49
|
+
Trove.list,
|
|
50
|
+
[:filename, :size, :updated_at]
|
|
51
|
+
)
|
|
52
|
+
end
|
|
53
|
+
|
|
54
|
+
desc "version", "Show the current version"
|
|
55
|
+
def version
|
|
56
|
+
say Trove::VERSION
|
|
57
|
+
end
|
|
58
|
+
|
|
59
|
+
desc "versions FILENAME", "List versions"
|
|
60
|
+
def versions(filename)
|
|
61
|
+
say table(
|
|
62
|
+
Trove.versions(filename),
|
|
63
|
+
[:version, :size, :updated_at]
|
|
64
|
+
)
|
|
65
|
+
end
|
|
66
|
+
|
|
67
|
+
private
|
|
68
|
+
|
|
69
|
+
def table(data, columns)
|
|
70
|
+
columns.each do |c|
|
|
71
|
+
if c == :size
|
|
72
|
+
data.each { |r| r[c] = Utils.human_size(r[c]) }
|
|
73
|
+
elsif c == :updated_at
|
|
74
|
+
data.each { |r| r[c] = "#{time_ago(r[c])} ago" }
|
|
75
|
+
elsif c == :version
|
|
76
|
+
data.each { |r| r[c] ||= "<none>" }
|
|
77
|
+
end
|
|
78
|
+
end
|
|
79
|
+
column_names = columns.map { |c| c.to_s.sub(/_at\z/, "").upcase }
|
|
80
|
+
widths = columns.map.with_index { |c, i| [column_names[i].size, data.map { |r| r[c].to_s.size }.max].max }
|
|
81
|
+
|
|
82
|
+
output = String.new("")
|
|
83
|
+
str = widths.map { |w| "%-#{w}s" }.join(" ") + "\n"
|
|
84
|
+
output << str % column_names
|
|
85
|
+
data.each do |row|
|
|
86
|
+
output << str % columns.map { |c| row[c] }
|
|
87
|
+
end
|
|
88
|
+
output
|
|
89
|
+
end
|
|
90
|
+
|
|
91
|
+
def time_ago(time)
|
|
92
|
+
diff = (Time.now - time).round
|
|
93
|
+
|
|
94
|
+
if diff < 60
|
|
95
|
+
pluralize(diff, "second")
|
|
96
|
+
elsif diff < 60 * 60
|
|
97
|
+
pluralize((diff / 60.0).floor, "minute")
|
|
98
|
+
elsif diff < 60 * 60 * 24
|
|
99
|
+
pluralize((diff / (60.0 * 60)).floor, "hour")
|
|
100
|
+
else
|
|
101
|
+
pluralize((diff / (60.0 * 60 * 24)).floor, "day")
|
|
102
|
+
end
|
|
103
|
+
end
|
|
104
|
+
|
|
105
|
+
def pluralize(value, str)
|
|
106
|
+
"#{value} #{value == 1 ? str : "#{str}s"}"
|
|
107
|
+
end
|
|
108
|
+
|
|
109
|
+
def self.exit_on_failure?
|
|
110
|
+
true
|
|
111
|
+
end
|
|
112
|
+
end
|
|
113
|
+
end
|
|
@@ -0,0 +1,120 @@
|
|
|
1
|
+
require "aws-sdk-s3"
|
|
2
|
+
require "fileutils"
|
|
3
|
+
|
|
4
|
+
module Trove
|
|
5
|
+
module Storage
|
|
6
|
+
class S3
|
|
7
|
+
attr_reader :bucket, :prefix
|
|
8
|
+
|
|
9
|
+
def initialize(bucket:, prefix: nil)
|
|
10
|
+
@bucket = bucket
|
|
11
|
+
@prefix = prefix
|
|
12
|
+
end
|
|
13
|
+
|
|
14
|
+
def download(filename, dest, version: nil)
|
|
15
|
+
current_size = 0
|
|
16
|
+
total_size = nil
|
|
17
|
+
|
|
18
|
+
# TODO better path
|
|
19
|
+
tmp = "#{Dir.tmpdir}/trove-#{Time.now.to_f}"
|
|
20
|
+
begin
|
|
21
|
+
File.open(tmp, "wb") do |file|
|
|
22
|
+
options = {bucket: bucket, key: key(filename)}
|
|
23
|
+
options[:version_id] = version if version
|
|
24
|
+
client.get_object(**options) do |chunk, headers|
|
|
25
|
+
file.write(chunk)
|
|
26
|
+
|
|
27
|
+
current_size += chunk.bytesize
|
|
28
|
+
total_size ||= headers["content-length"].to_i
|
|
29
|
+
yield current_size, total_size
|
|
30
|
+
end
|
|
31
|
+
end
|
|
32
|
+
FileUtils.mv(tmp, dest)
|
|
33
|
+
ensure
|
|
34
|
+
# delete file if interrupted
|
|
35
|
+
File.unlink(tmp) if File.exist?(tmp)
|
|
36
|
+
end
|
|
37
|
+
rescue Aws::S3::Errors::ServiceError
|
|
38
|
+
raise "File not found"
|
|
39
|
+
end
|
|
40
|
+
|
|
41
|
+
def upload(src, filename, &block)
|
|
42
|
+
on_chunk_sent = lambda do |_, current_size, total_size|
|
|
43
|
+
block.call(current_size, total_size)
|
|
44
|
+
end
|
|
45
|
+
resp = nil
|
|
46
|
+
File.open(src, "rb") do |file|
|
|
47
|
+
resp = client.put_object(bucket: bucket, key: key(filename), body: file, on_chunk_sent: on_chunk_sent)
|
|
48
|
+
end
|
|
49
|
+
{version: resp.version_id}
|
|
50
|
+
end
|
|
51
|
+
|
|
52
|
+
# etag isn't always MD5, but low likelihood of match if not
|
|
53
|
+
# could alternatively add sha256 to metadata
|
|
54
|
+
def info(filename, version: nil)
|
|
55
|
+
options = {bucket: bucket, key: key(filename)}
|
|
56
|
+
options[:version_id] = version if version
|
|
57
|
+
resp = client.head_object(**options)
|
|
58
|
+
{
|
|
59
|
+
version: resp.version_id,
|
|
60
|
+
md5: resp.etag.gsub('"', "")
|
|
61
|
+
}
|
|
62
|
+
rescue Aws::S3::Errors::ServiceError
|
|
63
|
+
nil
|
|
64
|
+
end
|
|
65
|
+
|
|
66
|
+
def delete(filename, version: nil)
|
|
67
|
+
options = {bucket: bucket, key: key(filename)}
|
|
68
|
+
options[:version_id] = version if version
|
|
69
|
+
client.delete_object(**options)
|
|
70
|
+
true
|
|
71
|
+
rescue Aws::S3::Errors::ServiceError
|
|
72
|
+
false
|
|
73
|
+
end
|
|
74
|
+
|
|
75
|
+
def list
|
|
76
|
+
files = []
|
|
77
|
+
options = {bucket: bucket}
|
|
78
|
+
options[:prefix] = prefix if prefix
|
|
79
|
+
client.list_objects_v2(**options).each do |response|
|
|
80
|
+
response.contents.each do |object|
|
|
81
|
+
filename = prefix ? object.key[(prefix.size + 1)..-1] : object.key
|
|
82
|
+
files << {
|
|
83
|
+
filename: filename,
|
|
84
|
+
size: object.size,
|
|
85
|
+
updated_at: object.last_modified
|
|
86
|
+
}
|
|
87
|
+
end
|
|
88
|
+
end
|
|
89
|
+
files
|
|
90
|
+
end
|
|
91
|
+
|
|
92
|
+
def versions(filename)
|
|
93
|
+
versions = []
|
|
94
|
+
object_key = key(filename)
|
|
95
|
+
client.list_object_versions(bucket: bucket, prefix: object_key).each do |response|
|
|
96
|
+
response.versions.each do |version|
|
|
97
|
+
next if version.key != object_key
|
|
98
|
+
|
|
99
|
+
versions << {
|
|
100
|
+
version: version.version_id == "null" ? nil : version.version_id,
|
|
101
|
+
size: version.size,
|
|
102
|
+
updated_at: version.last_modified
|
|
103
|
+
}
|
|
104
|
+
end
|
|
105
|
+
end
|
|
106
|
+
versions
|
|
107
|
+
end
|
|
108
|
+
|
|
109
|
+
private
|
|
110
|
+
|
|
111
|
+
def client
|
|
112
|
+
@client ||= Aws::S3::Client.new
|
|
113
|
+
end
|
|
114
|
+
|
|
115
|
+
def key(filename)
|
|
116
|
+
prefix ? "#{prefix}/#{filename}" : filename
|
|
117
|
+
end
|
|
118
|
+
end
|
|
119
|
+
end
|
|
120
|
+
end
|
data/lib/trove/utils.rb
ADDED
|
@@ -0,0 +1,33 @@
|
|
|
1
|
+
module Trove
|
|
2
|
+
module Utils
|
|
3
|
+
# TODO improve performance
|
|
4
|
+
def self.human_size(size)
|
|
5
|
+
if size < 2**10
|
|
6
|
+
units = "B"
|
|
7
|
+
elsif size < 2**20
|
|
8
|
+
size /= (2**10).to_f
|
|
9
|
+
units = "KB"
|
|
10
|
+
elsif size < 2**30
|
|
11
|
+
size /= (2**20).to_f
|
|
12
|
+
units = "MB"
|
|
13
|
+
else
|
|
14
|
+
size /= (2**30).to_f
|
|
15
|
+
units = "GB"
|
|
16
|
+
end
|
|
17
|
+
|
|
18
|
+
round = size < 9.95 ? 1 : 0
|
|
19
|
+
"#{size.round(round)}#{units}"
|
|
20
|
+
end
|
|
21
|
+
|
|
22
|
+
def self.progress(stream, filename, current_size, total_size)
|
|
23
|
+
return unless stream.tty?
|
|
24
|
+
|
|
25
|
+
width = 50
|
|
26
|
+
progress = (100.0 * current_size / total_size).floor
|
|
27
|
+
completed = (width / 100.0 * progress).round
|
|
28
|
+
remaining = width - completed
|
|
29
|
+
stream.print "\r#{filename} [#{"=" * completed}#{" " * remaining}] %3s%% %11s " % [progress, "#{Utils.human_size(current_size)}/#{Utils.human_size(total_size)}"]
|
|
30
|
+
stream.print "\n" if current_size == total_size
|
|
31
|
+
end
|
|
32
|
+
end
|
|
33
|
+
end
|
metadata
ADDED
|
@@ -0,0 +1,122 @@
|
|
|
1
|
+
--- !ruby/object:Gem::Specification
|
|
2
|
+
name: trove
|
|
3
|
+
version: !ruby/object:Gem::Version
|
|
4
|
+
version: 0.1.0
|
|
5
|
+
platform: ruby
|
|
6
|
+
authors:
|
|
7
|
+
- Andrew Kane
|
|
8
|
+
autorequire:
|
|
9
|
+
bindir: exe
|
|
10
|
+
cert_chain: []
|
|
11
|
+
date: 2020-10-31 00:00:00.000000000 Z
|
|
12
|
+
dependencies:
|
|
13
|
+
- !ruby/object:Gem::Dependency
|
|
14
|
+
name: aws-sdk-s3
|
|
15
|
+
requirement: !ruby/object:Gem::Requirement
|
|
16
|
+
requirements:
|
|
17
|
+
- - ">="
|
|
18
|
+
- !ruby/object:Gem::Version
|
|
19
|
+
version: '0'
|
|
20
|
+
type: :runtime
|
|
21
|
+
prerelease: false
|
|
22
|
+
version_requirements: !ruby/object:Gem::Requirement
|
|
23
|
+
requirements:
|
|
24
|
+
- - ">="
|
|
25
|
+
- !ruby/object:Gem::Version
|
|
26
|
+
version: '0'
|
|
27
|
+
- !ruby/object:Gem::Dependency
|
|
28
|
+
name: thor
|
|
29
|
+
requirement: !ruby/object:Gem::Requirement
|
|
30
|
+
requirements:
|
|
31
|
+
- - ">="
|
|
32
|
+
- !ruby/object:Gem::Version
|
|
33
|
+
version: '0'
|
|
34
|
+
type: :runtime
|
|
35
|
+
prerelease: false
|
|
36
|
+
version_requirements: !ruby/object:Gem::Requirement
|
|
37
|
+
requirements:
|
|
38
|
+
- - ">="
|
|
39
|
+
- !ruby/object:Gem::Version
|
|
40
|
+
version: '0'
|
|
41
|
+
- !ruby/object:Gem::Dependency
|
|
42
|
+
name: bundler
|
|
43
|
+
requirement: !ruby/object:Gem::Requirement
|
|
44
|
+
requirements:
|
|
45
|
+
- - ">="
|
|
46
|
+
- !ruby/object:Gem::Version
|
|
47
|
+
version: '0'
|
|
48
|
+
type: :development
|
|
49
|
+
prerelease: false
|
|
50
|
+
version_requirements: !ruby/object:Gem::Requirement
|
|
51
|
+
requirements:
|
|
52
|
+
- - ">="
|
|
53
|
+
- !ruby/object:Gem::Version
|
|
54
|
+
version: '0'
|
|
55
|
+
- !ruby/object:Gem::Dependency
|
|
56
|
+
name: rake
|
|
57
|
+
requirement: !ruby/object:Gem::Requirement
|
|
58
|
+
requirements:
|
|
59
|
+
- - ">="
|
|
60
|
+
- !ruby/object:Gem::Version
|
|
61
|
+
version: '0'
|
|
62
|
+
type: :development
|
|
63
|
+
prerelease: false
|
|
64
|
+
version_requirements: !ruby/object:Gem::Requirement
|
|
65
|
+
requirements:
|
|
66
|
+
- - ">="
|
|
67
|
+
- !ruby/object:Gem::Version
|
|
68
|
+
version: '0'
|
|
69
|
+
- !ruby/object:Gem::Dependency
|
|
70
|
+
name: minitest
|
|
71
|
+
requirement: !ruby/object:Gem::Requirement
|
|
72
|
+
requirements:
|
|
73
|
+
- - ">="
|
|
74
|
+
- !ruby/object:Gem::Version
|
|
75
|
+
version: '5'
|
|
76
|
+
type: :development
|
|
77
|
+
prerelease: false
|
|
78
|
+
version_requirements: !ruby/object:Gem::Requirement
|
|
79
|
+
requirements:
|
|
80
|
+
- - ">="
|
|
81
|
+
- !ruby/object:Gem::Version
|
|
82
|
+
version: '5'
|
|
83
|
+
description:
|
|
84
|
+
email: andrew@chartkick.com
|
|
85
|
+
executables:
|
|
86
|
+
- trove
|
|
87
|
+
extensions: []
|
|
88
|
+
extra_rdoc_files: []
|
|
89
|
+
files:
|
|
90
|
+
- CHANGELOG.md
|
|
91
|
+
- LICENSE.txt
|
|
92
|
+
- README.md
|
|
93
|
+
- exe/trove
|
|
94
|
+
- lib/trove.rb
|
|
95
|
+
- lib/trove/cli.rb
|
|
96
|
+
- lib/trove/storage/s3.rb
|
|
97
|
+
- lib/trove/utils.rb
|
|
98
|
+
- lib/trove/version.rb
|
|
99
|
+
homepage: https://github.com/ankane/trove
|
|
100
|
+
licenses:
|
|
101
|
+
- MIT
|
|
102
|
+
metadata: {}
|
|
103
|
+
post_install_message:
|
|
104
|
+
rdoc_options: []
|
|
105
|
+
require_paths:
|
|
106
|
+
- lib
|
|
107
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
|
108
|
+
requirements:
|
|
109
|
+
- - ">="
|
|
110
|
+
- !ruby/object:Gem::Version
|
|
111
|
+
version: '2.5'
|
|
112
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
|
113
|
+
requirements:
|
|
114
|
+
- - ">="
|
|
115
|
+
- !ruby/object:Gem::Version
|
|
116
|
+
version: '0'
|
|
117
|
+
requirements: []
|
|
118
|
+
rubygems_version: 3.1.4
|
|
119
|
+
signing_key:
|
|
120
|
+
specification_version: 4
|
|
121
|
+
summary: Deploy machine learning models in Ruby (and Rails)
|
|
122
|
+
test_files: []
|