trove 0.1.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +7 -0
- data/CHANGELOG.md +3 -0
- data/LICENSE.txt +21 -0
- data/README.md +237 -0
- data/exe/trove +14 -0
- data/lib/trove.rb +162 -0
- data/lib/trove/cli.rb +113 -0
- data/lib/trove/storage/s3.rb +120 -0
- data/lib/trove/utils.rb +33 -0
- data/lib/trove/version.rb +3 -0
- metadata +122 -0
checksums.yaml
ADDED
@@ -0,0 +1,7 @@
|
|
1
|
+
---
|
2
|
+
SHA256:
|
3
|
+
metadata.gz: a501c7fded530b2dc537d412eae735c19cfa25f04e2e21dd8d6861ae42b2a271
|
4
|
+
data.tar.gz: 0cf407da8dce52a446e053a7e03791622ecd303a6514bdf551ce12bd31ebe788
|
5
|
+
SHA512:
|
6
|
+
metadata.gz: 4ad4fa70071eb4745046938a9b042b7df87eaa85964bc4d9a71fa90837335e40d8195f9de7ea27d70963c83a4facd163a1542527f7f88a5ffeb6aec8b4b71649
|
7
|
+
data.tar.gz: 73b4037bf7a50b498ef6d6bb640f2d62e237c7a1effba0aa298c9a54cf8d56edbd6efd539bdfc0768e1e6e1c85d3e8d2f524c60f5b996469881b6d67a66f9d1d
|
data/CHANGELOG.md
ADDED
data/LICENSE.txt
ADDED
@@ -0,0 +1,21 @@
|
|
1
|
+
The MIT License (MIT)
|
2
|
+
|
3
|
+
Copyright (c) 2020 Andrew Kane
|
4
|
+
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
7
|
+
in the Software without restriction, including without limitation the rights
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
10
|
+
furnished to do so, subject to the following conditions:
|
11
|
+
|
12
|
+
The above copyright notice and this permission notice shall be included in
|
13
|
+
all copies or substantial portions of the Software.
|
14
|
+
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
|
21
|
+
THE SOFTWARE.
|
data/README.md
ADDED
@@ -0,0 +1,237 @@
|
|
1
|
+
# Trove
|
2
|
+
|
3
|
+
:fire: Deploy machine learning models in Ruby (and Rails)
|
4
|
+
|
5
|
+
Works great with [XGBoost](https://github.com/ankane/xgboost), [Torch.rb](https://github.com/ankane/torch.rb), [fastText](https://github.com/ankane/fastText), and many other gems
|
6
|
+
|
7
|
+
## Installation
|
8
|
+
|
9
|
+
Add this line to your application’s Gemfile:
|
10
|
+
|
11
|
+
```ruby
|
12
|
+
gem 'trove'
|
13
|
+
```
|
14
|
+
|
15
|
+
And run:
|
16
|
+
|
17
|
+
```sh
|
18
|
+
bundle install
|
19
|
+
trove init
|
20
|
+
```
|
21
|
+
|
22
|
+
And [configure your storage](#storage) in `.trove.yml`.
|
23
|
+
|
24
|
+
## Storage
|
25
|
+
|
26
|
+
### Amazon S3
|
27
|
+
|
28
|
+
Create a bucket and enable object versioning.
|
29
|
+
|
30
|
+
Next, set up your AWS credentials. You can use the [AWS CLI](https://github.com/aws/aws-cli):
|
31
|
+
|
32
|
+
```sh
|
33
|
+
pip install awscli
|
34
|
+
aws configure
|
35
|
+
```
|
36
|
+
|
37
|
+
Or environment variables:
|
38
|
+
|
39
|
+
```sh
|
40
|
+
export AWS_ACCESS_KEY_ID=...
|
41
|
+
export AWS_SECRET_ACCESS_KEY=...
|
42
|
+
export AWS_REGION=...
|
43
|
+
```
|
44
|
+
|
45
|
+
IAM users need:
|
46
|
+
|
47
|
+
- `s3:GetObject` and `s3:GetObjectVersion` to pull files
|
48
|
+
- `s3:PutObject` to push files
|
49
|
+
- `s3:ListBucket` and `s3:ListBucketVersions` to list files and versions
|
50
|
+
- `s3:DeleteObject` and `s3:DeleteObjectVersion` to delete files
|
51
|
+
|
52
|
+
Here’s an example policy:
|
53
|
+
|
54
|
+
```json
|
55
|
+
{
|
56
|
+
"Version": "2012-10-17",
|
57
|
+
"Statement": [
|
58
|
+
{
|
59
|
+
"Sid": "Trove",
|
60
|
+
"Effect": "Allow",
|
61
|
+
"Action": [
|
62
|
+
"s3:GetObject",
|
63
|
+
"s3:GetObjectVersion",
|
64
|
+
"s3:PutObject",
|
65
|
+
"s3:ListBucket",
|
66
|
+
"s3:ListBucketVersions",
|
67
|
+
"s3:DeleteObject",
|
68
|
+
"s3:DeleteObjectVersion"
|
69
|
+
],
|
70
|
+
"Resource": [
|
71
|
+
"arn:aws:s3:::my-bucket",
|
72
|
+
"arn:aws:s3:::my-bucket/trove/*"
|
73
|
+
]
|
74
|
+
}
|
75
|
+
]
|
76
|
+
}
|
77
|
+
```
|
78
|
+
|
79
|
+
If your production servers only need to pull files, only give them `s3:GetObject` and `s3:GetObjectVersion` permissions.
|
80
|
+
|
81
|
+
## How It Works
|
82
|
+
|
83
|
+
Git is great for code, but it’s not ideal for large files like models. Instead, we use an object store like Amazon S3 to store and version them.
|
84
|
+
|
85
|
+
Trove creates an `trove` directory for you to use as a workspace. Files in the directory are ignored by Git, but can be pushed and pulled from the object store. By default, files are tracked in `.trove.yml` to make it easy to deploy specific versions with code changes.
|
86
|
+
|
87
|
+
## Getting Started
|
88
|
+
|
89
|
+
Use the `trove` directory to save and load models.
|
90
|
+
|
91
|
+
```ruby
|
92
|
+
# training code
|
93
|
+
model.save_model("trove/model.bin")
|
94
|
+
|
95
|
+
# prediction code
|
96
|
+
model = FastText.load_model("trove/model.bin")
|
97
|
+
```
|
98
|
+
|
99
|
+
When a model is ready, push it to the object store with:
|
100
|
+
|
101
|
+
```sh
|
102
|
+
trove push model.bin
|
103
|
+
```
|
104
|
+
|
105
|
+
And commit the changes to `.trove.yml`. The model is now ready to be deployed.
|
106
|
+
|
107
|
+
## Deployment
|
108
|
+
|
109
|
+
We recommend pulling files during the build process.
|
110
|
+
|
111
|
+
- [Heroku and Dokku](#heroku-and-dokku)
|
112
|
+
- [Docker](#docker)
|
113
|
+
|
114
|
+
Make sure your storage credentials are available in the build environment.
|
115
|
+
|
116
|
+
### Heroku and Dokku
|
117
|
+
|
118
|
+
Add to your `Rakefile`:
|
119
|
+
|
120
|
+
```ruby
|
121
|
+
Rake::Task["assets:precompile"].enhance do
|
122
|
+
Trove.pull
|
123
|
+
end
|
124
|
+
```
|
125
|
+
|
126
|
+
This will pull files at the very end of the asset precompile. Check the build output for:
|
127
|
+
|
128
|
+
```text
|
129
|
+
remote: Pulling model.bin...
|
130
|
+
remote: Asset precompilation completed (30.00s)
|
131
|
+
```
|
132
|
+
|
133
|
+
### Docker
|
134
|
+
|
135
|
+
And add to your `Dockerfile`:
|
136
|
+
|
137
|
+
```Dockerfile
|
138
|
+
RUN bundle exec trove pull
|
139
|
+
```
|
140
|
+
|
141
|
+
## Commands
|
142
|
+
|
143
|
+
Push a file
|
144
|
+
|
145
|
+
```sh
|
146
|
+
trove push model.bin
|
147
|
+
```
|
148
|
+
|
149
|
+
Pull all files in `.trove.yml`
|
150
|
+
|
151
|
+
```sh
|
152
|
+
trove pull
|
153
|
+
```
|
154
|
+
|
155
|
+
Pull a specific file (uses the version in `.trove.yml` if present)
|
156
|
+
|
157
|
+
```sh
|
158
|
+
trove pull model.bin
|
159
|
+
```
|
160
|
+
|
161
|
+
Pull a specific version of a file
|
162
|
+
|
163
|
+
```sh
|
164
|
+
trove pull model.bin --version 123
|
165
|
+
```
|
166
|
+
|
167
|
+
Delete a file
|
168
|
+
|
169
|
+
```sh
|
170
|
+
trove delete model.bin
|
171
|
+
```
|
172
|
+
|
173
|
+
List files
|
174
|
+
|
175
|
+
```sh
|
176
|
+
trove list
|
177
|
+
```
|
178
|
+
|
179
|
+
List versions
|
180
|
+
|
181
|
+
```sh
|
182
|
+
trove versions model.bin
|
183
|
+
```
|
184
|
+
|
185
|
+
## Ruby API
|
186
|
+
|
187
|
+
You can use the Ruby API in addition to the CLI.
|
188
|
+
|
189
|
+
```ruby
|
190
|
+
Trove.push(filename)
|
191
|
+
Trove.pull
|
192
|
+
Trove.pull(filename)
|
193
|
+
Trove.pull(filename, version: version)
|
194
|
+
Trove.delete(filename)
|
195
|
+
Trove.list
|
196
|
+
Trove.versions(filename)
|
197
|
+
```
|
198
|
+
|
199
|
+
This makes it easy to perform operations from code, iRuby notebooks, and the Rails console.
|
200
|
+
|
201
|
+
## Automated Training
|
202
|
+
|
203
|
+
By default, Trove tracks files in `.trove.yml` so you can deploy specific versions with `trove pull`. However, this functionality is entirely optional. Disable it with:
|
204
|
+
|
205
|
+
```yml
|
206
|
+
vcs: false
|
207
|
+
```
|
208
|
+
|
209
|
+
This is useful if you want to automate training or build more complex workflows.
|
210
|
+
|
211
|
+
## History
|
212
|
+
|
213
|
+
View the [changelog](https://github.com/ankane/trove/blob/master/CHANGELOG.md)
|
214
|
+
|
215
|
+
## Contributing
|
216
|
+
|
217
|
+
Everyone is encouraged to help improve this project. Here are a few ways you can help:
|
218
|
+
|
219
|
+
- [Report bugs](https://github.com/ankane/trove/issues)
|
220
|
+
- Fix bugs and [submit pull requests](https://github.com/ankane/trove/pulls)
|
221
|
+
- Write, clarify, or fix documentation
|
222
|
+
- Suggest or add new features
|
223
|
+
|
224
|
+
To get started with development:
|
225
|
+
|
226
|
+
```sh
|
227
|
+
git clone https://github.com/ankane/trove.git
|
228
|
+
cd trove
|
229
|
+
bundle install
|
230
|
+
|
231
|
+
export AWS_ACCESS_KEY_ID=...
|
232
|
+
export AWS_SECRET_ACCESS_KEY=...
|
233
|
+
export AWS_REGION=...
|
234
|
+
export S3_BUCKET=my-bucket
|
235
|
+
|
236
|
+
bundle exec rake test
|
237
|
+
```
|
data/exe/trove
ADDED
data/lib/trove.rb
ADDED
@@ -0,0 +1,162 @@
|
|
1
|
+
# stdlib
|
2
|
+
require "digest/md5"
|
3
|
+
require "yaml"
|
4
|
+
|
5
|
+
# modules
|
6
|
+
require "trove/utils"
|
7
|
+
require "trove/version"
|
8
|
+
|
9
|
+
module Trove
|
10
|
+
# storage
|
11
|
+
module Storage
|
12
|
+
autoload :S3, "trove/storage/s3"
|
13
|
+
end
|
14
|
+
|
15
|
+
# methods
|
16
|
+
class << self
|
17
|
+
# TODO use flock to prevent multiple concurrent downloads
|
18
|
+
def pull(filename = nil, version: nil)
|
19
|
+
if filename
|
20
|
+
pull_file(filename, version: version)
|
21
|
+
else
|
22
|
+
raise ArgumentError, "Specify filename for version" if version
|
23
|
+
|
24
|
+
(config["files"] || []).each do |file|
|
25
|
+
pull_file(file["name"], version: file["version"], all: true)
|
26
|
+
end
|
27
|
+
end
|
28
|
+
end
|
29
|
+
|
30
|
+
# could use upload_file method for multipart uploads over a certain size
|
31
|
+
# but multipart uploads have extra cost and cleanup, so keep it simple for now
|
32
|
+
def push(filename)
|
33
|
+
src = File.join(root, filename)
|
34
|
+
raise "File not found" unless File.exist?(src)
|
35
|
+
|
36
|
+
info = storage.info(filename)
|
37
|
+
upload = info.nil?
|
38
|
+
unless upload
|
39
|
+
version = info[:version]
|
40
|
+
if modified?(src, info)
|
41
|
+
upload = true
|
42
|
+
else
|
43
|
+
stream.puts "Already up-to-date"
|
44
|
+
end
|
45
|
+
end
|
46
|
+
|
47
|
+
if upload
|
48
|
+
stream.puts "Pushing #{filename}..." unless stream.tty?
|
49
|
+
resp = storage.upload(src, filename) do |current_size, total_size|
|
50
|
+
Utils.progress(stream, filename, current_size, total_size)
|
51
|
+
end
|
52
|
+
version = resp[:version]
|
53
|
+
end
|
54
|
+
|
55
|
+
if vcs?
|
56
|
+
# add files to yaml if needed
|
57
|
+
files = (config["files"] ||= [])
|
58
|
+
|
59
|
+
# find file
|
60
|
+
file = files.find { |f| f["name"] == filename }
|
61
|
+
unless file
|
62
|
+
file = {"name" => filename}
|
63
|
+
files << file
|
64
|
+
end
|
65
|
+
|
66
|
+
# update version
|
67
|
+
file["version"] = version
|
68
|
+
|
69
|
+
File.write(".trove.yml", config.to_yaml.sub(/\A---\n/, ""))
|
70
|
+
end
|
71
|
+
|
72
|
+
{
|
73
|
+
version: version
|
74
|
+
}
|
75
|
+
end
|
76
|
+
|
77
|
+
def delete(filename)
|
78
|
+
storage.delete(filename)
|
79
|
+
end
|
80
|
+
|
81
|
+
def list
|
82
|
+
storage.list
|
83
|
+
end
|
84
|
+
|
85
|
+
def versions(filename)
|
86
|
+
storage.versions(filename)
|
87
|
+
end
|
88
|
+
|
89
|
+
private
|
90
|
+
|
91
|
+
def pull_file(filename, version: nil, all: false)
|
92
|
+
dest = File.join(root, filename)
|
93
|
+
|
94
|
+
if !version
|
95
|
+
file = (config["files"] || []).find { |f| f["name"] == filename }
|
96
|
+
version = file["version"] if file
|
97
|
+
end
|
98
|
+
|
99
|
+
download = !File.exist?(dest)
|
100
|
+
unless download
|
101
|
+
info = storage.info(filename, version: version)
|
102
|
+
if info.nil? || modified?(dest, info)
|
103
|
+
download = true
|
104
|
+
else
|
105
|
+
stream.puts "Already up-to-date" unless all
|
106
|
+
end
|
107
|
+
end
|
108
|
+
|
109
|
+
if download
|
110
|
+
stream.puts "Pulling #{filename}..." unless stream.tty?
|
111
|
+
storage.download(filename, dest, version: version) do |current_size, total_size|
|
112
|
+
Utils.progress(stream, filename, current_size, total_size)
|
113
|
+
end
|
114
|
+
end
|
115
|
+
|
116
|
+
download
|
117
|
+
end
|
118
|
+
|
119
|
+
def modified?(src, info)
|
120
|
+
Digest::MD5.file(src).hexdigest != info[:md5]
|
121
|
+
end
|
122
|
+
|
123
|
+
# TODO test file not found
|
124
|
+
def config
|
125
|
+
@config ||= begin
|
126
|
+
begin
|
127
|
+
YAML.load_file(".trove.yml")
|
128
|
+
rescue Errno::ENOENT
|
129
|
+
raise "Config not found"
|
130
|
+
end
|
131
|
+
end
|
132
|
+
end
|
133
|
+
|
134
|
+
def root
|
135
|
+
@root ||= config["root"] || "trove"
|
136
|
+
end
|
137
|
+
|
138
|
+
def storage
|
139
|
+
@storage ||= begin
|
140
|
+
uri = URI.parse(config["storage"])
|
141
|
+
|
142
|
+
case uri.scheme
|
143
|
+
when "s3"
|
144
|
+
Storage::S3.new(
|
145
|
+
bucket: uri.host,
|
146
|
+
prefix: uri.path[1..-1]
|
147
|
+
)
|
148
|
+
else
|
149
|
+
raise "Invalid storage provider: #{uri.scheme}"
|
150
|
+
end
|
151
|
+
end
|
152
|
+
end
|
153
|
+
|
154
|
+
def vcs?
|
155
|
+
config.key?("vcs") ? config["vcs"] : true
|
156
|
+
end
|
157
|
+
|
158
|
+
def stream
|
159
|
+
$stderr
|
160
|
+
end
|
161
|
+
end
|
162
|
+
end
|
data/lib/trove/cli.rb
ADDED
@@ -0,0 +1,113 @@
|
|
1
|
+
require "thor"
|
2
|
+
|
3
|
+
module Trove
|
4
|
+
class CLI < Thor
|
5
|
+
include Thor::Actions
|
6
|
+
|
7
|
+
desc "init", "Initialize a project"
|
8
|
+
def init
|
9
|
+
create_file "trove/.keep", ""
|
10
|
+
|
11
|
+
if File.exist?(".gitignore")
|
12
|
+
contents = <<~EOS
|
13
|
+
|
14
|
+
# Ignore Trove storage
|
15
|
+
/trove/*
|
16
|
+
!/trove/.keep
|
17
|
+
EOS
|
18
|
+
unless File.read(".gitignore").include?(contents)
|
19
|
+
append_to_file(".gitignore", contents)
|
20
|
+
end
|
21
|
+
else
|
22
|
+
say "Check in trove/.keep and ignore trove/*"
|
23
|
+
end
|
24
|
+
|
25
|
+
create_file ".trove.yml", <<~EOS
|
26
|
+
storage: s3://my-bucket/trove
|
27
|
+
EOS
|
28
|
+
end
|
29
|
+
|
30
|
+
desc "push FILENAME", "Push a file"
|
31
|
+
def push(filename)
|
32
|
+
Trove.push(filename)
|
33
|
+
end
|
34
|
+
|
35
|
+
desc "pull [FILENAME]", "Pull files"
|
36
|
+
option :version
|
37
|
+
def pull(filename = nil)
|
38
|
+
Trove.pull(filename, version: options[:version])
|
39
|
+
end
|
40
|
+
|
41
|
+
desc "delete FILENAME", "Delete a file"
|
42
|
+
def delete(filename = nil)
|
43
|
+
Trove.delete(filename)
|
44
|
+
end
|
45
|
+
|
46
|
+
desc "list", "List files"
|
47
|
+
def list
|
48
|
+
say table(
|
49
|
+
Trove.list,
|
50
|
+
[:filename, :size, :updated_at]
|
51
|
+
)
|
52
|
+
end
|
53
|
+
|
54
|
+
desc "version", "Show the current version"
|
55
|
+
def version
|
56
|
+
say Trove::VERSION
|
57
|
+
end
|
58
|
+
|
59
|
+
desc "versions FILENAME", "List versions"
|
60
|
+
def versions(filename)
|
61
|
+
say table(
|
62
|
+
Trove.versions(filename),
|
63
|
+
[:version, :size, :updated_at]
|
64
|
+
)
|
65
|
+
end
|
66
|
+
|
67
|
+
private
|
68
|
+
|
69
|
+
def table(data, columns)
|
70
|
+
columns.each do |c|
|
71
|
+
if c == :size
|
72
|
+
data.each { |r| r[c] = Utils.human_size(r[c]) }
|
73
|
+
elsif c == :updated_at
|
74
|
+
data.each { |r| r[c] = "#{time_ago(r[c])} ago" }
|
75
|
+
elsif c == :version
|
76
|
+
data.each { |r| r[c] ||= "<none>" }
|
77
|
+
end
|
78
|
+
end
|
79
|
+
column_names = columns.map { |c| c.to_s.sub(/_at\z/, "").upcase }
|
80
|
+
widths = columns.map.with_index { |c, i| [column_names[i].size, data.map { |r| r[c].to_s.size }.max].max }
|
81
|
+
|
82
|
+
output = String.new("")
|
83
|
+
str = widths.map { |w| "%-#{w}s" }.join(" ") + "\n"
|
84
|
+
output << str % column_names
|
85
|
+
data.each do |row|
|
86
|
+
output << str % columns.map { |c| row[c] }
|
87
|
+
end
|
88
|
+
output
|
89
|
+
end
|
90
|
+
|
91
|
+
def time_ago(time)
|
92
|
+
diff = (Time.now - time).round
|
93
|
+
|
94
|
+
if diff < 60
|
95
|
+
pluralize(diff, "second")
|
96
|
+
elsif diff < 60 * 60
|
97
|
+
pluralize((diff / 60.0).floor, "minute")
|
98
|
+
elsif diff < 60 * 60 * 24
|
99
|
+
pluralize((diff / (60.0 * 60)).floor, "hour")
|
100
|
+
else
|
101
|
+
pluralize((diff / (60.0 * 60 * 24)).floor, "day")
|
102
|
+
end
|
103
|
+
end
|
104
|
+
|
105
|
+
def pluralize(value, str)
|
106
|
+
"#{value} #{value == 1 ? str : "#{str}s"}"
|
107
|
+
end
|
108
|
+
|
109
|
+
def self.exit_on_failure?
|
110
|
+
true
|
111
|
+
end
|
112
|
+
end
|
113
|
+
end
|
@@ -0,0 +1,120 @@
|
|
1
|
+
require "aws-sdk-s3"
|
2
|
+
require "fileutils"
|
3
|
+
|
4
|
+
module Trove
|
5
|
+
module Storage
|
6
|
+
class S3
|
7
|
+
attr_reader :bucket, :prefix
|
8
|
+
|
9
|
+
def initialize(bucket:, prefix: nil)
|
10
|
+
@bucket = bucket
|
11
|
+
@prefix = prefix
|
12
|
+
end
|
13
|
+
|
14
|
+
def download(filename, dest, version: nil)
|
15
|
+
current_size = 0
|
16
|
+
total_size = nil
|
17
|
+
|
18
|
+
# TODO better path
|
19
|
+
tmp = "#{Dir.tmpdir}/trove-#{Time.now.to_f}"
|
20
|
+
begin
|
21
|
+
File.open(tmp, "wb") do |file|
|
22
|
+
options = {bucket: bucket, key: key(filename)}
|
23
|
+
options[:version_id] = version if version
|
24
|
+
client.get_object(**options) do |chunk, headers|
|
25
|
+
file.write(chunk)
|
26
|
+
|
27
|
+
current_size += chunk.bytesize
|
28
|
+
total_size ||= headers["content-length"].to_i
|
29
|
+
yield current_size, total_size
|
30
|
+
end
|
31
|
+
end
|
32
|
+
FileUtils.mv(tmp, dest)
|
33
|
+
ensure
|
34
|
+
# delete file if interrupted
|
35
|
+
File.unlink(tmp) if File.exist?(tmp)
|
36
|
+
end
|
37
|
+
rescue Aws::S3::Errors::ServiceError
|
38
|
+
raise "File not found"
|
39
|
+
end
|
40
|
+
|
41
|
+
def upload(src, filename, &block)
|
42
|
+
on_chunk_sent = lambda do |_, current_size, total_size|
|
43
|
+
block.call(current_size, total_size)
|
44
|
+
end
|
45
|
+
resp = nil
|
46
|
+
File.open(src, "rb") do |file|
|
47
|
+
resp = client.put_object(bucket: bucket, key: key(filename), body: file, on_chunk_sent: on_chunk_sent)
|
48
|
+
end
|
49
|
+
{version: resp.version_id}
|
50
|
+
end
|
51
|
+
|
52
|
+
# etag isn't always MD5, but low likelihood of match if not
|
53
|
+
# could alternatively add sha256 to metadata
|
54
|
+
def info(filename, version: nil)
|
55
|
+
options = {bucket: bucket, key: key(filename)}
|
56
|
+
options[:version_id] = version if version
|
57
|
+
resp = client.head_object(**options)
|
58
|
+
{
|
59
|
+
version: resp.version_id,
|
60
|
+
md5: resp.etag.gsub('"', "")
|
61
|
+
}
|
62
|
+
rescue Aws::S3::Errors::ServiceError
|
63
|
+
nil
|
64
|
+
end
|
65
|
+
|
66
|
+
def delete(filename, version: nil)
|
67
|
+
options = {bucket: bucket, key: key(filename)}
|
68
|
+
options[:version_id] = version if version
|
69
|
+
client.delete_object(**options)
|
70
|
+
true
|
71
|
+
rescue Aws::S3::Errors::ServiceError
|
72
|
+
false
|
73
|
+
end
|
74
|
+
|
75
|
+
def list
|
76
|
+
files = []
|
77
|
+
options = {bucket: bucket}
|
78
|
+
options[:prefix] = prefix if prefix
|
79
|
+
client.list_objects_v2(**options).each do |response|
|
80
|
+
response.contents.each do |object|
|
81
|
+
filename = prefix ? object.key[(prefix.size + 1)..-1] : object.key
|
82
|
+
files << {
|
83
|
+
filename: filename,
|
84
|
+
size: object.size,
|
85
|
+
updated_at: object.last_modified
|
86
|
+
}
|
87
|
+
end
|
88
|
+
end
|
89
|
+
files
|
90
|
+
end
|
91
|
+
|
92
|
+
def versions(filename)
|
93
|
+
versions = []
|
94
|
+
object_key = key(filename)
|
95
|
+
client.list_object_versions(bucket: bucket, prefix: object_key).each do |response|
|
96
|
+
response.versions.each do |version|
|
97
|
+
next if version.key != object_key
|
98
|
+
|
99
|
+
versions << {
|
100
|
+
version: version.version_id == "null" ? nil : version.version_id,
|
101
|
+
size: version.size,
|
102
|
+
updated_at: version.last_modified
|
103
|
+
}
|
104
|
+
end
|
105
|
+
end
|
106
|
+
versions
|
107
|
+
end
|
108
|
+
|
109
|
+
private
|
110
|
+
|
111
|
+
def client
|
112
|
+
@client ||= Aws::S3::Client.new
|
113
|
+
end
|
114
|
+
|
115
|
+
def key(filename)
|
116
|
+
prefix ? "#{prefix}/#{filename}" : filename
|
117
|
+
end
|
118
|
+
end
|
119
|
+
end
|
120
|
+
end
|
data/lib/trove/utils.rb
ADDED
@@ -0,0 +1,33 @@
|
|
1
|
+
module Trove
|
2
|
+
module Utils
|
3
|
+
# TODO improve performance
|
4
|
+
def self.human_size(size)
|
5
|
+
if size < 2**10
|
6
|
+
units = "B"
|
7
|
+
elsif size < 2**20
|
8
|
+
size /= (2**10).to_f
|
9
|
+
units = "KB"
|
10
|
+
elsif size < 2**30
|
11
|
+
size /= (2**20).to_f
|
12
|
+
units = "MB"
|
13
|
+
else
|
14
|
+
size /= (2**30).to_f
|
15
|
+
units = "GB"
|
16
|
+
end
|
17
|
+
|
18
|
+
round = size < 9.95 ? 1 : 0
|
19
|
+
"#{size.round(round)}#{units}"
|
20
|
+
end
|
21
|
+
|
22
|
+
def self.progress(stream, filename, current_size, total_size)
|
23
|
+
return unless stream.tty?
|
24
|
+
|
25
|
+
width = 50
|
26
|
+
progress = (100.0 * current_size / total_size).floor
|
27
|
+
completed = (width / 100.0 * progress).round
|
28
|
+
remaining = width - completed
|
29
|
+
stream.print "\r#{filename} [#{"=" * completed}#{" " * remaining}] %3s%% %11s " % [progress, "#{Utils.human_size(current_size)}/#{Utils.human_size(total_size)}"]
|
30
|
+
stream.print "\n" if current_size == total_size
|
31
|
+
end
|
32
|
+
end
|
33
|
+
end
|
metadata
ADDED
@@ -0,0 +1,122 @@
|
|
1
|
+
--- !ruby/object:Gem::Specification
|
2
|
+
name: trove
|
3
|
+
version: !ruby/object:Gem::Version
|
4
|
+
version: 0.1.0
|
5
|
+
platform: ruby
|
6
|
+
authors:
|
7
|
+
- Andrew Kane
|
8
|
+
autorequire:
|
9
|
+
bindir: exe
|
10
|
+
cert_chain: []
|
11
|
+
date: 2020-10-31 00:00:00.000000000 Z
|
12
|
+
dependencies:
|
13
|
+
- !ruby/object:Gem::Dependency
|
14
|
+
name: aws-sdk-s3
|
15
|
+
requirement: !ruby/object:Gem::Requirement
|
16
|
+
requirements:
|
17
|
+
- - ">="
|
18
|
+
- !ruby/object:Gem::Version
|
19
|
+
version: '0'
|
20
|
+
type: :runtime
|
21
|
+
prerelease: false
|
22
|
+
version_requirements: !ruby/object:Gem::Requirement
|
23
|
+
requirements:
|
24
|
+
- - ">="
|
25
|
+
- !ruby/object:Gem::Version
|
26
|
+
version: '0'
|
27
|
+
- !ruby/object:Gem::Dependency
|
28
|
+
name: thor
|
29
|
+
requirement: !ruby/object:Gem::Requirement
|
30
|
+
requirements:
|
31
|
+
- - ">="
|
32
|
+
- !ruby/object:Gem::Version
|
33
|
+
version: '0'
|
34
|
+
type: :runtime
|
35
|
+
prerelease: false
|
36
|
+
version_requirements: !ruby/object:Gem::Requirement
|
37
|
+
requirements:
|
38
|
+
- - ">="
|
39
|
+
- !ruby/object:Gem::Version
|
40
|
+
version: '0'
|
41
|
+
- !ruby/object:Gem::Dependency
|
42
|
+
name: bundler
|
43
|
+
requirement: !ruby/object:Gem::Requirement
|
44
|
+
requirements:
|
45
|
+
- - ">="
|
46
|
+
- !ruby/object:Gem::Version
|
47
|
+
version: '0'
|
48
|
+
type: :development
|
49
|
+
prerelease: false
|
50
|
+
version_requirements: !ruby/object:Gem::Requirement
|
51
|
+
requirements:
|
52
|
+
- - ">="
|
53
|
+
- !ruby/object:Gem::Version
|
54
|
+
version: '0'
|
55
|
+
- !ruby/object:Gem::Dependency
|
56
|
+
name: rake
|
57
|
+
requirement: !ruby/object:Gem::Requirement
|
58
|
+
requirements:
|
59
|
+
- - ">="
|
60
|
+
- !ruby/object:Gem::Version
|
61
|
+
version: '0'
|
62
|
+
type: :development
|
63
|
+
prerelease: false
|
64
|
+
version_requirements: !ruby/object:Gem::Requirement
|
65
|
+
requirements:
|
66
|
+
- - ">="
|
67
|
+
- !ruby/object:Gem::Version
|
68
|
+
version: '0'
|
69
|
+
- !ruby/object:Gem::Dependency
|
70
|
+
name: minitest
|
71
|
+
requirement: !ruby/object:Gem::Requirement
|
72
|
+
requirements:
|
73
|
+
- - ">="
|
74
|
+
- !ruby/object:Gem::Version
|
75
|
+
version: '5'
|
76
|
+
type: :development
|
77
|
+
prerelease: false
|
78
|
+
version_requirements: !ruby/object:Gem::Requirement
|
79
|
+
requirements:
|
80
|
+
- - ">="
|
81
|
+
- !ruby/object:Gem::Version
|
82
|
+
version: '5'
|
83
|
+
description:
|
84
|
+
email: andrew@chartkick.com
|
85
|
+
executables:
|
86
|
+
- trove
|
87
|
+
extensions: []
|
88
|
+
extra_rdoc_files: []
|
89
|
+
files:
|
90
|
+
- CHANGELOG.md
|
91
|
+
- LICENSE.txt
|
92
|
+
- README.md
|
93
|
+
- exe/trove
|
94
|
+
- lib/trove.rb
|
95
|
+
- lib/trove/cli.rb
|
96
|
+
- lib/trove/storage/s3.rb
|
97
|
+
- lib/trove/utils.rb
|
98
|
+
- lib/trove/version.rb
|
99
|
+
homepage: https://github.com/ankane/trove
|
100
|
+
licenses:
|
101
|
+
- MIT
|
102
|
+
metadata: {}
|
103
|
+
post_install_message:
|
104
|
+
rdoc_options: []
|
105
|
+
require_paths:
|
106
|
+
- lib
|
107
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
108
|
+
requirements:
|
109
|
+
- - ">="
|
110
|
+
- !ruby/object:Gem::Version
|
111
|
+
version: '2.5'
|
112
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
113
|
+
requirements:
|
114
|
+
- - ">="
|
115
|
+
- !ruby/object:Gem::Version
|
116
|
+
version: '0'
|
117
|
+
requirements: []
|
118
|
+
rubygems_version: 3.1.4
|
119
|
+
signing_key:
|
120
|
+
specification_version: 4
|
121
|
+
summary: Deploy machine learning models in Ruby (and Rails)
|
122
|
+
test_files: []
|