gitballs 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/.ruby-version +1 -0
- data/CHANGELOG.md +5 -0
- data/CODE_OF_CONDUCT.md +10 -0
- data/LICENSE.txt +22 -0
- data/Rakefile +8 -0
- data/Readme.md +66 -0
- data/download.rb +102 -0
- data/exe/gitballs +6 -0
- data/lib/gitballs/cli.rb +97 -0
- data/lib/gitballs/client.rb +59 -0
- data/lib/gitballs/compressor.rb +140 -0
- data/lib/gitballs/registry.rb +68 -0
- data/lib/gitballs/stats.rb +59 -0
- data/lib/gitballs/version.rb +5 -0
- data/lib/gitballs.rb +12 -0
- data/sig/gitballs.rbs +4 -0
- metadata +102 -0
checksums.yaml
ADDED
|
@@ -0,0 +1,7 @@
|
|
|
1
|
+
---
|
|
2
|
+
SHA256:
|
|
3
|
+
metadata.gz: 9a964e4849a7431071d704d40e3eecb45a0e6d1e8c98518630c2ebfd443b8393
|
|
4
|
+
data.tar.gz: 22d3fb51ff7153d84a971cff39140074599fbd9bf7197867ec1522ef483cf790
|
|
5
|
+
SHA512:
|
|
6
|
+
metadata.gz: 6708210b22d2c27569414da6daba7b75f6e9b2a368fad074a22a6e9e70d88bdf10a10672e764d178d47f61b434ceb993006b8542215db616f1cffbc57a3ff450
|
|
7
|
+
data.tar.gz: f06e2b498c7b28b7beb878124b44866173defc4b4270f7558472303d67ab752ca6b52515cc8c1bcac11ef02688c87eb096c9b5d9c390e81d9f6f45f5c04033ff
|
data/.ruby-version
ADDED
|
@@ -0,0 +1 @@
|
|
|
1
|
+
3.4.7
|
data/CHANGELOG.md
ADDED
data/CODE_OF_CONDUCT.md
ADDED
|
@@ -0,0 +1,10 @@
|
|
|
1
|
+
# Code of Conduct
|
|
2
|
+
|
|
3
|
+
"gitballs" follows [The Ruby Community Conduct Guideline](https://www.ruby-lang.org/en/conduct) in all "collaborative space", which is defined as community communications channels (such as mailing lists, submitted patches, commit comments, etc.):
|
|
4
|
+
|
|
5
|
+
* Participants will be tolerant of opposing views.
|
|
6
|
+
* Participants must ensure that their language and actions are free of personal attacks and disparaging personal remarks.
|
|
7
|
+
* When interpreting the words and actions of others, participants should always assume good intentions.
|
|
8
|
+
* Behaviour which can be reasonably considered harassment will not be tolerated.
|
|
9
|
+
|
|
10
|
+
If you have any concerns about behaviour within this project, please contact us at ["andrewnez@gmail.com"](mailto:"andrewnez@gmail.com").
|
data/LICENSE.txt
ADDED
|
@@ -0,0 +1,22 @@
|
|
|
1
|
+
Copyright (c) 2016 Andrew Nesbitt
|
|
2
|
+
|
|
3
|
+
MIT License
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining
|
|
6
|
+
a copy of this software and associated documentation files (the
|
|
7
|
+
"Software"), to deal in the Software without restriction, including
|
|
8
|
+
without limitation the rights to use, copy, modify, merge, publish,
|
|
9
|
+
distribute, sublicense, and/or sell copies of the Software, and to
|
|
10
|
+
permit persons to whom the Software is furnished to do so, subject to
|
|
11
|
+
the following conditions:
|
|
12
|
+
|
|
13
|
+
The above copyright notice and this permission notice shall be
|
|
14
|
+
included in all copies or substantial portions of the Software.
|
|
15
|
+
|
|
16
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
|
17
|
+
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
|
18
|
+
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
|
|
19
|
+
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
|
|
20
|
+
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
|
|
21
|
+
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
|
|
22
|
+
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
data/Rakefile
ADDED
data/Readme.md
ADDED
|
@@ -0,0 +1,66 @@
|
|
|
1
|
+
# Gitballs :8ball: :basketball: :soccer:
|
|
2
|
+
|
|
3
|
+
An investigation into storing package manager release tarballs in a space efficient way using git.
|
|
4
|
+
|
|
5
|
+
## Theory
|
|
6
|
+
|
|
7
|
+
Each release of a library to a package manager is usually a few line changes, but is stored as a complete copy of the changes, git is good at efficiently storing diffs so perhaps we can put each release in a git repo as a commit and see how much smaller the result will be.
|
|
8
|
+
|
|
9
|
+
## Results
|
|
10
|
+
|
|
11
|
+
#### Rubygems
|
|
12
|
+
|
|
13
|
+
| gem name | # of releases | tarball size | gitball size | % saving | saving |
|
|
14
|
+
| ------------------ | ------------- | ------------ | ------------ | -------- | ------ |
|
|
15
|
+
| split | 48 | 2.5M | 380K | 85 | 2.12MB |
|
|
16
|
+
| redis | 52 | 1.7M | 400K | 76 | 1.3MB |
|
|
17
|
+
| capistrano | 82 | 7.0M | 772K | 89 | 6.23MB |
|
|
18
|
+
| rake | 71 | 7.0M | 624K | 91 | 6.38MB |
|
|
19
|
+
| bundler | 225 | 42M | 1.9M | 95 | 40MB |
|
|
20
|
+
| rails | 288 | 159M | 7.4M | 95 | 152MB |
|
|
21
|
+
| nokogiri | 94 | 275M | 33M | 88 | 242MB |
|
|
22
|
+
| sass | 309 | 74M | 2.0M | 97 | 72MB |
|
|
23
|
+
| font-awesome-rails | 34 | 55M | 16M | 70 | 39MB |
|
|
24
|
+
| i18n-active_record | 4 | 52K | 360K | -590 | -308KB |
|
|
25
|
+
|
|
26
|
+
#### NPM
|
|
27
|
+
|
|
28
|
+
| module name | # of releases | tarball size | gitball size | % saving | saving |
|
|
29
|
+
| ----------- | ------------- | ------------ | ------------ | -------- | ------ |
|
|
30
|
+
| base62 | 6 | 36K | 100K | -277 | -64KB |
|
|
31
|
+
| express | 274 | 23M | 9.8M | 57 | 13.2MB |
|
|
32
|
+
| mocha | 118 | 10M | 1.1M | 89 | 8.9MB |
|
|
33
|
+
| node-sass | 94 | 111M | 22M | 80 | 89MB |
|
|
34
|
+
| request | 111 | 6.7M | 844K | 87 | 5.86MB |
|
|
35
|
+
| left-pad | 11 | 52K | 348K | -569 | -296KB |
|
|
36
|
+
| react | 87 | 40M | 4.8M | 88 | 35.2MB |
|
|
37
|
+
| chai | 64 | 3.4M | 780K | 77 | 2.62MB |
|
|
38
|
+
| lodash | 88 | 79M | 8.1M | 90 | 70.9MB |
|
|
39
|
+
| bootstrap | 13 | 15M | 4.4M | 71 | 10.6MB |
|
|
40
|
+
|
|
41
|
+
## Installation
|
|
42
|
+
|
|
43
|
+
```
|
|
44
|
+
gem install gitballs
|
|
45
|
+
```
|
|
46
|
+
|
|
47
|
+
## CLI
|
|
48
|
+
|
|
49
|
+
```
|
|
50
|
+
gitballs init <purl> # Download and compress package versions into git repo
|
|
51
|
+
gitballs stats <path> # Show stats for an existing gitballs repo
|
|
52
|
+
gitballs version # Show version
|
|
53
|
+
```
|
|
54
|
+
|
|
55
|
+
Options:
|
|
56
|
+
- `-o, --output DIR` - Output directory (default: ./gitballs/<package>)
|
|
57
|
+
- `-q, --quiet` - Suppress progress output
|
|
58
|
+
|
|
59
|
+
Examples:
|
|
60
|
+
|
|
61
|
+
```
|
|
62
|
+
gitballs init pkg:gem/rails
|
|
63
|
+
gitballs init pkg:npm/lodash --output ./lodash-repo
|
|
64
|
+
gitballs init pkg:gem/nokogiri --quiet
|
|
65
|
+
gitballs stats ./gitballs/rails
|
|
66
|
+
```
|
data/download.rb
ADDED
|
@@ -0,0 +1,102 @@
|
|
|
1
|
+
require 'bundler'
|
|
2
|
+
Bundler.require
|
|
3
|
+
require 'json'
|
|
4
|
+
|
|
5
|
+
# for a given library (rubygem/node module etc)
|
|
6
|
+
|
|
7
|
+
library_name = 'bootstrap'
|
|
8
|
+
platform = 'npm'
|
|
9
|
+
|
|
10
|
+
# download a list of every release number
|
|
11
|
+
|
|
12
|
+
response = Typhoeus.get("https://libraries.io/api/#{platform}/#{library_name}")
|
|
13
|
+
library = JSON.parse(response.body)
|
|
14
|
+
|
|
15
|
+
versions = library['versions']
|
|
16
|
+
|
|
17
|
+
version_numbers = versions.map{|v| v['number']}
|
|
18
|
+
|
|
19
|
+
# delete any existing tarballs
|
|
20
|
+
|
|
21
|
+
`rm -f ./tarballs/*.tar*`
|
|
22
|
+
|
|
23
|
+
# delete any existing gitball repo
|
|
24
|
+
|
|
25
|
+
`rm -rf ./gitballs/.git`
|
|
26
|
+
|
|
27
|
+
# download each release tarball into /tarballs/:name
|
|
28
|
+
|
|
29
|
+
version_numbers.each do |version_number|
|
|
30
|
+
|
|
31
|
+
case platform
|
|
32
|
+
when 'rubygems'
|
|
33
|
+
tarball_url = "https://rubygems.org/downloads/#{library_name}-#{version_number}.gem"
|
|
34
|
+
`wget -O ./tarballs/#{version_number}.tar '#{tarball_url}'`
|
|
35
|
+
when 'npm'
|
|
36
|
+
tarball_url = "https://registry.npmjs.org/#{library_name}/-/#{library_name}-#{version_number}.tgz"
|
|
37
|
+
`wget -O ./tarballs/#{version_number}.tar.gz '#{tarball_url}'`
|
|
38
|
+
else
|
|
39
|
+
raise "unknown tarball url for #{platform}"
|
|
40
|
+
end
|
|
41
|
+
end
|
|
42
|
+
|
|
43
|
+
# create a new git repository in /gitballs/:name
|
|
44
|
+
|
|
45
|
+
`git init ./gitballs/`
|
|
46
|
+
|
|
47
|
+
# sort releases by semver
|
|
48
|
+
|
|
49
|
+
version_numbers.sort! do |a,b|
|
|
50
|
+
begin
|
|
51
|
+
Semantic::Version.new(a) <=> Semantic::Version.new(b)
|
|
52
|
+
rescue
|
|
53
|
+
a <=> b
|
|
54
|
+
end
|
|
55
|
+
end
|
|
56
|
+
|
|
57
|
+
# for each release
|
|
58
|
+
version_numbers.each do |version_number|
|
|
59
|
+
# delete the contents of the directory
|
|
60
|
+
|
|
61
|
+
`cd gitballs && git rm -rf .`
|
|
62
|
+
|
|
63
|
+
# untar the release into the directory
|
|
64
|
+
|
|
65
|
+
case platform
|
|
66
|
+
when 'rubygems'
|
|
67
|
+
`tar -C ./gitballs -xvf ./tarballs/#{version_number}.tar`
|
|
68
|
+
`cd gitballs && tar -C . -zxvf data.tar.gz && rm -f data.tar.gz metadata.gz` # rubygems specific
|
|
69
|
+
when 'npm'
|
|
70
|
+
`tar -C ./gitballs -zxvf ./tarballs/#{version_number}.tar.gz`
|
|
71
|
+
else
|
|
72
|
+
raise "unknown tarball url for #{platform}"
|
|
73
|
+
end
|
|
74
|
+
|
|
75
|
+
|
|
76
|
+
# add and commit all files and folders with the release number as the message
|
|
77
|
+
|
|
78
|
+
`cd gitballs && git add . && git commit -am '#{version_number}'`
|
|
79
|
+
end
|
|
80
|
+
|
|
81
|
+
# optimize the git repo size
|
|
82
|
+
|
|
83
|
+
`cd gitballs && git gc --aggressive --prune && git rm -rf .`
|
|
84
|
+
|
|
85
|
+
# calculate the size of the git repository (git count-objects -vH)
|
|
86
|
+
|
|
87
|
+
gitball_size = `du -sh ./gitballs | cut -f1`.strip
|
|
88
|
+
|
|
89
|
+
# calculate the size of the folder of tarballs (du -sh .)
|
|
90
|
+
|
|
91
|
+
tarball_size = `du -sh ./tarballs | cut -f1`.strip
|
|
92
|
+
|
|
93
|
+
# output comparison
|
|
94
|
+
|
|
95
|
+
puts "releases: #{version_numbers.length}"
|
|
96
|
+
puts "tarballs: #{tarball_size}"
|
|
97
|
+
puts "gitballs: #{gitball_size}"
|
|
98
|
+
|
|
99
|
+
# Extras
|
|
100
|
+
# make a branch for each major semver number
|
|
101
|
+
# handle invalid semver numbers by sorting by publish date
|
|
102
|
+
# how does it handle .gitignore files in tarballs?
|
data/exe/gitballs
ADDED
data/lib/gitballs/cli.rb
ADDED
|
@@ -0,0 +1,97 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
require "optparse"
|
|
4
|
+
|
|
5
|
+
module Gitballs
|
|
6
|
+
class CLI
|
|
7
|
+
def initialize(argv)
|
|
8
|
+
@argv = argv
|
|
9
|
+
@options = {}
|
|
10
|
+
end
|
|
11
|
+
|
|
12
|
+
def run
|
|
13
|
+
parser = build_parser
|
|
14
|
+
args = parser.parse(@argv)
|
|
15
|
+
command = args.shift
|
|
16
|
+
|
|
17
|
+
case command
|
|
18
|
+
when "init"
|
|
19
|
+
run_init(args)
|
|
20
|
+
when "stats"
|
|
21
|
+
run_stats(args)
|
|
22
|
+
when "version", "-v", "--version"
|
|
23
|
+
puts "gitballs #{VERSION}"
|
|
24
|
+
when "help", nil
|
|
25
|
+
puts parser
|
|
26
|
+
else
|
|
27
|
+
warn "Unknown command: #{command}"
|
|
28
|
+
puts parser
|
|
29
|
+
exit 1
|
|
30
|
+
end
|
|
31
|
+
end
|
|
32
|
+
|
|
33
|
+
def build_parser
|
|
34
|
+
OptionParser.new do |opts|
|
|
35
|
+
opts.banner = "Usage: gitballs <command> [options]"
|
|
36
|
+
opts.separator ""
|
|
37
|
+
opts.separator "Commands:"
|
|
38
|
+
opts.separator " init <purl> Download and compress package versions into git repo"
|
|
39
|
+
opts.separator " stats <path> Show stats for an existing gitballs repo"
|
|
40
|
+
opts.separator " version Show version"
|
|
41
|
+
opts.separator ""
|
|
42
|
+
opts.separator "Options:"
|
|
43
|
+
|
|
44
|
+
opts.on("-o", "--output DIR", "Output directory (default: ./gitballs/<package>)") do |dir|
|
|
45
|
+
@options[:output] = dir
|
|
46
|
+
end
|
|
47
|
+
|
|
48
|
+
opts.on("-q", "--quiet", "Suppress progress output") do
|
|
49
|
+
@options[:quiet] = true
|
|
50
|
+
end
|
|
51
|
+
|
|
52
|
+
opts.on("-h", "--help", "Show this help") do
|
|
53
|
+
puts opts
|
|
54
|
+
exit
|
|
55
|
+
end
|
|
56
|
+
end
|
|
57
|
+
end
|
|
58
|
+
|
|
59
|
+
def run_init(args)
|
|
60
|
+
purl = args.first
|
|
61
|
+
unless purl
|
|
62
|
+
warn "Error: purl argument required"
|
|
63
|
+
warn "Usage: gitballs init <purl>"
|
|
64
|
+
warn "Example: gitballs init pkg:gem/rails"
|
|
65
|
+
exit 1
|
|
66
|
+
end
|
|
67
|
+
|
|
68
|
+
compressor = Compressor.new(purl, output: @options[:output], quiet: @options[:quiet])
|
|
69
|
+
compressor.run
|
|
70
|
+
puts compressor.stats unless @options[:quiet]
|
|
71
|
+
puts "output: #{compressor.output_dir}"
|
|
72
|
+
rescue Error => e
|
|
73
|
+
warn "Error: #{e.message}"
|
|
74
|
+
exit 1
|
|
75
|
+
end
|
|
76
|
+
|
|
77
|
+
def run_stats(args)
|
|
78
|
+
path = args.first
|
|
79
|
+
unless path
|
|
80
|
+
warn "Error: path argument required"
|
|
81
|
+
warn "Usage: gitballs stats <path>"
|
|
82
|
+
exit 1
|
|
83
|
+
end
|
|
84
|
+
|
|
85
|
+
unless File.directory?(path)
|
|
86
|
+
warn "Error: #{path} is not a directory"
|
|
87
|
+
exit 1
|
|
88
|
+
end
|
|
89
|
+
|
|
90
|
+
stats = Stats.new(path)
|
|
91
|
+
puts stats
|
|
92
|
+
rescue Error => e
|
|
93
|
+
warn "Error: #{e.message}"
|
|
94
|
+
exit 1
|
|
95
|
+
end
|
|
96
|
+
end
|
|
97
|
+
end
|
|
@@ -0,0 +1,59 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
require "typhoeus"
|
|
4
|
+
require "json"
|
|
5
|
+
require "purl"
|
|
6
|
+
|
|
7
|
+
module Gitballs
|
|
8
|
+
class Client
|
|
9
|
+
BASE_URL = "https://packages.ecosyste.ms/api/v1"
|
|
10
|
+
|
|
11
|
+
REGISTRY_MAP = {
|
|
12
|
+
"gem" => "rubygems.org",
|
|
13
|
+
"npm" => "npmjs.org",
|
|
14
|
+
"pypi" => "pypi.org",
|
|
15
|
+
"cargo" => "crates.io",
|
|
16
|
+
"nuget" => "nuget.org",
|
|
17
|
+
"maven" => "repo1.maven.org",
|
|
18
|
+
"go" => "proxy.golang.org",
|
|
19
|
+
"hex" => "hex.pm",
|
|
20
|
+
"packagist" => "packagist.org"
|
|
21
|
+
}.freeze
|
|
22
|
+
|
|
23
|
+
def initialize
|
|
24
|
+
@hydra = Typhoeus::Hydra.new
|
|
25
|
+
end
|
|
26
|
+
|
|
27
|
+
def fetch_versions(purl_string)
|
|
28
|
+
purl = Purl.parse(purl_string)
|
|
29
|
+
registry = registry_for(purl.type)
|
|
30
|
+
package_name = package_name_for(purl)
|
|
31
|
+
|
|
32
|
+
url = "#{BASE_URL}/registries/#{registry}/packages/#{package_name}/versions"
|
|
33
|
+
response = Typhoeus.get(url)
|
|
34
|
+
|
|
35
|
+
raise Error, "Failed to fetch versions: #{response.code}" unless response.success?
|
|
36
|
+
|
|
37
|
+
JSON.parse(response.body)
|
|
38
|
+
end
|
|
39
|
+
|
|
40
|
+
def download_tarball(url, destination)
|
|
41
|
+
response = Typhoeus.get(url, followlocation: true)
|
|
42
|
+
raise Error, "Failed to download #{url}: #{response.code}" unless response.success?
|
|
43
|
+
|
|
44
|
+
File.binwrite(destination, response.body)
|
|
45
|
+
end
|
|
46
|
+
|
|
47
|
+
def registry_for(purl_type)
|
|
48
|
+
REGISTRY_MAP[purl_type] || raise(Error, "Unsupported purl type: #{purl_type}")
|
|
49
|
+
end
|
|
50
|
+
|
|
51
|
+
def package_name_for(purl)
|
|
52
|
+
if purl.namespace
|
|
53
|
+
"#{purl.namespace}%2F#{purl.name}"
|
|
54
|
+
else
|
|
55
|
+
purl.name
|
|
56
|
+
end
|
|
57
|
+
end
|
|
58
|
+
end
|
|
59
|
+
end
|
|
@@ -0,0 +1,140 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
require "fileutils"
|
|
4
|
+
require "purl"
|
|
5
|
+
require "vers"
|
|
6
|
+
|
|
7
|
+
module Gitballs
|
|
8
|
+
class Compressor
|
|
9
|
+
attr_reader :purl, :output_dir, :quiet
|
|
10
|
+
|
|
11
|
+
def initialize(purl_string, output: nil, quiet: false)
|
|
12
|
+
@purl = Purl.parse(purl_string)
|
|
13
|
+
@output_dir = output || File.join(".", "gitballs", @purl.name)
|
|
14
|
+
@quiet = quiet
|
|
15
|
+
@client = Client.new
|
|
16
|
+
@registry = Registry.new(@purl.type)
|
|
17
|
+
@tarball_dir = File.join(@output_dir, ".tarballs")
|
|
18
|
+
end
|
|
19
|
+
|
|
20
|
+
def run
|
|
21
|
+
setup_directories
|
|
22
|
+
versions = fetch_and_sort_versions
|
|
23
|
+
download_tarballs(versions)
|
|
24
|
+
@tarball_size = dir_size(@tarball_dir)
|
|
25
|
+
init_git_repo
|
|
26
|
+
commit_versions(versions)
|
|
27
|
+
optimize_repo
|
|
28
|
+
cleanup_tarballs
|
|
29
|
+
stats
|
|
30
|
+
end
|
|
31
|
+
|
|
32
|
+
def stats
|
|
33
|
+
Stats.new(@output_dir, @tarball_size)
|
|
34
|
+
end
|
|
35
|
+
|
|
36
|
+
def dir_size(path)
|
|
37
|
+
return 0 unless File.directory?(path)
|
|
38
|
+
|
|
39
|
+
`du -sk #{path}`.strip.split.first.to_i * 1024
|
|
40
|
+
end
|
|
41
|
+
|
|
42
|
+
def setup_directories
|
|
43
|
+
FileUtils.rm_rf(@output_dir)
|
|
44
|
+
FileUtils.mkdir_p(@output_dir)
|
|
45
|
+
FileUtils.mkdir_p(@tarball_dir)
|
|
46
|
+
end
|
|
47
|
+
|
|
48
|
+
def fetch_and_sort_versions
|
|
49
|
+
log "Fetching versions..."
|
|
50
|
+
versions = @client.fetch_versions(@purl.to_s)
|
|
51
|
+
|
|
52
|
+
versions.sort_by do |v|
|
|
53
|
+
Vers::Version.new(v["number"])
|
|
54
|
+
rescue ArgumentError
|
|
55
|
+
Vers::Version.new("0.0.0")
|
|
56
|
+
end
|
|
57
|
+
end
|
|
58
|
+
|
|
59
|
+
def download_tarballs(versions)
|
|
60
|
+
log "Downloading #{versions.size} versions..."
|
|
61
|
+
versions.each_with_index do |version, index|
|
|
62
|
+
number = version["number"]
|
|
63
|
+
url = version["download_url"]
|
|
64
|
+
next unless url
|
|
65
|
+
|
|
66
|
+
log " [#{index + 1}/#{versions.size}] #{number}"
|
|
67
|
+
extension = tarball_extension(url)
|
|
68
|
+
destination = File.join(@tarball_dir, "#{number}#{extension}")
|
|
69
|
+
@client.download_tarball(url, destination)
|
|
70
|
+
rescue Error => e
|
|
71
|
+
log " skipping: #{e.message}"
|
|
72
|
+
end
|
|
73
|
+
end
|
|
74
|
+
|
|
75
|
+
def init_git_repo
|
|
76
|
+
log "Initializing git repository..."
|
|
77
|
+
system("git", "init", @output_dir, out: File::NULL, err: File::NULL, exception: true)
|
|
78
|
+
end
|
|
79
|
+
|
|
80
|
+
def commit_versions(versions)
|
|
81
|
+
log "Committing versions..."
|
|
82
|
+
versions.each_with_index do |version, index|
|
|
83
|
+
number = version["number"]
|
|
84
|
+
tarball = find_tarball(number)
|
|
85
|
+
next unless tarball
|
|
86
|
+
|
|
87
|
+
log " [#{index + 1}/#{versions.size}] #{number}"
|
|
88
|
+
clear_working_dir
|
|
89
|
+
@registry.extract(tarball, @output_dir)
|
|
90
|
+
git_add_and_commit(number)
|
|
91
|
+
end
|
|
92
|
+
end
|
|
93
|
+
|
|
94
|
+
def optimize_repo
|
|
95
|
+
log "Optimizing repository..."
|
|
96
|
+
Dir.chdir(@output_dir) do
|
|
97
|
+
system("git", "gc", "--aggressive", "--prune=now", out: File::NULL, err: File::NULL, exception: true)
|
|
98
|
+
system("git", "rm", "-rf", ".", out: File::NULL, err: File::NULL, exception: true)
|
|
99
|
+
end
|
|
100
|
+
end
|
|
101
|
+
|
|
102
|
+
def cleanup_tarballs
|
|
103
|
+
FileUtils.rm_rf(@tarball_dir)
|
|
104
|
+
end
|
|
105
|
+
|
|
106
|
+
def clear_working_dir
|
|
107
|
+
Dir.chdir(@output_dir) do
|
|
108
|
+
Dir.children(".").reject { |f| f == ".git" || f == ".tarballs" }.each do |entry|
|
|
109
|
+
FileUtils.rm_rf(entry)
|
|
110
|
+
end
|
|
111
|
+
end
|
|
112
|
+
end
|
|
113
|
+
|
|
114
|
+
def git_add_and_commit(version_number)
|
|
115
|
+
Dir.chdir(@output_dir) do
|
|
116
|
+
system("git", "add", "-A", out: File::NULL, err: File::NULL, exception: true)
|
|
117
|
+
system("git", "commit", "-m", version_number, "--allow-empty", out: File::NULL, err: File::NULL, exception: true)
|
|
118
|
+
end
|
|
119
|
+
end
|
|
120
|
+
|
|
121
|
+
def find_tarball(version_number)
|
|
122
|
+
Dir.glob(File.join(@tarball_dir, "#{version_number}.*")).first
|
|
123
|
+
end
|
|
124
|
+
|
|
125
|
+
def tarball_extension(url)
|
|
126
|
+
case url
|
|
127
|
+
when /\.gem$/ then ".gem"
|
|
128
|
+
when /\.tgz$/ then ".tgz"
|
|
129
|
+
when /\.tar\.gz$/ then ".tar.gz"
|
|
130
|
+
when /\.zip$/ then ".zip"
|
|
131
|
+
when /\.nupkg$/ then ".nupkg"
|
|
132
|
+
else ".tar.gz"
|
|
133
|
+
end
|
|
134
|
+
end
|
|
135
|
+
|
|
136
|
+
def log(message)
|
|
137
|
+
puts message unless @quiet
|
|
138
|
+
end
|
|
139
|
+
end
|
|
140
|
+
end
|
|
@@ -0,0 +1,68 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
module Gitballs
|
|
4
|
+
class Registry
|
|
5
|
+
EXTRACTORS = {
|
|
6
|
+
"gem" => :extract_gem,
|
|
7
|
+
"npm" => :extract_tgz,
|
|
8
|
+
"pypi" => :extract_tgz,
|
|
9
|
+
"cargo" => :extract_tgz,
|
|
10
|
+
"nuget" => :extract_nupkg,
|
|
11
|
+
"go" => :extract_zip,
|
|
12
|
+
"hex" => :extract_tgz,
|
|
13
|
+
"packagist" => :extract_zip
|
|
14
|
+
}.freeze
|
|
15
|
+
|
|
16
|
+
def initialize(purl_type)
|
|
17
|
+
@purl_type = purl_type
|
|
18
|
+
@extractor = EXTRACTORS[purl_type] || raise(Error, "Unsupported type: #{purl_type}")
|
|
19
|
+
end
|
|
20
|
+
|
|
21
|
+
def extract(tarball_path, destination)
|
|
22
|
+
send(@extractor, tarball_path, destination)
|
|
23
|
+
end
|
|
24
|
+
|
|
25
|
+
def extract_gem(tarball_path, destination)
|
|
26
|
+
system("tar", "-C", destination, "-xf", tarball_path, exception: true)
|
|
27
|
+
data_tar = File.join(destination, "data.tar.gz")
|
|
28
|
+
if File.exist?(data_tar)
|
|
29
|
+
system("tar", "-C", destination, "-xzf", data_tar, exception: true)
|
|
30
|
+
FileUtils.rm_f([data_tar, File.join(destination, "metadata.gz"), File.join(destination, "checksums.yaml.gz")])
|
|
31
|
+
end
|
|
32
|
+
end
|
|
33
|
+
|
|
34
|
+
def extract_tgz(tarball_path, destination)
|
|
35
|
+
system("tar", "-C", destination, "-xzf", tarball_path, exception: true)
|
|
36
|
+
move_nested_package_dir(destination)
|
|
37
|
+
end
|
|
38
|
+
|
|
39
|
+
def extract_zip(tarball_path, destination)
|
|
40
|
+
system("unzip", "-q", "-o", tarball_path, "-d", destination, exception: true)
|
|
41
|
+
move_nested_package_dir(destination)
|
|
42
|
+
end
|
|
43
|
+
|
|
44
|
+
def extract_nupkg(tarball_path, destination)
|
|
45
|
+
system("unzip", "-q", "-o", tarball_path, "-d", destination, exception: true)
|
|
46
|
+
cleanup_nupkg_metadata(destination)
|
|
47
|
+
end
|
|
48
|
+
|
|
49
|
+
def move_nested_package_dir(destination)
|
|
50
|
+
entries = Dir.children(destination).reject { |f| f.start_with?(".") }
|
|
51
|
+
return unless entries.size == 1
|
|
52
|
+
|
|
53
|
+
nested = File.join(destination, entries.first)
|
|
54
|
+
return unless File.directory?(nested)
|
|
55
|
+
|
|
56
|
+
Dir.children(nested).each do |child|
|
|
57
|
+
FileUtils.mv(File.join(nested, child), destination)
|
|
58
|
+
end
|
|
59
|
+
FileUtils.rmdir(nested)
|
|
60
|
+
end
|
|
61
|
+
|
|
62
|
+
def cleanup_nupkg_metadata(destination)
|
|
63
|
+
FileUtils.rm_rf(Dir.glob(File.join(destination, "_rels")))
|
|
64
|
+
FileUtils.rm_rf(Dir.glob(File.join(destination, "[Content_Types].xml")))
|
|
65
|
+
FileUtils.rm_rf(Dir.glob(File.join(destination, "*.nuspec")))
|
|
66
|
+
end
|
|
67
|
+
end
|
|
68
|
+
end
|
|
@@ -0,0 +1,59 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
module Gitballs
|
|
4
|
+
class Stats
|
|
5
|
+
attr_reader :repo_dir, :tarball_size
|
|
6
|
+
|
|
7
|
+
def initialize(repo_dir, tarball_size = nil)
|
|
8
|
+
@repo_dir = repo_dir
|
|
9
|
+
@tarball_size = tarball_size
|
|
10
|
+
end
|
|
11
|
+
|
|
12
|
+
def release_count
|
|
13
|
+
return 0 unless File.directory?(File.join(@repo_dir, ".git"))
|
|
14
|
+
|
|
15
|
+
`git -C #{@repo_dir} rev-list --count HEAD 2>/dev/null`.strip.to_i
|
|
16
|
+
end
|
|
17
|
+
|
|
18
|
+
def repo_size
|
|
19
|
+
return 0 unless File.directory?(@repo_dir)
|
|
20
|
+
|
|
21
|
+
`du -sk #{@repo_dir}`.strip.split.first.to_i * 1024
|
|
22
|
+
end
|
|
23
|
+
|
|
24
|
+
def compression_ratio
|
|
25
|
+
return nil unless @tarball_size && @tarball_size > 0
|
|
26
|
+
|
|
27
|
+
((@tarball_size - repo_size).to_f / @tarball_size * 100).round(1)
|
|
28
|
+
end
|
|
29
|
+
|
|
30
|
+
def to_h
|
|
31
|
+
{
|
|
32
|
+
releases: release_count,
|
|
33
|
+
repo_size: repo_size,
|
|
34
|
+
tarball_size: tarball_size,
|
|
35
|
+
compression_ratio: compression_ratio
|
|
36
|
+
}
|
|
37
|
+
end
|
|
38
|
+
|
|
39
|
+
def to_s
|
|
40
|
+
lines = []
|
|
41
|
+
lines << "releases: #{release_count}"
|
|
42
|
+
lines << "repo size: #{format_size(repo_size)}"
|
|
43
|
+
if @tarball_size && @tarball_size > 0
|
|
44
|
+
lines << "tarball size: #{format_size(@tarball_size)}"
|
|
45
|
+
lines << "savings: #{compression_ratio}%"
|
|
46
|
+
end
|
|
47
|
+
lines.join("\n")
|
|
48
|
+
end
|
|
49
|
+
|
|
50
|
+
def format_size(bytes)
|
|
51
|
+
return "0B" if bytes.zero?
|
|
52
|
+
|
|
53
|
+
units = %w[B KB MB GB]
|
|
54
|
+
exp = (Math.log(bytes) / Math.log(1024)).to_i
|
|
55
|
+
exp = units.size - 1 if exp >= units.size
|
|
56
|
+
"%.1f%s" % [bytes.to_f / (1024**exp), units[exp]]
|
|
57
|
+
end
|
|
58
|
+
end
|
|
59
|
+
end
|
data/lib/gitballs.rb
ADDED
|
@@ -0,0 +1,12 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
require_relative "gitballs/version"
|
|
4
|
+
require_relative "gitballs/client"
|
|
5
|
+
require_relative "gitballs/registry"
|
|
6
|
+
require_relative "gitballs/stats"
|
|
7
|
+
require_relative "gitballs/compressor"
|
|
8
|
+
require_relative "gitballs/cli"
|
|
9
|
+
|
|
10
|
+
module Gitballs
|
|
11
|
+
class Error < StandardError; end
|
|
12
|
+
end
|
data/sig/gitballs.rbs
ADDED
metadata
ADDED
|
@@ -0,0 +1,102 @@
|
|
|
1
|
+
--- !ruby/object:Gem::Specification
|
|
2
|
+
name: gitballs
|
|
3
|
+
version: !ruby/object:Gem::Version
|
|
4
|
+
version: 0.1.0
|
|
5
|
+
platform: ruby
|
|
6
|
+
authors:
|
|
7
|
+
- Andrew Nesbitt
|
|
8
|
+
bindir: exe
|
|
9
|
+
cert_chain: []
|
|
10
|
+
date: 1980-01-02 00:00:00.000000000 Z
|
|
11
|
+
dependencies:
|
|
12
|
+
- !ruby/object:Gem::Dependency
|
|
13
|
+
name: typhoeus
|
|
14
|
+
requirement: !ruby/object:Gem::Requirement
|
|
15
|
+
requirements:
|
|
16
|
+
- - "~>"
|
|
17
|
+
- !ruby/object:Gem::Version
|
|
18
|
+
version: '1.4'
|
|
19
|
+
type: :runtime
|
|
20
|
+
prerelease: false
|
|
21
|
+
version_requirements: !ruby/object:Gem::Requirement
|
|
22
|
+
requirements:
|
|
23
|
+
- - "~>"
|
|
24
|
+
- !ruby/object:Gem::Version
|
|
25
|
+
version: '1.4'
|
|
26
|
+
- !ruby/object:Gem::Dependency
|
|
27
|
+
name: purl
|
|
28
|
+
requirement: !ruby/object:Gem::Requirement
|
|
29
|
+
requirements:
|
|
30
|
+
- - "~>"
|
|
31
|
+
- !ruby/object:Gem::Version
|
|
32
|
+
version: '1.6'
|
|
33
|
+
type: :runtime
|
|
34
|
+
prerelease: false
|
|
35
|
+
version_requirements: !ruby/object:Gem::Requirement
|
|
36
|
+
requirements:
|
|
37
|
+
- - "~>"
|
|
38
|
+
- !ruby/object:Gem::Version
|
|
39
|
+
version: '1.6'
|
|
40
|
+
- !ruby/object:Gem::Dependency
|
|
41
|
+
name: vers
|
|
42
|
+
requirement: !ruby/object:Gem::Requirement
|
|
43
|
+
requirements:
|
|
44
|
+
- - "~>"
|
|
45
|
+
- !ruby/object:Gem::Version
|
|
46
|
+
version: '1.0'
|
|
47
|
+
type: :runtime
|
|
48
|
+
prerelease: false
|
|
49
|
+
version_requirements: !ruby/object:Gem::Requirement
|
|
50
|
+
requirements:
|
|
51
|
+
- - "~>"
|
|
52
|
+
- !ruby/object:Gem::Version
|
|
53
|
+
version: '1.0'
|
|
54
|
+
description: Downloads all versions of a package and commits them into a git repository,
|
|
55
|
+
leveraging git delta compression to reduce storage.
|
|
56
|
+
email:
|
|
57
|
+
- andrewnez@gmail.com
|
|
58
|
+
executables:
|
|
59
|
+
- gitballs
|
|
60
|
+
extensions: []
|
|
61
|
+
extra_rdoc_files: []
|
|
62
|
+
files:
|
|
63
|
+
- ".ruby-version"
|
|
64
|
+
- CHANGELOG.md
|
|
65
|
+
- CODE_OF_CONDUCT.md
|
|
66
|
+
- LICENSE.txt
|
|
67
|
+
- Rakefile
|
|
68
|
+
- Readme.md
|
|
69
|
+
- download.rb
|
|
70
|
+
- exe/gitballs
|
|
71
|
+
- lib/gitballs.rb
|
|
72
|
+
- lib/gitballs/cli.rb
|
|
73
|
+
- lib/gitballs/client.rb
|
|
74
|
+
- lib/gitballs/compressor.rb
|
|
75
|
+
- lib/gitballs/registry.rb
|
|
76
|
+
- lib/gitballs/stats.rb
|
|
77
|
+
- lib/gitballs/version.rb
|
|
78
|
+
- sig/gitballs.rbs
|
|
79
|
+
homepage: https://github.com/andrew/gitballs
|
|
80
|
+
licenses:
|
|
81
|
+
- MIT
|
|
82
|
+
metadata:
|
|
83
|
+
homepage_uri: https://github.com/andrew/gitballs
|
|
84
|
+
source_code_uri: https://github.com/andrew/gitballs
|
|
85
|
+
rdoc_options: []
|
|
86
|
+
require_paths:
|
|
87
|
+
- lib
|
|
88
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
|
89
|
+
requirements:
|
|
90
|
+
- - ">="
|
|
91
|
+
- !ruby/object:Gem::Version
|
|
92
|
+
version: 3.2.0
|
|
93
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
|
94
|
+
requirements:
|
|
95
|
+
- - ">="
|
|
96
|
+
- !ruby/object:Gem::Version
|
|
97
|
+
version: '0'
|
|
98
|
+
requirements: []
|
|
99
|
+
rubygems_version: 3.6.9
|
|
100
|
+
specification_version: 4
|
|
101
|
+
summary: Space-efficient storage of package release tarballs using git
|
|
102
|
+
test_files: []
|