puppet-community-mvp 0.0.4 → 0.0.8
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +5 -5
- data/README.md +78 -0
- data/bin/mvp +31 -16
- data/bin/pftest.rb +22 -0
- data/lib/mvp/{uploader.rb → bigquery.rb} +81 -74
- data/lib/mvp/{downloader.rb → forge.rb} +51 -126
- data/lib/mvp/itemizer.rb +49 -6
- data/lib/mvp/puppetfile_parser.rb +171 -0
- data/lib/mvp/runner.rb +122 -26
- data/lib/mvp/stats.rb +27 -10
- data/lib/mvp.rb +1 -3
- metadata +8 -8
- data/lib/mvp/monkeypatches.rb +0 -8
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
|
-
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
2
|
+
SHA256:
|
3
|
+
metadata.gz: 07c7d574ba02a7e51c842a54436af6b1c3d146fb94221f32c0eddf3566203d4f
|
4
|
+
data.tar.gz: e43f84c474532abeeeb017746b481f2bef0ab6880cbd3de7fce96de3cb69a9cd
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 04f75c646cdace930a54cf3f5312f4fb054abcd7350206fb8193b96192ebcd7a231005282a7ebd67b912f7ad468c5882b49a112258cc9d62fad6447143629d81
|
7
|
+
data.tar.gz: 9f33f85545f88cf3a2a7cf477c82644b7a2f8ba8dfe22883fe86a89d4e200ad601e91099274eae9e680ee598fc07e2bc734a9055e8089c7cb049eac094cef580
|
data/README.md
CHANGED
@@ -0,0 +1,78 @@
|
|
1
|
+
# Puppet Community MVP tool
|
2
|
+
|
3
|
+
This is a simple tool to generate stats about the Puppet community. It was
|
4
|
+
originally intended to show the "most valuable players" but has since morphed to
|
5
|
+
show a lot of other things too. We primarily use it on a weekly cron job to
|
6
|
+
gather information using the Forge APIs and normalizing them so that they can be
|
7
|
+
easily combined with simple SQL queries to generate usage information.
|
8
|
+
|
9
|
+
## Interactive usage
|
10
|
+
|
11
|
+
If you're not working on our community stats pipeline, then there are only three
|
12
|
+
subcommands you'll be interested in.
|
13
|
+
|
14
|
+
### `stats`
|
15
|
+
|
16
|
+
This subcommand will use cached data to generate a report of Forge community
|
17
|
+
statistics. For example, it will generate distributions of module quality
|
18
|
+
scores, or releases per module, or modules per author, etc. And it will generate
|
19
|
+
sparklines showing the contributions over time of the most prolific Forge
|
20
|
+
authors and it will show authors who aren't as active as they used to be.
|
21
|
+
|
22
|
+
Unfortunately, this report is not customizable or templatable at this point.
|
23
|
+
|
24
|
+
You will need cached data before you can generate this report. See the `get` subcommand.
|
25
|
+
|
26
|
+
|
27
|
+
### `get`
|
28
|
+
|
29
|
+
This subcommand will download and cache a local mirror of the data stored in our
|
30
|
+
BigQuery database. This data is used for the `stats` command.
|
31
|
+
|
32
|
+
|
33
|
+
### `analyze`
|
34
|
+
|
35
|
+
This subcommand is maybe the most interesting. Many interesting bits of
|
36
|
+
information can be gathered by inspecting the source code of modules, not by
|
37
|
+
running SQL queries about their statistics. For example, `find manifests/ -name
|
38
|
+
'*.pp' | wc -l` will tell you how many manifests any given module includes, and
|
39
|
+
`grep -rn '--no-external-facts' facts.d/` will tell you how many external facts
|
40
|
+
are invoking `facter` to gather and use _other_ facts while running.
|
41
|
+
|
42
|
+
This command lets you write that little bit of analysis code as a script, and
|
43
|
+
then systematically run that script against the current release of every single
|
44
|
+
module on the Forge and collate the generated output.
|
45
|
+
|
46
|
+
A script can be written in any language and will be executed from the root of
|
47
|
+
the unpacked module. It will be invoked with an environment containing the following
|
48
|
+
variables:
|
49
|
+
|
50
|
+
* `mvp_owner` -- the Forge namespace of the module, aka the author's username
|
51
|
+
* `mvp_name` -- the name of the module itself
|
52
|
+
* `mvp_version` -- the current version of the module
|
53
|
+
* `mvp_downloads` -- the number of downloads this module has. A *rough* estimation of popularity
|
54
|
+
|
55
|
+
The script should print an array of arrays in JSON format to STDOUT. These will be
|
56
|
+
combined to make a CSV file, the columns of which are defined by the data you
|
57
|
+
return. In other words, the items in the inner array(s) are totally up to you.
|
58
|
+
They will become the columns of the generated CSV file.
|
59
|
+
|
60
|
+
The parameters relevant to this subcommand are:
|
61
|
+
|
62
|
+
```
|
63
|
+
-o, --output_file OUTPUT_FILE The path to save a csv report.
|
64
|
+
--script SCRIPT The script file to analyze a module. See docs for interface.
|
65
|
+
--count N For debugging. Select a random list of this many modules to analyze.
|
66
|
+
-d, --debug Display extra debugging information.
|
67
|
+
```
|
68
|
+
|
69
|
+
See files in the `scripts/` directory for examples of analysis scripts. To use,
|
70
|
+
just path of a script, like
|
71
|
+
|
72
|
+
```
|
73
|
+
$ mvp analyze --script scripts/manifest_count.rb --count 5
|
74
|
+
[✔] stdlib (OK)
|
75
|
+
$ cat analyzed.csv
|
76
|
+
...
|
77
|
+
```
|
78
|
+
|
data/bin/mvp
CHANGED
@@ -13,19 +13,21 @@ optparse = OptionParser.new { |opts|
|
|
13
13
|
opts.banner = "Usage : #{NAME} [command] [target] [options]
|
14
14
|
|
15
15
|
This tool will scrape the Puppet Forge API for interesting module & author stats.
|
16
|
-
|
16
|
+
It can also mirror public BigQuery tables or views into our dataset for efficiency,
|
17
|
+
or download and itemize each Forge module.
|
17
18
|
|
18
|
-
* get | retrieve | download [target]
|
19
|
-
* Downloads and caches all Forge metadata.
|
20
|
-
* Optional targets: all, authors, modules, releases
|
21
|
-
* upload | insert [target]
|
22
|
-
* Uploads data to BigQuery
|
23
|
-
* Optional targets: all, authors, modules, releases, mirrors
|
24
19
|
* mirror [target]
|
25
20
|
* Runs the download & then upload tasks.
|
21
|
+
* Optional targets: all, authors, modules, releases, validations, itemizations, puppetfiles, tables
|
22
|
+
* get | retrieve | download [target]
|
23
|
+
* Downloads and caches data locally so you can run the stats task.
|
26
24
|
* Optional targets: all, authors, modules, releases
|
27
25
|
* stats
|
28
26
|
* Print out a summary of interesting stats.
|
27
|
+
* analyze <script file>
|
28
|
+
* Run a specified script to analyze each module to generate arbitrary stats
|
29
|
+
* Writes output to a csv file, analyzed.csv by default
|
30
|
+
|
29
31
|
"
|
30
32
|
|
31
33
|
opts.on("-f FORGEAPI", "--forgeapi FORGEAPI", "Forge API server. Rarely needed.") do |arg|
|
@@ -60,10 +62,22 @@ The following CLI commands are available.
|
|
60
62
|
options[:output_file] = arg
|
61
63
|
end
|
62
64
|
|
65
|
+
opts.on("--script SCRIPT", "The script file to analyze a module. See docs for interface.") do |arg|
|
66
|
+
options[:script] = arg
|
67
|
+
end
|
68
|
+
|
69
|
+
opts.on("--count N", "For debugging. Select a random list of this many modules to analyze.") do |arg|
|
70
|
+
options[:count] = arg.to_i
|
71
|
+
end
|
72
|
+
|
63
73
|
opts.on("-d", "--debug", "Display extra debugging information.") do
|
64
74
|
options[:debug] = true
|
65
75
|
end
|
66
76
|
|
77
|
+
opts.on("-n", "--noop", "Don't actually upload data.") do
|
78
|
+
options[:noop] = true
|
79
|
+
end
|
80
|
+
|
67
81
|
opts.separator('')
|
68
82
|
|
69
83
|
opts.on("-h", "--help", "Displays this help") do
|
@@ -83,31 +97,29 @@ options[:gcloud][:dataset] ||= 'community'
|
|
83
97
|
options[:gcloud][:project] ||= 'puppet'
|
84
98
|
options[:gcloud][:keyfile] ||= '~/.mvp/credentials.json'
|
85
99
|
|
100
|
+
options[:script] = File.expand_path(options[:script]) if options[:script]
|
86
101
|
options[:cachedir] = File.expand_path(options[:cachedir])
|
87
102
|
options[:github_data] = File.expand_path(options[:github_data])
|
88
103
|
options[:gcloud][:keyfile] = File.expand_path(options[:gcloud][:keyfile])
|
89
104
|
FileUtils.mkdir_p(options[:cachedir])
|
90
105
|
|
106
|
+
command, target = ARGV
|
107
|
+
case command
|
108
|
+
when 'analyze'
|
109
|
+
options[:output_file] ||= 'analyzed.csv'
|
110
|
+
end
|
111
|
+
|
91
112
|
$logger = Logger::new(STDOUT)
|
92
113
|
$logger.level = options[:debug] ? Logger::DEBUG : Logger::INFO
|
93
114
|
$logger.formatter = proc { |severity,datetime,progname,msg| "#{severity}: #{msg}\n" }
|
94
115
|
|
95
116
|
runner = Mvp::Runner.new(options)
|
96
117
|
|
97
|
-
command, target = ARGV
|
98
118
|
case command
|
99
119
|
when 'get', 'retrieve', 'download'
|
100
120
|
target ||= :all
|
101
121
|
runner.retrieve(target.to_sym)
|
102
122
|
|
103
|
-
when 'transform'
|
104
|
-
target ||= :all
|
105
|
-
runner.retrieve(target.to_sym, false)
|
106
|
-
|
107
|
-
when 'insert', 'upload'
|
108
|
-
target ||= :all
|
109
|
-
runner.upload(target.to_sym)
|
110
|
-
|
111
123
|
when 'mirror'
|
112
124
|
target ||= :all
|
113
125
|
runner.mirror(target.to_sym)
|
@@ -116,6 +128,9 @@ when 'stats'
|
|
116
128
|
target ||= :all
|
117
129
|
runner.stats(target.to_sym)
|
118
130
|
|
131
|
+
when 'analyze'
|
132
|
+
runner.analyze
|
133
|
+
|
119
134
|
when 'test'
|
120
135
|
runner.test
|
121
136
|
|
data/bin/pftest.rb
ADDED
@@ -0,0 +1,22 @@
|
|
1
|
+
#! /usr/bin/env ruby
|
2
|
+
|
3
|
+
require 'mvp/puppetfile_parser'
|
4
|
+
require 'open-uri'
|
5
|
+
require 'json'
|
6
|
+
require 'logger'
|
7
|
+
|
8
|
+
$logger = Logger::new(STDOUT)
|
9
|
+
$logger.level = Logger::INFO
|
10
|
+
$logger.formatter = proc { |severity,datetime,progname,msg| "#{severity}: #{msg}\n" }
|
11
|
+
|
12
|
+
pf = open(ARGV.first)
|
13
|
+
parser = Mvp::PuppetfileParser.new()
|
14
|
+
|
15
|
+
|
16
|
+
repo = {
|
17
|
+
:repo_name => 'testing',
|
18
|
+
:md5 => 'wakka wakka',
|
19
|
+
:content => pf.read,
|
20
|
+
}
|
21
|
+
|
22
|
+
puts JSON.pretty_generate(parser.parse(repo))
|
@@ -3,10 +3,10 @@ require 'tty-spinner'
|
|
3
3
|
require "google/cloud/bigquery"
|
4
4
|
|
5
5
|
class Mvp
|
6
|
-
class
|
6
|
+
class Bigquery
|
7
7
|
def initialize(options = {})
|
8
|
+
@options = options
|
8
9
|
@cachedir = options[:cachedir]
|
9
|
-
@mirrors = options[:gcloud][:mirrors]
|
10
10
|
@bigquery = Google::Cloud::Bigquery.new(
|
11
11
|
:project_id => options[:gcloud][:project],
|
12
12
|
:credentials => Google::Cloud::Bigquery::Credentials.new(options[:gcloud][:keyfile]),
|
@@ -27,9 +27,24 @@ class Mvp
|
|
27
27
|
s.integer "count", mode: :required
|
28
28
|
end
|
29
29
|
end
|
30
|
+
|
31
|
+
@puppetfile_usage = @dataset.table('github_puppetfile_usage') || @dataset.create_table('github_puppetfile_usage') do |table|
|
32
|
+
table.name = 'Puppetfile Module Usage'
|
33
|
+
table.description = 'A list of all modules referenced in public Puppetfiles'
|
34
|
+
table.schema do |s|
|
35
|
+
s.string "repo_name", mode: :required
|
36
|
+
s.string "module", mode: :required
|
37
|
+
s.string "type", mode: :required
|
38
|
+
s.string "source"
|
39
|
+
s.string "version"
|
40
|
+
s.string "md5", mode: :required
|
41
|
+
end
|
42
|
+
end
|
30
43
|
end
|
31
44
|
|
32
45
|
def truncate(entity)
|
46
|
+
return if @options[:noop]
|
47
|
+
|
33
48
|
begin
|
34
49
|
case entity
|
35
50
|
when :authors
|
@@ -65,6 +80,7 @@ class Mvp
|
|
65
80
|
s.timestamp "created_at", mode: :required
|
66
81
|
s.timestamp "updated_at", mode: :required
|
67
82
|
s.string "tasks", mode: :repeated
|
83
|
+
s.string "plans", mode: :repeated
|
68
84
|
s.string "homepage_url"
|
69
85
|
s.string "project_page"
|
70
86
|
s.string "issues_url"
|
@@ -72,6 +88,7 @@ class Mvp
|
|
72
88
|
s.boolean "supported"
|
73
89
|
s.string "endorsement"
|
74
90
|
s.string "module_group"
|
91
|
+
s.boolean "premium"
|
75
92
|
s.boolean "pdk"
|
76
93
|
s.string "operatingsystem", mode: :repeated
|
77
94
|
s.integer "release_count", mode: :required
|
@@ -125,6 +142,7 @@ class Mvp
|
|
125
142
|
s.timestamp "deleted_at"
|
126
143
|
s.string "deleted_for"
|
127
144
|
s.string "tasks", mode: :repeated
|
145
|
+
s.string "plans", mode: :repeated
|
128
146
|
s.string "project_page"
|
129
147
|
s.string "issues_url"
|
130
148
|
s.string "source"
|
@@ -144,11 +162,9 @@ class Mvp
|
|
144
162
|
s.boolean "puppet_99x"
|
145
163
|
s.string "dependencies", mode: :repeated
|
146
164
|
s.string "file_uri", mode: :required
|
147
|
-
s.string "file_md5"
|
165
|
+
s.string "file_md5"
|
166
|
+
s.string "file_sha256"
|
148
167
|
s.integer "file_size", mode: :required
|
149
|
-
s.string "changelog"
|
150
|
-
s.string "reference"
|
151
|
-
s.string "readme"
|
152
168
|
s.string "license"
|
153
169
|
s.string "metadata", mode: :required
|
154
170
|
end
|
@@ -163,95 +179,86 @@ class Mvp
|
|
163
179
|
end
|
164
180
|
end
|
165
181
|
|
166
|
-
def
|
167
|
-
|
168
|
-
end
|
169
|
-
|
170
|
-
def modules()
|
171
|
-
upload('modules')
|
182
|
+
def retrieve(entity)
|
183
|
+
get(entity, ['*'])
|
172
184
|
end
|
173
185
|
|
174
|
-
def
|
175
|
-
|
176
|
-
end
|
186
|
+
def mirror_table(entity)
|
187
|
+
return if @options[:noop]
|
177
188
|
|
178
|
-
|
179
|
-
|
180
|
-
|
181
|
-
|
182
|
-
|
183
|
-
@mirrors.each do |entity|
|
184
|
-
begin
|
185
|
-
spinner = TTY::Spinner.new("[:spinner] :title")
|
186
|
-
spinner.update(title: "Mirroring #{entity[:type]} #{entity[:name]} to BigQuery...")
|
187
|
-
spinner.auto_spin
|
188
|
-
|
189
|
-
case entity[:type]
|
190
|
-
when :view
|
191
|
-
@dataset.table(entity[:name]).delete rescue nil # delete if exists
|
192
|
-
@dataset.create_view(entity[:name], entity[:query],
|
193
|
-
:legacy_sql => true)
|
194
|
-
|
195
|
-
when :table
|
196
|
-
job = @dataset.query_job(entity[:query],
|
197
|
-
:legacy_sql => true,
|
198
|
-
:write => 'truncate',
|
199
|
-
:table => @dataset.table(entity[:name], :skip_lookup => true))
|
200
|
-
job.wait_until_done!
|
189
|
+
begin
|
190
|
+
case entity[:type]
|
191
|
+
when :view
|
192
|
+
@dataset.table(entity[:name]).delete rescue nil # delete if exists
|
193
|
+
@dataset.create_view(entity[:name], entity[:query])
|
201
194
|
|
202
|
-
|
203
|
-
|
204
|
-
|
195
|
+
when :table
|
196
|
+
job = @dataset.query_job(entity[:query],
|
197
|
+
:write => 'truncate',
|
198
|
+
:table => @dataset.table(entity[:name], :skip_lookup => true))
|
199
|
+
job.wait_until_done!
|
205
200
|
|
206
|
-
|
207
|
-
|
208
|
-
spinner.error("(Google Cloud error: #{e.message})")
|
209
|
-
$logger.error e.backtrace.join("\n")
|
201
|
+
else
|
202
|
+
$logger.error "Unknown mirror type: #{entity[:type]}"
|
210
203
|
end
|
204
|
+
rescue => e
|
205
|
+
$logger.error("(Google Cloud error: #{e.message})")
|
206
|
+
$logger.debug e.backtrace.join("\n")
|
211
207
|
end
|
212
208
|
end
|
213
209
|
|
214
|
-
def insert(entity, data)
|
215
|
-
|
210
|
+
def insert(entity, data, suite = 'forge')
|
211
|
+
return if @options[:noop]
|
212
|
+
return if data.empty?
|
213
|
+
|
214
|
+
table = @dataset.table("#{suite}_#{entity}")
|
216
215
|
response = table.insert(data)
|
217
216
|
|
218
217
|
unless response.success?
|
219
|
-
|
218
|
+
$logger.error '========================================================================='
|
220
219
|
response.insert_errors.each do |err|
|
221
|
-
|
220
|
+
$logger.debug JSON.pretty_generate(err.row.reject {|k,v| ['metadata'].include? k})
|
221
|
+
$logger.error JSON.pretty_generate(err.errors)
|
222
222
|
end
|
223
|
-
$logger.error JSON.pretty_generate(errors)
|
224
223
|
end
|
225
224
|
end
|
226
225
|
|
227
|
-
def
|
228
|
-
|
229
|
-
|
230
|
-
spinner.update(title: "Uploading #{entity} to BigQuery ...")
|
231
|
-
spinner.auto_spin
|
226
|
+
def delete(entity, field, match, suite = 'forge')
|
227
|
+
@dataset.query("DELETE FROM #{suite}_#{entity} WHERE #{field} = '#{match}'")
|
228
|
+
end
|
232
229
|
|
233
|
-
|
234
|
-
|
235
|
-
|
230
|
+
def get(entity, fields, suite = 'forge')
|
231
|
+
raise 'pass fields as an array' unless fields.is_a? Array
|
232
|
+
@dataset.query("SELECT #{fields.join(', ')} FROM #{suite}_#{entity}")
|
233
|
+
end
|
236
234
|
|
237
|
-
|
238
|
-
|
239
|
-
|
240
|
-
#
|
241
|
-
# begin
|
242
|
-
# table.insert data
|
243
|
-
# rescue
|
244
|
-
# require 'pry'
|
245
|
-
# binding.pry
|
246
|
-
# end
|
247
|
-
# end
|
235
|
+
def module_sources()
|
236
|
+
get('modules', ['slug', 'source'])
|
237
|
+
end
|
248
238
|
|
239
|
+
def puppetfiles()
|
240
|
+
sql = 'SELECT f.repo_name, f.path, c.content, c.md5
|
241
|
+
FROM github_puppetfile_files AS f
|
242
|
+
JOIN github_puppetfile_contents AS c
|
243
|
+
ON c.id = f.id
|
249
244
|
|
250
|
-
|
251
|
-
|
252
|
-
|
253
|
-
|
254
|
-
|
245
|
+
WHERE c.md5 NOT IN (
|
246
|
+
SELECT u.md5
|
247
|
+
FROM github_puppetfile_usage AS u
|
248
|
+
WHERE u.repo_name = f.repo_name
|
249
|
+
) AND LOWER(repo_name) NOT LIKE "%boxen%"'
|
250
|
+
@dataset.query(sql)
|
251
|
+
end
|
252
|
+
|
253
|
+
def unitemized()
|
254
|
+
sql = 'SELECT m.name, m.slug, m.version, m.dependencies
|
255
|
+
FROM forge_modules AS m
|
256
|
+
WHERE m.version NOT IN (
|
257
|
+
SELECT i.version
|
258
|
+
FROM forge_itemized AS i
|
259
|
+
WHERE module = m.slug
|
260
|
+
)'
|
261
|
+
@dataset.query(sql)
|
255
262
|
end
|
256
263
|
|
257
264
|
def version_itemized?(mod, version)
|
@@ -2,151 +2,82 @@ require 'json'
|
|
2
2
|
require 'httparty'
|
3
3
|
require 'tty-spinner'
|
4
4
|
require 'semantic_puppet'
|
5
|
-
require 'mvp/monkeypatches'
|
6
|
-
require 'mvp/itemizer'
|
7
5
|
|
8
6
|
class Mvp
|
9
|
-
class
|
7
|
+
class Forge
|
10
8
|
def initialize(options = {})
|
11
9
|
@useragent = 'Puppet Community Stats Monitor'
|
12
|
-
@cachedir = options[:cachedir]
|
13
10
|
@forgeapi = options[:forgeapi] ||'https://forgeapi.puppet.com'
|
14
|
-
@itemizer = Mvp::Itemizer.new(options)
|
15
11
|
end
|
16
12
|
|
17
|
-
def
|
18
|
-
|
19
|
-
item = (entity == :authors) ? 'users' : entity.to_s
|
20
|
-
download(item) do |data|
|
21
|
-
case entity
|
22
|
-
when :modules
|
23
|
-
uploader.insert(:validations, flatten_validations(retrieve_validations(data)))
|
24
|
-
data = flatten_modules(data)
|
25
|
-
|
26
|
-
@itemizer.run!(data, uploader)
|
27
|
-
when :releases
|
28
|
-
data = flatten_releases(data)
|
29
|
-
end
|
30
|
-
|
31
|
-
uploader.insert(entity, data)
|
32
|
-
end
|
33
|
-
end
|
34
|
-
|
35
|
-
def retrieve(entity, download = true)
|
36
|
-
if download
|
37
|
-
# I am focusing on authorship rather than just users, so for now I'm using the word authors
|
38
|
-
item = (entity == :authors) ? 'users' : entity.to_s
|
39
|
-
data = []
|
40
|
-
download(item) do |resp|
|
41
|
-
data.concat resp
|
42
|
-
end
|
43
|
-
save_json(entity, data)
|
44
|
-
else
|
45
|
-
data = File.read("#{@cachedir}/#{entity}.json")
|
46
|
-
end
|
47
|
-
|
48
|
-
case entity
|
49
|
-
when :modules
|
50
|
-
data = flatten_modules(data)
|
51
|
-
when :releases
|
52
|
-
data = flatten_releases(data)
|
53
|
-
end
|
54
|
-
save_nld_json(entity.to_s, data)
|
55
|
-
end
|
56
|
-
|
57
|
-
def retrieve_validations(modules, period = 25)
|
58
|
-
results = {}
|
13
|
+
def retrieve(entity)
|
14
|
+
raise 'Please process downloaded data by passing a block' unless block_given?
|
59
15
|
|
16
|
+
# using authors for git repo terminology consistency
|
17
|
+
entity = :users if entity == :authors
|
60
18
|
begin
|
61
19
|
offset = 0
|
62
|
-
endpoint = "/
|
63
|
-
|
64
|
-
|
65
|
-
response = HTTParty.get("#{@forgeapi}#{endpoint}
|
20
|
+
endpoint = "/v3/#{entity}?sort_by=downloads&limit=50"
|
21
|
+
|
22
|
+
while endpoint do
|
23
|
+
response = HTTParty.get("#{@forgeapi}#{endpoint}", headers: {"User-Agent" => @useragent})
|
66
24
|
raise "Forge Error: #{@response.body}" unless response.code == 200
|
25
|
+
data = JSON.parse(response.body)
|
26
|
+
results = munge_dates(data['results'])
|
27
|
+
|
28
|
+
case entity
|
29
|
+
when :modules
|
30
|
+
results = flatten_modules(results)
|
31
|
+
when :releases
|
32
|
+
results = flatten_releases(results)
|
33
|
+
end
|
67
34
|
|
68
|
-
results
|
69
|
-
offset += 1
|
35
|
+
yield results, offset
|
70
36
|
|
71
|
-
|
72
|
-
|
37
|
+
offset += 50
|
38
|
+
endpoint = data['pagination']['next']
|
39
|
+
if (endpoint and (offset % 250 == 0))
|
73
40
|
GC.start
|
74
41
|
end
|
75
42
|
end
|
43
|
+
|
76
44
|
rescue => e
|
77
45
|
$logger.error e.message
|
78
46
|
$logger.debug e.backtrace.join("\n")
|
79
47
|
end
|
80
48
|
|
81
|
-
|
49
|
+
nil
|
82
50
|
end
|
83
51
|
|
84
|
-
def
|
85
|
-
|
86
|
-
|
87
|
-
if File.exist? cache
|
88
|
-
module_data = JSON.parse(File.read(cache))
|
89
|
-
else
|
90
|
-
module_data = retrieve(:modules)
|
91
|
-
end
|
52
|
+
def retrieve_validations(modules, period = 25)
|
53
|
+
raise 'Please process validations by passing a block' unless block_given?
|
92
54
|
|
55
|
+
offset = 0
|
93
56
|
begin
|
94
|
-
|
95
|
-
|
96
|
-
|
57
|
+
modules.each_slice(period) do |group|
|
58
|
+
offset += period
|
59
|
+
results = group.map { |mod| validations(mod[:slug]) }
|
97
60
|
|
98
|
-
|
99
|
-
|
61
|
+
yield results, offset
|
62
|
+
GC.start
|
100
63
|
end
|
101
|
-
|
102
|
-
spinner.success('(OK)')
|
103
64
|
rescue => e
|
104
|
-
spinner.error('API error')
|
105
65
|
$logger.error e.message
|
106
66
|
$logger.debug e.backtrace.join("\n")
|
107
67
|
end
|
108
68
|
|
109
|
-
|
110
|
-
save_nld_json('validations', flatten_validations(results))
|
111
|
-
results
|
69
|
+
nil
|
112
70
|
end
|
113
71
|
|
114
|
-
def
|
115
|
-
|
116
|
-
|
117
|
-
|
118
|
-
offset = 0
|
119
|
-
endpoint = "/v3/#{entity}?sort_by=downloads&limit=50"
|
120
|
-
spinner = TTY::Spinner.new("[:spinner] :title")
|
121
|
-
spinner.update(title: "Downloading #{entity} ...")
|
122
|
-
spinner.auto_spin
|
123
|
-
|
124
|
-
while endpoint do
|
125
|
-
response = HTTParty.get("#{@forgeapi}#{endpoint}", headers: {"User-Agent" => @useragent})
|
126
|
-
raise "Forge Error: #{@response.body}" unless response.code == 200
|
127
|
-
data = JSON.parse(response.body)
|
128
|
-
|
129
|
-
offset += 50
|
130
|
-
endpoint = data['pagination']['next']
|
131
|
-
|
132
|
-
yield munge_dates(data['results'])
|
133
|
-
|
134
|
-
if (endpoint and (offset % 250 == 0))
|
135
|
-
spinner.update(title: "Downloading #{entity} [#{offset}]...")
|
136
|
-
GC.start
|
137
|
-
end
|
138
|
-
end
|
139
|
-
|
140
|
-
spinner.success('(OK)')
|
141
|
-
rescue => e
|
142
|
-
spinner.error('API error')
|
143
|
-
$logger.error e.message
|
144
|
-
$logger.debug e.backtrace.join("\n")
|
145
|
-
end
|
72
|
+
def validations(name)
|
73
|
+
endpoint = "/private/validations/"
|
74
|
+
response = HTTParty.get("#{@forgeapi}#{endpoint}#{name}", headers: {'User-Agent' => @useragent})
|
75
|
+
raise "Forge Error: #{@response.body}" unless response.code == 200
|
146
76
|
|
147
|
-
|
77
|
+
flatten_validations(name, JSON.parse(response.body))
|
148
78
|
end
|
149
79
|
|
80
|
+
|
150
81
|
# transform dates into a format that bigquery knows
|
151
82
|
def munge_dates(object)
|
152
83
|
["created_at", "updated_at", "deprecated_at", "deleted_at"].each do |field|
|
@@ -160,16 +91,6 @@ class Mvp
|
|
160
91
|
object
|
161
92
|
end
|
162
93
|
|
163
|
-
def save_json(thing, data)
|
164
|
-
File.write("#{@cachedir}/#{thing}.json", data.to_json)
|
165
|
-
end
|
166
|
-
|
167
|
-
# store data in a way that bigquery can grok
|
168
|
-
# uploading files is far easier than streaming data, when replacing a dataset
|
169
|
-
def save_nld_json(thing, data)
|
170
|
-
File.write("#{@cachedir}/nld_#{thing}.json", data.to_newline_delimited_json)
|
171
|
-
end
|
172
|
-
|
173
94
|
def flatten_modules(data)
|
174
95
|
data.each do |row|
|
175
96
|
row['owner'] = row['owner']['username']
|
@@ -183,6 +104,7 @@ class Mvp
|
|
183
104
|
row['project_page'] = row['current_release']['metadata']['project_page']
|
184
105
|
row['issues_url'] = row['current_release']['metadata']['issues_url']
|
185
106
|
row['tasks'] = row['current_release']['tasks'].map{|task| task['name']} rescue []
|
107
|
+
row['plans'] = row['current_release']['plans'].map{|task| task['name']} rescue []
|
186
108
|
|
187
109
|
row['release_count'] = row['releases'].count rescue 0
|
188
110
|
row['releases'] = row['releases'].map{|r| r['version']} rescue []
|
@@ -202,21 +124,24 @@ class Mvp
|
|
202
124
|
row['project_page'] = row['metadata']['project_page']
|
203
125
|
row['issues_url'] = row['metadata']['issues_url']
|
204
126
|
row['tasks'] = row['tasks'].map{|task| task['name']} rescue []
|
127
|
+
row['plans'] = row['plans'].map{|task| task['name']} rescue []
|
205
128
|
|
206
129
|
simplify_metadata(row, row['metadata'])
|
207
|
-
|
130
|
+
|
131
|
+
# These items are just too big to store in the table, and the malware scan isn't done yet
|
132
|
+
['module', 'changelog', 'readme', 'reference', 'malware_scan'].each do |column|
|
133
|
+
row.delete(column)
|
134
|
+
end
|
208
135
|
end
|
209
136
|
data
|
210
137
|
end
|
211
138
|
|
212
|
-
def flatten_validations(
|
213
|
-
|
214
|
-
|
215
|
-
|
216
|
-
row[entry['name']] = entry['score']
|
217
|
-
end
|
218
|
-
row
|
139
|
+
def flatten_validations(name, scores)
|
140
|
+
row = { 'name' => name }
|
141
|
+
scores.each do |entry|
|
142
|
+
row[entry['name']] = entry['score']
|
219
143
|
end
|
144
|
+
row
|
220
145
|
end
|
221
146
|
|
222
147
|
def simplify_metadata(data, metadata)
|
data/lib/mvp/itemizer.rb
CHANGED
@@ -12,7 +12,7 @@ class Mvp
|
|
12
12
|
|
13
13
|
def run!(data, uploader)
|
14
14
|
data.each do |mod|
|
15
|
-
modname = mod['
|
15
|
+
modname = mod['name']
|
16
16
|
version = mod['version']
|
17
17
|
return if uploader.version_itemized?(modname, version)
|
18
18
|
|
@@ -27,13 +27,23 @@ class Mvp
|
|
27
27
|
end
|
28
28
|
end
|
29
29
|
|
30
|
+
def itemized(mod)
|
31
|
+
modname = mod[:slug]
|
32
|
+
version = mod[:version]
|
33
|
+
baserow = { :module => modname, :version => version, :kind => 'admin', :element => 'version', :count => 0}
|
34
|
+
|
35
|
+
table(itemize(modname, version), mod) << baserow
|
36
|
+
end
|
37
|
+
|
30
38
|
def download(path, modname, version)
|
31
39
|
filename = "#{modname}-#{version}.tar.gz"
|
32
40
|
Dir.chdir(path) do
|
33
41
|
File.open(filename, "w") do |file|
|
34
42
|
file << HTTParty.get( "#{@forge}/v3/files/#{filename}" )
|
35
43
|
end
|
36
|
-
|
44
|
+
# Why is tar terrible?
|
45
|
+
FileUtils.mkdir("#{modname}-#{version}")
|
46
|
+
system("tar -xf #{filename} -C #{modname}-#{version} --strip-components=1")
|
37
47
|
FileUtils.rm(filename)
|
38
48
|
end
|
39
49
|
end
|
@@ -55,13 +65,46 @@ class Mvp
|
|
55
65
|
end
|
56
66
|
end
|
57
67
|
|
68
|
+
def analyze(mod, script, debug)
|
69
|
+
require 'open3'
|
70
|
+
require 'json'
|
71
|
+
|
72
|
+
# sanitize an environment
|
73
|
+
env = {'mvp_script' => script}
|
74
|
+
mod.each do |key, value|
|
75
|
+
env["mvp_#{key}"] = value.to_s
|
76
|
+
end
|
77
|
+
|
78
|
+
downloads = mod[:downloads]
|
79
|
+
Dir.mktmpdir('mvp') do |path|
|
80
|
+
download(path, "#{mod[:owner]}-#{mod[:name]}", mod[:version])
|
81
|
+
|
82
|
+
rows = []
|
83
|
+
Dir.chdir("#{path}/#{mod[:owner]}-#{mod[:name]}-#{mod[:version]}") do
|
84
|
+
if debug
|
85
|
+
exit(1) unless system(env, ENV['SHELL'])
|
86
|
+
end
|
87
|
+
|
88
|
+
stdout, stderr, status = Open3.capture3(env, script)
|
89
|
+
|
90
|
+
if status.success?
|
91
|
+
rows = JSON.parse(stdout)
|
92
|
+
else
|
93
|
+
$logger.error stderr
|
94
|
+
end
|
95
|
+
end
|
96
|
+
|
97
|
+
return rows unless rows.empty?
|
98
|
+
end
|
99
|
+
end
|
100
|
+
|
58
101
|
# Build a table with this schema
|
59
102
|
# module | version | source | kind | element | count
|
60
103
|
def table(itemized, data)
|
61
|
-
modname = data[
|
62
|
-
slug = data[
|
63
|
-
version = data[
|
64
|
-
dependencies = data[
|
104
|
+
modname = data[:name]
|
105
|
+
slug = data[:slug]
|
106
|
+
version = data[:version]
|
107
|
+
dependencies = data[:dependencies]
|
65
108
|
|
66
109
|
itemized.map do |kind, elements|
|
67
110
|
# the kind of element comes pluralized from puppet-itemize
|
@@ -0,0 +1,171 @@
|
|
1
|
+
class Mvp
|
2
|
+
class PuppetfileParser
|
3
|
+
def initialize(options = {})
|
4
|
+
@sources = {}
|
5
|
+
@modules = []
|
6
|
+
@repo = nil
|
7
|
+
end
|
8
|
+
|
9
|
+
def suitable?
|
10
|
+
defined?(RubyVM::AbstractSyntaxTree)
|
11
|
+
end
|
12
|
+
|
13
|
+
def sources=(modules)
|
14
|
+
modules.each do |row|
|
15
|
+
next unless row[:source]
|
16
|
+
next if row[:source] == 'UNKNOWN'
|
17
|
+
|
18
|
+
@sources[canonical_git_repo(row[:source])] = row[:slug]
|
19
|
+
end
|
20
|
+
end
|
21
|
+
|
22
|
+
def parse(repo)
|
23
|
+
# This only works on Ruby 2.6+
|
24
|
+
return unless suitable?
|
25
|
+
|
26
|
+
begin
|
27
|
+
root = RubyVM::AbstractSyntaxTree.parse(repo[:content])
|
28
|
+
rescue SyntaxError => e
|
29
|
+
$logger.warn "Syntax error in #{repo[:repo_name]}/Puppetfile"
|
30
|
+
$logger.warn e.message
|
31
|
+
end
|
32
|
+
|
33
|
+
@repo = repo
|
34
|
+
@modules = []
|
35
|
+
traverse(root)
|
36
|
+
@modules.compact.map do |row|
|
37
|
+
row[:repo_name] = repo[:repo_name]
|
38
|
+
row[:md5] = repo[:md5]
|
39
|
+
row[:module] = canonical_name(row[:module], row[:source])
|
40
|
+
stringify(row)
|
41
|
+
end
|
42
|
+
end
|
43
|
+
|
44
|
+
def stringify(row)
|
45
|
+
row.each do |key, value|
|
46
|
+
if value.is_a? RubyVM::AbstractSyntaxTree::Node
|
47
|
+
row[key] = :'#<programmatically generated via ruby code>'
|
48
|
+
end
|
49
|
+
end
|
50
|
+
end
|
51
|
+
|
52
|
+
def canonical_name(name, repo)
|
53
|
+
return name if name.include?('-')
|
54
|
+
repo = canonical_git_repo(repo)
|
55
|
+
|
56
|
+
return @sources[repo] if @sources.include?(repo)
|
57
|
+
name
|
58
|
+
end
|
59
|
+
|
60
|
+
def canonical_git_repo(repo)
|
61
|
+
return unless repo
|
62
|
+
return unless repo.is_a? String
|
63
|
+
repo.sub(/^git@github.com\:/, 'github.com/')
|
64
|
+
.sub(/^(git|https?)\:\/\//, '')
|
65
|
+
.sub(/\.git$/, '')
|
66
|
+
end
|
67
|
+
|
68
|
+
def add_module(name, args)
|
69
|
+
unless name.is_a? String
|
70
|
+
$logger.warn "Non string module name in #{@repo[:repo_name]}/Puppetfile"
|
71
|
+
return nil
|
72
|
+
end
|
73
|
+
name.gsub!('/', '-')
|
74
|
+
case args
|
75
|
+
when String, Symbol, NilClass
|
76
|
+
@modules << {
|
77
|
+
:module => name,
|
78
|
+
:type => :forge,
|
79
|
+
:source => :forge,
|
80
|
+
:version => args,
|
81
|
+
}
|
82
|
+
when Hash
|
83
|
+
@modules << parse_args(name, args)
|
84
|
+
else
|
85
|
+
$logger.warn "#{@repo[:repo_name]}/Puppetfile: Unknown format: mod('#{name}', #{args.inspect})"
|
86
|
+
end
|
87
|
+
end
|
88
|
+
|
89
|
+
def parse_args(name, args)
|
90
|
+
data = {:module => name}
|
91
|
+
|
92
|
+
if args.include? :git
|
93
|
+
data[:type] = :git
|
94
|
+
data[:source] = args[:git]
|
95
|
+
data[:version] = args[:ref] || args[:tag] || args[:commit] || args[:branch] || :latest
|
96
|
+
elsif args.include? :svn
|
97
|
+
data[:type] = :svn
|
98
|
+
data[:source] = args[:svn]
|
99
|
+
data[:version] = args[:rev] || args[:revision] || :latest
|
100
|
+
elsif args.include? :boxen
|
101
|
+
data[:type] = :boxen
|
102
|
+
data[:source] = args[:repo]
|
103
|
+
data[:version] = args[:version] || :latest
|
104
|
+
else
|
105
|
+
$logger.warn "#{@repo[:repo_name]}/Puppetfile: Unknown args format: mod('#{name}', #{args.inspect})"
|
106
|
+
return nil
|
107
|
+
end
|
108
|
+
|
109
|
+
data
|
110
|
+
end
|
111
|
+
|
112
|
+
def traverse(node)
|
113
|
+
begin
|
114
|
+
if node.type == :FCALL
|
115
|
+
name = node.children.first
|
116
|
+
args = node.children.last.children.map do |item|
|
117
|
+
next if item.nil?
|
118
|
+
|
119
|
+
case item.type
|
120
|
+
when :HASH
|
121
|
+
Hash[*item.children.first.children.compact.map {|n| n.children.first }]
|
122
|
+
else
|
123
|
+
item.children.first
|
124
|
+
end
|
125
|
+
end.compact
|
126
|
+
|
127
|
+
case name
|
128
|
+
when :mod
|
129
|
+
add_module(args.shift, args.shift)
|
130
|
+
when :forge
|
131
|
+
# noop
|
132
|
+
when :moduledir
|
133
|
+
# noop
|
134
|
+
when :github
|
135
|
+
# oh boxen, you so silly.
|
136
|
+
# The order of the unpacking below *is* important.
|
137
|
+
modname = args.shift
|
138
|
+
version = args.shift
|
139
|
+
data = args.shift || {}
|
140
|
+
|
141
|
+
# this is gross but I'm not sure I actually care right now.
|
142
|
+
if (modname.is_a? String and [String, NilClass].include? version.class and data.is_a? Hash)
|
143
|
+
data[:boxen] = :boxen
|
144
|
+
data[:version] = version
|
145
|
+
add_module(modname, data)
|
146
|
+
else
|
147
|
+
$logger.warn "#{@repo[:repo_name]}/Puppetfile: malformed boxen"
|
148
|
+
end
|
149
|
+
else
|
150
|
+
# Should we record unexpected Ruby code or just log it to stdout?
|
151
|
+
args = args.map {|a| a.is_a?(String) ? "'#{a}'" : a}.join(', ')
|
152
|
+
$logger.warn "#{@repo[:repo_name]}/Puppetfile: Unexpected invocation of #{name}(#{args})"
|
153
|
+
end
|
154
|
+
end
|
155
|
+
|
156
|
+
node.children.each do |n|
|
157
|
+
next unless n.is_a? RubyVM::AbstractSyntaxTree::Node
|
158
|
+
|
159
|
+
traverse(n)
|
160
|
+
end
|
161
|
+
rescue => e
|
162
|
+
puts e.message
|
163
|
+
end
|
164
|
+
end
|
165
|
+
|
166
|
+
def test()
|
167
|
+
require 'pry'
|
168
|
+
binding.pry
|
169
|
+
end
|
170
|
+
end
|
171
|
+
end
|
data/lib/mvp/runner.rb
CHANGED
@@ -1,6 +1,10 @@
|
|
1
|
-
require 'mvp/
|
2
|
-
require 'mvp/
|
1
|
+
require 'mvp/forge'
|
2
|
+
require 'mvp/bigquery'
|
3
3
|
require 'mvp/stats'
|
4
|
+
require 'mvp/itemizer'
|
5
|
+
require 'mvp/puppetfile_parser'
|
6
|
+
|
7
|
+
require 'tty-spinner'
|
4
8
|
|
5
9
|
class Mvp
|
6
10
|
class Runner
|
@@ -11,52 +15,144 @@ class Mvp
|
|
11
15
|
end
|
12
16
|
|
13
17
|
def retrieve(target = :all, download = true)
|
14
|
-
|
18
|
+
bigquery = Mvp::Bigquery.new(@options)
|
15
19
|
|
16
|
-
|
17
|
-
|
18
|
-
|
19
|
-
|
20
|
+
begin
|
21
|
+
[:authors, :modules, :releases, :validations].each do |thing|
|
22
|
+
next unless [:all, thing].include? target
|
23
|
+
spinner = mkspinner("Retrieving #{thing} ...")
|
24
|
+
data = bigquery.retrieve(thing)
|
25
|
+
save_json(thing, data)
|
26
|
+
spinner.success('(OK)')
|
27
|
+
end
|
20
28
|
|
21
|
-
|
22
|
-
|
29
|
+
rescue => e
|
30
|
+
spinner.error("API error: #{e.message}")
|
31
|
+
$logger.error "API error: #{e.message}"
|
32
|
+
$logger.debug e.backtrace.join("\n")
|
33
|
+
sleep 10
|
23
34
|
end
|
24
35
|
end
|
25
36
|
|
26
|
-
def
|
27
|
-
|
37
|
+
def mirror(target = :all)
|
38
|
+
forge = Mvp::Forge.new(@options)
|
39
|
+
bigquery = Mvp::Bigquery.new(@options)
|
40
|
+
itemizer = Mvp::Itemizer.new(@options)
|
41
|
+
pfparser = Mvp::PuppetfileParser.new(@options)
|
28
42
|
|
29
|
-
|
30
|
-
|
31
|
-
|
43
|
+
begin
|
44
|
+
[:authors, :modules, :releases].each do |thing|
|
45
|
+
next unless [:all, thing].include? target
|
46
|
+
spinner = mkspinner("Mirroring #{thing}...")
|
47
|
+
bigquery.truncate(thing)
|
48
|
+
forge.retrieve(thing) do |data, offset|
|
49
|
+
spinner.update(title: "Mirroring #{thing} [#{offset}]...")
|
50
|
+
bigquery.insert(thing, data)
|
51
|
+
end
|
52
|
+
spinner.success('(OK)')
|
53
|
+
end
|
54
|
+
|
55
|
+
if [:all, :validations].include? target
|
56
|
+
spinner = mkspinner("Mirroring validations...")
|
57
|
+
modules = bigquery.get(:modules, [:slug])
|
58
|
+
bigquery.truncate(:validations)
|
59
|
+
forge.retrieve_validations(modules) do |data, offset|
|
60
|
+
spinner.update(title: "Mirroring validations [#{offset}]...")
|
61
|
+
bigquery.insert(:validations, data)
|
62
|
+
end
|
63
|
+
spinner.success('(OK)')
|
64
|
+
end
|
65
|
+
|
66
|
+
if [:all, :itemizations].include? target
|
67
|
+
spinner = mkspinner("Itemizing modules...")
|
68
|
+
bigquery.unitemized.each do |mod|
|
69
|
+
spinner.update(title: "Itemizing [#{mod[:slug]}]...")
|
70
|
+
rows = itemizer.itemized(mod)
|
71
|
+
bigquery.delete(:itemized, :module, mod[:slug])
|
72
|
+
bigquery.insert(:itemized, rows)
|
73
|
+
end
|
74
|
+
spinner.success('(OK)')
|
75
|
+
end
|
76
|
+
|
77
|
+
if [:all, :mirrors, :tables].include? target
|
78
|
+
@options[:gcloud][:mirrors].each do |entity|
|
79
|
+
spinner = mkspinner("Mirroring #{entity[:type]} #{entity[:name]} to BigQuery...")
|
80
|
+
bigquery.mirror_table(entity)
|
81
|
+
spinner.success('(OK)')
|
82
|
+
end
|
83
|
+
end
|
84
|
+
|
85
|
+
if [:all, :puppetfiles].include? target
|
86
|
+
spinner = mkspinner("Analyzing Puppetfile module references...")
|
87
|
+
if pfparser.suitable?
|
88
|
+
pfparser.sources = bigquery.module_sources
|
89
|
+
bigquery.puppetfiles.each do |repo|
|
90
|
+
spinner.update(title: "Analyzing [#{repo[:repo_name]}/Puppetfile]...")
|
91
|
+
rows = pfparser.parse(repo)
|
92
|
+
bigquery.delete(:puppetfile_usage, :repo_name, repo[:repo_name], :github)
|
93
|
+
bigquery.insert(:puppetfile_usage, rows, :github)
|
94
|
+
end
|
95
|
+
spinner.success('(OK)')
|
96
|
+
else
|
97
|
+
spinner.error("(Not functional on Ruby #{RUBY_VERSION})")
|
98
|
+
end
|
99
|
+
end
|
100
|
+
|
101
|
+
rescue => e
|
102
|
+
spinner.error("API error: #{e.message}")
|
103
|
+
$logger.error "API error: #{e.message}"
|
104
|
+
$logger.debug e.backtrace.join("\n")
|
105
|
+
sleep 10
|
32
106
|
end
|
33
107
|
end
|
34
108
|
|
35
|
-
def
|
36
|
-
|
37
|
-
|
109
|
+
def analyze
|
110
|
+
bigquery = Mvp::Bigquery.new(@options)
|
111
|
+
itemizer = Mvp::Itemizer.new(@options)
|
38
112
|
|
39
|
-
|
40
|
-
|
41
|
-
|
42
|
-
|
43
|
-
|
44
|
-
|
113
|
+
begin
|
114
|
+
spinner = mkspinner("Analyzing modules...")
|
115
|
+
modules = bigquery.get(:modules, [:owner, :name, :version, :downloads])
|
116
|
+
modules = modules.sample(@options[:count]) if @options[:count]
|
117
|
+
|
118
|
+
require 'csv'
|
119
|
+
csv_string = CSV.generate do |csv|
|
120
|
+
modules.each do |mod|
|
121
|
+
spinner.stop if @options[:debug]
|
122
|
+
rows = itemizer.analyze(mod, @options[:script], @options[:debug])
|
123
|
+
spinner.start if @options[:debug]
|
124
|
+
|
125
|
+
next unless rows
|
126
|
+
spinner.update(title: mod[:name])
|
127
|
+
rows.each {|row| csv << row}
|
128
|
+
end
|
129
|
+
end
|
45
130
|
|
46
|
-
|
47
|
-
|
131
|
+
File.write(@options[:output_file], csv_string)
|
132
|
+
spinner.success('(OK)')
|
48
133
|
end
|
49
134
|
end
|
50
135
|
|
51
136
|
def stats(target)
|
52
137
|
stats = Mvp::Stats.new(@options)
|
53
138
|
|
54
|
-
[:authors, :modules, :releases, :relationships, :
|
139
|
+
[:authors, :modules, :releases, :relationships, :validations].each do |thing|
|
55
140
|
next unless [:all, thing].include? target
|
56
141
|
stats.send(thing)
|
57
142
|
end
|
58
143
|
end
|
59
144
|
|
145
|
+
def mkspinner(title)
|
146
|
+
spinner = TTY::Spinner.new("[:spinner] :title")
|
147
|
+
spinner.update(title: title)
|
148
|
+
spinner.auto_spin
|
149
|
+
spinner
|
150
|
+
end
|
151
|
+
|
152
|
+
def save_json(thing, data)
|
153
|
+
File.write("#{@cachedir}/#{thing}.json", data.to_json)
|
154
|
+
end
|
155
|
+
|
60
156
|
def test()
|
61
157
|
require 'pry'
|
62
158
|
binding.pry
|
data/lib/mvp/stats.rb
CHANGED
@@ -19,7 +19,8 @@ class Mvp
|
|
19
19
|
|
20
20
|
def draw_graph(series, width, title = nil)
|
21
21
|
series.compact!
|
22
|
-
|
22
|
+
width = [width, series.size].min
|
23
|
+
graph = []
|
23
24
|
(bins, freqs) = series.histogram(:bin_width => width)
|
24
25
|
|
25
26
|
bins.each_with_index do |item, index|
|
@@ -44,6 +45,20 @@ class Mvp
|
|
44
45
|
days_ago(datestr)/365
|
45
46
|
end
|
46
47
|
|
48
|
+
def current_releases
|
49
|
+
return @current_releases if @current_releases
|
50
|
+
|
51
|
+
data_m = load('modules').reject {|m| m['owner'] == 'puppetlabs' }
|
52
|
+
data_r = load('releases').reject {|m| m['owner'] == 'puppetlabs' }
|
53
|
+
|
54
|
+
@current_releases = data_m.map {|mod|
|
55
|
+
name = mod['slug']
|
56
|
+
curr = mod['releases'].first
|
57
|
+
|
58
|
+
data_r.find {|r| r['slug'] == "#{name}-#{curr}" }
|
59
|
+
}.compact
|
60
|
+
end
|
61
|
+
|
47
62
|
def tally_author_info(releases, target, scope='module_count')
|
48
63
|
# update the author records with the fields we need
|
49
64
|
target.each do |author|
|
@@ -52,7 +67,7 @@ class Mvp
|
|
52
67
|
end
|
53
68
|
|
54
69
|
releases.each do |mod|
|
55
|
-
username = mod['
|
70
|
+
username = mod['owner']
|
56
71
|
score = mod['validation_score']
|
57
72
|
author = target.select{|m| m['username'] == username}.first
|
58
73
|
|
@@ -111,9 +126,10 @@ class Mvp
|
|
111
126
|
end
|
112
127
|
|
113
128
|
def modules()
|
114
|
-
data_m = load('modules').reject {|m| m['owner']
|
129
|
+
data_m = load('modules').reject {|m| m['owner'] == 'puppetlabs' }
|
115
130
|
data_a = load('authors').reject {|u| u['username'] == 'puppetlabs' or u['module_count'] == 0}
|
116
|
-
|
131
|
+
|
132
|
+
current = current_releases
|
117
133
|
|
118
134
|
tally_author_info(current, data_a, 'module_count')
|
119
135
|
|
@@ -155,7 +171,7 @@ class Mvp
|
|
155
171
|
end
|
156
172
|
|
157
173
|
def releases()
|
158
|
-
data_r = load('releases').reject {|m| m['
|
174
|
+
data_r = load('releases').reject {|m| m['owner'] == 'puppetlabs' }
|
159
175
|
data_a = load('authors').reject {|u| u['username'] == 'puppetlabs' or u['module_count'] == 0}
|
160
176
|
|
161
177
|
tally_author_info(data_r, data_a, 'release_count')
|
@@ -236,12 +252,12 @@ class Mvp
|
|
236
252
|
end
|
237
253
|
|
238
254
|
def relationships()
|
239
|
-
data_m = load('modules').reject {|m| m['owner']['username'] == 'puppetlabs' }
|
240
255
|
data_a = load('authors').reject {|u| u['username'] == 'puppetlabs' or u['module_count'] == 0}
|
241
|
-
current =
|
256
|
+
current = current_releases.dup
|
242
257
|
|
243
258
|
current.each do |mod|
|
244
|
-
|
259
|
+
mod['metadata'] = JSON.parse(mod['metadata'])
|
260
|
+
mod['metadata']['dependants'] = []
|
245
261
|
end
|
246
262
|
current.each do |mod|
|
247
263
|
mod['metadata']['dependencies'].each do |dependency|
|
@@ -257,7 +273,7 @@ class Mvp
|
|
257
273
|
count = mod['metadata']['dependants'].count
|
258
274
|
next unless count > 0
|
259
275
|
|
260
|
-
author = data_a.select{|m| m['username'] == mod['
|
276
|
+
author = data_a.select{|m| m['username'] == mod['owner']}.first
|
261
277
|
author['dependants'] << count
|
262
278
|
end
|
263
279
|
data_a.each { |a| a['average_dependants'] = average(a['dependants']) }
|
@@ -280,6 +296,7 @@ class Mvp
|
|
280
296
|
author['module_count'],
|
281
297
|
author['release_count'] ]
|
282
298
|
end
|
299
|
+
puts
|
283
300
|
end
|
284
301
|
|
285
302
|
def github()
|
@@ -328,7 +345,7 @@ class Mvp
|
|
328
345
|
end
|
329
346
|
|
330
347
|
def validations()
|
331
|
-
puts '
|
348
|
+
puts 'No validations yet'
|
332
349
|
end
|
333
350
|
|
334
351
|
def test()
|
data/lib/mvp.rb
CHANGED
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: puppet-community-mvp
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.0.
|
4
|
+
version: 0.0.8
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Ben Ford
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date:
|
11
|
+
date: 2022-02-01 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: json
|
@@ -151,16 +151,17 @@ files:
|
|
151
151
|
- LICENSE
|
152
152
|
- README.md
|
153
153
|
- bin/mvp
|
154
|
+
- bin/pftest.rb
|
154
155
|
- lib/mvp.rb
|
155
|
-
- lib/mvp/
|
156
|
+
- lib/mvp/bigquery.rb
|
157
|
+
- lib/mvp/forge.rb
|
156
158
|
- lib/mvp/itemizer.rb
|
157
|
-
- lib/mvp/
|
159
|
+
- lib/mvp/puppetfile_parser.rb
|
158
160
|
- lib/mvp/runner.rb
|
159
161
|
- lib/mvp/stats.rb
|
160
|
-
- lib/mvp/uploader.rb
|
161
162
|
homepage:
|
162
163
|
licenses:
|
163
|
-
- Apache
|
164
|
+
- Apache-2.0
|
164
165
|
metadata: {}
|
165
166
|
post_install_message:
|
166
167
|
rdoc_options: []
|
@@ -177,8 +178,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
177
178
|
- !ruby/object:Gem::Version
|
178
179
|
version: '0'
|
179
180
|
requirements: []
|
180
|
-
|
181
|
-
rubygems_version: 2.5.2.3
|
181
|
+
rubygems_version: 3.0.3.1
|
182
182
|
signing_key:
|
183
183
|
specification_version: 4
|
184
184
|
summary: Generate some stats about the Puppet Community.
|