puppet-community-mvp 0.0.6 → 0.0.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 645c5ea1f55ad14ff3a0aa0f7a41e436cb17570d647f44d00b3ae0b27aab606c
4
- data.tar.gz: 6068f00d22ce28a045ed5361422e33f1027d4f5aaed138a7de8768f4f9deece0
3
+ metadata.gz: dd83202b003a900b8744b0fc8da5bb14b6024ca37f1419c841d68afaa4b487dd
4
+ data.tar.gz: c69cfa9c035b30136593d10dbad96d588fd0fe8dd870ba0763068f0c756af5cc
5
5
  SHA512:
6
- metadata.gz: 62570f5212319c8fc8fbb683230b00d5fe34c3ba8d5cda86eab7d0fd1379efe65d894c0803bc6e2db15cdebea29a07043dbb2fbbc1195fbbe6508b0ff430706e
7
- data.tar.gz: a55c36b38b69025933e880c3ef2f9e8f0612f36e28396a328b60d4de617f11ff68cf7c76230195fa7e9c340cd780c00716d085cf38b2ef9d0f52c9481feebdb1
6
+ metadata.gz: 5370badaaa4208281fa6e864a398482cd6403aeacaa8ef1ec7ac661d39287cff944a483c3f16ed352feb27b3c230573d627ebf75c43d5be4d496313553706f11
7
+ data.tar.gz: 1418192f0b6adc010b7982b34c2b8f1654b6a3fa6fbd88533a2bf766aea2a4ad3ffb63ba6d3d3481c67c1cf74fa0ec9796dc520474558750bda62fd32d05d24a
data/README.md CHANGED
@@ -0,0 +1,78 @@
1
+ # Puppet Community MVP tool
2
+
3
+ This is a simple tool to generate stats about the Puppet community. It was
4
+ originally intended to show the "most valuable players" but has since morphed to
5
+ show a lot of other things too. We primarily use it on a weekly cron job to
6
+ gather information using the Forge APIs and normalizing them so that they can be
7
+ easily combined with simple SQL queries to generate usage information.
8
+
9
+ ## Interactive usage
10
+
11
+ If you're not working on our community stats pipeline, then there are only three
12
+ subcommands you'll be interested in.
13
+
14
+ ### `stats`
15
+
16
+ This subcommand will use cached data to generate a report of Forge community
17
+ statistics. For example, it will generate distributions of module quality
18
+ scores, or releases per module, or modules per author, etc. And it will generate
19
+ sparklines showing the contributions over time of the most prolific Forge
20
+ authors and it will show authors who aren't as active as they used to be.
21
+
22
+ Unfortunately, this report is not customizable or templatable at this point.
23
+
24
+ You will need cached data before you can generate this report. See the `get` subcommand.
25
+
26
+
27
+ ### `get`
28
+
29
+ This subcommand will download and cache a local mirror of the data stored in our
30
+ BigQuery database. This data is used for the `stats` command.
31
+
32
+
33
+ ### `analyze`
34
+
35
+ This subcommand is maybe the most interesting. Many interesting bits of
36
+ information can be gathered by inspecting the source code of modules, not by
37
+ running SQL queries about their statistics. For example, `find manifests/ -name
38
+ '*.pp' | wc -l` will tell you how many manifests any given module includes, and
39
+ `grep -rn '--no-external-facts' facts.d/` will tell you how many external facts
40
+ are invoking `facter` to gather and use _other_ facts while running.
41
+
42
+ This command lets you write that little bit of analysis code as a script, and
43
+ then systematically run that script against the current release of every single
44
+ module on the Forge and collate the generated output.
45
+
46
+ A script can be written in any language and will be executed from the root of
47
+ the unpacked module. It will be invoked with an environment containing the following
48
+ variables:
49
+
50
+ * `mvp_owner` -- the Forge namespace of the module, aka the author's username
51
+ * `mvp_name` -- the name of the module itself
52
+ * `mvp_version` -- the current version of the module
53
+ * `mvp_downloads` -- the number of downloads this module has. A *rough* estimation of popularity
54
+
55
+ The script should print an array of arrays in JSON format to STDOUT. These will be
56
+ combined to make a CSV file, the columns of which are defined by the data you
57
+ return. In other words, the items in the inner array(s) are totally up to you.
58
+ They will become the columns of the generated CSV file.
59
+
60
+ The parameters relevant to this subcommand are:
61
+
62
+ ```
63
+ -o, --output_file OUTPUT_FILE The path to save a csv report.
64
+ --script SCRIPT The script file to analyze a module. See docs for interface.
65
+ --count N For debugging. Select a random list of this many modules to analyze.
66
+ -d, --debug Display extra debugging information.
67
+ ```
68
+
69
+ See files in the `scripts/` directory for examples of analysis scripts. To use,
70
+ just path of a script, like
71
+
72
+ ```
73
+ $ mvp analyze --script scripts/manifest_count.rb --count 5
74
+ [✔] stdlib (OK)
75
+ $ cat analyzed.csv
76
+ ...
77
+ ```
78
+
data/bin/mvp CHANGED
@@ -24,6 +24,10 @@ or download and itemize each Forge module.
24
24
  * Optional targets: all, authors, modules, releases
25
25
  * stats
26
26
  * Print out a summary of interesting stats.
27
+ * analyze <script file>
28
+ * Run a specified script to analyze each module to generate arbitrary stats
29
+ * Writes output to a csv file, analyzed.csv by default
30
+
27
31
  "
28
32
 
29
33
  opts.on("-f FORGEAPI", "--forgeapi FORGEAPI", "Forge API server. Rarely needed.") do |arg|
@@ -58,6 +62,14 @@ or download and itemize each Forge module.
58
62
  options[:output_file] = arg
59
63
  end
60
64
 
65
+ opts.on("--script SCRIPT", "The script file to analyze a module. See docs for interface.") do |arg|
66
+ options[:script] = arg
67
+ end
68
+
69
+ opts.on("--count N", "For debugging. Select a random list of this many modules to analyze.") do |arg|
70
+ options[:count] = arg.to_i
71
+ end
72
+
61
73
  opts.on("-d", "--debug", "Display extra debugging information.") do
62
74
  options[:debug] = true
63
75
  end
@@ -85,18 +97,24 @@ options[:gcloud][:dataset] ||= 'community'
85
97
  options[:gcloud][:project] ||= 'puppet'
86
98
  options[:gcloud][:keyfile] ||= '~/.mvp/credentials.json'
87
99
 
100
+ options[:script] = File.expand_path(options[:script]) if options[:script]
88
101
  options[:cachedir] = File.expand_path(options[:cachedir])
89
102
  options[:github_data] = File.expand_path(options[:github_data])
90
103
  options[:gcloud][:keyfile] = File.expand_path(options[:gcloud][:keyfile])
91
104
  FileUtils.mkdir_p(options[:cachedir])
92
105
 
106
+ command, target = ARGV
107
+ case command
108
+ when 'analyze'
109
+ options[:output_file] ||= 'analyzed.csv'
110
+ end
111
+
93
112
  $logger = Logger::new(STDOUT)
94
113
  $logger.level = options[:debug] ? Logger::DEBUG : Logger::INFO
95
114
  $logger.formatter = proc { |severity,datetime,progname,msg| "#{severity}: #{msg}\n" }
96
115
 
97
116
  runner = Mvp::Runner.new(options)
98
117
 
99
- command, target = ARGV
100
118
  case command
101
119
  when 'get', 'retrieve', 'download'
102
120
  target ||= :all
@@ -110,6 +128,9 @@ when 'stats'
110
128
  target ||= :all
111
129
  runner.stats(target.to_sym)
112
130
 
131
+ when 'analyze'
132
+ runner.analyze
133
+
113
134
  when 'test'
114
135
  runner.test
115
136
 
data/lib/mvp/forge.rb CHANGED
@@ -128,8 +128,8 @@ class Mvp
128
128
 
129
129
  simplify_metadata(row, row['metadata'])
130
130
 
131
- # These items are just too big to store in the table
132
- ['module', 'changelog', 'readme', 'reference'].each do |column|
131
+ # These items are just too big to store in the table, and the malware scan isn't done yet
132
+ ['module', 'changelog', 'readme', 'reference', 'malware_scan'].each do |column|
133
133
  row.delete(column)
134
134
  end
135
135
  end
data/lib/mvp/itemizer.rb CHANGED
@@ -12,7 +12,7 @@ class Mvp
12
12
 
13
13
  def run!(data, uploader)
14
14
  data.each do |mod|
15
- modname = mod['slug']
15
+ modname = mod['name']
16
16
  version = mod['version']
17
17
  return if uploader.version_itemized?(modname, version)
18
18
 
@@ -41,7 +41,9 @@ class Mvp
41
41
  File.open(filename, "w") do |file|
42
42
  file << HTTParty.get( "#{@forge}/v3/files/#{filename}" )
43
43
  end
44
- system("tar -xf #{filename}")
44
+ # Why is tar terrible?
45
+ FileUtils.mkdir("#{modname}-#{version}")
46
+ system("tar -xf #{filename} -C #{modname}-#{version} --strip-components=1")
45
47
  FileUtils.rm(filename)
46
48
  end
47
49
  end
@@ -63,6 +65,39 @@ class Mvp
63
65
  end
64
66
  end
65
67
 
68
+ def analyze(mod, script, debug)
69
+ require 'open3'
70
+ require 'json'
71
+
72
+ # sanitize an environment
73
+ env = {'mvp_script' => script}
74
+ mod.each do |key, value|
75
+ env["mvp_#{key}"] = value.to_s
76
+ end
77
+
78
+ downloads = mod[:downloads]
79
+ Dir.mktmpdir('mvp') do |path|
80
+ download(path, "#{mod[:owner]}-#{mod[:name]}", mod[:version])
81
+
82
+ rows = []
83
+ Dir.chdir("#{path}/#{mod[:owner]}-#{mod[:name]}-#{mod[:version]}") do
84
+ if debug
85
+ exit(1) unless system(env, ENV['SHELL'])
86
+ end
87
+
88
+ stdout, stderr, status = Open3.capture3(env, script)
89
+
90
+ if status.success?
91
+ rows = JSON.parse(stdout)
92
+ else
93
+ $logger.error stderr
94
+ end
95
+ end
96
+
97
+ return rows unless rows.empty?
98
+ end
99
+ end
100
+
66
101
  # Build a table with this schema
67
102
  # module | version | source | kind | element | count
68
103
  def table(itemized, data)
data/lib/mvp/runner.rb CHANGED
@@ -106,6 +106,33 @@ class Mvp
106
106
  end
107
107
  end
108
108
 
109
+ def analyze
110
+ bigquery = Mvp::Bigquery.new(@options)
111
+ itemizer = Mvp::Itemizer.new(@options)
112
+
113
+ begin
114
+ spinner = mkspinner("Analyzing modules...")
115
+ modules = bigquery.get(:modules, [:owner, :name, :version, :downloads])
116
+ modules = modules.sample(@options[:count]) if @options[:count]
117
+
118
+ require 'csv'
119
+ csv_string = CSV.generate do |csv|
120
+ modules.each do |mod|
121
+ spinner.stop if @options[:debug]
122
+ rows = itemizer.analyze(mod, @options[:script], @options[:debug])
123
+ spinner.start if @options[:debug]
124
+
125
+ next unless rows
126
+ spinner.update(title: mod[:name])
127
+ rows.each {|row| csv << row}
128
+ end
129
+ end
130
+
131
+ File.write(@options[:output_file], csv_string)
132
+ spinner.success('(OK)')
133
+ end
134
+ end
135
+
109
136
  def stats(target)
110
137
  stats = Mvp::Stats.new(@options)
111
138
 
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: puppet-community-mvp
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.0.6
4
+ version: 0.0.7
5
5
  platform: ruby
6
6
  authors:
7
7
  - Ben Ford
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2020-02-28 00:00:00.000000000 Z
11
+ date: 2021-08-16 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: json
@@ -178,7 +178,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
178
178
  - !ruby/object:Gem::Version
179
179
  version: '0'
180
180
  requirements: []
181
- rubygems_version: 3.0.6
181
+ rubygems_version: 3.0.3
182
182
  signing_key:
183
183
  specification_version: 4
184
184
  summary: Generate some stats about the Puppet Community.