RubyGems - puppet-community-mvp - Versions diffs - 0.0.6 → 0.0.7 - Mend

puppet-community-mvp 0.0.6 → 0.0.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (7) hide show

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 645c5ea1f55ad14ff3a0aa0f7a41e436cb17570d647f44d00b3ae0b27aab606c
-  data.tar.gz: 6068f00d22ce28a045ed5361422e33f1027d4f5aaed138a7de8768f4f9deece0
+  metadata.gz: dd83202b003a900b8744b0fc8da5bb14b6024ca37f1419c841d68afaa4b487dd
+  data.tar.gz: c69cfa9c035b30136593d10dbad96d588fd0fe8dd870ba0763068f0c756af5cc
 SHA512:
-  metadata.gz: 62570f5212319c8fc8fbb683230b00d5fe34c3ba8d5cda86eab7d0fd1379efe65d894c0803bc6e2db15cdebea29a07043dbb2fbbc1195fbbe6508b0ff430706e
-  data.tar.gz: a55c36b38b69025933e880c3ef2f9e8f0612f36e28396a328b60d4de617f11ff68cf7c76230195fa7e9c340cd780c00716d085cf38b2ef9d0f52c9481feebdb1
+  metadata.gz: 5370badaaa4208281fa6e864a398482cd6403aeacaa8ef1ec7ac661d39287cff944a483c3f16ed352feb27b3c230573d627ebf75c43d5be4d496313553706f11
+  data.tar.gz: 1418192f0b6adc010b7982b34c2b8f1654b6a3fa6fbd88533a2bf766aea2a4ad3ffb63ba6d3d3481c67c1cf74fa0ec9796dc520474558750bda62fd32d05d24a

data/README.md CHANGED Viewed

@@ -0,0 +1,78 @@
+# Puppet Community MVP tool
+This is a simple tool to generate stats about the Puppet community. It was
+originally intended to show the "most valuable players" but has since morphed to
+show a lot of other things too. We primarily use it on a weekly cron job to
+gather information using the Forge APIs and normalizing them so that they can be
+easily combined with simple SQL queries to generate usage information.
+## Interactive usage
+If you're not working on our community stats pipeline, then there are only three
+subcommands you'll be interested in.
+### `stats`
+This subcommand will use cached data to generate a report of Forge community
+statistics. For example, it will generate distributions of module quality
+scores, or releases per module, or modules per author, etc. And it will generate
+sparklines showing the contributions over time of the most prolific Forge
+authors and it will show authors who aren't as active as they used to be.
+Unfortunately, this report is not customizable or templatable at this point.
+You will need cached data before you can generate this report. See the `get` subcommand.
+### `get`
+This subcommand will download and cache a local mirror of the data stored in our
+BigQuery database. This data is used for the `stats` command.
+### `analyze`
+This subcommand is maybe the most interesting. Many interesting bits of
+information can be gathered by inspecting the source code of modules, not by
+running SQL queries about their statistics. For example, `find manifests/ -name
+'*.pp' | wc -l` will tell you how many manifests any given module includes, and
+`grep -rn '--no-external-facts' facts.d/` will tell you how many external facts
+are invoking `facter` to gather and use _other_ facts while running.
+This command lets you write that little bit of analysis code as a script, and
+then systematically run that script against the current release of every single
+module on the Forge and collate the generated output.
+A script can be written in any language and will be executed from the root of
+the unpacked module. It will be invoked with an environment containing the following
+variables:
+* `mvp_owner` -- the Forge namespace of the module, aka the author's username
+* `mvp_name` -- the name of the module itself
+* `mvp_version` -- the current version of the module
+* `mvp_downloads` -- the number of downloads this module has. A *rough* estimation of popularity
+The script should print an array of arrays in JSON format to STDOUT. These will be
+combined to make a CSV file, the columns of which are defined by the data you
+return. In other words, the items in the inner array(s) are totally up to you.
+They will become the columns of the generated CSV file.
+The parameters relevant to this subcommand are:
+```
+    -o, --output_file OUTPUT_FILE    The path to save a csv report.
+        --script SCRIPT              The script file to analyze a module. See docs for interface.
+        --count N                    For debugging. Select a random list of this many modules to analyze.
+    -d, --debug                      Display extra debugging information.
+```
+See files in the `scripts/` directory for examples of analysis scripts. To use,
+just path of a script, like
+```
+$ mvp analyze --script scripts/manifest_count.rb --count 5
+[✔] stdlib (OK)
+$ cat analyzed.csv
+...
+```

data/bin/mvp CHANGED Viewed

@@ -24,6 +24,10 @@ or download and itemize each Forge module.
       * Optional targets: all, authors, modules, releases
   * stats
       * Print out a summary of interesting stats.
+  * analyze <script file>
+      * Run a specified script to analyze each module to generate arbitrary stats
+      * Writes output to a csv file, analyzed.csv by default
 "
   opts.on("-f FORGEAPI", "--forgeapi FORGEAPI", "Forge API server. Rarely needed.") do |arg|
@@ -58,6 +62,14 @@ or download and itemize each Forge module.
     options[:output_file] = arg
   end
+  opts.on("--script SCRIPT", "The script file to analyze a module. See docs for interface.") do |arg|
+    options[:script] = arg
+  end
+  opts.on("--count N", "For debugging. Select a random list of this many modules to analyze.") do |arg|
+    options[:count] = arg.to_i
+  end
   opts.on("-d", "--debug", "Display extra debugging information.") do
     options[:debug] = true
   end
@@ -85,18 +97,24 @@ options[:gcloud][:dataset] ||= 'community'
 options[:gcloud][:project] ||= 'puppet'
 options[:gcloud][:keyfile] ||= '~/.mvp/credentials.json'
+options[:script]             = File.expand_path(options[:script]) if options[:script]
 options[:cachedir]           = File.expand_path(options[:cachedir])
 options[:github_data]        = File.expand_path(options[:github_data])
 options[:gcloud][:keyfile]   = File.expand_path(options[:gcloud][:keyfile])
 FileUtils.mkdir_p(options[:cachedir])
+command, target = ARGV
+case command
+when 'analyze'
+  options[:output_file] ||= 'analyzed.csv'
+end
 $logger           = Logger::new(STDOUT)
 $logger.level     = options[:debug] ? Logger::DEBUG : Logger::INFO
 $logger.formatter = proc { |severity,datetime,progname,msg| "#{severity}: #{msg}\n" }
 runner = Mvp::Runner.new(options)
-command, target = ARGV
 case command
 when 'get', 'retrieve', 'download'
   target ||= :all
@@ -110,6 +128,9 @@ when 'stats'
   target ||= :all
   runner.stats(target.to_sym)
+when 'analyze'
+  runner.analyze
 when 'test'
   runner.test

data/lib/mvp/forge.rb CHANGED Viewed

@@ -128,8 +128,8 @@ class Mvp
         simplify_metadata(row, row['metadata'])
-        # These items are just too big to store in the table
-        ['module', 'changelog', 'readme', 'reference'].each do |column|
+        # These items are just too big to store in the table, and the malware scan isn't done yet
+        ['module', 'changelog', 'readme', 'reference', 'malware_scan'].each do |column|
           row.delete(column)
         end
       end

data/lib/mvp/itemizer.rb CHANGED Viewed

@@ -12,7 +12,7 @@ class Mvp
     def run!(data, uploader)
       data.each do |mod|
-        modname = mod['slug']
+        modname = mod['name']
         version = mod['version']
         return if uploader.version_itemized?(modname, version)
@@ -41,7 +41,9 @@ class Mvp
         File.open(filename, "w") do |file|
           file << HTTParty.get( "#{@forge}/v3/files/#{filename}" )
         end
-        system("tar -xf #{filename}")
+        # Why is tar terrible?
+        FileUtils.mkdir("#{modname}-#{version}")
+        system("tar -xf #{filename} -C #{modname}-#{version} --strip-components=1")
         FileUtils.rm(filename)
       end
     end
@@ -63,6 +65,39 @@ class Mvp
       end
     end
+    def analyze(mod, script, debug)
+      require 'open3'
+      require 'json'
+      # sanitize an environment
+      env = {'mvp_script' => script}
+      mod.each do |key, value|
+        env["mvp_#{key}"] = value.to_s
+      end
+      downloads = mod[:downloads]
+      Dir.mktmpdir('mvp') do |path|
+        download(path, "#{mod[:owner]}-#{mod[:name]}", mod[:version])
+        rows = []
+        Dir.chdir("#{path}/#{mod[:owner]}-#{mod[:name]}-#{mod[:version]}") do
+          if debug
+            exit(1) unless system(env, ENV['SHELL'])
+          end
+          stdout, stderr, status = Open3.capture3(env, script)
+          if status.success?
+            rows = JSON.parse(stdout)
+          else
+            $logger.error stderr
+          end
+        end
+        return rows unless rows.empty?
+      end
+    end
     # Build a table with this schema
     # module | version | source | kind | element | count
     def table(itemized, data)

data/lib/mvp/runner.rb CHANGED Viewed

@@ -106,6 +106,33 @@ class Mvp
       end
     end
+    def analyze
+      bigquery = Mvp::Bigquery.new(@options)
+      itemizer = Mvp::Itemizer.new(@options)
+      begin
+        spinner = mkspinner("Analyzing modules...")
+        modules = bigquery.get(:modules, [:owner, :name, :version, :downloads])
+        modules = modules.sample(@options[:count]) if @options[:count]
+        require 'csv'
+        csv_string = CSV.generate do |csv|
+          modules.each do |mod|
+            spinner.stop if @options[:debug]
+            rows = itemizer.analyze(mod, @options[:script], @options[:debug])
+            spinner.start if @options[:debug]
+            next unless rows
+            spinner.update(title: mod[:name])
+            rows.each {|row| csv << row}
+          end
+        end
+        File.write(@options[:output_file], csv_string)
+        spinner.success('(OK)')
+      end
+    end
     def stats(target)
       stats = Mvp::Stats.new(@options)

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: puppet-community-mvp
 version: !ruby/object:Gem::Version
-  version: 0.0.6
+  version: 0.0.7
 platform: ruby
 authors:
 - Ben Ford
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2020-02-28 00:00:00.000000000 Z
+date: 2021-08-16 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: json
@@ -178,7 +178,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
     - !ruby/object:Gem::Version
       version: '0'
 requirements: []
-rubygems_version: 3.0.6
+rubygems_version: 3.0.3
 signing_key:
 specification_version: 4
 summary: Generate some stats about the Puppet Community.