dropsonde 0.0.2 → 0.0.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 4e4919cf3d11feb2d8a1a546bbc8ffd4cb4ebe182fcfe056aa39e7d382c88a9b
4
- data.tar.gz: c0bee1c43b2e3b9d317fd79ca3f3dd568b8893ff99378c14e0f6ed53a7b934ef
3
+ metadata.gz: 0ac6f28ed9d64982c400f6a7460b57f44396020e6f17f04f587cc6e5007ff115
4
+ data.tar.gz: ef5a2ac1957b4b45c756b0e4bbc5ae66291706afabd3a7b80e899702f8aa6757
5
5
  SHA512:
6
- metadata.gz: b49fc015d8cecde93318d475a25e48c4eba104e454b3d701ba995d381df865907d10f44fc842ab2ce6dca0678a62eb15c3452453b55fe2292689e790fba30eb0
7
- data.tar.gz: 337be044607d02a10ec7cd84f31560aab72d6c0a78f15f14292a9c1e528d1380c4a73be48a1677943d30bee728c9b88e6672158c123d2118941bab8af1efcd69
6
+ metadata.gz: d9bda687091ec5a1d6bd8ca7e5f1414e2c489e3a1c66a970b879d7f6a2d6ab82e11f079eebfea13ed7a710ebe20b4c0aefd0a08f1af17547e3a5cd120c587847
7
+ data.tar.gz: e96d699c38c0d158bce376f13d77d9bafa3ef5d4020e104d32694df04fcc68c91bb458d2bb858f9d120dc5b9146ffd3bfde450535cfe0b971b331b349e7dc78b
data/README.md CHANGED
@@ -63,7 +63,7 @@ For example, this aggregated data might include records that show a count of how
63
63
  many sites are using various combinations of modules together, but it will never
64
64
  include a record showing the full list of modules that any single site is using.
65
65
 
66
- With your own Google Cloud account, you can use that [dataset](https://console.cloud.google.com/bigquery?project=puppetlabs.com:api-project-53122606061)
66
+ With your own Google Cloud account, you can use that [dataset](https://console.cloud.google.com/bigquery?p=dataops-puppet-public-data&d=community&t=forge_modules&page=table)
67
67
  in your own tooling and you can see/contribute to the aggregation queries in its
68
68
  own [repository](https://github.com/puppetlabs/dropsonde-aggregation).
69
69
 
@@ -85,7 +85,9 @@ possible: [privacy@puppet.com](mailto:privacy@puppet.com)
85
85
 
86
86
  ## Installation
87
87
 
88
- This is distributed as a Ruby gem. Simply `gem install dropsonde`
88
+ This is distributed as a Ruby gem. Simply `gem install dropsonde`. There's a
89
+ [Puppet module](https://github.com/puppetlabs/puppetlabs-dropsonde) to manage it
90
+ if that's more your thing.
89
91
 
90
92
 
91
93
  ## Configuration
@@ -112,8 +114,6 @@ Run `dropsonde --help` to see usage information.
112
114
  * `preview`
113
115
  * Generate and print out an example telemetry report in human readable form
114
116
  * Annotated with descriptions of each plugin and each metric gathered.
115
- * `schema`
116
- * Generate and print out the complete combined schema.
117
117
  * `list`
118
118
  * See a quick list of the available metrics and what they do.
119
119
  * `submit`
@@ -123,6 +123,17 @@ Run `dropsonde --help` to see usage information.
123
123
  * Once a week, the list of public modules on the Forge will be updated. This
124
124
  command will manually force that cache update to happen.
125
125
 
126
+ Developer comands
127
+
128
+ * `dev example`
129
+ * To make writing aggregation queries possible without access to the private
130
+ database, this will generate a randomized example of the dataset. This is
131
+ in JSONL format, so it can be imported directly into BigQuery.
132
+ * `dev schema`
133
+ * Generate and print out the complete combined schema of all metrics.
134
+ * `dev shell`
135
+ * Open up a Pry shell with all the relevant connections open and initialized.
136
+
126
137
 
127
138
  ## Architecture
128
139
 
@@ -1,15 +1,12 @@
1
1
  #!/usr/bin/env ruby
2
2
  require 'gli'
3
3
  require 'dropsonde'
4
- require 'puppet'
5
4
 
6
5
  class Dropsonde
7
6
  extend GLI::App
8
7
 
9
- Puppet.initialize_settings
10
-
11
8
  program_desc 'A simple telemetry tool for Puppet infrastructures'
12
- config_file '/etc/puppetlabs/telemetry.yaml'
9
+ config_file "#{File.dirname(Puppet.settings[:confdir])}/telemetry.yaml"
13
10
  version Dropsonde::VERSION
14
11
 
15
12
  desc 'Verbose logging'
@@ -42,13 +39,6 @@ class Dropsonde
42
39
  end
43
40
  end
44
41
 
45
- desc 'Generate a complete schema set'
46
- command :schema do |c|
47
- c.action do |global, options, args|
48
- Dropsonde.generate_schema
49
- end
50
- end
51
-
52
42
  desc 'List all available metrics'
53
43
  command :list do |c|
54
44
  c.action do |global, options, args|
@@ -56,7 +46,7 @@ class Dropsonde
56
46
  end
57
47
  end
58
48
 
59
- desc 'Generate an example telemetry report'
49
+ desc 'Preview the telemetry report that will be submitted'
60
50
  command :preview do |c|
61
51
  c.desc 'The output format to use'
62
52
  c.flag [:format], :default_value => 'human'
@@ -80,6 +70,50 @@ class Dropsonde
80
70
  Dropsonde.submit_report(options[:endpoint], options[:port])
81
71
  end
82
72
  end
73
+
74
+ desc "Commands useful for developers"
75
+ command :dev do |t|
76
+ t.desc 'Open a Pry shell for debugging'
77
+ t.command :shell do |c|
78
+ c.action do |global, options, args|
79
+ require 'pry'
80
+ binding.pry
81
+ end
82
+ end
83
+
84
+ t.desc 'Generate a complete schema for all metrics'
85
+ t.long_desc "This generates the schema that is used to create or update the BigQuery
86
+ database. Every report is also validated against this schema before
87
+ submission, so you can be assured that this is a complete representation
88
+ of what data is collected and run through aggregation filters."
89
+ t.command :schema do |c|
90
+ c.action do |global, options, args|
91
+ Dropsonde.generate_schema
92
+ end
93
+ end
94
+
95
+ t.desc 'Generate an example of random data to simulate actual reports'
96
+ t.long_desc "The submitted telemetry reports are treated as sensitive material. Very
97
+ few people have access to that raw data. Instead, it's run through some
98
+ data aggregation filters to generate the published statistics we share.
99
+ Writing those aggregation queries is difficult without data to work with,
100
+ so this command generates a representative example of random data.
101
+
102
+ This is in jsonl format for direct upload to BigQuery."
103
+ t.command :example do |c|
104
+ c.desc 'How many rows to generate'
105
+ c.flag [:size], :default_value => 100
106
+
107
+ c.desc 'Filename for the output (in jsonl format).'
108
+ c.flag [:filename], :default_value => 'example.jsonl'
109
+
110
+ c.action do |global, options, args|
111
+ Dropsonde::Cache.autoupdate
112
+ Dropsonde.generate_example(options[:size], options[:filename])
113
+ end
114
+ end
115
+ end
116
+
83
117
  end
84
118
 
85
119
  exit Dropsonde.run(ARGV)
@@ -2,6 +2,7 @@ require 'json'
2
2
  require 'httpclient'
3
3
  require 'puppetdb'
4
4
  require 'inifile'
5
+ require 'puppet'
5
6
 
6
7
  class Dropsonde
7
8
  require 'dropsonde/cache'
@@ -9,6 +10,8 @@ class Dropsonde
9
10
  require 'dropsonde/monkeypatches'
10
11
  require 'dropsonde/version'
11
12
 
13
+ Puppet.initialize_settings
14
+
12
15
  @@pdbclient = nil
13
16
  @@settings = {}
14
17
  def self.settings=(arg)
@@ -63,6 +66,16 @@ class Dropsonde
63
66
  end
64
67
  end
65
68
 
69
+ def self.generate_example(size, filename)
70
+ metrics = Dropsonde::Metrics.new
71
+ File.open(filename, 'w') do |file|
72
+ for i in 0...size
73
+ file.write(metrics.example.to_json)
74
+ file.write("\n")
75
+ end
76
+ end
77
+ end
78
+
66
79
  def self.puppetDB
67
80
  return @@pdbclient if @@pdbclient
68
81
 
@@ -43,20 +43,22 @@ class Dropsonde::Cache
43
43
  end
44
44
 
45
45
  def self.update
46
+ puts "Updating module cache..."
46
47
  iter = PuppetForge::Module.all(:sort_by => 'latest_release')
47
48
  newest = DateTime.parse(@@cache['timestamp'])
48
49
 
49
- @@cache['timestamp'] = iter.first.created_at
50
+ @@cache['timestamp'] = iter.first.updated_at
50
51
 
51
52
  until iter.next.nil?
52
53
  # stop once we reach modules we've already cached
53
- break if DateTime.parse(iter.first.created_at) <= newest
54
+ break if DateTime.parse(iter.first.updated_at) <= newest
54
55
 
55
56
  @@cache['modules'].concat iter.map {|mod| mod.slug }
56
57
 
57
58
  iter = iter.next
58
59
  print '.'
59
60
  end
61
+ puts
60
62
  @@cache['modules'].sort!
61
63
  @@cache['modules'].uniq!
62
64
 
@@ -66,7 +68,12 @@ class Dropsonde::Cache
66
68
  def self.autoupdate
67
69
  return unless @@autoupdate
68
70
 
69
- update unless File.file? @@path
71
+ unless File.file? @@path
72
+ puts "Dropsonde caches a list of all Forge modules to ensure that it only reports"
73
+ puts "usage data on public modules. Generating this cache may take some time on"
74
+ puts "the first run and you'll see your screen fill up with dots."
75
+ update
76
+ end
70
77
 
71
78
  if (Date.today - File.mtime(@@path).to_date).to_i > @@ttl
72
79
  update
@@ -78,7 +78,7 @@ class Dropsonde::Metrics
78
78
  snapshots = {}
79
79
  Dropsonde::Metrics.plugins.each do |name, plugin|
80
80
  plugin.setup
81
- sanity_check_data(plugin).each do |row|
81
+ sanity_check_data(plugin, plugin.run).each do |row|
82
82
  snapshots[row.keys.first] = {
83
83
  'value' => row.values.first,
84
84
  'timestamp' => Time.now.iso8601,
@@ -92,8 +92,26 @@ class Dropsonde::Metrics
92
92
  results
93
93
  end
94
94
 
95
- def sanity_check_data(plugin)
96
- data = plugin.run
95
+ def example
96
+ require 'ipaddr'
97
+ results = skeleton_report
98
+ results[:message_id] = generate_guid
99
+ results[:timestamp] = rand((Time.now - 60 * 60 * 24 * 365)..Time.now).utc
100
+ results[:ip] = IPAddr.new(rand(2**32), Socket::AF_INET)
101
+ results.delete(:'self-service-analytics')
102
+
103
+ Dropsonde::Metrics.plugins.each do |name, plugin|
104
+ sanity_check_data(plugin, plugin.example).each do |row|
105
+ results.merge!(row)
106
+ end
107
+ end
108
+
109
+ results
110
+ end
111
+
112
+ # We accept both the plugin and data gathered from the plugin so that
113
+ # we can sanitize both data and example data
114
+ def sanity_check_data(plugin, data)
97
115
  keys_data = data.map {|item| item.keys }.flatten.map(&:to_s)
98
116
  keys_schema = plugin.schema.map {|item| item[:name] }
99
117
 
@@ -181,4 +199,14 @@ class Dropsonde::Metrics
181
199
  }
182
200
  }
183
201
  end
202
+
203
+ def generate_guid
204
+ "%s-%s-%s-%s-%s" % [
205
+ (0..8).to_a.map{|a| rand(16).to_s(16)}.join,
206
+ (0..4).to_a.map{|a| rand(16).to_s(16)}.join,
207
+ (0..4).to_a.map{|a| rand(16).to_s(16)}.join,
208
+ (0..4).to_a.map{|a| rand(16).to_s(16)}.join,
209
+ (0..12).to_a.map{|a| rand(16).to_s(16)}.join
210
+ ]
211
+ end
184
212
  end
@@ -64,6 +64,22 @@ class Dropsonde::Metrics::Dependencies
64
64
 
65
65
  end
66
66
 
67
+ def self.example
68
+ # this method is used to generate a table filled with randomized data to
69
+ # make it easier to write data aggregation queries without access to the
70
+ # actual private data that users have submitted.
71
+
72
+ versions = ['>= 1.5.2', '>= 4.3.2', '>= 3.0.0 < 4.0.0', '>= 2.2.1 < 5.0.0', '>= 5.0.0 < 7.0.0', '>= 4.11.0']
73
+ [
74
+ :dependencies => Dropsonde::Cache.modules
75
+ .sample(rand(250))
76
+ .map {|item| {
77
+ :name => item,
78
+ :version_requirement => versions.sample,
79
+ }},
80
+ ]
81
+ end
82
+
67
83
  def self.cleanup
68
84
  # run just after generating this metric
69
85
  end
@@ -108,6 +108,30 @@ class Dropsonde::Metrics::Modules
108
108
  ]
109
109
  end
110
110
 
111
+ def self.example
112
+ # this method is used to generate a table filled with randomized data to
113
+ # make it easier to write data aggregation queries without access to the
114
+ # actual private data that users have submitted.
115
+
116
+ versions = ['1.3.2', '0.0.1', '0.1.2', '1.0.0', '3.0.2', '7.10', '6.1.0', '2.1.0', '1.4.0']
117
+ classes = ['', '::Config', '::Service', '::Server', '::Client', '::Packages']
118
+ [
119
+ :modules => Dropsonde::Cache.modules
120
+ .sample(rand(100))
121
+ .map {|item| {
122
+ :name => item.split('-').last,
123
+ :slug => item,
124
+ :version => versions.sample,
125
+ }},
126
+ :classes => Dropsonde::Cache.modules
127
+ .sample(rand(500))
128
+ .map {|item| {
129
+ :name => item.split('-').last.capitalize + classes.sample,
130
+ :count => rand(750),
131
+ }},
132
+ ]
133
+ end
134
+
111
135
  def self.cleanup
112
136
  # run just after generating this metric
113
137
  end
@@ -70,6 +70,22 @@ class Dropsonde::Metrics::Puppetfiles
70
70
  ]
71
71
  end
72
72
 
73
+ def self.example
74
+ # this method is used to generate a table filled with randomized data to
75
+ # make it easier to write data aggregation queries without access to the
76
+ # actual private data that users have submitted.
77
+ [
78
+ :puppetfile_ruby_methods => [
79
+ {:name => 'require', :count => rand(200)},
80
+ {:name => 'each', :count => rand(200)},
81
+ {:name => 'puts', :count => rand(200)},
82
+ {:name => 'select', :count => rand(200)},
83
+ {:name => 'reject', :count => rand(200)},
84
+ {:name => 'read', :count => rand(200)},
85
+ ].shuffle,
86
+ ]
87
+ end
88
+
73
89
  def self.cleanup
74
90
  # run just after generating this metric
75
91
  end
@@ -1,3 +1,3 @@
1
1
  class Dropsonde
2
- VERSION = '0.0.2'
2
+ VERSION = '0.0.3'
3
3
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: dropsonde
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.0.2
4
+ version: 0.0.3
5
5
  platform: ruby
6
6
  authors:
7
7
  - Ben Ford
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2020-05-07 00:00:00.000000000 Z
11
+ date: 2020-05-24 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: json