dropsonde 0.0.2 → 0.0.3

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 4e4919cf3d11feb2d8a1a546bbc8ffd4cb4ebe182fcfe056aa39e7d382c88a9b
4
- data.tar.gz: c0bee1c43b2e3b9d317fd79ca3f3dd568b8893ff99378c14e0f6ed53a7b934ef
3
+ metadata.gz: 0ac6f28ed9d64982c400f6a7460b57f44396020e6f17f04f587cc6e5007ff115
4
+ data.tar.gz: ef5a2ac1957b4b45c756b0e4bbc5ae66291706afabd3a7b80e899702f8aa6757
5
5
  SHA512:
6
- metadata.gz: b49fc015d8cecde93318d475a25e48c4eba104e454b3d701ba995d381df865907d10f44fc842ab2ce6dca0678a62eb15c3452453b55fe2292689e790fba30eb0
7
- data.tar.gz: 337be044607d02a10ec7cd84f31560aab72d6c0a78f15f14292a9c1e528d1380c4a73be48a1677943d30bee728c9b88e6672158c123d2118941bab8af1efcd69
6
+ metadata.gz: d9bda687091ec5a1d6bd8ca7e5f1414e2c489e3a1c66a970b879d7f6a2d6ab82e11f079eebfea13ed7a710ebe20b4c0aefd0a08f1af17547e3a5cd120c587847
7
+ data.tar.gz: e96d699c38c0d158bce376f13d77d9bafa3ef5d4020e104d32694df04fcc68c91bb458d2bb858f9d120dc5b9146ffd3bfde450535cfe0b971b331b349e7dc78b
data/README.md CHANGED
@@ -63,7 +63,7 @@ For example, this aggregated data might include records that show a count of how
63
63
  many sites are using various combinations of modules together, but it will never
64
64
  include a record showing the full list of modules that any single site is using.
65
65
 
66
- With your own Google Cloud account, you can use that [dataset](https://console.cloud.google.com/bigquery?project=puppetlabs.com:api-project-53122606061)
66
+ With your own Google Cloud account, you can use that [dataset](https://console.cloud.google.com/bigquery?p=dataops-puppet-public-data&d=community&t=forge_modules&page=table)
67
67
  in your own tooling and you can see/contribute to the aggregation queries in its
68
68
  own [repository](https://github.com/puppetlabs/dropsonde-aggregation).
69
69
 
@@ -85,7 +85,9 @@ possible: [privacy@puppet.com](mailto:privacy@puppet.com)
85
85
 
86
86
  ## Installation
87
87
 
88
- This is distributed as a Ruby gem. Simply `gem install dropsonde`
88
+ This is distributed as a Ruby gem. Simply `gem install dropsonde`. There's a
89
+ [Puppet module](https://github.com/puppetlabs/puppetlabs-dropsonde) to manage it
90
+ if that's more your thing.
89
91
 
90
92
 
91
93
  ## Configuration
@@ -112,8 +114,6 @@ Run `dropsonde --help` to see usage information.
112
114
  * `preview`
113
115
  * Generate and print out an example telemetry report in human readable form
114
116
  * Annotated with descriptions of each plugin and each metric gathered.
115
- * `schema`
116
- * Generate and print out the complete combined schema.
117
117
  * `list`
118
118
  * See a quick list of the available metrics and what they do.
119
119
  * `submit`
@@ -123,6 +123,17 @@ Run `dropsonde --help` to see usage information.
123
123
  * Once a week, the list of public modules on the Forge will be updated. This
124
124
  command will manually force that cache update to happen.
125
125
 
126
+ Developer comands
127
+
128
+ * `dev example`
129
+ * To make writing aggregation queries possible without access to the private
130
+ database, this will generate a randomized example of the dataset. This is
131
+ in JSONL format, so it can be imported directly into BigQuery.
132
+ * `dev schema`
133
+ * Generate and print out the complete combined schema of all metrics.
134
+ * `dev shell`
135
+ * Open up a Pry shell with all the relevant connections open and initialized.
136
+
126
137
 
127
138
  ## Architecture
128
139
 
@@ -1,15 +1,12 @@
1
1
  #!/usr/bin/env ruby
2
2
  require 'gli'
3
3
  require 'dropsonde'
4
- require 'puppet'
5
4
 
6
5
  class Dropsonde
7
6
  extend GLI::App
8
7
 
9
- Puppet.initialize_settings
10
-
11
8
  program_desc 'A simple telemetry tool for Puppet infrastructures'
12
- config_file '/etc/puppetlabs/telemetry.yaml'
9
+ config_file "#{File.dirname(Puppet.settings[:confdir])}/telemetry.yaml"
13
10
  version Dropsonde::VERSION
14
11
 
15
12
  desc 'Verbose logging'
@@ -42,13 +39,6 @@ class Dropsonde
42
39
  end
43
40
  end
44
41
 
45
- desc 'Generate a complete schema set'
46
- command :schema do |c|
47
- c.action do |global, options, args|
48
- Dropsonde.generate_schema
49
- end
50
- end
51
-
52
42
  desc 'List all available metrics'
53
43
  command :list do |c|
54
44
  c.action do |global, options, args|
@@ -56,7 +46,7 @@ class Dropsonde
56
46
  end
57
47
  end
58
48
 
59
- desc 'Generate an example telemetry report'
49
+ desc 'Preview the telemetry report that will be submitted'
60
50
  command :preview do |c|
61
51
  c.desc 'The output format to use'
62
52
  c.flag [:format], :default_value => 'human'
@@ -80,6 +70,50 @@ class Dropsonde
80
70
  Dropsonde.submit_report(options[:endpoint], options[:port])
81
71
  end
82
72
  end
73
+
74
+ desc "Commands useful for developers"
75
+ command :dev do |t|
76
+ t.desc 'Open a Pry shell for debugging'
77
+ t.command :shell do |c|
78
+ c.action do |global, options, args|
79
+ require 'pry'
80
+ binding.pry
81
+ end
82
+ end
83
+
84
+ t.desc 'Generate a complete schema for all metrics'
85
+ t.long_desc "This generates the schema that is used to create or update the BigQuery
86
+ database. Every report is also validated against this schema before
87
+ submission, so you can be assured that this is a complete representation
88
+ of what data is collected and run through aggregation filters."
89
+ t.command :schema do |c|
90
+ c.action do |global, options, args|
91
+ Dropsonde.generate_schema
92
+ end
93
+ end
94
+
95
+ t.desc 'Generate an example of random data to simulate actual reports'
96
+ t.long_desc "The submitted telemetry reports are treated as sensitive material. Very
97
+ few people have access to that raw data. Instead, it's run through some
98
+ data aggregation filters to generate the published statistics we share.
99
+ Writing those aggregation queries is difficult without data to work with,
100
+ so this command generates a representative example of random data.
101
+
102
+ This is in jsonl format for direct upload to BigQuery."
103
+ t.command :example do |c|
104
+ c.desc 'How many rows to generate'
105
+ c.flag [:size], :default_value => 100
106
+
107
+ c.desc 'Filename for the output (in jsonl format).'
108
+ c.flag [:filename], :default_value => 'example.jsonl'
109
+
110
+ c.action do |global, options, args|
111
+ Dropsonde::Cache.autoupdate
112
+ Dropsonde.generate_example(options[:size], options[:filename])
113
+ end
114
+ end
115
+ end
116
+
83
117
  end
84
118
 
85
119
  exit Dropsonde.run(ARGV)
@@ -2,6 +2,7 @@ require 'json'
2
2
  require 'httpclient'
3
3
  require 'puppetdb'
4
4
  require 'inifile'
5
+ require 'puppet'
5
6
 
6
7
  class Dropsonde
7
8
  require 'dropsonde/cache'
@@ -9,6 +10,8 @@ class Dropsonde
9
10
  require 'dropsonde/monkeypatches'
10
11
  require 'dropsonde/version'
11
12
 
13
+ Puppet.initialize_settings
14
+
12
15
  @@pdbclient = nil
13
16
  @@settings = {}
14
17
  def self.settings=(arg)
@@ -63,6 +66,16 @@ class Dropsonde
63
66
  end
64
67
  end
65
68
 
69
+ def self.generate_example(size, filename)
70
+ metrics = Dropsonde::Metrics.new
71
+ File.open(filename, 'w') do |file|
72
+ for i in 0...size
73
+ file.write(metrics.example.to_json)
74
+ file.write("\n")
75
+ end
76
+ end
77
+ end
78
+
66
79
  def self.puppetDB
67
80
  return @@pdbclient if @@pdbclient
68
81
 
@@ -43,20 +43,22 @@ class Dropsonde::Cache
43
43
  end
44
44
 
45
45
  def self.update
46
+ puts "Updating module cache..."
46
47
  iter = PuppetForge::Module.all(:sort_by => 'latest_release')
47
48
  newest = DateTime.parse(@@cache['timestamp'])
48
49
 
49
- @@cache['timestamp'] = iter.first.created_at
50
+ @@cache['timestamp'] = iter.first.updated_at
50
51
 
51
52
  until iter.next.nil?
52
53
  # stop once we reach modules we've already cached
53
- break if DateTime.parse(iter.first.created_at) <= newest
54
+ break if DateTime.parse(iter.first.updated_at) <= newest
54
55
 
55
56
  @@cache['modules'].concat iter.map {|mod| mod.slug }
56
57
 
57
58
  iter = iter.next
58
59
  print '.'
59
60
  end
61
+ puts
60
62
  @@cache['modules'].sort!
61
63
  @@cache['modules'].uniq!
62
64
 
@@ -66,7 +68,12 @@ class Dropsonde::Cache
66
68
  def self.autoupdate
67
69
  return unless @@autoupdate
68
70
 
69
- update unless File.file? @@path
71
+ unless File.file? @@path
72
+ puts "Dropsonde caches a list of all Forge modules to ensure that it only reports"
73
+ puts "usage data on public modules. Generating this cache may take some time on"
74
+ puts "the first run and you'll see your screen fill up with dots."
75
+ update
76
+ end
70
77
 
71
78
  if (Date.today - File.mtime(@@path).to_date).to_i > @@ttl
72
79
  update
@@ -78,7 +78,7 @@ class Dropsonde::Metrics
78
78
  snapshots = {}
79
79
  Dropsonde::Metrics.plugins.each do |name, plugin|
80
80
  plugin.setup
81
- sanity_check_data(plugin).each do |row|
81
+ sanity_check_data(plugin, plugin.run).each do |row|
82
82
  snapshots[row.keys.first] = {
83
83
  'value' => row.values.first,
84
84
  'timestamp' => Time.now.iso8601,
@@ -92,8 +92,26 @@ class Dropsonde::Metrics
92
92
  results
93
93
  end
94
94
 
95
- def sanity_check_data(plugin)
96
- data = plugin.run
95
+ def example
96
+ require 'ipaddr'
97
+ results = skeleton_report
98
+ results[:message_id] = generate_guid
99
+ results[:timestamp] = rand((Time.now - 60 * 60 * 24 * 365)..Time.now).utc
100
+ results[:ip] = IPAddr.new(rand(2**32), Socket::AF_INET)
101
+ results.delete(:'self-service-analytics')
102
+
103
+ Dropsonde::Metrics.plugins.each do |name, plugin|
104
+ sanity_check_data(plugin, plugin.example).each do |row|
105
+ results.merge!(row)
106
+ end
107
+ end
108
+
109
+ results
110
+ end
111
+
112
+ # We accept both the plugin and data gathered from the plugin so that
113
+ # we can sanitize both data and example data
114
+ def sanity_check_data(plugin, data)
97
115
  keys_data = data.map {|item| item.keys }.flatten.map(&:to_s)
98
116
  keys_schema = plugin.schema.map {|item| item[:name] }
99
117
 
@@ -181,4 +199,14 @@ class Dropsonde::Metrics
181
199
  }
182
200
  }
183
201
  end
202
+
203
+ def generate_guid
204
+ "%s-%s-%s-%s-%s" % [
205
+ (0..8).to_a.map{|a| rand(16).to_s(16)}.join,
206
+ (0..4).to_a.map{|a| rand(16).to_s(16)}.join,
207
+ (0..4).to_a.map{|a| rand(16).to_s(16)}.join,
208
+ (0..4).to_a.map{|a| rand(16).to_s(16)}.join,
209
+ (0..12).to_a.map{|a| rand(16).to_s(16)}.join
210
+ ]
211
+ end
184
212
  end
@@ -64,6 +64,22 @@ class Dropsonde::Metrics::Dependencies
64
64
 
65
65
  end
66
66
 
67
+ def self.example
68
+ # this method is used to generate a table filled with randomized data to
69
+ # make it easier to write data aggregation queries without access to the
70
+ # actual private data that users have submitted.
71
+
72
+ versions = ['>= 1.5.2', '>= 4.3.2', '>= 3.0.0 < 4.0.0', '>= 2.2.1 < 5.0.0', '>= 5.0.0 < 7.0.0', '>= 4.11.0']
73
+ [
74
+ :dependencies => Dropsonde::Cache.modules
75
+ .sample(rand(250))
76
+ .map {|item| {
77
+ :name => item,
78
+ :version_requirement => versions.sample,
79
+ }},
80
+ ]
81
+ end
82
+
67
83
  def self.cleanup
68
84
  # run just after generating this metric
69
85
  end
@@ -108,6 +108,30 @@ class Dropsonde::Metrics::Modules
108
108
  ]
109
109
  end
110
110
 
111
+ def self.example
112
+ # this method is used to generate a table filled with randomized data to
113
+ # make it easier to write data aggregation queries without access to the
114
+ # actual private data that users have submitted.
115
+
116
+ versions = ['1.3.2', '0.0.1', '0.1.2', '1.0.0', '3.0.2', '7.10', '6.1.0', '2.1.0', '1.4.0']
117
+ classes = ['', '::Config', '::Service', '::Server', '::Client', '::Packages']
118
+ [
119
+ :modules => Dropsonde::Cache.modules
120
+ .sample(rand(100))
121
+ .map {|item| {
122
+ :name => item.split('-').last,
123
+ :slug => item,
124
+ :version => versions.sample,
125
+ }},
126
+ :classes => Dropsonde::Cache.modules
127
+ .sample(rand(500))
128
+ .map {|item| {
129
+ :name => item.split('-').last.capitalize + classes.sample,
130
+ :count => rand(750),
131
+ }},
132
+ ]
133
+ end
134
+
111
135
  def self.cleanup
112
136
  # run just after generating this metric
113
137
  end
@@ -70,6 +70,22 @@ class Dropsonde::Metrics::Puppetfiles
70
70
  ]
71
71
  end
72
72
 
73
+ def self.example
74
+ # this method is used to generate a table filled with randomized data to
75
+ # make it easier to write data aggregation queries without access to the
76
+ # actual private data that users have submitted.
77
+ [
78
+ :puppetfile_ruby_methods => [
79
+ {:name => 'require', :count => rand(200)},
80
+ {:name => 'each', :count => rand(200)},
81
+ {:name => 'puts', :count => rand(200)},
82
+ {:name => 'select', :count => rand(200)},
83
+ {:name => 'reject', :count => rand(200)},
84
+ {:name => 'read', :count => rand(200)},
85
+ ].shuffle,
86
+ ]
87
+ end
88
+
73
89
  def self.cleanup
74
90
  # run just after generating this metric
75
91
  end
@@ -1,3 +1,3 @@
1
1
  class Dropsonde
2
- VERSION = '0.0.2'
2
+ VERSION = '0.0.3'
3
3
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: dropsonde
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.0.2
4
+ version: 0.0.3
5
5
  platform: ruby
6
6
  authors:
7
7
  - Ben Ford
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2020-05-07 00:00:00.000000000 Z
11
+ date: 2020-05-24 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: json