sensu-plugins-influxdb-metrics-checker 0.3.4 → 0.4.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 86fe26cfb281849f4b73356b47011b08bf1a7960
4
- data.tar.gz: dca6df28cc658d7faff6274090ecb5b050b5654a
3
+ metadata.gz: ed89e4ad1b835c997279a3afc967efc91e6579fa
4
+ data.tar.gz: 5813fe0fa863a0e278620862bad7aeeae75651be
5
5
  SHA512:
6
- metadata.gz: 861427b6a1b16058eb9bfc51c0186eed75d765942325762932933ec8d67efb81f0192c2804be9c4c443acd73545ce8a134ae6f5e96de219504a23ebda7b32524
7
- data.tar.gz: 74d9af1f5b6de576c991bcece54f24041ca43d2b7128c60e7c5f7f825b4ad90eb6e1dfb49e83f711b3130c95d6c799328b35c694aab67cd7319593a2e82d1f3e
6
+ metadata.gz: b8638d8b0b9a343b0233a5e39e2425cb5c238c58a97124d81452dc0da5478ae7b3886393b507b4be6a8ac4aa8680fe6faf02e41bd03e955fb96f6724dbb6e139
7
+ data.tar.gz: 0cacd9d9ee2257904253403682edb271ed8f7431ca6b4bdeb0d17d3d847baae81c682e57647f9d0b616eb89a60de9d1fc2382becea3df2c96ec5bd710486d9b4
data/CHANGELOG.md CHANGED
@@ -3,6 +3,11 @@ This project adheres to [Semantic Versioning](http://semver.org/).
3
3
 
4
4
  This CHANGELOG follows the format listed at [Keep A Changelog](http://keepachangelog.com/)
5
5
 
6
+ # [0.4.0] - 2017-01-20
7
+ - seventh release
8
+ New feature: Triangulation. Added the ability to get percentage of metric A, get percentage of metric B, and compare the distance between them. Useful when the metrics are related together by some business rule.
9
+ Improved feedback when returning to customer.
10
+
6
11
  # [0.3.4] - 2017-01-17
7
12
  - sixth release
8
13
  Allow the usage of regex expressions that we can identify as "/^[your_regex]$". I'll strongly recommend to use this only for exceptions, and always aim for zero-exceptions, or it wouldn't be accurate. At the moment it will fire when the number of exception today is bigger than the number of exceptions yesterday.
data/README.md CHANGED
@@ -11,10 +11,10 @@ We chose to do it as a Sensu plugin because it comes with Handlers that will all
11
11
  The result is that now we are able to experiment with our metrics and alerts, giving us a better understanding of whats going on in our systems.
12
12
 
13
13
  ## What it does
14
- The script will compare the values of yesterday at this time minus 10 minutes, with the values of today at this time minus 10 minus.
14
+ The script will compare the values of yesterday at this time minus 25 minutes, with the values of today at this time minus 25 minus.
15
15
  It will calculate the percentage of difference and will act on that.
16
16
  You will be able to set a threshold of warning and critical values where your program will act.
17
- It will also leave it 5 minutes to aggregate the data in influxdb, so we are more precise.
17
+ It will also leave it 10 minutes to aggregate the data in influxdb, so we are more precise.
18
18
 
19
19
  ## Components
20
20
  There is just one script that you can find at
@@ -22,8 +22,7 @@ There is just one script that you can find at
22
22
 
23
23
  ## Getting started
24
24
 
25
- At the moment there is just one script
26
- **check-influxdb-metrics** which you can run in a bash doing:
25
+ Once we go to **check-influxdb-metrics** you can run it in a bash doing:
27
26
 
28
27
  ```
29
28
  ruby check-influxdb-metrics.rb --host=metrics-influxdb.internal.com --port=8086 --user=admin --password=password -c -3 -w -10 --db=statsd_metrics --metric=api.request.counter
@@ -48,9 +47,13 @@ ruby check-influxdb-metrics.rb --host=metrics-influxdb.internal.com --port=8086
48
47
 
49
48
  ## Advanced Queries
50
49
 
51
- You can use Regex in your metrics. The spirit behind this feature is to gather information about exceptions only, always aiming for a zero exception policy. So I'll advise against using it for other purposes.
50
+ **Regex**
51
+
52
+ You can use it in your metrics. The spirit behind this feature is to gather information about exceptions only, beware that this could gather all your metrics inside your influxdb cluster, which may produce some unintended pain, so always aiming for querying exceptions, and ideally a zero exception policy.
53
+ I'll strongly advise against using it for other purposes.
52
54
 
53
55
  **How it works**
56
+
54
57
  1. It will understand that is a regex only when the metric name contains '/'. In the bash you'll need to include your metric inside double quotes.
55
58
  2. It will compare the number of metrics gathered today vs the number of metrics gathered yesterday.
56
59
  3. If today we read more than yesterday, it will blow up as **Critical**.
@@ -61,6 +64,38 @@ ruby check-influxdb-metrics.rb --host=metrics-influxdb.internal.com --port=8086
61
64
 
62
65
  ```
63
66
 
67
+ **Triangulation**
68
+
69
+ In trigonometry and geometry, triangulation is the process of determining the location of a point by forming triangles to it from known points. This feature of the script is inspired in that idea.
70
+
71
+
72
+ [![triangulation_01.png](https://s24.postimg.org/kjihvilvp/triangulation_01.png (2KB))](https://postimg.org/image/hcnybw1fl/)
73
+
74
+ Once we have a given metric A (ex: messages.sent), we'll normally compare that to yesterday's weather A', we'll get the percentage of difference and according to our threshold we'll fire an alert. Cool. Now let's go one step further.
75
+ We may have a metric B (ex: sessions.generated), that has a business dependency on B. And if we dig further in our metrics, we may even relate that to, let's say, an average of 5 metrics A for each metric B. (In this example, you'll need 5 messages sent to build a session).
76
+
77
+ If we could say that every 5 As relates to 1 B, then the % of difference for A and B will always be the same. Realistically, it's not always like that in production applications, sometimes you may need 7 messages, others only 4, so your average would be something around 5.333. Therefore, we can't say that the % in difference will always be the same, but once we look at the *distance* between these percentages, we'll see that they are pretty close. And that's the spirit of this feature, the ability to diagnose when the distance is higher than expected.
78
+
79
+ Let's say that the system that sends items has an increase of 150%, and you are using this tool to verify that, therefore you don't get any exceptions because there is no drop in the metrics, but the system that process sessions keeps in the same 2% increase, which is a big distance up to 148. We clearly have a problem here. Maybe some bottleneck is happening somewhere, maybe some messages are lost due to this huge increase, and hopefully this feature will allow you to identify that something fussy is going on.
80
+
81
+ **How it works**
82
+
83
+ This query will get the distance between "messages.counter" % vs "sessions.generated" %. By default it's set to fire an alert if that turns out to be bigger than 2.
84
+
85
+ ```
86
+ ruby check-influxdb-metrics.rb --host=metrics-influxdb.internal.com --port=8086 --user=admin --password=password -c -30 -w -10 --db=statsd_metrics --metric=messages.counter --triangulate=sessions.generated
87
+ ```
88
+ If you want to increase the distance, you will just need --distance.
89
+
90
+ ```
91
+ ruby check-influxdb-metrics.rb --host=metrics-influxdb.internal.com --port=8086 --user=admin --password=password -c -30 -w -10 --db=statsd_metrics --metric=messages.counter --triangulate=sessions.generated --distance=10
92
+ ```
93
+
94
+ If you want to apply some tags and filters you can do it as you'll do normally, just bear in mind that by default they will not apply to both metrics, only tot he first one. If you want to apply them to the second metric you'll just need to add --applyfilterbothqueries=yes
95
+
96
+ ```
97
+ ruby check-influxdb-metrics.rb --host=metrics-influxdb.internal.com --port=8086 --user=admin --password=password -c -30 -w -10 --db=statsd_metrics --metric=messages.counter --tag=datacenter --filter=pro-westeurope --triangulate=sessions.generated --distance=10 --applyfilterbothqueries=yes
98
+ ```
64
99
 
65
100
  ## Lessons learnt
66
101
  The InfluxDb query language that we used is not the latest, you can find it here:
@@ -72,6 +107,11 @@ What matters for this program is that:
72
107
  ```
73
108
  -24h will turn into bad request
74
109
  - 24h good
110
+
111
+ - session.certified will turn into a bad request
112
+ - "session.certified" is good. Notice that when you use grafana or influx db console you don't need the quotes
113
+ but when you query through the script you'll need them.
114
+ When using regex both with and without quotes will work, because what matters is `/^[metric]$/`
75
115
  ```
76
116
 
77
117
  **When passing parameters**
@@ -83,22 +83,39 @@ class CheckInfluxDbMetrics < Sensu::Plugin::Check::CLI
83
83
  long: '--period=VALUE',
84
84
  description: 'Filter by a given day period in minutes',
85
85
  proc: proc { |l| l.to_i },
86
- default: 10
86
+ default: 25
87
87
 
88
- def filter_by_environment_when_needed
89
- config[:tag].nil? && config[:filter].nil? ? '' : " AND \"#{config[:tag]}\" =~ /#{config[:filter]}/"
88
+ option :triangulate,
89
+ long: '--triangulate=VALUE',
90
+ description: 'Triangulate with this metric'
91
+
92
+ option :applyfilterbothqueries,
93
+ long: '--applyfilterbothqueries=VALUE',
94
+ description: 'Set if you want to apply tag and filter also for the query that you are about to triangulate with'
95
+
96
+ option :distance,
97
+ long: '--distance=VALUE',
98
+ description: 'Set the distance threshold to alert in case of triangulation',
99
+ default: 2
100
+
101
+ BASE_QUERY = 'SELECT sum("value") from '.freeze
102
+ TODAY_START_PERIOD = 10
103
+ YESTERDAY_START_PERIOD = 1455 # starts counting 1455 minutes before now() [ yesetrday - 10 minutes] to match with today_query_for_a_period start_period
104
+
105
+ def yesterday_end_period
106
+ config[:period] + YESTERDAY_START_PERIOD
90
107
  end
91
108
 
92
- def base_query
93
- 'SELECT sum("value") from '
109
+ def today_end_period
110
+ config[:period] + TODAY_START_PERIOD
94
111
  end
95
112
 
96
- def base_query_with_metricname
97
- base_query + clean_quotes_when_regex
113
+ def base_query_with_metricname(metric)
114
+ BASE_QUERY + clean_quotes_when_regex(metric)
98
115
  end
99
116
 
100
- def clean_quotes_when_regex
101
- metric = " \"#{config[:metric]}\""
117
+ def clean_quotes_when_regex(metric_to_clean)
118
+ metric = ' "' + metric_to_clean + '"'
102
119
  clean_metric = ''
103
120
  if metric.include?('/')
104
121
  clean_metric = metric.tr '\"', ''
@@ -110,23 +127,36 @@ class CheckInfluxDbMetrics < Sensu::Plugin::Check::CLI
110
127
  clean_metric
111
128
  end
112
129
 
113
- def today_query_for_a_period
114
- start_period = 5; # starts counting 5 minutes before now() to let influxdb time to aggregate the data
115
- end_period = config[:period] + 5; # adds 5 minutes to match with start_period
116
- query = query_for_a_period(start_period, end_period)
117
- query + filter_by_environment_when_needed
130
+ def filter_by_environment_when_needed
131
+ config[:tag].nil? && config[:filter].nil? ? '' : " AND \"#{config[:tag]}\" =~ /#{config[:filter]}/"
132
+ end
133
+
134
+ def filter_for_triangulate_when_needed
135
+ config[:applyfilterbothqueries].nil? ? '' : " AND \"#{config[:tag]}\" =~ /#{config[:filter]}/"
136
+ end
137
+
138
+ def query_for_a_period(metric, start_period, end_period, istriangulated)
139
+ query = base_query_with_metricname(metric) + ' WHERE time > now() - ' + end_period.to_s + 'm AND time < now() - ' + start_period.to_s + 'm'
140
+ query + add_filter_when_needed(istriangulated)
141
+ end
142
+
143
+ def add_filter_when_needed(istriangulated)
144
+ if istriangulated == true
145
+ filter_for_triangulate_when_needed
146
+ else
147
+ filter_by_environment_when_needed
148
+ end
118
149
  end
119
150
 
120
- def yesterday_query_for_a_period
121
- start_period = 1445; # starts counting 1445 minutes before now() [ yesetrday - 5 minutes] to match with today_query_for_a_period start_period
122
- end_period = config[:period] + 1445; # adds 1445 minutes to match with start_period
123
- query = query_for_a_period(start_period, end_period)
124
- query + filter_by_environment_when_needed
151
+ def query_encoded_for_a_period(metric, start_period, end_period, istriangulated)
152
+ query = query_for_a_period(metric, start_period, end_period, istriangulated)
153
+ encode_parameters(query)
125
154
  end
126
155
 
127
- def query_for_a_period(start_period, end_period)
128
- query = base_query_with_metricname + ' WHERE time > now() - ' + end_period.to_s + 'm AND time < now() - ' + start_period.to_s + 'm'
129
- query + filter_by_environment_when_needed
156
+ def metrics(metric, start_period, end_period, istriangulated)
157
+ query = query_encoded_for_a_period(metric, start_period, end_period, istriangulated)
158
+ response = request(query)
159
+ parse_json(response)
130
160
  end
131
161
 
132
162
  def encode_parameters(parameters)
@@ -141,46 +171,48 @@ class CheckInfluxDbMetrics < Sensu::Plugin::Check::CLI
141
171
  "#{config[:db]}&q=" + encode_for_regex
142
172
  end
143
173
 
144
- def yesterday_query_encoded
145
- query = yesterday_query_for_a_period
146
- encode_parameters(query)
147
- end
148
-
149
- def today_query_encoded
150
- query = today_query_for_a_period
151
- encode_parameters(query)
152
- end
153
-
154
- def today_value
155
- response = request(today_query_encoded)
156
- metrics = parse_json(response)
157
- @today_metric_count = validate_metrics_and_count(metrics)
174
+ def today_metrics
175
+ today_info = metrics(config[:metric], TODAY_START_PERIOD, today_end_period, false)
176
+ @today_metric_count = validate_metrics_and_count(today_info)
158
177
  value = if @today_metric_count > 0
159
- series = read_series_from_metrics(metrics)
178
+ series = read_series_from_metrics(today_info)
160
179
  @today_metrics = store_metrics(series)
161
180
  read_value_from_series(series)
162
181
  end
163
182
  value
164
183
  end
165
184
 
166
- def yesterday_value
167
- response = request(yesterday_query_encoded)
168
- metrics = parse_json(response)
169
- @yesterday_metric_count = validate_metrics_and_count(metrics)
170
- value = if @today_metric_count > 0
171
- series = read_series_from_metrics(metrics)
185
+ def yesterday_metrics
186
+ yesterday_info = metrics(config[:metric], YESTERDAY_START_PERIOD, yesterday_end_period, false)
187
+ @yesterday_metric_count = validate_metrics_and_count(yesterday_info)
188
+ value = if @yesterday_metric_count > 0
189
+ series = read_series_from_metrics(yesterday_info)
172
190
  @yesterday_metrics = store_metrics(series)
173
191
  read_value_from_series(series)
174
192
  end
175
193
  value
176
194
  end
177
195
 
178
- def metric_bigger_than_zero?(metric)
179
- metric > 0
196
+ def today_triangulated_metrics
197
+ today_triangulated_info = metrics(config[:triangulate], TODAY_START_PERIOD, today_end_period, true)
198
+ @today_triangulated_metric_count = validate_metrics_and_count(today_triangulated_info)
199
+ value = if @today_triangulated_metric_count > 0
200
+ series = read_series_from_metrics(today_triangulated_info)
201
+ @today_triangulated_metrics = store_metrics(series)
202
+ read_value_from_series(series)
203
+ end
204
+ value
180
205
  end
181
206
 
182
- def using_regex?(using_regex)
183
- using_regex == true
207
+ def yesterday_triangulated_metrics
208
+ yesterday_triangulated_info = metrics(config[:triangulate], YESTERDAY_START_PERIOD, yesterday_end_period, true)
209
+ @yesterday_triangulated_metric_count = validate_metrics_and_count(yesterday_triangulated_info)
210
+ value = if @yesterday_triangulated_metric_count > 0
211
+ series = read_series_from_metrics(yesterday_triangulated_info)
212
+ @yesterday_triangulated_metrics = store_metrics(series)
213
+ read_value_from_series(series)
214
+ end
215
+ value
184
216
  end
185
217
 
186
218
  def read_series_from_metrics(metrics)
@@ -234,14 +266,19 @@ class CheckInfluxDbMetrics < Sensu::Plugin::Check::CLI
234
266
  ok 'no metrics found'
235
267
  elsif @today_metric_count > @yesterday_metric_count
236
268
  display_metrics
237
- critical "For \"#{config[:metric]}\" more metrics were tracked today than yesterday. Check them out above"
269
+ critical 'For ' + config[:metric] + ' more metrics tracked today (' + @today_metric_count + ') than yesterday (' + @yesterday_metric_count + ') See above'
238
270
  elsif @today_metric_count == @yesterday_metric_count
239
271
  compare_each_metric_in_regex
240
272
  else
241
- ok 'regex seems ok ' + @today_metric_count.to_s + ' metrics found today vs ' + @yesterday_metric_count.to_s + ' metrics found yesterday'
273
+ ok 'regex seems ok! Today metrics dropped. Yesterday (' + @yesterday_metric_count.to_s + ') vs (' + @today_metric_count.to_s + ') found today.'
242
274
  end
243
275
  end
244
276
 
277
+ def difference_for_standard_queries(today, yesterday)
278
+ difference = difference_between_two_metrics(today, yesterday)
279
+ evaluate_percentage_and_notify(difference)
280
+ end
281
+
245
282
  def compare_each_metric_in_regex
246
283
  @today_metrics.each do |today_key, today_value|
247
284
  @yesterday_metrics.each do |yesterday_key, yesterday_value|
@@ -271,7 +308,7 @@ class CheckInfluxDbMetrics < Sensu::Plugin::Check::CLI
271
308
  end
272
309
 
273
310
  def evaluate_percentage_and_notify(difference)
274
- puts 'Difference of: ' + difference.round(4).to_s + ' % for a period of ' + config[:period].to_s + 'm'
311
+ puts 'Difference of: ' + difference.round(3).to_s + ' % for a period of ' + config[:period].to_s + 'm'
275
312
  if difference < config[:crit]
276
313
  critical "\"#{config[:metric]}\" difference is below allowed minimum of #{config[:crit]} %"
277
314
  elsif difference < config[:warn]
@@ -281,33 +318,84 @@ class CheckInfluxDbMetrics < Sensu::Plugin::Check::CLI
281
318
  end
282
319
  end
283
320
 
321
+ def evaluate_distance_and_notify(distance)
322
+ if distance > config[:distance].to_f
323
+ critical config[:metric] + ' vs ' + config[:triangulate] + ' distance is greater than allowed minimum of ' + config[:distance]
324
+ else
325
+ ok 'distance ok'
326
+ end
327
+ end
328
+
284
329
  def calculate_difference_and_display_result(today, yesterday)
285
- difference = if @is_using_regex
286
- difference_for_regex_and_notify
287
- else
288
- difference_for_standard_queries(today, yesterday)
289
- end
290
- difference
330
+ if @is_using_regex
331
+ difference_for_regex_and_notify
332
+ else
333
+ difference_between_two_metrics(today, yesterday)
334
+ end
291
335
  end
292
336
 
293
- def difference_for_standard_queries(today, yesterday)
294
- difference = difference_between_two_metrics(today, yesterday)
295
- evaluate_percentage_and_notify(difference)
337
+ def difference_between_percentages_of_two_metrics
338
+ validate_base_metrics
339
+ validate_triangulated_metrics
340
+ base = difference_between_two_metrics(today_metrics, yesterday_metrics)
341
+ triangulated = difference_between_two_metrics(today_triangulated_metrics, yesterday_triangulated_metrics)
342
+ puts 'difference for ' + config[:metric] + ' ' + base.round(3).to_s + '% vs ' + config[:triangulate] + ' ' + triangulated.round(3).to_s + '%'
343
+ distance = distance_between_two_numbers(base, triangulated)
344
+ evaluate_distance_and_notify(distance)
345
+ end
346
+
347
+ def distance_between_two_numbers(a, b)
348
+ (a - b).abs
349
+ end
350
+
351
+ def validate_triangulated_metrics
352
+ today = today_triangulated_metrics
353
+ yesterday = yesterday_triangulated_metrics
354
+ if today.nil? && yesterday.nil?
355
+ puts 'No metrics found to triangulate'
356
+ exit
357
+ else
358
+ 0
359
+ end
360
+ end
361
+
362
+ def validate_base_metrics
363
+ today = today_metrics
364
+ yesterday = yesterday_metrics
365
+ if today.nil? && yesterday.nil?
366
+ puts 'No metrics found in base to triangulate'
367
+ exit
368
+ else
369
+ 0
370
+ end
296
371
  end
297
372
 
298
- def difference_in_metrics
299
- today = today_value
300
- yesterday = yesterday_value
373
+ def difference_between_metrics
374
+ today = today_metrics
375
+ yesterday = yesterday_metrics
301
376
  if today.nil? && yesterday.nil?
302
377
  puts 'No results coming from InfluxDB either for Today nor Yesterday. Please check your query or try again'
303
378
  else
304
- calculate_difference_and_display_result(today, yesterday)
379
+ difference = calculate_difference_and_display_result(today, yesterday)
380
+ evaluate_percentage_and_notify(difference)
305
381
  end
306
382
  exit
307
383
  end
308
384
 
385
+ def triangulation?
386
+ config[:triangulate].nil?
387
+ end
388
+
389
+ def check_metrics_in_influxdb
390
+ if triangulation?
391
+ difference_between_metrics
392
+ else
393
+ difference_between_percentages_of_two_metrics
394
+ end
395
+ end
396
+
309
397
  def run
310
- difference_in_metrics
398
+ check_metrics_in_influxdb
311
399
 
312
400
  rescue Errno::ECONNREFUSED => e
313
401
  critical 'InfluxDB is not responding' + e.message
@@ -1,8 +1,8 @@
1
1
  module SensuPluginsInfluxDbMetricsChecker
2
2
  module Version
3
3
  MAJOR = 0
4
- MINOR = 3
5
- PATCH = 4
4
+ MINOR = 4
5
+ PATCH = 0
6
6
 
7
7
  VER_STRING = [MAJOR, MINOR, PATCH].compact.join('.')
8
8
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: sensu-plugins-influxdb-metrics-checker
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.3.4
4
+ version: 0.4.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Juanjo Guerrero Cerezuela
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2017-01-17 00:00:00.000000000 Z
11
+ date: 2017-01-20 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: sensu-plugin