sensu-plugins-influxdb-metrics-checker 0.3.4 → 0.4.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/CHANGELOG.md +5 -0
- data/README.md +45 -5
- data/bin/check-influxdb-metrics.rb +152 -64
- data/lib/sensu-plugins-influxdb-metrics-checker/version.rb +2 -2
- metadata +2 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: ed89e4ad1b835c997279a3afc967efc91e6579fa
|
4
|
+
data.tar.gz: 5813fe0fa863a0e278620862bad7aeeae75651be
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: b8638d8b0b9a343b0233a5e39e2425cb5c238c58a97124d81452dc0da5478ae7b3886393b507b4be6a8ac4aa8680fe6faf02e41bd03e955fb96f6724dbb6e139
|
7
|
+
data.tar.gz: 0cacd9d9ee2257904253403682edb271ed8f7431ca6b4bdeb0d17d3d847baae81c682e57647f9d0b616eb89a60de9d1fc2382becea3df2c96ec5bd710486d9b4
|
data/CHANGELOG.md
CHANGED
@@ -3,6 +3,11 @@ This project adheres to [Semantic Versioning](http://semver.org/).
|
|
3
3
|
|
4
4
|
This CHANGELOG follows the format listed at [Keep A Changelog](http://keepachangelog.com/)
|
5
5
|
|
6
|
+
# [0.4.0] - 2017-01-20
|
7
|
+
- seventh release
|
8
|
+
New feature: Triangulation. Added the ability to get percentage of metric A, get percentage of metric B, and compare the distance between them. Useful when the metrics are related together by some business rule.
|
9
|
+
Improved feedback when returning to customer.
|
10
|
+
|
6
11
|
# [0.3.4] - 2017-01-17
|
7
12
|
- sixth release
|
8
13
|
Allow the usage of regex expressions that we can identify as "/^[your_regex]$". I'll strongly recommend to use this only for exceptions, and always aim for zero-exceptions, or it wouldn't be accurate. At the moment it will fire when the number of exception today is bigger than the number of exceptions yesterday.
|
data/README.md
CHANGED
@@ -11,10 +11,10 @@ We chose to do it as a Sensu plugin because it comes with Handlers that will all
|
|
11
11
|
The result is that now we are able to experiment with our metrics and alerts, giving us a better understanding of whats going on in our systems.
|
12
12
|
|
13
13
|
## What it does
|
14
|
-
The script will compare the values of yesterday at this time minus
|
14
|
+
The script will compare the values of yesterday at this time minus 25 minutes, with the values of today at this time minus 25 minus.
|
15
15
|
It will calculate the percentage of difference and will act on that.
|
16
16
|
You will be able to set a threshold of warning and critical values where your program will act.
|
17
|
-
It will also leave it
|
17
|
+
It will also leave it 10 minutes to aggregate the data in influxdb, so we are more precise.
|
18
18
|
|
19
19
|
## Components
|
20
20
|
There is just one script that you can find at
|
@@ -22,8 +22,7 @@ There is just one script that you can find at
|
|
22
22
|
|
23
23
|
## Getting started
|
24
24
|
|
25
|
-
|
26
|
-
**check-influxdb-metrics** which you can run in a bash doing:
|
25
|
+
Once we go to **check-influxdb-metrics** you can run it in a bash doing:
|
27
26
|
|
28
27
|
```
|
29
28
|
ruby check-influxdb-metrics.rb --host=metrics-influxdb.internal.com --port=8086 --user=admin --password=password -c -3 -w -10 --db=statsd_metrics --metric=api.request.counter
|
@@ -48,9 +47,13 @@ ruby check-influxdb-metrics.rb --host=metrics-influxdb.internal.com --port=8086
|
|
48
47
|
|
49
48
|
## Advanced Queries
|
50
49
|
|
51
|
-
|
50
|
+
**Regex**
|
51
|
+
|
52
|
+
You can use it in your metrics. The spirit behind this feature is to gather information about exceptions only, beware that this could gather all your metrics inside your influxdb cluster, which may produce some unintended pain, so always aiming for querying exceptions, and ideally a zero exception policy.
|
53
|
+
I'll strongly advise against using it for other purposes.
|
52
54
|
|
53
55
|
**How it works**
|
56
|
+
|
54
57
|
1. It will understand that is a regex only when the metric name contains '/'. In the bash you'll need to include your metric inside double quotes.
|
55
58
|
2. It will compare the number of metrics gathered today vs the number of metrics gathered yesterday.
|
56
59
|
3. If today we read more than yesterday, it will blow up as **Critical**.
|
@@ -61,6 +64,38 @@ ruby check-influxdb-metrics.rb --host=metrics-influxdb.internal.com --port=8086
|
|
61
64
|
|
62
65
|
```
|
63
66
|
|
67
|
+
**Triangulation**
|
68
|
+
|
69
|
+
In trigonometry and geometry, triangulation is the process of determining the location of a point by forming triangles to it from known points. This feature of the script is inspired in that idea.
|
70
|
+
|
71
|
+
|
72
|
+
[![triangulation_01.png](https://s24.postimg.org/kjihvilvp/triangulation_01.png (2KB))](https://postimg.org/image/hcnybw1fl/)
|
73
|
+
|
74
|
+
Once we have a given metric A (ex: messages.sent), we'll normally compare that to yesterday's weather A', we'll get the percentage of difference and according to our threshold we'll fire an alert. Cool. Now let's go one step further.
|
75
|
+
We may have a metric B (ex: sessions.generated), that has a business dependency on B. And if we dig further in our metrics, we may even relate that to, let's say, an average of 5 metrics A for each metric B. (In this example, you'll need 5 messages sent to build a session).
|
76
|
+
|
77
|
+
If we could say that every 5 As relates to 1 B, then the % of difference for A and B will always be the same. Realistically, it's not always like that in production applications, sometimes you may need 7 messages, others only 4, so your average would be something around 5.333. Therefore, we can't say that the % in difference will always be the same, but once we look at the *distance* between these percentages, we'll see that they are pretty close. And that's the spirit of this feature, the ability to diagnose when the distance is higher than expected.
|
78
|
+
|
79
|
+
Let's say that the system that sends items has an increase of 150%, and you are using this tool to verify that, therefore you don't get any exceptions because there is no drop in the metrics, but the system that process sessions keeps in the same 2% increase, which is a big distance up to 148. We clearly have a problem here. Maybe some bottleneck is happening somewhere, maybe some messages are lost due to this huge increase, and hopefully this feature will allow you to identify that something fussy is going on.
|
80
|
+
|
81
|
+
**How it works**
|
82
|
+
|
83
|
+
This query will get the distance between "messages.counter" % vs "sessions.generated" %. By default it's set to fire an alert if that turns out to be bigger than 2.
|
84
|
+
|
85
|
+
```
|
86
|
+
ruby check-influxdb-metrics.rb --host=metrics-influxdb.internal.com --port=8086 --user=admin --password=password -c -30 -w -10 --db=statsd_metrics --metric=messages.counter --triangulate=sessions.generated
|
87
|
+
```
|
88
|
+
If you want to increase the distance, you will just need --distance.
|
89
|
+
|
90
|
+
```
|
91
|
+
ruby check-influxdb-metrics.rb --host=metrics-influxdb.internal.com --port=8086 --user=admin --password=password -c -30 -w -10 --db=statsd_metrics --metric=messages.counter --triangulate=sessions.generated --distance=10
|
92
|
+
```
|
93
|
+
|
94
|
+
If you want to apply some tags and filters you can do it as you'll do normally, just bear in mind that by default they will not apply to both metrics, only tot he first one. If you want to apply them to the second metric you'll just need to add --applyfilterbothqueries=yes
|
95
|
+
|
96
|
+
```
|
97
|
+
ruby check-influxdb-metrics.rb --host=metrics-influxdb.internal.com --port=8086 --user=admin --password=password -c -30 -w -10 --db=statsd_metrics --metric=messages.counter --tag=datacenter --filter=pro-westeurope --triangulate=sessions.generated --distance=10 --applyfilterbothqueries=yes
|
98
|
+
```
|
64
99
|
|
65
100
|
## Lessons learnt
|
66
101
|
The InfluxDb query language that we used is not the latest, you can find it here:
|
@@ -72,6 +107,11 @@ What matters for this program is that:
|
|
72
107
|
```
|
73
108
|
-24h will turn into bad request
|
74
109
|
- 24h good
|
110
|
+
|
111
|
+
- session.certified will turn into a bad request
|
112
|
+
- "session.certified" is good. Notice that when you use grafana or influx db console you don't need the quotes
|
113
|
+
but when you query through the script you'll need them.
|
114
|
+
When using regex both with and without quotes will work, because what matters is `/^[metric]$/`
|
75
115
|
```
|
76
116
|
|
77
117
|
**When passing parameters**
|
@@ -83,22 +83,39 @@ class CheckInfluxDbMetrics < Sensu::Plugin::Check::CLI
|
|
83
83
|
long: '--period=VALUE',
|
84
84
|
description: 'Filter by a given day period in minutes',
|
85
85
|
proc: proc { |l| l.to_i },
|
86
|
-
default:
|
86
|
+
default: 25
|
87
87
|
|
88
|
-
|
89
|
-
|
88
|
+
option :triangulate,
|
89
|
+
long: '--triangulate=VALUE',
|
90
|
+
description: 'Triangulate with this metric'
|
91
|
+
|
92
|
+
option :applyfilterbothqueries,
|
93
|
+
long: '--applyfilterbothqueries=VALUE',
|
94
|
+
description: 'Set if you want to apply tag and filter also for the query that you are about to triangulate with'
|
95
|
+
|
96
|
+
option :distance,
|
97
|
+
long: '--distance=VALUE',
|
98
|
+
description: 'Set the distance threshold to alert in case of triangulation',
|
99
|
+
default: 2
|
100
|
+
|
101
|
+
BASE_QUERY = 'SELECT sum("value") from '.freeze
|
102
|
+
TODAY_START_PERIOD = 10
|
103
|
+
YESTERDAY_START_PERIOD = 1455 # starts counting 1455 minutes before now() [ yesetrday - 10 minutes] to match with today_query_for_a_period start_period
|
104
|
+
|
105
|
+
def yesterday_end_period
|
106
|
+
config[:period] + YESTERDAY_START_PERIOD
|
90
107
|
end
|
91
108
|
|
92
|
-
def
|
93
|
-
|
109
|
+
def today_end_period
|
110
|
+
config[:period] + TODAY_START_PERIOD
|
94
111
|
end
|
95
112
|
|
96
|
-
def base_query_with_metricname
|
97
|
-
|
113
|
+
def base_query_with_metricname(metric)
|
114
|
+
BASE_QUERY + clean_quotes_when_regex(metric)
|
98
115
|
end
|
99
116
|
|
100
|
-
def clean_quotes_when_regex
|
101
|
-
metric = "
|
117
|
+
def clean_quotes_when_regex(metric_to_clean)
|
118
|
+
metric = ' "' + metric_to_clean + '"'
|
102
119
|
clean_metric = ''
|
103
120
|
if metric.include?('/')
|
104
121
|
clean_metric = metric.tr '\"', ''
|
@@ -110,23 +127,36 @@ class CheckInfluxDbMetrics < Sensu::Plugin::Check::CLI
|
|
110
127
|
clean_metric
|
111
128
|
end
|
112
129
|
|
113
|
-
def
|
114
|
-
|
115
|
-
|
116
|
-
|
117
|
-
|
130
|
+
def filter_by_environment_when_needed
|
131
|
+
config[:tag].nil? && config[:filter].nil? ? '' : " AND \"#{config[:tag]}\" =~ /#{config[:filter]}/"
|
132
|
+
end
|
133
|
+
|
134
|
+
def filter_for_triangulate_when_needed
|
135
|
+
config[:applyfilterbothqueries].nil? ? '' : " AND \"#{config[:tag]}\" =~ /#{config[:filter]}/"
|
136
|
+
end
|
137
|
+
|
138
|
+
def query_for_a_period(metric, start_period, end_period, istriangulated)
|
139
|
+
query = base_query_with_metricname(metric) + ' WHERE time > now() - ' + end_period.to_s + 'm AND time < now() - ' + start_period.to_s + 'm'
|
140
|
+
query + add_filter_when_needed(istriangulated)
|
141
|
+
end
|
142
|
+
|
143
|
+
def add_filter_when_needed(istriangulated)
|
144
|
+
if istriangulated == true
|
145
|
+
filter_for_triangulate_when_needed
|
146
|
+
else
|
147
|
+
filter_by_environment_when_needed
|
148
|
+
end
|
118
149
|
end
|
119
150
|
|
120
|
-
def
|
121
|
-
|
122
|
-
|
123
|
-
query = query_for_a_period(start_period, end_period)
|
124
|
-
query + filter_by_environment_when_needed
|
151
|
+
def query_encoded_for_a_period(metric, start_period, end_period, istriangulated)
|
152
|
+
query = query_for_a_period(metric, start_period, end_period, istriangulated)
|
153
|
+
encode_parameters(query)
|
125
154
|
end
|
126
155
|
|
127
|
-
def
|
128
|
-
query =
|
129
|
-
|
156
|
+
def metrics(metric, start_period, end_period, istriangulated)
|
157
|
+
query = query_encoded_for_a_period(metric, start_period, end_period, istriangulated)
|
158
|
+
response = request(query)
|
159
|
+
parse_json(response)
|
130
160
|
end
|
131
161
|
|
132
162
|
def encode_parameters(parameters)
|
@@ -141,46 +171,48 @@ class CheckInfluxDbMetrics < Sensu::Plugin::Check::CLI
|
|
141
171
|
"#{config[:db]}&q=" + encode_for_regex
|
142
172
|
end
|
143
173
|
|
144
|
-
def
|
145
|
-
|
146
|
-
|
147
|
-
end
|
148
|
-
|
149
|
-
def today_query_encoded
|
150
|
-
query = today_query_for_a_period
|
151
|
-
encode_parameters(query)
|
152
|
-
end
|
153
|
-
|
154
|
-
def today_value
|
155
|
-
response = request(today_query_encoded)
|
156
|
-
metrics = parse_json(response)
|
157
|
-
@today_metric_count = validate_metrics_and_count(metrics)
|
174
|
+
def today_metrics
|
175
|
+
today_info = metrics(config[:metric], TODAY_START_PERIOD, today_end_period, false)
|
176
|
+
@today_metric_count = validate_metrics_and_count(today_info)
|
158
177
|
value = if @today_metric_count > 0
|
159
|
-
series = read_series_from_metrics(
|
178
|
+
series = read_series_from_metrics(today_info)
|
160
179
|
@today_metrics = store_metrics(series)
|
161
180
|
read_value_from_series(series)
|
162
181
|
end
|
163
182
|
value
|
164
183
|
end
|
165
184
|
|
166
|
-
def
|
167
|
-
|
168
|
-
|
169
|
-
@yesterday_metric_count
|
170
|
-
|
171
|
-
series = read_series_from_metrics(metrics)
|
185
|
+
def yesterday_metrics
|
186
|
+
yesterday_info = metrics(config[:metric], YESTERDAY_START_PERIOD, yesterday_end_period, false)
|
187
|
+
@yesterday_metric_count = validate_metrics_and_count(yesterday_info)
|
188
|
+
value = if @yesterday_metric_count > 0
|
189
|
+
series = read_series_from_metrics(yesterday_info)
|
172
190
|
@yesterday_metrics = store_metrics(series)
|
173
191
|
read_value_from_series(series)
|
174
192
|
end
|
175
193
|
value
|
176
194
|
end
|
177
195
|
|
178
|
-
def
|
179
|
-
|
196
|
+
def today_triangulated_metrics
|
197
|
+
today_triangulated_info = metrics(config[:triangulate], TODAY_START_PERIOD, today_end_period, true)
|
198
|
+
@today_triangulated_metric_count = validate_metrics_and_count(today_triangulated_info)
|
199
|
+
value = if @today_triangulated_metric_count > 0
|
200
|
+
series = read_series_from_metrics(today_triangulated_info)
|
201
|
+
@today_triangulated_metrics = store_metrics(series)
|
202
|
+
read_value_from_series(series)
|
203
|
+
end
|
204
|
+
value
|
180
205
|
end
|
181
206
|
|
182
|
-
def
|
183
|
-
|
207
|
+
def yesterday_triangulated_metrics
|
208
|
+
yesterday_triangulated_info = metrics(config[:triangulate], YESTERDAY_START_PERIOD, yesterday_end_period, true)
|
209
|
+
@yesterday_triangulated_metric_count = validate_metrics_and_count(yesterday_triangulated_info)
|
210
|
+
value = if @yesterday_triangulated_metric_count > 0
|
211
|
+
series = read_series_from_metrics(yesterday_triangulated_info)
|
212
|
+
@yesterday_triangulated_metrics = store_metrics(series)
|
213
|
+
read_value_from_series(series)
|
214
|
+
end
|
215
|
+
value
|
184
216
|
end
|
185
217
|
|
186
218
|
def read_series_from_metrics(metrics)
|
@@ -234,14 +266,19 @@ class CheckInfluxDbMetrics < Sensu::Plugin::Check::CLI
|
|
234
266
|
ok 'no metrics found'
|
235
267
|
elsif @today_metric_count > @yesterday_metric_count
|
236
268
|
display_metrics
|
237
|
-
critical
|
269
|
+
critical 'For ' + config[:metric] + ' more metrics tracked today (' + @today_metric_count + ') than yesterday (' + @yesterday_metric_count + ') See above'
|
238
270
|
elsif @today_metric_count == @yesterday_metric_count
|
239
271
|
compare_each_metric_in_regex
|
240
272
|
else
|
241
|
-
ok 'regex seems ok ' + @
|
273
|
+
ok 'regex seems ok! Today metrics dropped. Yesterday (' + @yesterday_metric_count.to_s + ') vs (' + @today_metric_count.to_s + ') found today.'
|
242
274
|
end
|
243
275
|
end
|
244
276
|
|
277
|
+
def difference_for_standard_queries(today, yesterday)
|
278
|
+
difference = difference_between_two_metrics(today, yesterday)
|
279
|
+
evaluate_percentage_and_notify(difference)
|
280
|
+
end
|
281
|
+
|
245
282
|
def compare_each_metric_in_regex
|
246
283
|
@today_metrics.each do |today_key, today_value|
|
247
284
|
@yesterday_metrics.each do |yesterday_key, yesterday_value|
|
@@ -271,7 +308,7 @@ class CheckInfluxDbMetrics < Sensu::Plugin::Check::CLI
|
|
271
308
|
end
|
272
309
|
|
273
310
|
def evaluate_percentage_and_notify(difference)
|
274
|
-
puts 'Difference of: ' + difference.round(
|
311
|
+
puts 'Difference of: ' + difference.round(3).to_s + ' % for a period of ' + config[:period].to_s + 'm'
|
275
312
|
if difference < config[:crit]
|
276
313
|
critical "\"#{config[:metric]}\" difference is below allowed minimum of #{config[:crit]} %"
|
277
314
|
elsif difference < config[:warn]
|
@@ -281,33 +318,84 @@ class CheckInfluxDbMetrics < Sensu::Plugin::Check::CLI
|
|
281
318
|
end
|
282
319
|
end
|
283
320
|
|
321
|
+
def evaluate_distance_and_notify(distance)
|
322
|
+
if distance > config[:distance].to_f
|
323
|
+
critical config[:metric] + ' vs ' + config[:triangulate] + ' distance is greater than allowed minimum of ' + config[:distance]
|
324
|
+
else
|
325
|
+
ok 'distance ok'
|
326
|
+
end
|
327
|
+
end
|
328
|
+
|
284
329
|
def calculate_difference_and_display_result(today, yesterday)
|
285
|
-
|
286
|
-
|
287
|
-
|
288
|
-
|
289
|
-
|
290
|
-
difference
|
330
|
+
if @is_using_regex
|
331
|
+
difference_for_regex_and_notify
|
332
|
+
else
|
333
|
+
difference_between_two_metrics(today, yesterday)
|
334
|
+
end
|
291
335
|
end
|
292
336
|
|
293
|
-
def
|
294
|
-
|
295
|
-
|
337
|
+
def difference_between_percentages_of_two_metrics
|
338
|
+
validate_base_metrics
|
339
|
+
validate_triangulated_metrics
|
340
|
+
base = difference_between_two_metrics(today_metrics, yesterday_metrics)
|
341
|
+
triangulated = difference_between_two_metrics(today_triangulated_metrics, yesterday_triangulated_metrics)
|
342
|
+
puts 'difference for ' + config[:metric] + ' ' + base.round(3).to_s + '% vs ' + config[:triangulate] + ' ' + triangulated.round(3).to_s + '%'
|
343
|
+
distance = distance_between_two_numbers(base, triangulated)
|
344
|
+
evaluate_distance_and_notify(distance)
|
345
|
+
end
|
346
|
+
|
347
|
+
def distance_between_two_numbers(a, b)
|
348
|
+
(a - b).abs
|
349
|
+
end
|
350
|
+
|
351
|
+
def validate_triangulated_metrics
|
352
|
+
today = today_triangulated_metrics
|
353
|
+
yesterday = yesterday_triangulated_metrics
|
354
|
+
if today.nil? && yesterday.nil?
|
355
|
+
puts 'No metrics found to triangulate'
|
356
|
+
exit
|
357
|
+
else
|
358
|
+
0
|
359
|
+
end
|
360
|
+
end
|
361
|
+
|
362
|
+
def validate_base_metrics
|
363
|
+
today = today_metrics
|
364
|
+
yesterday = yesterday_metrics
|
365
|
+
if today.nil? && yesterday.nil?
|
366
|
+
puts 'No metrics found in base to triangulate'
|
367
|
+
exit
|
368
|
+
else
|
369
|
+
0
|
370
|
+
end
|
296
371
|
end
|
297
372
|
|
298
|
-
def
|
299
|
-
today =
|
300
|
-
yesterday =
|
373
|
+
def difference_between_metrics
|
374
|
+
today = today_metrics
|
375
|
+
yesterday = yesterday_metrics
|
301
376
|
if today.nil? && yesterday.nil?
|
302
377
|
puts 'No results coming from InfluxDB either for Today nor Yesterday. Please check your query or try again'
|
303
378
|
else
|
304
|
-
calculate_difference_and_display_result(today, yesterday)
|
379
|
+
difference = calculate_difference_and_display_result(today, yesterday)
|
380
|
+
evaluate_percentage_and_notify(difference)
|
305
381
|
end
|
306
382
|
exit
|
307
383
|
end
|
308
384
|
|
385
|
+
def triangulation?
|
386
|
+
config[:triangulate].nil?
|
387
|
+
end
|
388
|
+
|
389
|
+
def check_metrics_in_influxdb
|
390
|
+
if triangulation?
|
391
|
+
difference_between_metrics
|
392
|
+
else
|
393
|
+
difference_between_percentages_of_two_metrics
|
394
|
+
end
|
395
|
+
end
|
396
|
+
|
309
397
|
def run
|
310
|
-
|
398
|
+
check_metrics_in_influxdb
|
311
399
|
|
312
400
|
rescue Errno::ECONNREFUSED => e
|
313
401
|
critical 'InfluxDB is not responding' + e.message
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: sensu-plugins-influxdb-metrics-checker
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.4.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Juanjo Guerrero Cerezuela
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2017-01-
|
11
|
+
date: 2017-01-20 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: sensu-plugin
|