sensu-plugins-mongodb-wt 2.2.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +7 -0
- data/CHANGELOG.md +156 -0
- data/LICENSE +22 -0
- data/README.md +26 -0
- data/bin/check-mongodb-metric.rb +144 -0
- data/bin/check-mongodb.py +1471 -0
- data/bin/check-mongodb.rb +6 -0
- data/bin/metrics-mongodb-replication.rb +268 -0
- data/bin/metrics-mongodb.rb +133 -0
- data/lib/sensu-plugins-mongodb.rb +1 -0
- data/lib/sensu-plugins-mongodb/metrics.rb +449 -0
- data/lib/sensu-plugins-mongodb/version.rb +9 -0
- metadata +237 -0
checksums.yaml
ADDED
@@ -0,0 +1,7 @@
|
|
1
|
+
---
|
2
|
+
SHA256:
|
3
|
+
metadata.gz: c1cebf301a79e401bb146049f805077f2b21ecd6d45a7a05c1173cc3ac804a66
|
4
|
+
data.tar.gz: d8b3757d1d6e2d7f813b4af224964949d4218c5354d45acbbe56ae5a5d178f41
|
5
|
+
SHA512:
|
6
|
+
metadata.gz: 0476135d459f5a6d3da206adfd0a9540fdeaefab5907200f3adfe27458884912408737426e1b02923c0ed54476bb2b7e757d4b75c8be2d1abaca21ae2a80351f
|
7
|
+
data.tar.gz: 3382cca4570b6987fa8eba474c62c06f56fde9b8ee497157c82123b7884c8e2388af6cb7104db56a85816cf08372757c9071efc6833bec3a1ddb8919147b9f80
|
data/CHANGELOG.md
ADDED
@@ -0,0 +1,156 @@
|
|
1
|
+
# Change Log
|
2
|
+
This project adheres to [Semantic Versioning](http://semver.org/).
|
3
|
+
|
4
|
+
This CHANGELOG follows the format located [here](https://github.com/sensu-plugins/community/blob/master/HOW_WE_CHANGELOG.md)
|
5
|
+
|
6
|
+
## [Unreleased]
|
7
|
+
### Added
|
8
|
+
- added WiredTiger metrics (@jonathanschlue-as)
|
9
|
+
|
10
|
+
## [2.1.0] - 2018-12-27
|
11
|
+
### Added
|
12
|
+
- `bin/metrics-mongodb.rb`: added `--exclude-db-sizes` option that removes database sizes which can be quite large from the payload sent to message broker (rabbitmq) which often need special tuning for (@mdzidic)
|
13
|
+
|
14
|
+
## [2.0.2] - 2018-03-17
|
15
|
+
### Fixed
|
16
|
+
- renamed library file `metics` to `metrics` and updated all refrences in code to it (@majormoses)
|
17
|
+
|
18
|
+
## [2.0.1] - 2017-10-19
|
19
|
+
### Fixed
|
20
|
+
- updating the read preferences for `2.2`-`2.8` pymongo clients (@urg)
|
21
|
+
|
22
|
+
## [2.0.0] - 2017-09-23
|
23
|
+
### Breaking Change
|
24
|
+
- bumped requirement of `sensu-plugin` [to 2.0](https://github.com/sensu-plugins/sensu-plugin/blob/master/CHANGELOG.md#v200---2017-03-29) (@majormoses)
|
25
|
+
|
26
|
+
### Fixed
|
27
|
+
- check-mongodb-metric.rb: make `--metric` required since it is (@majormoses)
|
28
|
+
|
29
|
+
## [1.4.1] - 2017-09-23
|
30
|
+
### Fixed
|
31
|
+
- Support for database size metrics (@fandrews)
|
32
|
+
|
33
|
+
### Changed
|
34
|
+
- updated changelog guidelines location (@majormoses)
|
35
|
+
|
36
|
+
## [1.4.0] - 2017-09-05
|
37
|
+
### Added
|
38
|
+
- Support for returning replicaset state metrics (@naemono)
|
39
|
+
- Tests covering returning replicaset state metrics (@naemono)
|
40
|
+
- Ruby 2.4.1 testing
|
41
|
+
|
42
|
+
## [1.3.0] - 2017-05-22
|
43
|
+
### Added
|
44
|
+
- Support for database size metrics (@naemono)
|
45
|
+
- Tests covering returning database size metrics (@naemono)
|
46
|
+
|
47
|
+
## [1.2.2] - 2017-05-08
|
48
|
+
### Fixed
|
49
|
+
- `check-mongodb.py`: will now correctly crit on connection issues (@majormoses)
|
50
|
+
## [1.2.1] - 2017-05-07
|
51
|
+
### Fixed
|
52
|
+
- `check-mongodb.py`: fixed issue of param building with not/using ssl connections (@s-schweer)
|
53
|
+
|
54
|
+
## [1.2.0] - 2017-03-06
|
55
|
+
### Fixed
|
56
|
+
- `check-mongodb.py`: Set read preference for pymongo 2.2+ to fix 'General MongoDB Error: can't set attribute' (@boutetnico)
|
57
|
+
- `check-mongodb.py`: Fix mongo replication lag percent check showing password in plain text (@furbiesandbeans)
|
58
|
+
- `metrics-mongodb-replication.rb`: Sort replication members to ensure the primary is the first element (@gonzalo-radio)
|
59
|
+
|
60
|
+
### Changed
|
61
|
+
- Update `mongo` gem to 2.4.1, which adds support for MongoDB 3.4 (@eheydrick)
|
62
|
+
|
63
|
+
## [1.1.0] - 2016-10-17
|
64
|
+
### Added
|
65
|
+
- Inclusion of check-mongodb-metrics.rb to perform checks against the same data metrics-mongodb.rb produces. (@stefano-pogliani)
|
66
|
+
- Inclusion of lib/sensu-plugins-mongodb/metics.rb to share metric collection logic. (@stefano-pogliani)
|
67
|
+
- Tests to the metrics processing shared code. (@stefano-pogliani)
|
68
|
+
- Support for SSL certificates for clients. (@b0d0nne11)
|
69
|
+
- Inclusion of metrics-mongodb-replication.rb to produce replication metrics including lag statistics (@stefano-pogliani)
|
70
|
+
- Updated metrics-mongodb.rb to include version checks to ensure execution in mongodb > 3.2.x (@RycroftSolutions)
|
71
|
+
- Additional metrics not included in original metrics-mongodb.rb (@RycroftSolutions)
|
72
|
+
|
73
|
+
### Changed
|
74
|
+
- Moved most of metrics-mongodb.rb code to shared library. (@stefano-pogliani)
|
75
|
+
- MongoDB version checks to skip missing metrics. (@stefano-pogliani)
|
76
|
+
- Renamed some metrics to become standard with MongoDB 3.2 equivalent
|
77
|
+
(so checks/queries don't have to bother with version detection). (@stefano-pogliani)
|
78
|
+
|
79
|
+
## [1.0.0] - 2016-06-03
|
80
|
+
### Removed
|
81
|
+
- support for Rubies 1.9.3 and 2.0
|
82
|
+
|
83
|
+
### Added
|
84
|
+
- support for Ruby 2.3
|
85
|
+
|
86
|
+
### Changed
|
87
|
+
- Update to rubocop 0.40 and cleanup
|
88
|
+
- Update to mongo gem 2.2.x and bson 4.x for MongoDB 3.2 support
|
89
|
+
|
90
|
+
### Fixed
|
91
|
+
- Long was added as a numeric type
|
92
|
+
- metrics-mongodb.rb: fix typo
|
93
|
+
|
94
|
+
## [0.0.8] - 2016-03-04
|
95
|
+
### Added
|
96
|
+
- Add a ruby wrapper script for check-mongodb.py
|
97
|
+
|
98
|
+
### Changed
|
99
|
+
- Rubocop upgrade and cleanup
|
100
|
+
|
101
|
+
## [0.0.7] - 2015-11-12
|
102
|
+
### Fixed
|
103
|
+
- Stopped trying to gather indexCounters data from mongo 3 (metrics-mongodb.rb)
|
104
|
+
|
105
|
+
### Changed
|
106
|
+
- Updated mongo gem to 1.12.3
|
107
|
+
|
108
|
+
## [0.0.6] - 2015-10-13
|
109
|
+
### Fixed
|
110
|
+
- Rename option to avoid naming conflict with class variable name
|
111
|
+
- Add message for replica set state 9 (rollback)
|
112
|
+
- Installation fix
|
113
|
+
|
114
|
+
## [0.0.5] - 2015-09-04
|
115
|
+
### Fixed
|
116
|
+
- Fixed non ssl mongo connections
|
117
|
+
|
118
|
+
## [0.0.4] - 2015-08-12
|
119
|
+
### Changed
|
120
|
+
- general gem cleanup
|
121
|
+
- bump rubocop
|
122
|
+
|
123
|
+
## [0.0.3] - 2015-07-14
|
124
|
+
### Changed
|
125
|
+
- updated sensu-plugin gem to 1.2.0
|
126
|
+
|
127
|
+
## [0.0.2] - 2015-06-03
|
128
|
+
### Fixed
|
129
|
+
- added binstubs
|
130
|
+
|
131
|
+
### Changed
|
132
|
+
- removed cruft from /lib
|
133
|
+
|
134
|
+
## 0.0.1 - 2015-05-20
|
135
|
+
### Added
|
136
|
+
- initial release
|
137
|
+
|
138
|
+
[Unreleased]: https://github.com/sensu-plugins/sensu-plugins-mongodb/compare/2.1.0...HEAD
|
139
|
+
[2.1.0]: https://github.com/sensu-plugins/sensu-plugins-mongodb/compare/2.0.2...2.1.0
|
140
|
+
[2.0.2]: https://github.com/sensu-plugins/sensu-plugins-mongodb/compare/2.0.1...2.0.2
|
141
|
+
[2.0.1]: https://github.com/sensu-plugins/sensu-plugins-mongodb/compare/2.0.0...2.0.1
|
142
|
+
[2.0.0]: https://github.com/sensu-plugins/sensu-plugins-mongodb/compare/1.4.1...2.0.0
|
143
|
+
[1.4.1]: https://github.com/sensu-plugins/sensu-plugins-mongodb/compare/1.4.0...1.4.1
|
144
|
+
[1.4.0]: https://github.com/sensu-plugins/sensu-plugins-mongodb/compare/1.3.0...1.4.0
|
145
|
+
[1.3.0]: https://github.com/sensu-plugins/sensu-plugins-mongodb/compare/1.2.1...1.3.0
|
146
|
+
[1.2.1]: https://github.com/sensu-plugins/sensu-plugins-mongodb/compare/1.2.0...1.2.1
|
147
|
+
[1.2.0]: https://github.com/sensu-plugins/sensu-plugins-mongodb/compare/1.1.0...1.2.0
|
148
|
+
[1.1.0]: https://github.com/sensu-plugins/sensu-plugins-mongodb/compare/1.0.0...1.1.0
|
149
|
+
[1.0.0]: https://github.com/sensu-plugins/sensu-plugins-mongodb/compare/0.0.8...1.0.0
|
150
|
+
[0.0.8]: https://github.com/sensu-plugins/sensu-plugins-mongodb/compare/0.0.7...0.0.8
|
151
|
+
[0.0.7]: https://github.com/sensu-plugins/sensu-plugins-mongodb/compare/0.0.6...0.0.7
|
152
|
+
[0.0.6]: https://github.com/sensu-plugins/sensu-plugins-mongodb/compare/0.0.5...0.0.6
|
153
|
+
[0.0.5]: https://github.com/sensu-plugins/sensu-plugins-mongodb/compare/0.0.4...0.0.5
|
154
|
+
[0.0.4]: https://github.com/sensu-plugins/sensu-plugins-mongodb/compare/0.0.3...0.0.4
|
155
|
+
[0.0.3]: https://github.com/sensu-plugins/sensu-plugins-mongodb/compare/0.0.2...0.0.3
|
156
|
+
[0.0.2]: https://github.com/sensu-plugins/sensu-plugins-mongodb/compare/0.0.1...0.0.2
|
data/LICENSE
ADDED
@@ -0,0 +1,22 @@
|
|
1
|
+
Copyright (c) 2015 Sensu-Plugins
|
2
|
+
|
3
|
+
MIT License
|
4
|
+
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining
|
6
|
+
a copy of this software and associated documentation files (the
|
7
|
+
"Software"), to deal in the Software without restriction, including
|
8
|
+
without limitation the rights to use, copy, modify, merge, publish,
|
9
|
+
distribute, sublicense, and/or sell copies of the Software, and to
|
10
|
+
permit persons to whom the Software is furnished to do so, subject to
|
11
|
+
the following conditions:
|
12
|
+
|
13
|
+
The above copyright notice and this permission notice shall be
|
14
|
+
included in all copies or substantial portions of the Software.
|
15
|
+
|
16
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
17
|
+
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
18
|
+
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
|
19
|
+
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
|
20
|
+
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
|
21
|
+
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
|
22
|
+
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
data/README.md
ADDED
@@ -0,0 +1,26 @@
|
|
1
|
+
## Sensu-Plugins-mongodb
|
2
|
+
|
3
|
+
[![Build Status](https://travis-ci.org/sensu-plugins/sensu-plugins-mongodb.svg?branch=master)](https://travis-ci.org/sensu-plugins/sensu-plugins-mongodb)
|
4
|
+
[![Gem Version](https://badge.fury.io/rb/sensu-plugins-mongodb.svg)](http://badge.fury.io/rb/sensu-plugins-mongodb)
|
5
|
+
[![Code Climate](https://codeclimate.com/github/sensu-plugins/sensu-plugins-mongodb/badges/gpa.svg)](https://codeclimate.com/github/sensu-plugins/sensu-plugins-mongodb)
|
6
|
+
[![Test Coverage](https://codeclimate.com/github/sensu-plugins/sensu-plugins-mongodb/badges/coverage.svg)](https://codeclimate.com/github/sensu-plugins/sensu-plugins-mongodb)
|
7
|
+
[![Dependency Status](https://gemnasium.com/sensu-plugins/sensu-plugins-mongodb.svg)](https://gemnasium.com/sensu-plugins/sensu-plugins-mongodb)
|
8
|
+
|
9
|
+
## Functionality
|
10
|
+
|
11
|
+
## Files
|
12
|
+
* bin/check-mongodb.py
|
13
|
+
* bin/check-mongodb.rb - wrapper for check-mongodb.py
|
14
|
+
* bin/check-mongodb-metric.rb
|
15
|
+
* bin/metrics-mongodb.rb
|
16
|
+
* bin/metrics-mongodb-replication.rb
|
17
|
+
|
18
|
+
## Usage
|
19
|
+
|
20
|
+
## Installation
|
21
|
+
|
22
|
+
[Installation and Setup](http://sensu-plugins.io/docs/installation_instructions.html)
|
23
|
+
|
24
|
+
## Notes
|
25
|
+
|
26
|
+
The `pymongo` python package needs to be installed to use `check-mongodb`
|
@@ -0,0 +1,144 @@
|
|
1
|
+
#! /usr/bin/env ruby
|
2
|
+
#
|
3
|
+
# check-mongodb-metric.rb
|
4
|
+
#
|
5
|
+
# DESCRIPTION:
|
6
|
+
#
|
7
|
+
# OUTPUT:
|
8
|
+
# plain text
|
9
|
+
#
|
10
|
+
# PLATFORMS:
|
11
|
+
# Linux
|
12
|
+
#
|
13
|
+
# DEPENDENCIES:
|
14
|
+
# gem: sensu-plugin
|
15
|
+
# gem: mongo
|
16
|
+
# gem: bson
|
17
|
+
# gem: bson_ext
|
18
|
+
#
|
19
|
+
# USAGE:
|
20
|
+
# #YELLOW
|
21
|
+
#
|
22
|
+
# NOTES:
|
23
|
+
#
|
24
|
+
# LICENSE:
|
25
|
+
# Copyright 2016 Conversocial https://github.com/conversocial
|
26
|
+
# Released under the same terms as Sensu (the MIT license); see LICENSE
|
27
|
+
# for details.
|
28
|
+
#
|
29
|
+
|
30
|
+
require 'sensu-plugin/check/cli'
|
31
|
+
require 'sensu-plugins-mongodb/metrics'
|
32
|
+
require 'mongo'
|
33
|
+
include Mongo
|
34
|
+
|
35
|
+
#
|
36
|
+
# Mongodb
|
37
|
+
#
|
38
|
+
|
39
|
+
class CheckMongodbMetric < Sensu::Plugin::Check::CLI
|
40
|
+
option :host,
|
41
|
+
description: 'MongoDB host',
|
42
|
+
long: '--host HOST',
|
43
|
+
default: 'localhost'
|
44
|
+
|
45
|
+
option :port,
|
46
|
+
description: 'MongoDB port',
|
47
|
+
long: '--port PORT',
|
48
|
+
default: 27_017
|
49
|
+
|
50
|
+
option :user,
|
51
|
+
description: 'MongoDB user',
|
52
|
+
long: '--user USER',
|
53
|
+
default: nil
|
54
|
+
|
55
|
+
option :password,
|
56
|
+
description: 'MongoDB password',
|
57
|
+
long: '--password PASSWORD',
|
58
|
+
default: nil
|
59
|
+
|
60
|
+
option :ssl,
|
61
|
+
description: 'Connect using SSL',
|
62
|
+
long: '--ssl',
|
63
|
+
default: false
|
64
|
+
|
65
|
+
option :ssl_cert,
|
66
|
+
description: 'The certificate file used to identify the local connection against mongod',
|
67
|
+
long: '--ssl-cert SSL_CERT',
|
68
|
+
default: ''
|
69
|
+
|
70
|
+
option :ssl_key,
|
71
|
+
description: 'The private key used to identify the local connection against mongod',
|
72
|
+
long: '--ssl-key SSL_KEY',
|
73
|
+
default: ''
|
74
|
+
|
75
|
+
option :ssl_ca_cert,
|
76
|
+
description: 'The set of concatenated CA certificates, which are used to validate certificates passed from the other end of the connection',
|
77
|
+
long: '--ssl-ca-cert SSL_CA_CERT',
|
78
|
+
default: ''
|
79
|
+
|
80
|
+
option :ssl_verify,
|
81
|
+
description: 'Whether or not to do peer certification validation',
|
82
|
+
long: '--ssl-verify',
|
83
|
+
default: false
|
84
|
+
|
85
|
+
option :debug,
|
86
|
+
description: 'Enable debug',
|
87
|
+
long: '--debug',
|
88
|
+
default: false
|
89
|
+
|
90
|
+
option :require_master,
|
91
|
+
description: 'Require the node to be a master node',
|
92
|
+
long: '--require-master',
|
93
|
+
default: false
|
94
|
+
|
95
|
+
option :metric,
|
96
|
+
description: 'Name of the metric to check',
|
97
|
+
long: '--metric METRIC',
|
98
|
+
short: '-m METRIC',
|
99
|
+
required: true
|
100
|
+
|
101
|
+
option :warn,
|
102
|
+
description: 'Warn if values are above this threshold',
|
103
|
+
short: '-w WARN',
|
104
|
+
proc: proc(&:to_i),
|
105
|
+
default: 0
|
106
|
+
|
107
|
+
option :crit,
|
108
|
+
description: 'Fail if values are above this threshold',
|
109
|
+
short: '-c CRIT',
|
110
|
+
proc: proc(&:to_i),
|
111
|
+
default: 0
|
112
|
+
|
113
|
+
def run
|
114
|
+
Mongo::Logger.logger.level = Logger::FATAL
|
115
|
+
@debug = config[:debug]
|
116
|
+
if @debug
|
117
|
+
Mongo::Logger.logger.level = Logger::DEBUG
|
118
|
+
config_debug = config.clone
|
119
|
+
config_debug[:password] = '***'
|
120
|
+
puts 'Arguments: ' + config_debug.inspect
|
121
|
+
end
|
122
|
+
|
123
|
+
# Get the metrics.
|
124
|
+
collector = SensuPluginsMongoDB::Metrics.new(config)
|
125
|
+
collector.connect_mongo_db('admin')
|
126
|
+
exit(1) if config[:require_master] && !collector.master?
|
127
|
+
metrics = collector.server_metrics
|
128
|
+
|
129
|
+
# Make sure the requested value is available.
|
130
|
+
unless metrics.key?(config[:metric])
|
131
|
+
unknown "Unable to find a value for metric '#{config[:metric]}'"
|
132
|
+
end
|
133
|
+
|
134
|
+
# Check the requested value against the thresholds.
|
135
|
+
value = metrics[config[:metric]]
|
136
|
+
if value >= config[:crit]
|
137
|
+
critical "The value of '#{config[:metric]}' exceeds #{config[:crit]}."
|
138
|
+
end
|
139
|
+
if value >= config[:warn]
|
140
|
+
warning "The value of '#{config[:metric]}' exceeds #{config[:warn]}."
|
141
|
+
end
|
142
|
+
ok "The value of '#{config[:metric]}' is below all threshold."
|
143
|
+
end
|
144
|
+
end
|
@@ -0,0 +1,1471 @@
|
|
1
|
+
#!/usr/bin/env python
|
2
|
+
|
3
|
+
#
|
4
|
+
# A MongoDB Nagios check script
|
5
|
+
#
|
6
|
+
|
7
|
+
# Script idea taken from a Tag1 script I found and I modified it a lot
|
8
|
+
#
|
9
|
+
# Main Author
|
10
|
+
# - Mike Zupan <mike@zcentric.com>
|
11
|
+
# Contributers
|
12
|
+
# - Frank Brandewiede <brande@travel-iq.com> <brande@bfiw.de> <brande@novolab.de>
|
13
|
+
# - Sam Perman <sam@brightcove.com>
|
14
|
+
# - Shlomo Priymak <shlomoid@gmail.com>
|
15
|
+
# - @jhoff909 on github
|
16
|
+
# - @jbraeuer on github
|
17
|
+
# - Dag Stockstad <dag.stockstad@gmail.com>
|
18
|
+
# - @Andor on github
|
19
|
+
# - Steven Richards - Captainkrtek on Github <sbrichards@mit.edu>
|
20
|
+
#
|
21
|
+
|
22
|
+
# License: BSD
|
23
|
+
# Copyright (c) 2012, Mike Zupan <mike@zcentric.com>
|
24
|
+
# All rights reserved.
|
25
|
+
# Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
|
26
|
+
#
|
27
|
+
# Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
|
28
|
+
# Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the
|
29
|
+
# documentation and/or other materials provided with the distribution.
|
30
|
+
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
|
31
|
+
# THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS
|
32
|
+
# BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
|
33
|
+
# GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
|
34
|
+
# STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
35
|
+
#
|
36
|
+
# README: https://github.com/mzupan/nagios-plugin-mongodb/blob/master/LICENSE
|
37
|
+
|
38
|
+
# #RED
|
39
|
+
import sys
|
40
|
+
import time
|
41
|
+
import optparse
|
42
|
+
import textwrap
|
43
|
+
import re
|
44
|
+
import os
|
45
|
+
|
46
|
+
try:
|
47
|
+
import pymongo
|
48
|
+
except ImportError, e:
|
49
|
+
print e
|
50
|
+
sys.exit(2)
|
51
|
+
|
52
|
+
# As of pymongo v 1.9 the SON API is part of the BSON package, therefore attempt
|
53
|
+
# to import from there and fall back to pymongo in cases of older pymongo
|
54
|
+
if pymongo.version >= "1.9":
|
55
|
+
import bson.son as son
|
56
|
+
else:
|
57
|
+
import pymongo.son as son
|
58
|
+
|
59
|
+
|
60
|
+
#
|
61
|
+
# thanks to http://stackoverflow.com/a/1229667/72987
|
62
|
+
#
|
63
|
+
def optional_arg(arg_default):
|
64
|
+
def func(option, opt_str, value, parser):
|
65
|
+
if parser.rargs and not parser.rargs[0].startswith('-'):
|
66
|
+
val = parser.rargs[0]
|
67
|
+
parser.rargs.pop(0)
|
68
|
+
else:
|
69
|
+
val = arg_default
|
70
|
+
setattr(parser.values, option.dest, val)
|
71
|
+
return func
|
72
|
+
|
73
|
+
|
74
|
+
def performance_data(perf_data, params):
|
75
|
+
data = ''
|
76
|
+
if perf_data:
|
77
|
+
data = " |"
|
78
|
+
for p in params:
|
79
|
+
p += (None, None, None, None)
|
80
|
+
param, param_name, warning, critical = p[0:4]
|
81
|
+
data += "%s=%s" % (param_name, str(param))
|
82
|
+
if warning or critical:
|
83
|
+
warning = warning or 0
|
84
|
+
critical = critical or 0
|
85
|
+
data += ";%s;%s" % (warning, critical)
|
86
|
+
|
87
|
+
data += " "
|
88
|
+
|
89
|
+
return data
|
90
|
+
|
91
|
+
|
92
|
+
def numeric_type(param):
|
93
|
+
if ((type(param) == float or type(param) == int or type(param) == long or param == None)):
|
94
|
+
return True
|
95
|
+
return False
|
96
|
+
|
97
|
+
|
98
|
+
def check_levels(param, warning, critical, message, ok=[]):
|
99
|
+
if (numeric_type(critical) and numeric_type(warning)):
|
100
|
+
if param >= critical:
|
101
|
+
print "CRITICAL - " + message
|
102
|
+
sys.exit(2)
|
103
|
+
elif param >= warning:
|
104
|
+
print "WARNING - " + message
|
105
|
+
sys.exit(1)
|
106
|
+
else:
|
107
|
+
print "OK - " + message
|
108
|
+
sys.exit(0)
|
109
|
+
else:
|
110
|
+
if param in critical:
|
111
|
+
print "CRITICAL - " + message
|
112
|
+
sys.exit(2)
|
113
|
+
|
114
|
+
if param in warning:
|
115
|
+
print "WARNING - " + message
|
116
|
+
sys.exit(1)
|
117
|
+
|
118
|
+
if param in ok:
|
119
|
+
print "OK - " + message
|
120
|
+
sys.exit(0)
|
121
|
+
|
122
|
+
# unexpected param value
|
123
|
+
print "CRITICAL - Unexpected value : %d" % param + "; " + message
|
124
|
+
return 2
|
125
|
+
|
126
|
+
|
127
|
+
def get_server_status(con):
|
128
|
+
try:
|
129
|
+
set_read_preference(con.admin)
|
130
|
+
data = con.admin.command(pymongo.son_manipulator.SON([('serverStatus', 1)]))
|
131
|
+
except:
|
132
|
+
data = con.admin.command(son.SON([('serverStatus', 1)]))
|
133
|
+
return data
|
134
|
+
|
135
|
+
|
136
|
+
def main(argv):
|
137
|
+
p = optparse.OptionParser(conflict_handler="resolve", description="This Nagios plugin checks the health of mongodb.")
|
138
|
+
|
139
|
+
p.add_option('-H', '--host', action='store', type='string', dest='host', default='127.0.0.1', help='The hostname you want to connect to')
|
140
|
+
p.add_option('-P', '--port', action='store', type='int', dest='port', default=27017, help='The port mongodb is runnung on')
|
141
|
+
p.add_option('-u', '--user', action='store', type='string', dest='user', default=None, help='The username you want to login as')
|
142
|
+
p.add_option('-p', '--pass', action='store', type='string', dest='passwd', default=None, help='The password you want to use for that user')
|
143
|
+
p.add_option('-W', '--warning', action='store', dest='warning', default=None, help='The warning threshold we want to set')
|
144
|
+
p.add_option('-C', '--critical', action='store', dest='critical', default=None, help='The critical threshold we want to set')
|
145
|
+
p.add_option('-A', '--action', action='store', type='choice', dest='action', default='connect', help='The action you want to take',
|
146
|
+
choices=['connect', 'connections', 'replication_lag', 'replication_lag_percent', 'replset_state', 'memory', 'memory_mapped', 'lock',
|
147
|
+
'flushing', 'last_flush_time', 'index_miss_ratio', 'databases', 'collections', 'database_size', 'database_indexes', 'collection_indexes', 'collection_size',
|
148
|
+
'queues', 'oplog', 'journal_commits_in_wl', 'write_data_files', 'journaled', 'opcounters', 'current_lock', 'replica_primary', 'page_faults',
|
149
|
+
'asserts', 'queries_per_second', 'page_faults', 'chunks_balance', 'connect_primary', 'collection_state', 'row_count', 'replset_quorum'])
|
150
|
+
p.add_option('--max-lag', action='store_true', dest='max_lag', default=False, help='Get max replication lag (for replication_lag action only)')
|
151
|
+
p.add_option('--mapped-memory', action='store_true', dest='mapped_memory', default=False, help='Get mapped memory instead of resident (if resident memory can not be read)')
|
152
|
+
p.add_option('-D', '--perf-data', action='store_true', dest='perf_data', default=False, help='Enable output of Nagios performance data')
|
153
|
+
p.add_option('-d', '--database', action='store', dest='database', default='admin', help='Specify the database to check')
|
154
|
+
p.add_option('--all-databases', action='store_true', dest='all_databases', default=False, help='Check all databases (action database_size)')
|
155
|
+
p.add_option('-s', '--ssl-enabled', dest='ssl_enabled', default=False, action='callback', callback=optional_arg(True), help='Connect using SSL')
|
156
|
+
p.add_option('-e', '--ssl-certfile', dest='ssl_certfile', default=None, action='store', help='The certificate file used to identify the local connection against mongod')
|
157
|
+
p.add_option('-k', '--ssl-keyfile', dest='ssl_keyfile', default=None, action='store', help='The private key used to identify the local connection against mongod')
|
158
|
+
p.add_option('-a', '--ssl-ca-certs', dest='ssl_ca_certs', default=None, action='store', help='The set of concatenated CA certificates, which are used to validate certificates passed from the other end of the connection')
|
159
|
+
p.add_option('-r', '--replicaset', dest='replicaset', default=None, action='callback', callback=optional_arg(True), help='Connect to replicaset')
|
160
|
+
p.add_option('-q', '--querytype', action='store', dest='query_type', default='query', help='The query type to check [query|insert|update|delete|getmore|command] from queries_per_second')
|
161
|
+
p.add_option('-c', '--collection', action='store', dest='collection', default='admin', help='Specify the collection to check')
|
162
|
+
p.add_option('-T', '--time', action='store', type='int', dest='sample_time', default=1, help='Time used to sample number of pages faults')
|
163
|
+
|
164
|
+
options, arguments = p.parse_args()
|
165
|
+
host = options.host
|
166
|
+
port = options.port
|
167
|
+
user = options.user
|
168
|
+
passwd = options.passwd
|
169
|
+
query_type = options.query_type
|
170
|
+
collection = options.collection
|
171
|
+
sample_time = options.sample_time
|
172
|
+
if (options.action == 'replset_state'):
|
173
|
+
warning = str(options.warning or "")
|
174
|
+
critical = str(options.critical or "")
|
175
|
+
else:
|
176
|
+
warning = float(options.warning or 0)
|
177
|
+
critical = float(options.critical or 0)
|
178
|
+
|
179
|
+
action = options.action
|
180
|
+
perf_data = options.perf_data
|
181
|
+
max_lag = options.max_lag
|
182
|
+
database = options.database
|
183
|
+
ssl_enabled = options.ssl_enabled
|
184
|
+
ssl_certfile = options.ssl_certfile
|
185
|
+
ssl_keyfile = options.ssl_keyfile
|
186
|
+
ssl_ca_certs = options.ssl_ca_certs
|
187
|
+
replicaset = options.replicaset
|
188
|
+
|
189
|
+
if action == 'replica_primary' and replicaset is None:
|
190
|
+
return "replicaset must be passed in when using replica_primary check"
|
191
|
+
elif not action == 'replica_primary' and replicaset:
|
192
|
+
return "passing a replicaset while not checking replica_primary does not work"
|
193
|
+
|
194
|
+
#
|
195
|
+
# moving the login up here and passing in the connection
|
196
|
+
#
|
197
|
+
start = time.time()
|
198
|
+
err, con = mongo_connect(host, port, ssl_enabled, ssl_certfile, ssl_keyfile, ssl_ca_certs, user, passwd, replicaset)
|
199
|
+
if err != 0:
|
200
|
+
return err
|
201
|
+
|
202
|
+
conn_time = time.time() - start
|
203
|
+
conn_time = round(conn_time, 0)
|
204
|
+
|
205
|
+
if action == "connections":
|
206
|
+
return check_connections(con, warning, critical, perf_data)
|
207
|
+
elif action == "replication_lag":
|
208
|
+
return check_rep_lag(con, host, port, warning, critical, False, perf_data, max_lag, user, passwd)
|
209
|
+
elif action == "replication_lag_percent":
|
210
|
+
return check_rep_lag(con, host, port, warning, critical, True, perf_data, max_lag, user, passwd)
|
211
|
+
elif action == "replset_state":
|
212
|
+
return check_replset_state(con, perf_data, warning, critical)
|
213
|
+
elif action == "memory":
|
214
|
+
return check_memory(con, warning, critical, perf_data, options.mapped_memory)
|
215
|
+
elif action == "memory_mapped":
|
216
|
+
return check_memory_mapped(con, warning, critical, perf_data)
|
217
|
+
elif action == "queues":
|
218
|
+
return check_queues(con, warning, critical, perf_data)
|
219
|
+
elif action == "lock":
|
220
|
+
return check_lock(con, warning, critical, perf_data)
|
221
|
+
elif action == "current_lock":
|
222
|
+
return check_current_lock(con, host, warning, critical, perf_data)
|
223
|
+
elif action == "flushing":
|
224
|
+
return check_flushing(con, warning, critical, True, perf_data)
|
225
|
+
elif action == "last_flush_time":
|
226
|
+
return check_flushing(con, warning, critical, False, perf_data)
|
227
|
+
elif action == "index_miss_ratio":
|
228
|
+
index_miss_ratio(con, warning, critical, perf_data)
|
229
|
+
elif action == "databases":
|
230
|
+
return check_databases(con, warning, critical, perf_data)
|
231
|
+
elif action == "collections":
|
232
|
+
return check_collections(con, warning, critical, perf_data)
|
233
|
+
elif action == "oplog":
|
234
|
+
return check_oplog(con, warning, critical, perf_data)
|
235
|
+
elif action == "journal_commits_in_wl":
|
236
|
+
return check_journal_commits_in_wl(con, warning, critical, perf_data)
|
237
|
+
elif action == "database_size":
|
238
|
+
if options.all_databases:
|
239
|
+
return check_all_databases_size(con, warning, critical, perf_data)
|
240
|
+
else:
|
241
|
+
return check_database_size(con, database, warning, critical, perf_data)
|
242
|
+
elif action == "database_indexes":
|
243
|
+
return check_database_indexes(con, database, warning, critical, perf_data)
|
244
|
+
elif action == "collection_indexes":
|
245
|
+
return check_collection_indexes(con, database, collection, warning, critical, perf_data)
|
246
|
+
elif action == "collection_size":
|
247
|
+
return check_collection_size(con, database, collection, warning, critical, perf_data)
|
248
|
+
elif action == "journaled":
|
249
|
+
return check_journaled(con, warning, critical, perf_data)
|
250
|
+
elif action == "write_data_files":
|
251
|
+
return check_write_to_datafiles(con, warning, critical, perf_data)
|
252
|
+
elif action == "opcounters":
|
253
|
+
return check_opcounters(con, host, warning, critical, perf_data)
|
254
|
+
elif action == "asserts":
|
255
|
+
return check_asserts(con, host, warning, critical, perf_data)
|
256
|
+
elif action == "replica_primary":
|
257
|
+
return check_replica_primary(con, host, warning, critical, perf_data, replicaset)
|
258
|
+
elif action == "queries_per_second":
|
259
|
+
return check_queries_per_second(con, query_type, warning, critical, perf_data)
|
260
|
+
elif action == "page_faults":
|
261
|
+
check_page_faults(con, sample_time, warning, critical, perf_data)
|
262
|
+
elif action == "chunks_balance":
|
263
|
+
chunks_balance(con, database, collection, warning, critical)
|
264
|
+
elif action == "connect_primary":
|
265
|
+
return check_connect_primary(con, warning, critical, perf_data)
|
266
|
+
elif action == "collection_state":
|
267
|
+
return check_collection_state(con, database, collection)
|
268
|
+
elif action == "row_count":
|
269
|
+
return check_row_count(con, database, collection, warning, critical, perf_data)
|
270
|
+
elif action == "replset_quorum":
|
271
|
+
return check_replset_quorum(con, perf_data)
|
272
|
+
else:
|
273
|
+
return check_connect(host, port, warning, critical, perf_data, user, passwd, conn_time)
|
274
|
+
|
275
|
+
|
276
|
+
def mongo_connect(host=None, port=None, ssl_enabled=False, ssl_certfile=None, ssl_keyfile=None, ssl_ca_certs=None, user=None, passwd=None, replica=None):
|
277
|
+
try:
|
278
|
+
# ssl connection for pymongo > 2.3
|
279
|
+
if pymongo.version >= "2.3":
|
280
|
+
if replica is None:
|
281
|
+
if ssl_enabled:
|
282
|
+
con = pymongo.MongoClient(host, port, ssl=ssl_enabled, ssl_certfile=ssl_certfile, ssl_keyfile=ssl_keyfile, ssl_ca_certs=ssl_ca_certs)
|
283
|
+
else:
|
284
|
+
con = pymongo.MongoClient(host, port)
|
285
|
+
else:
|
286
|
+
if ssl_enabled:
|
287
|
+
con = pymongo.Connection(host, port, read_preference=pymongo.ReadPreference.SECONDARY, ssl=ssl_enabled, ssl_certfile=ssl_certfile, ssl_keyfile=ssl_keyfile, ssl_ca_certs=ssl_ca_certs, replicaSet=replica, network_timeout=10)
|
288
|
+
else:
|
289
|
+
con = pymongo.Connection(host, port, read_preference=pymongo.ReadPreference.SECONDARY, replicaSet=replica, network_timeout=10)
|
290
|
+
try:
|
291
|
+
# https://api.mongodb.com/python/current/api/pymongo/mongo_client.html
|
292
|
+
# The ismaster command is cheap and does not require auth.
|
293
|
+
con.admin.command('ismaster', connectTimeoutMS=10000)
|
294
|
+
except Exception, e:
|
295
|
+
return exit_with_general_critical(e), None
|
296
|
+
else:
|
297
|
+
if replica is None:
|
298
|
+
con = pymongo.Connection(host, port, slave_okay=True, network_timeout=10)
|
299
|
+
else:
|
300
|
+
con = pymongo.Connection(host, port, slave_okay=True, network_timeout=10)
|
301
|
+
#con = pymongo.Connection(host, port, slave_okay=True, replicaSet=replica, network_timeout=10)
|
302
|
+
|
303
|
+
if user and passwd:
|
304
|
+
db = con["admin"]
|
305
|
+
if not db.authenticate(user, passwd):
|
306
|
+
sys.exit("Username/Password incorrect")
|
307
|
+
except Exception, e:
|
308
|
+
if isinstance(e, pymongo.errors.AutoReconnect) and str(e).find(" is an arbiter") != -1:
|
309
|
+
# We got a pymongo AutoReconnect exception that tells us we connected to an Arbiter Server
|
310
|
+
# This means: Arbiter is reachable and can answer requests/votes - this is all we need to know from an arbiter
|
311
|
+
print "OK - State: 7 (Arbiter)"
|
312
|
+
sys.exit(0)
|
313
|
+
return exit_with_general_critical(e), None
|
314
|
+
return 0, con
|
315
|
+
|
316
|
+
|
317
|
+
def exit_with_general_warning(e):
|
318
|
+
if isinstance(e, SystemExit):
|
319
|
+
return e
|
320
|
+
else:
|
321
|
+
print "WARNING - General MongoDB warning:", e
|
322
|
+
return 1
|
323
|
+
|
324
|
+
|
325
|
+
def exit_with_general_critical(e):
|
326
|
+
if isinstance(e, SystemExit):
|
327
|
+
return e
|
328
|
+
else:
|
329
|
+
print "CRITICAL - General MongoDB Error:", e
|
330
|
+
return 2
|
331
|
+
|
332
|
+
|
333
|
+
def set_read_preference(db):
|
334
|
+
if pymongo.version >= "2.2" and pymongo.version < "2.8":
|
335
|
+
pymongo.read_preferences.Secondary
|
336
|
+
else:
|
337
|
+
db.read_preference = pymongo.ReadPreference.SECONDARY
|
338
|
+
|
339
|
+
|
340
|
+
def check_connect(host, port, warning, critical, perf_data, user, passwd, conn_time):
|
341
|
+
warning = warning or 3
|
342
|
+
critical = critical or 6
|
343
|
+
message = "Connection took %i seconds" % conn_time
|
344
|
+
message += performance_data(perf_data, [(conn_time, "connection_time", warning, critical)])
|
345
|
+
|
346
|
+
return check_levels(conn_time, warning, critical, message)
|
347
|
+
|
348
|
+
|
349
|
+
def check_connections(con, warning, critical, perf_data):
|
350
|
+
warning = warning or 80
|
351
|
+
critical = critical or 95
|
352
|
+
try:
|
353
|
+
data = get_server_status(con)
|
354
|
+
|
355
|
+
current = float(data['connections']['current'])
|
356
|
+
available = float(data['connections']['available'])
|
357
|
+
|
358
|
+
used_percent = int(float(current / (available + current)) * 100)
|
359
|
+
message = "%i percent (%i of %i connections) used" % (used_percent, current, current + available)
|
360
|
+
message += performance_data(perf_data, [(used_percent, "used_percent", warning, critical),
|
361
|
+
(current, "current_connections"),
|
362
|
+
(available, "available_connections")])
|
363
|
+
return check_levels(used_percent, warning, critical, message)
|
364
|
+
|
365
|
+
except Exception, e:
|
366
|
+
return exit_with_general_critical(e)
|
367
|
+
|
368
|
+
|
369
|
+
def check_rep_lag(con, host, port, warning, critical, percent, perf_data, max_lag, user, passwd):
|
370
|
+
# Get mongo to tell us replica set member name when connecting locally
|
371
|
+
if "127.0.0.1" == host:
|
372
|
+
host = con.admin.command("ismaster","1")["me"].split(':')[0]
|
373
|
+
|
374
|
+
if percent:
|
375
|
+
warning = warning or 50
|
376
|
+
critical = critical or 75
|
377
|
+
else:
|
378
|
+
warning = warning or 600
|
379
|
+
critical = critical or 3600
|
380
|
+
rs_status = {}
|
381
|
+
slaveDelays = {}
|
382
|
+
try:
|
383
|
+
set_read_preference(con.admin)
|
384
|
+
|
385
|
+
# Get replica set status
|
386
|
+
try:
|
387
|
+
rs_status = con.admin.command("replSetGetStatus")
|
388
|
+
except pymongo.errors.OperationFailure, e:
|
389
|
+
if e.code == None and str(e).find('failed: not running with --replSet"'):
|
390
|
+
print "OK - Not running with replSet"
|
391
|
+
return 0
|
392
|
+
|
393
|
+
serverVersion = tuple(con.server_info()['version'].split('.'))
|
394
|
+
if serverVersion >= tuple("2.0.0".split(".")):
|
395
|
+
#
|
396
|
+
# check for version greater then 2.0
|
397
|
+
#
|
398
|
+
rs_conf = con.local.system.replset.find_one()
|
399
|
+
for member in rs_conf['members']:
|
400
|
+
if member.get('slaveDelay') is not None:
|
401
|
+
slaveDelays[member['host']] = member.get('slaveDelay')
|
402
|
+
else:
|
403
|
+
slaveDelays[member['host']] = 0
|
404
|
+
|
405
|
+
# Find the primary and/or the current node
|
406
|
+
primary_node = None
|
407
|
+
host_node = None
|
408
|
+
|
409
|
+
for member in rs_status["members"]:
|
410
|
+
if member["stateStr"] == "PRIMARY":
|
411
|
+
primary_node = member
|
412
|
+
if member["name"].split(':')[0] == host and int(member["name"].split(':')[1]) == port:
|
413
|
+
host_node = member
|
414
|
+
|
415
|
+
# Check if we're in the middle of an election and don't have a primary
|
416
|
+
if primary_node is None:
|
417
|
+
print "WARNING - No primary defined. In an election?"
|
418
|
+
return 1
|
419
|
+
|
420
|
+
# Check if we failed to find the current host
|
421
|
+
# below should never happen
|
422
|
+
if host_node is None:
|
423
|
+
print "CRITICAL - Unable to find host '" + host + "' in replica set."
|
424
|
+
return 2
|
425
|
+
|
426
|
+
# Is the specified host the primary?
|
427
|
+
if host_node["stateStr"] == "PRIMARY":
|
428
|
+
if max_lag == False:
|
429
|
+
print "OK - This is the primary."
|
430
|
+
return 0
|
431
|
+
else:
|
432
|
+
#get the maximal replication lag
|
433
|
+
data = ""
|
434
|
+
maximal_lag = 0
|
435
|
+
for member in rs_status['members']:
|
436
|
+
if not member['stateStr'] == "ARBITER":
|
437
|
+
lastSlaveOpTime = member['optimeDate']
|
438
|
+
replicationLag = abs(primary_node["optimeDate"] - lastSlaveOpTime).seconds - slaveDelays[member['name']]
|
439
|
+
data = data + member['name'] + " lag=%d;" % replicationLag
|
440
|
+
maximal_lag = max(maximal_lag, replicationLag)
|
441
|
+
if percent:
|
442
|
+
err, con = mongo_connect(primary_node['name'].split(':')[0], int(primary_node['name'].split(':')[1]), False, user=user, passwd=passwd)
|
443
|
+
if err != 0:
|
444
|
+
return err
|
445
|
+
primary_timediff = replication_get_time_diff(con)
|
446
|
+
maximal_lag = int(float(maximal_lag) / float(primary_timediff) * 100)
|
447
|
+
message = "Maximal lag is " + str(maximal_lag) + " percents"
|
448
|
+
message += performance_data(perf_data, [(maximal_lag, "replication_lag_percent", warning, critical)])
|
449
|
+
else:
|
450
|
+
message = "Maximal lag is " + str(maximal_lag) + " seconds"
|
451
|
+
message += performance_data(perf_data, [(maximal_lag, "replication_lag", warning, critical)])
|
452
|
+
return check_levels(maximal_lag, warning, critical, message)
|
453
|
+
elif host_node["stateStr"] == "ARBITER":
|
454
|
+
print "OK - This is an arbiter"
|
455
|
+
return 0
|
456
|
+
|
457
|
+
# Find the difference in optime between current node and PRIMARY
|
458
|
+
|
459
|
+
optime_lag = abs(primary_node["optimeDate"] - host_node["optimeDate"])
|
460
|
+
|
461
|
+
if host_node['name'] in slaveDelays:
|
462
|
+
slave_delay = slaveDelays[host_node['name']]
|
463
|
+
elif host_node['name'].endswith(':27017') and host_node['name'][:-len(":27017")] in slaveDelays:
|
464
|
+
slave_delay = slaveDelays[host_node['name'][:-len(":27017")]]
|
465
|
+
else:
|
466
|
+
raise Exception("Unable to determine slave delay for {0}".format(host_node['name']))
|
467
|
+
|
468
|
+
try: # work starting from python2.7
|
469
|
+
lag = optime_lag.total_seconds()
|
470
|
+
except:
|
471
|
+
lag = float(optime_lag.seconds + optime_lag.days * 24 * 3600)
|
472
|
+
|
473
|
+
if percent:
|
474
|
+
err, con = mongo_connect(primary_node['name'].split(':')[0], int(primary_node['name'].split(':')[1]), False, user=user, passwd=passwd)
|
475
|
+
if err != 0:
|
476
|
+
return err
|
477
|
+
primary_timediff = replication_get_time_diff(con)
|
478
|
+
if primary_timediff != 0:
|
479
|
+
lag = int(float(lag) / float(primary_timediff) * 100)
|
480
|
+
else:
|
481
|
+
lag = 0
|
482
|
+
message = "Lag is " + str(lag) + " percents"
|
483
|
+
message += performance_data(perf_data, [(lag, "replication_lag_percent", warning, critical)])
|
484
|
+
else:
|
485
|
+
message = "Lag is " + str(lag) + " seconds"
|
486
|
+
message += performance_data(perf_data, [(lag, "replication_lag", warning, critical)])
|
487
|
+
return check_levels(lag, warning + slaveDelays[host_node['name']], critical + slaveDelays[host_node['name']], message)
|
488
|
+
else:
|
489
|
+
#
|
490
|
+
# less than 2.0 check
|
491
|
+
#
|
492
|
+
# Get replica set status
|
493
|
+
rs_status = con.admin.command("replSetGetStatus")
|
494
|
+
|
495
|
+
# Find the primary and/or the current node
|
496
|
+
primary_node = None
|
497
|
+
host_node = None
|
498
|
+
for member in rs_status["members"]:
|
499
|
+
if member["stateStr"] == "PRIMARY":
|
500
|
+
primary_node = (member["name"], member["optimeDate"])
|
501
|
+
if member["name"].split(":")[0].startswith(host):
|
502
|
+
host_node = member
|
503
|
+
|
504
|
+
# Check if we're in the middle of an election and don't have a primary
|
505
|
+
if primary_node is None:
|
506
|
+
print "WARNING - No primary defined. In an election?"
|
507
|
+
sys.exit(1)
|
508
|
+
|
509
|
+
# Is the specified host the primary?
|
510
|
+
if host_node["stateStr"] == "PRIMARY":
|
511
|
+
print "OK - This is the primary."
|
512
|
+
sys.exit(0)
|
513
|
+
|
514
|
+
# Find the difference in optime between current node and PRIMARY
|
515
|
+
optime_lag = abs(primary_node[1] - host_node["optimeDate"])
|
516
|
+
lag = optime_lag.seconds
|
517
|
+
if percent:
|
518
|
+
err, con = mongo_connect(primary_node['name'].split(':')[0], int(primary_node['name'].split(':')[1]))
|
519
|
+
if err != 0:
|
520
|
+
return err
|
521
|
+
primary_timediff = replication_get_time_diff(con)
|
522
|
+
lag = int(float(lag) / float(primary_timediff) * 100)
|
523
|
+
message = "Lag is " + str(lag) + " percents"
|
524
|
+
message += performance_data(perf_data, [(lag, "replication_lag_percent", warning, critical)])
|
525
|
+
else:
|
526
|
+
message = "Lag is " + str(lag) + " seconds"
|
527
|
+
message += performance_data(perf_data, [(lag, "replication_lag", warning, critical)])
|
528
|
+
return check_levels(lag, warning, critical, message)
|
529
|
+
|
530
|
+
except Exception, e:
|
531
|
+
return exit_with_general_critical(e)
|
532
|
+
|
533
|
+
|
534
|
+
def check_memory(con, warning, critical, perf_data, mapped_memory):
|
535
|
+
#
|
536
|
+
# These thresholds are basically meaningless, and must be customized to your system's ram
|
537
|
+
#
|
538
|
+
warning = warning or 8
|
539
|
+
critical = critical or 16
|
540
|
+
try:
|
541
|
+
data = get_server_status(con)
|
542
|
+
if not data['mem']['supported'] and not mapped_memory:
|
543
|
+
print "OK - Platform not supported for memory info"
|
544
|
+
return 0
|
545
|
+
#
|
546
|
+
# convert to gigs
|
547
|
+
#
|
548
|
+
message = "Memory Usage:"
|
549
|
+
try:
|
550
|
+
mem_resident = float(data['mem']['resident']) / 1024.0
|
551
|
+
message += " %.2fGB resident," % (mem_resident)
|
552
|
+
except:
|
553
|
+
mem_resident = 0
|
554
|
+
message += " resident unsupported,"
|
555
|
+
try:
|
556
|
+
mem_virtual = float(data['mem']['virtual']) / 1024.0
|
557
|
+
message += " %.2fGB virtual," % mem_virtual
|
558
|
+
except:
|
559
|
+
mem_virtual = 0
|
560
|
+
message += " virtual unsupported,"
|
561
|
+
try:
|
562
|
+
mem_mapped = float(data['mem']['mapped']) / 1024.0
|
563
|
+
message += " %.2fGB mapped," % mem_mapped
|
564
|
+
except:
|
565
|
+
mem_mapped = 0
|
566
|
+
message += " mapped unsupported,"
|
567
|
+
try:
|
568
|
+
mem_mapped_journal = float(data['mem']['mappedWithJournal']) / 1024.0
|
569
|
+
message += " %.2fGB mappedWithJournal" % mem_mapped_journal
|
570
|
+
except:
|
571
|
+
mem_mapped_journal = 0
|
572
|
+
message += performance_data(perf_data, [("%.2f" % mem_resident, "memory_usage", warning, critical),
|
573
|
+
("%.2f" % mem_mapped, "memory_mapped"), ("%.2f" % mem_virtual, "memory_virtual"), ("%.2f" % mem_mapped_journal, "mappedWithJournal")])
|
574
|
+
#added for unsupported systems like Solaris
|
575
|
+
if mapped_memory and mem_resident == 0:
|
576
|
+
return check_levels(mem_mapped, warning, critical, message)
|
577
|
+
else:
|
578
|
+
return check_levels(mem_resident, warning, critical, message)
|
579
|
+
|
580
|
+
except Exception, e:
|
581
|
+
return exit_with_general_critical(e)
|
582
|
+
|
583
|
+
|
584
|
+
def check_memory_mapped(con, warning, critical, perf_data):
|
585
|
+
#
|
586
|
+
# These thresholds are basically meaningless, and must be customized to your application
|
587
|
+
#
|
588
|
+
warning = warning or 8
|
589
|
+
critical = critical or 16
|
590
|
+
try:
|
591
|
+
data = get_server_status(con)
|
592
|
+
if not data['mem']['supported']:
|
593
|
+
print "OK - Platform not supported for memory info"
|
594
|
+
return 0
|
595
|
+
#
|
596
|
+
# convert to gigs
|
597
|
+
#
|
598
|
+
message = "Memory Usage:"
|
599
|
+
try:
|
600
|
+
mem_mapped = float(data['mem']['mapped']) / 1024.0
|
601
|
+
message += " %.2fGB mapped," % mem_mapped
|
602
|
+
except:
|
603
|
+
mem_mapped = -1
|
604
|
+
message += " mapped unsupported,"
|
605
|
+
try:
|
606
|
+
mem_mapped_journal = float(data['mem']['mappedWithJournal']) / 1024.0
|
607
|
+
message += " %.2fGB mappedWithJournal" % mem_mapped_journal
|
608
|
+
except:
|
609
|
+
mem_mapped_journal = 0
|
610
|
+
message += performance_data(perf_data, [("%.2f" % mem_mapped, "memory_mapped"), ("%.2f" % mem_mapped_journal, "mappedWithJournal")])
|
611
|
+
|
612
|
+
if not mem_mapped == -1:
|
613
|
+
return check_levels(mem_mapped, warning, critical, message)
|
614
|
+
else:
|
615
|
+
print "OK - Server does not provide mem.mapped info"
|
616
|
+
return 0
|
617
|
+
|
618
|
+
except Exception, e:
|
619
|
+
return exit_with_general_critical(e)
|
620
|
+
|
621
|
+
|
622
|
+
def check_lock(con, warning, critical, perf_data):
|
623
|
+
warning = warning or 10
|
624
|
+
critical = critical or 30
|
625
|
+
try:
|
626
|
+
data = get_server_status(con)
|
627
|
+
#
|
628
|
+
# calculate percentage
|
629
|
+
#
|
630
|
+
lock_percentage = float(data['globalLock']['lockTime']) / float(data['globalLock']['totalTime']) * 100
|
631
|
+
message = "Lock Percentage: %.2f%%" % lock_percentage
|
632
|
+
message += performance_data(perf_data, [("%.2f" % lock_percentage, "lock_percentage", warning, critical)])
|
633
|
+
return check_levels(lock_percentage, warning, critical, message)
|
634
|
+
|
635
|
+
except Exception, e:
|
636
|
+
return exit_with_general_critical(e)
|
637
|
+
|
638
|
+
|
639
|
+
def check_flushing(con, warning, critical, avg, perf_data):
|
640
|
+
#
|
641
|
+
# These thresholds mean it's taking 5 seconds to perform a background flush to issue a warning
|
642
|
+
# and 10 seconds to issue a critical.
|
643
|
+
#
|
644
|
+
warning = warning or 5000
|
645
|
+
critical = critical or 15000
|
646
|
+
try:
|
647
|
+
data = get_server_status(con)
|
648
|
+
if avg:
|
649
|
+
flush_time = float(data['backgroundFlushing']['average_ms'])
|
650
|
+
stat_type = "Average"
|
651
|
+
else:
|
652
|
+
flush_time = float(data['backgroundFlushing']['last_ms'])
|
653
|
+
stat_type = "Last"
|
654
|
+
|
655
|
+
message = "%s Flush Time: %.2fms" % (stat_type, flush_time)
|
656
|
+
message += performance_data(perf_data, [("%.2fms" % flush_time, "%s_flush_time" % stat_type.lower(), warning, critical)])
|
657
|
+
|
658
|
+
return check_levels(flush_time, warning, critical, message)
|
659
|
+
|
660
|
+
except Exception, e:
|
661
|
+
return exit_with_general_critical(e)
|
662
|
+
|
663
|
+
|
664
|
+
def index_miss_ratio(con, warning, critical, perf_data):
|
665
|
+
warning = warning or 10
|
666
|
+
critical = critical or 30
|
667
|
+
try:
|
668
|
+
data = get_server_status(con)
|
669
|
+
|
670
|
+
try:
|
671
|
+
serverVersion = tuple(con.server_info()['version'].split('.'))
|
672
|
+
if serverVersion >= tuple("2.4.0".split(".")):
|
673
|
+
miss_ratio = float(data['indexCounters']['missRatio'])
|
674
|
+
else:
|
675
|
+
miss_ratio = float(data['indexCounters']['btree']['missRatio'])
|
676
|
+
except KeyError:
|
677
|
+
not_supported_msg = "not supported on this platform"
|
678
|
+
if data['indexCounters'].has_key('note'):
|
679
|
+
print "OK - MongoDB says: " + not_supported_msg
|
680
|
+
return 0
|
681
|
+
else:
|
682
|
+
print "WARNING - Can't get counter from MongoDB"
|
683
|
+
return 1
|
684
|
+
|
685
|
+
message = "Miss Ratio: %.2f" % miss_ratio
|
686
|
+
message += performance_data(perf_data, [("%.2f" % miss_ratio, "index_miss_ratio", warning, critical)])
|
687
|
+
|
688
|
+
return check_levels(miss_ratio, warning, critical, message)
|
689
|
+
|
690
|
+
except Exception, e:
|
691
|
+
return exit_with_general_critical(e)
|
692
|
+
|
693
|
+
def check_replset_quorum(con, perf_data):
|
694
|
+
db = con['admin']
|
695
|
+
warning = 1
|
696
|
+
critical = 2
|
697
|
+
primary = 0
|
698
|
+
|
699
|
+
try:
|
700
|
+
rs_members = db.command("replSetGetStatus")['members']
|
701
|
+
|
702
|
+
for member in rs_members:
|
703
|
+
if member['state'] == 1:
|
704
|
+
primary += 1
|
705
|
+
|
706
|
+
if primary == 1:
|
707
|
+
state = 0
|
708
|
+
message = "Cluster is quorate"
|
709
|
+
else:
|
710
|
+
state = 2
|
711
|
+
message = "Cluster is not quorate and cannot operate"
|
712
|
+
|
713
|
+
return check_levels(state, warning, critical, message)
|
714
|
+
except Exception, e:
|
715
|
+
return exit_with_general_critical(e)
|
716
|
+
|
717
|
+
|
718
|
+
|
719
|
+
def check_replset_state(con, perf_data, warning="", critical=""):
|
720
|
+
try:
|
721
|
+
warning = [int(x) for x in warning.split(",")]
|
722
|
+
except:
|
723
|
+
warning = [0, 3, 5, 9]
|
724
|
+
try:
|
725
|
+
critical = [int(x) for x in critical.split(",")]
|
726
|
+
except:
|
727
|
+
critical = [8, 4, -1]
|
728
|
+
|
729
|
+
ok = range(-1, 8) # should include the range of all posiible values
|
730
|
+
try:
|
731
|
+
try:
|
732
|
+
try:
|
733
|
+
set_read_preference(con.admin)
|
734
|
+
data = con.admin.command(pymongo.son_manipulator.SON([('replSetGetStatus', 1)]))
|
735
|
+
except:
|
736
|
+
data = con.admin.command(son.SON([('replSetGetStatus', 1)]))
|
737
|
+
state = int(data['myState'])
|
738
|
+
except pymongo.errors.OperationFailure, e:
|
739
|
+
if e.code == None and str(e).find('failed: not running with --replSet"'):
|
740
|
+
state = -1
|
741
|
+
|
742
|
+
if state == 8:
|
743
|
+
message = "State: %i (Down)" % state
|
744
|
+
elif state == 4:
|
745
|
+
message = "State: %i (Fatal error)" % state
|
746
|
+
elif state == 0:
|
747
|
+
message = "State: %i (Starting up, phase1)" % state
|
748
|
+
elif state == 3:
|
749
|
+
message = "State: %i (Recovering)" % state
|
750
|
+
elif state == 5:
|
751
|
+
message = "State: %i (Starting up, phase2)" % state
|
752
|
+
elif state == 1:
|
753
|
+
message = "State: %i (Primary)" % state
|
754
|
+
elif state == 2:
|
755
|
+
message = "State: %i (Secondary)" % state
|
756
|
+
elif state == 7:
|
757
|
+
message = "State: %i (Arbiter)" % state
|
758
|
+
elif state == 9:
|
759
|
+
message = "State: %i (Rollback)" % state
|
760
|
+
elif state == -1:
|
761
|
+
message = "Not running with replSet"
|
762
|
+
else:
|
763
|
+
message = "State: %i (Unknown state)" % state
|
764
|
+
message += performance_data(perf_data, [(state, "state")])
|
765
|
+
return check_levels(state, warning, critical, message, ok)
|
766
|
+
except Exception, e:
|
767
|
+
return exit_with_general_critical(e)
|
768
|
+
|
769
|
+
|
770
|
+
def check_databases(con, warning, critical, perf_data=None):
|
771
|
+
try:
|
772
|
+
try:
|
773
|
+
set_read_preference(con.admin)
|
774
|
+
data = con.admin.command(pymongo.son_manipulator.SON([('listDatabases', 1)]))
|
775
|
+
except:
|
776
|
+
data = con.admin.command(son.SON([('listDatabases', 1)]))
|
777
|
+
|
778
|
+
count = len(data['databases'])
|
779
|
+
message = "Number of DBs: %.0f" % count
|
780
|
+
message += performance_data(perf_data, [(count, "databases", warning, critical, message)])
|
781
|
+
return check_levels(count, warning, critical, message)
|
782
|
+
except Exception, e:
|
783
|
+
return exit_with_general_critical(e)
|
784
|
+
|
785
|
+
|
786
|
+
def check_collections(con, warning, critical, perf_data=None):
|
787
|
+
try:
|
788
|
+
try:
|
789
|
+
set_read_preference(con.admin)
|
790
|
+
data = con.admin.command(pymongo.son_manipulator.SON([('listDatabases', 1)]))
|
791
|
+
except:
|
792
|
+
data = con.admin.command(son.SON([('listDatabases', 1)]))
|
793
|
+
|
794
|
+
count = 0
|
795
|
+
for db in data['databases']:
|
796
|
+
dbase = con[db['name']]
|
797
|
+
set_read_preference(dbase)
|
798
|
+
count += len(dbase.collection_names())
|
799
|
+
|
800
|
+
message = "Number of collections: %.0f" % count
|
801
|
+
message += performance_data(perf_data, [(count, "collections", warning, critical, message)])
|
802
|
+
return check_levels(count, warning, critical, message)
|
803
|
+
|
804
|
+
except Exception, e:
|
805
|
+
return exit_with_general_critical(e)
|
806
|
+
|
807
|
+
|
808
|
+
def check_all_databases_size(con, warning, critical, perf_data):
|
809
|
+
warning = warning or 100
|
810
|
+
critical = critical or 1000
|
811
|
+
try:
|
812
|
+
set_read_preference(con.admin)
|
813
|
+
all_dbs_data = con.admin.command(pymongo.son_manipulator.SON([('listDatabases', 1)]))
|
814
|
+
except:
|
815
|
+
all_dbs_data = con.admin.command(son.SON([('listDatabases', 1)]))
|
816
|
+
|
817
|
+
total_storage_size = 0
|
818
|
+
message = ""
|
819
|
+
perf_data_param = [()]
|
820
|
+
for db in all_dbs_data['databases']:
|
821
|
+
database = db['name']
|
822
|
+
data = con[database].command('dbstats')
|
823
|
+
storage_size = round(data['storageSize'] / 1024 / 1024, 1)
|
824
|
+
message += "; Database %s size: %.0f MB" % (database, storage_size)
|
825
|
+
perf_data_param.append((storage_size, database + "_database_size"))
|
826
|
+
total_storage_size += storage_size
|
827
|
+
|
828
|
+
perf_data_param[0] = (total_storage_size, "total_size", warning, critical)
|
829
|
+
message += performance_data(perf_data, perf_data_param)
|
830
|
+
message = "Total size: %.0f MB" % total_storage_size + message
|
831
|
+
return check_levels(total_storage_size, warning, critical, message)
|
832
|
+
|
833
|
+
|
834
|
+
def check_database_size(con, database, warning, critical, perf_data):
|
835
|
+
warning = warning or 100
|
836
|
+
critical = critical or 1000
|
837
|
+
perfdata = ""
|
838
|
+
try:
|
839
|
+
set_read_preference(con.admin)
|
840
|
+
data = con[database].command('dbstats')
|
841
|
+
storage_size = data['storageSize'] / 1024 / 1024
|
842
|
+
if perf_data:
|
843
|
+
perfdata += " | database_size=%i;%i;%i" % (storage_size, warning, critical)
|
844
|
+
#perfdata += " database=%s" %(database)
|
845
|
+
|
846
|
+
if storage_size >= critical:
|
847
|
+
print "CRITICAL - Database size: %.0f MB, Database: %s%s" % (storage_size, database, perfdata)
|
848
|
+
return 2
|
849
|
+
elif storage_size >= warning:
|
850
|
+
print "WARNING - Database size: %.0f MB, Database: %s%s" % (storage_size, database, perfdata)
|
851
|
+
return 1
|
852
|
+
else:
|
853
|
+
print "OK - Database size: %.0f MB, Database: %s%s" % (storage_size, database, perfdata)
|
854
|
+
return 0
|
855
|
+
except Exception, e:
|
856
|
+
return exit_with_general_critical(e)
|
857
|
+
|
858
|
+
|
859
|
+
def check_database_indexes(con, database, warning, critical, perf_data):
|
860
|
+
#
|
861
|
+
# These thresholds are basically meaningless, and must be customized to your application
|
862
|
+
#
|
863
|
+
warning = warning or 100
|
864
|
+
critical = critical or 1000
|
865
|
+
perfdata = ""
|
866
|
+
try:
|
867
|
+
set_read_preference(con.admin)
|
868
|
+
data = con[database].command('dbstats')
|
869
|
+
index_size = data['indexSize'] / 1024 / 1024
|
870
|
+
if perf_data:
|
871
|
+
perfdata += " | database_indexes=%i;%i;%i" % (index_size, warning, critical)
|
872
|
+
|
873
|
+
if index_size >= critical:
|
874
|
+
print "CRITICAL - %s indexSize: %.0f MB %s" % (database, index_size, perfdata)
|
875
|
+
return 2
|
876
|
+
elif index_size >= warning:
|
877
|
+
print "WARNING - %s indexSize: %.0f MB %s" % (database, index_size, perfdata)
|
878
|
+
return 1
|
879
|
+
else:
|
880
|
+
print "OK - %s indexSize: %.0f MB %s" % (database, index_size, perfdata)
|
881
|
+
return 0
|
882
|
+
except Exception, e:
|
883
|
+
return exit_with_general_critical(e)
|
884
|
+
|
885
|
+
|
886
|
+
def check_collection_indexes(con, database, collection, warning, critical, perf_data):
|
887
|
+
#
|
888
|
+
# These thresholds are basically meaningless, and must be customized to your application
|
889
|
+
#
|
890
|
+
warning = warning or 100
|
891
|
+
critical = critical or 1000
|
892
|
+
perfdata = ""
|
893
|
+
try:
|
894
|
+
set_read_preference(con.admin)
|
895
|
+
data = con[database].command('collstats', collection)
|
896
|
+
total_index_size = data['totalIndexSize'] / 1024 / 1024
|
897
|
+
if perf_data:
|
898
|
+
perfdata += " | collection_indexes=%i;%i;%i" % (total_index_size, warning, critical)
|
899
|
+
|
900
|
+
if total_index_size >= critical:
|
901
|
+
print "CRITICAL - %s.%s totalIndexSize: %.0f MB %s" % (database, collection, total_index_size, perfdata)
|
902
|
+
return 2
|
903
|
+
elif total_index_size >= warning:
|
904
|
+
print "WARNING - %s.%s totalIndexSize: %.0f MB %s" % (database, collection, total_index_size, perfdata)
|
905
|
+
return 1
|
906
|
+
else:
|
907
|
+
print "OK - %s.%s totalIndexSize: %.0f MB %s" % (database, collection, total_index_size, perfdata)
|
908
|
+
return 0
|
909
|
+
except Exception, e:
|
910
|
+
return exit_with_general_critical(e)
|
911
|
+
|
912
|
+
|
913
|
+
def check_queues(con, warning, critical, perf_data):
|
914
|
+
warning = warning or 10
|
915
|
+
critical = critical or 30
|
916
|
+
try:
|
917
|
+
data = get_server_status(con)
|
918
|
+
|
919
|
+
total_queues = float(data['globalLock']['currentQueue']['total'])
|
920
|
+
readers_queues = float(data['globalLock']['currentQueue']['readers'])
|
921
|
+
writers_queues = float(data['globalLock']['currentQueue']['writers'])
|
922
|
+
message = "Current queue is : total = %d, readers = %d, writers = %d" % (total_queues, readers_queues, writers_queues)
|
923
|
+
message += performance_data(perf_data, [(total_queues, "total_queues", warning, critical), (readers_queues, "readers_queues"), (writers_queues, "writers_queues")])
|
924
|
+
return check_levels(total_queues, warning, critical, message)
|
925
|
+
|
926
|
+
except Exception, e:
|
927
|
+
return exit_with_general_critical(e)
|
928
|
+
|
929
|
+
def check_collection_size(con, database, collection, warning, critical, perf_data):
|
930
|
+
warning = warning or 100
|
931
|
+
critical = critical or 1000
|
932
|
+
perfdata = ""
|
933
|
+
try:
|
934
|
+
set_read_preference(con.admin)
|
935
|
+
data = con[database].command('collstats', collection)
|
936
|
+
size = data['size'] / 1024 / 1024
|
937
|
+
if perf_data:
|
938
|
+
perfdata += " | collection_size=%i;%i;%i" % (size, warning, critical)
|
939
|
+
|
940
|
+
if size >= critical:
|
941
|
+
print "CRITICAL - %s.%s size: %.0f MB %s" % (database, collection, size, perfdata)
|
942
|
+
return 2
|
943
|
+
elif size >= warning:
|
944
|
+
print "WARNING - %s.%s size: %.0f MB %s" % (database, collection, size, perfdata)
|
945
|
+
return 1
|
946
|
+
else:
|
947
|
+
print "OK - %s.%s size: %.0f MB %s" % (database, collection, size, perfdata)
|
948
|
+
return 0
|
949
|
+
except Exception, e:
|
950
|
+
return exit_with_general_critical(e)
|
951
|
+
|
952
|
+
def check_queries_per_second(con, query_type, warning, critical, perf_data):
|
953
|
+
warning = warning or 250
|
954
|
+
critical = critical or 500
|
955
|
+
|
956
|
+
if query_type not in ['insert', 'query', 'update', 'delete', 'getmore', 'command']:
|
957
|
+
return exit_with_general_critical("The query type of '%s' is not valid" % query_type)
|
958
|
+
|
959
|
+
try:
|
960
|
+
db = con.local
|
961
|
+
data = get_server_status(con)
|
962
|
+
|
963
|
+
# grab the count
|
964
|
+
num = int(data['opcounters'][query_type])
|
965
|
+
|
966
|
+
# do the math
|
967
|
+
last_count = db.nagios_check.find_one({'check': 'query_counts'})
|
968
|
+
try:
|
969
|
+
ts = int(time.time())
|
970
|
+
diff_query = num - last_count['data'][query_type]['count']
|
971
|
+
diff_ts = ts - last_count['data'][query_type]['ts']
|
972
|
+
|
973
|
+
query_per_sec = float(diff_query) / float(diff_ts)
|
974
|
+
|
975
|
+
# update the count now
|
976
|
+
db.nagios_check.update({u'_id': last_count['_id']}, {'$set': {"data.%s" % query_type: {'count': num, 'ts': int(time.time())}}})
|
977
|
+
|
978
|
+
message = "Queries / Sec: %f" % query_per_sec
|
979
|
+
message += performance_data(perf_data, [(query_per_sec, "%s_per_sec" % query_type, warning, critical, message)])
|
980
|
+
except KeyError:
|
981
|
+
#
|
982
|
+
# since it is the first run insert it
|
983
|
+
query_per_sec = 0
|
984
|
+
message = "First run of check.. no data"
|
985
|
+
db.nagios_check.update({u'_id': last_count['_id']}, {'$set': {"data.%s" % query_type: {'count': num, 'ts': int(time.time())}}})
|
986
|
+
except TypeError:
|
987
|
+
#
|
988
|
+
# since it is the first run insert it
|
989
|
+
query_per_sec = 0
|
990
|
+
message = "First run of check.. no data"
|
991
|
+
db.nagios_check.insert({'check': 'query_counts', 'data': {query_type: {'count': num, 'ts': int(time.time())}}})
|
992
|
+
|
993
|
+
return check_levels(query_per_sec, warning, critical, message)
|
994
|
+
|
995
|
+
except Exception, e:
|
996
|
+
return exit_with_general_critical(e)
|
997
|
+
|
998
|
+
|
999
|
+
def check_oplog(con, warning, critical, perf_data):
|
1000
|
+
""" Checking the oplog time - the time of the log currntly saved in the oplog collection
|
1001
|
+
defaults:
|
1002
|
+
critical 4 hours
|
1003
|
+
warning 24 hours
|
1004
|
+
those can be changed as usual with -C and -W parameters"""
|
1005
|
+
warning = warning or 24
|
1006
|
+
critical = critical or 4
|
1007
|
+
try:
|
1008
|
+
db = con.local
|
1009
|
+
ol = db.system.namespaces.find_one({"name": "local.oplog.rs"})
|
1010
|
+
if (db.system.namespaces.find_one({"name": "local.oplog.rs"}) != None):
|
1011
|
+
oplog = "oplog.rs"
|
1012
|
+
else:
|
1013
|
+
ol = db.system.namespaces.find_one({"name": "local.oplog.$main"})
|
1014
|
+
if (db.system.namespaces.find_one({"name": "local.oplog.$main"}) != None):
|
1015
|
+
oplog = "oplog.$main"
|
1016
|
+
else:
|
1017
|
+
message = "neither master/slave nor replica set replication detected"
|
1018
|
+
return check_levels(None, warning, critical, message)
|
1019
|
+
|
1020
|
+
try:
|
1021
|
+
set_read_preference(con.admin)
|
1022
|
+
data = con.local.command(pymongo.son_manipulator.SON([('collstats', oplog)]))
|
1023
|
+
except:
|
1024
|
+
data = con.admin.command(son.SON([('collstats', oplog)]))
|
1025
|
+
|
1026
|
+
ol_size = data['size']
|
1027
|
+
ol_storage_size = data['storageSize']
|
1028
|
+
ol_used_storage = int(float(ol_size) / ol_storage_size * 100 + 1)
|
1029
|
+
ol = con.local[oplog]
|
1030
|
+
firstc = ol.find().sort("$natural", pymongo.ASCENDING).limit(1)[0]['ts']
|
1031
|
+
lastc = ol.find().sort("$natural", pymongo.DESCENDING).limit(1)[0]['ts']
|
1032
|
+
time_in_oplog = (lastc.as_datetime() - firstc.as_datetime())
|
1033
|
+
message = "Oplog saves " + str(time_in_oplog) + " %d%% used" % ol_used_storage
|
1034
|
+
try: # work starting from python2.7
|
1035
|
+
hours_in_oplog = time_in_oplog.total_seconds() / 60 / 60
|
1036
|
+
except:
|
1037
|
+
hours_in_oplog = float(time_in_oplog.seconds + time_in_oplog.days * 24 * 3600) / 60 / 60
|
1038
|
+
approx_level = hours_in_oplog * 100 / ol_used_storage
|
1039
|
+
message += performance_data(perf_data, [("%.2f" % hours_in_oplog, 'oplog_time', warning, critical), ("%.2f " % approx_level, 'oplog_time_100_percent_used')])
|
1040
|
+
return check_levels(-approx_level, -warning, -critical, message)
|
1041
|
+
|
1042
|
+
except Exception, e:
|
1043
|
+
return exit_with_general_critical(e)
|
1044
|
+
|
1045
|
+
|
1046
|
+
def check_journal_commits_in_wl(con, warning, critical, perf_data):
|
1047
|
+
""" Checking the number of commits which occurred in the db's write lock.
|
1048
|
+
Most commits are performed outside of this lock; committed while in the write lock is undesirable.
|
1049
|
+
Under very high write situations it is normal for this value to be nonzero. """
|
1050
|
+
|
1051
|
+
warning = warning or 10
|
1052
|
+
critical = critical or 40
|
1053
|
+
try:
|
1054
|
+
data = get_server_status(con)
|
1055
|
+
j_commits_in_wl = data['dur']['commitsInWriteLock']
|
1056
|
+
message = "Journal commits in DB write lock : %d" % j_commits_in_wl
|
1057
|
+
message += performance_data(perf_data, [(j_commits_in_wl, "j_commits_in_wl", warning, critical)])
|
1058
|
+
return check_levels(j_commits_in_wl, warning, critical, message)
|
1059
|
+
|
1060
|
+
except Exception, e:
|
1061
|
+
return exit_with_general_critical(e)
|
1062
|
+
|
1063
|
+
|
1064
|
+
def check_journaled(con, warning, critical, perf_data):
|
1065
|
+
""" Checking the average amount of data in megabytes written to the recovery log in the last four seconds"""
|
1066
|
+
|
1067
|
+
warning = warning or 20
|
1068
|
+
critical = critical or 40
|
1069
|
+
try:
|
1070
|
+
data = get_server_status(con)
|
1071
|
+
journaled = data['dur']['journaledMB']
|
1072
|
+
message = "Journaled : %.2f MB" % journaled
|
1073
|
+
message += performance_data(perf_data, [("%.2f" % journaled, "journaled", warning, critical)])
|
1074
|
+
return check_levels(journaled, warning, critical, message)
|
1075
|
+
|
1076
|
+
except Exception, e:
|
1077
|
+
return exit_with_general_critical(e)
|
1078
|
+
|
1079
|
+
|
1080
|
+
def check_write_to_datafiles(con, warning, critical, perf_data):
|
1081
|
+
""" Checking the average amount of data in megabytes written to the databases datafiles in the last four seconds.
|
1082
|
+
As these writes are already journaled, they can occur lazily, and thus the number indicated here may be lower
|
1083
|
+
than the amount physically written to disk."""
|
1084
|
+
warning = warning or 20
|
1085
|
+
critical = critical or 40
|
1086
|
+
try:
|
1087
|
+
data = get_server_status(con)
|
1088
|
+
writes = data['dur']['writeToDataFilesMB']
|
1089
|
+
message = "Write to data files : %.2f MB" % writes
|
1090
|
+
message += performance_data(perf_data, [("%.2f" % writes, "write_to_data_files", warning, critical)])
|
1091
|
+
return check_levels(writes, warning, critical, message)
|
1092
|
+
|
1093
|
+
except Exception, e:
|
1094
|
+
return exit_with_general_critical(e)
|
1095
|
+
|
1096
|
+
|
1097
|
+
def get_opcounters(data, opcounters_name, host):
|
1098
|
+
try:
|
1099
|
+
insert = data[opcounters_name]['insert']
|
1100
|
+
query = data[opcounters_name]['query']
|
1101
|
+
update = data[opcounters_name]['update']
|
1102
|
+
delete = data[opcounters_name]['delete']
|
1103
|
+
getmore = data[opcounters_name]['getmore']
|
1104
|
+
command = data[opcounters_name]['command']
|
1105
|
+
except KeyError, e:
|
1106
|
+
return 0, [0] * 100
|
1107
|
+
total_commands = insert + query + update + delete + getmore + command
|
1108
|
+
new_vals = [total_commands, insert, query, update, delete, getmore, command]
|
1109
|
+
return maintain_delta(new_vals, host, opcounters_name)
|
1110
|
+
|
1111
|
+
|
1112
|
+
def check_opcounters(con, host, warning, critical, perf_data):
|
1113
|
+
""" A function to get all opcounters delta per minute. In case of a replication - gets the opcounters+opcountersRepl"""
|
1114
|
+
warning = warning or 10000
|
1115
|
+
critical = critical or 15000
|
1116
|
+
|
1117
|
+
data = get_server_status(con)
|
1118
|
+
err1, delta_opcounters = get_opcounters(data, 'opcounters', host)
|
1119
|
+
err2, delta_opcounters_repl = get_opcounters(data, 'opcountersRepl', host)
|
1120
|
+
if err1 == 0 and err2 == 0:
|
1121
|
+
delta = [(x + y) for x, y in zip(delta_opcounters, delta_opcounters_repl)]
|
1122
|
+
delta[0] = delta_opcounters[0] # only the time delta shouldn't be summarized
|
1123
|
+
per_minute_delta = [int(x / delta[0] * 60) for x in delta[1:]]
|
1124
|
+
message = "Test succeeded , old values missing"
|
1125
|
+
message = "Opcounters: total=%d,insert=%d,query=%d,update=%d,delete=%d,getmore=%d,command=%d" % tuple(per_minute_delta)
|
1126
|
+
message += performance_data(perf_data, ([(per_minute_delta[0], "total", warning, critical), (per_minute_delta[1], "insert"),
|
1127
|
+
(per_minute_delta[2], "query"), (per_minute_delta[3], "update"), (per_minute_delta[5], "delete"),
|
1128
|
+
(per_minute_delta[5], "getmore"), (per_minute_delta[6], "command")]))
|
1129
|
+
return check_levels(per_minute_delta[0], warning, critical, message)
|
1130
|
+
else:
|
1131
|
+
return exit_with_general_critical("problem reading data from temp file")
|
1132
|
+
|
1133
|
+
|
1134
|
+
def check_current_lock(con, host, warning, critical, perf_data):
|
1135
|
+
""" A function to get current lock percentage and not a global one, as check_lock function does"""
|
1136
|
+
warning = warning or 10
|
1137
|
+
critical = critical or 30
|
1138
|
+
data = get_server_status(con)
|
1139
|
+
|
1140
|
+
lockTime = float(data['globalLock']['lockTime'])
|
1141
|
+
totalTime = float(data['globalLock']['totalTime'])
|
1142
|
+
|
1143
|
+
err, delta = maintain_delta([totalTime, lockTime], host, "locktime")
|
1144
|
+
if err == 0:
|
1145
|
+
lock_percentage = delta[2] / delta[1] * 100 # lockTime/totalTime*100
|
1146
|
+
message = "Current Lock Percentage: %.2f%%" % lock_percentage
|
1147
|
+
message += performance_data(perf_data, [("%.2f" % lock_percentage, "current_lock_percentage", warning, critical)])
|
1148
|
+
return check_levels(lock_percentage, warning, critical, message)
|
1149
|
+
else:
|
1150
|
+
return exit_with_general_warning("problem reading data from temp file")
|
1151
|
+
|
1152
|
+
|
1153
|
+
def check_page_faults(con, host, warning, critical, perf_data):
|
1154
|
+
""" A function to get page_faults per second from the system"""
|
1155
|
+
warning = warning or 10
|
1156
|
+
critical = critical or 30
|
1157
|
+
data = get_server_status(con)
|
1158
|
+
|
1159
|
+
try:
|
1160
|
+
page_faults = float(data['extra_info']['page_faults'])
|
1161
|
+
except:
|
1162
|
+
# page_faults unsupported on the underlaying system
|
1163
|
+
return exit_with_general_critical("page_faults unsupported on the underlaying system")
|
1164
|
+
|
1165
|
+
err, delta = maintain_delta([page_faults], host, "page_faults")
|
1166
|
+
if err == 0:
|
1167
|
+
page_faults_ps = delta[1] / delta[0]
|
1168
|
+
message = "Page faults : %.2f ps" % page_faults_ps
|
1169
|
+
message += performance_data(perf_data, [("%.2f" % page_faults_ps, "page_faults_ps", warning, critical)])
|
1170
|
+
return check_levels(page_faults_ps, warning, critical, message)
|
1171
|
+
else:
|
1172
|
+
return exit_with_general_warning("problem reading data from temp file")
|
1173
|
+
|
1174
|
+
|
1175
|
+
def check_asserts(con, host, warning, critical, perf_data):
|
1176
|
+
""" A function to get asserts from the system"""
|
1177
|
+
warning = warning or 1
|
1178
|
+
critical = critical or 10
|
1179
|
+
data = get_server_status(con)
|
1180
|
+
|
1181
|
+
asserts = data['asserts']
|
1182
|
+
|
1183
|
+
#{ "regular" : 0, "warning" : 6, "msg" : 0, "user" : 12, "rollovers" : 0 }
|
1184
|
+
regular = asserts['regular']
|
1185
|
+
warning_asserts = asserts['warning']
|
1186
|
+
msg = asserts['msg']
|
1187
|
+
user = asserts['user']
|
1188
|
+
rollovers = asserts['rollovers']
|
1189
|
+
|
1190
|
+
err, delta = maintain_delta([regular, warning_asserts, msg, user, rollovers], host, "asserts")
|
1191
|
+
|
1192
|
+
if err == 0:
|
1193
|
+
if delta[5] != 0:
|
1194
|
+
#the number of rollovers were increased
|
1195
|
+
warning = -1 # no matter the metrics this situation should raise a warning
|
1196
|
+
# if this is normal rollover - the warning will not appear again, but if there will be a lot of asserts
|
1197
|
+
# the warning will stay for a long period of time
|
1198
|
+
# although this is not a usual situation
|
1199
|
+
|
1200
|
+
regular_ps = delta[1] / delta[0]
|
1201
|
+
warning_ps = delta[2] / delta[0]
|
1202
|
+
msg_ps = delta[3] / delta[0]
|
1203
|
+
user_ps = delta[4] / delta[0]
|
1204
|
+
rollovers_ps = delta[5] / delta[0]
|
1205
|
+
total_ps = regular_ps + warning_ps + msg_ps + user_ps
|
1206
|
+
message = "Total asserts : %.2f ps" % total_ps
|
1207
|
+
message += performance_data(perf_data, [(total_ps, "asserts_ps", warning, critical), (regular_ps, "regular"),
|
1208
|
+
(warning_ps, "warning"), (msg_ps, "msg"), (user_ps, "user")])
|
1209
|
+
return check_levels(total_ps, warning, critical, message)
|
1210
|
+
else:
|
1211
|
+
return exit_with_general_warning("problem reading data from temp file")
|
1212
|
+
|
1213
|
+
|
1214
|
+
def get_stored_primary_server_name(db):
|
1215
|
+
""" get the stored primary server name from db. """
|
1216
|
+
if "last_primary_server" in db.collection_names():
|
1217
|
+
stored_primary_server = db.last_primary_server.find_one()["server"]
|
1218
|
+
else:
|
1219
|
+
stored_primary_server = None
|
1220
|
+
|
1221
|
+
return stored_primary_server
|
1222
|
+
|
1223
|
+
|
1224
|
+
def check_replica_primary(con, host, warning, critical, perf_data, replicaset):
|
1225
|
+
""" A function to check if the primary server of a replica set has changed """
|
1226
|
+
if warning is None and critical is None:
|
1227
|
+
warning = 1
|
1228
|
+
warning = warning or 2
|
1229
|
+
critical = critical or 2
|
1230
|
+
|
1231
|
+
primary_status = 0
|
1232
|
+
message = "Primary server has not changed"
|
1233
|
+
db = con["nagios"]
|
1234
|
+
data = get_server_status(con)
|
1235
|
+
if replicaset != data['repl'].get('setName'):
|
1236
|
+
message = "Replica set requested: %s differs from the one found: %s" % (replicaset, data['repl'].get('setName'))
|
1237
|
+
primary_status = 2
|
1238
|
+
return check_levels(primary_status, warning, critical, message)
|
1239
|
+
current_primary = data['repl'].get('primary')
|
1240
|
+
saved_primary = get_stored_primary_server_name(db)
|
1241
|
+
if current_primary is None:
|
1242
|
+
current_primary = "None"
|
1243
|
+
if saved_primary is None:
|
1244
|
+
saved_primary = "None"
|
1245
|
+
if current_primary != saved_primary:
|
1246
|
+
last_primary_server_record = {"server": current_primary}
|
1247
|
+
db.last_primary_server.update({"_id": "last_primary"}, {"$set": last_primary_server_record}, upsert=True, safe=True)
|
1248
|
+
message = "Primary server has changed from %s to %s" % (saved_primary, current_primary)
|
1249
|
+
primary_status = 1
|
1250
|
+
return check_levels(primary_status, warning, critical, message)
|
1251
|
+
|
1252
|
+
|
1253
|
+
def check_page_faults(con, sample_time, warning, critical, perf_data):
|
1254
|
+
warning = warning or 10
|
1255
|
+
critical = critical or 20
|
1256
|
+
try:
|
1257
|
+
try:
|
1258
|
+
set_read_preference(con.admin)
|
1259
|
+
data1 = con.admin.command(pymongo.son_manipulator.SON([('serverStatus', 1)]))
|
1260
|
+
time.sleep(sample_time)
|
1261
|
+
data2 = con.admin.command(pymongo.son_manipulator.SON([('serverStatus', 1)]))
|
1262
|
+
except:
|
1263
|
+
data1 = con.admin.command(son.SON([('serverStatus', 1)]))
|
1264
|
+
time.sleep(sample_time)
|
1265
|
+
data2 = con.admin.command(son.SON([('serverStatus', 1)]))
|
1266
|
+
|
1267
|
+
try:
|
1268
|
+
#on linux servers only
|
1269
|
+
page_faults = (int(data2['extra_info']['page_faults']) - int(data1['extra_info']['page_faults'])) / sample_time
|
1270
|
+
except KeyError:
|
1271
|
+
print "WARNING - Can't get extra_info.page_faults counter from MongoDB"
|
1272
|
+
sys.exit(1)
|
1273
|
+
|
1274
|
+
message = "Page Faults: %i" % (page_faults)
|
1275
|
+
|
1276
|
+
message += performance_data(perf_data, [(page_faults, "page_faults", warning, critical)])
|
1277
|
+
check_levels(page_faults, warning, critical, message)
|
1278
|
+
|
1279
|
+
except Exception, e:
|
1280
|
+
exit_with_general_critical(e)
|
1281
|
+
|
1282
|
+
|
1283
|
+
def chunks_balance(con, database, collection, warning, critical):
|
1284
|
+
warning = warning or 10
|
1285
|
+
critical = critical or 20
|
1286
|
+
nsfilter = database + "." + collection
|
1287
|
+
try:
|
1288
|
+
try:
|
1289
|
+
set_read_preference(con.admin)
|
1290
|
+
col = con.config.chunks
|
1291
|
+
nscount = col.find({"ns": nsfilter}).count()
|
1292
|
+
shards = col.distinct("shard")
|
1293
|
+
|
1294
|
+
except:
|
1295
|
+
print "WARNING - Can't get chunks infos from MongoDB"
|
1296
|
+
sys.exit(1)
|
1297
|
+
|
1298
|
+
if nscount == 0:
|
1299
|
+
print "WARNING - Namespace %s is not sharded" % (nsfilter)
|
1300
|
+
sys.exit(1)
|
1301
|
+
|
1302
|
+
avgchunksnb = nscount / len(shards)
|
1303
|
+
warningnb = avgchunksnb * warning / 100
|
1304
|
+
criticalnb = avgchunksnb * critical / 100
|
1305
|
+
|
1306
|
+
for shard in shards:
|
1307
|
+
delta = abs(avgchunksnb - col.find({"ns": nsfilter, "shard": shard}).count())
|
1308
|
+
message = "Namespace: %s, Shard name: %s, Chunk delta: %i" % (nsfilter, shard, delta)
|
1309
|
+
|
1310
|
+
if delta >= criticalnb and delta > 0:
|
1311
|
+
print "CRITICAL - Chunks not well balanced " + message
|
1312
|
+
sys.exit(2)
|
1313
|
+
elif delta >= warningnb and delta > 0:
|
1314
|
+
print "WARNING - Chunks not well balanced " + message
|
1315
|
+
sys.exit(1)
|
1316
|
+
|
1317
|
+
print "OK - Chunks well balanced across shards"
|
1318
|
+
sys.exit(0)
|
1319
|
+
|
1320
|
+
except Exception, e:
|
1321
|
+
exit_with_general_critical(e)
|
1322
|
+
|
1323
|
+
print "OK - Chunks well balanced across shards"
|
1324
|
+
sys.exit(0)
|
1325
|
+
|
1326
|
+
|
1327
|
+
def check_connect_primary(con, warning, critical, perf_data):
|
1328
|
+
warning = warning or 3
|
1329
|
+
critical = critical or 6
|
1330
|
+
|
1331
|
+
try:
|
1332
|
+
try:
|
1333
|
+
set_read_preference(con.admin)
|
1334
|
+
data = con.admin.command(pymongo.son_manipulator.SON([('isMaster', 1)]))
|
1335
|
+
except:
|
1336
|
+
data = con.admin.command(son.SON([('isMaster', 1)]))
|
1337
|
+
|
1338
|
+
if data['ismaster'] == True:
|
1339
|
+
print "OK - This server is primary"
|
1340
|
+
return 0
|
1341
|
+
|
1342
|
+
phost = data['primary'].split(':')[0]
|
1343
|
+
pport = int(data['primary'].split(':')[1])
|
1344
|
+
start = time.time()
|
1345
|
+
|
1346
|
+
err, con = mongo_connect(phost, pport)
|
1347
|
+
if err != 0:
|
1348
|
+
return err
|
1349
|
+
|
1350
|
+
pconn_time = time.time() - start
|
1351
|
+
pconn_time = round(pconn_time, 0)
|
1352
|
+
message = "Connection to primary server " + data['primary'] + " took %i seconds" % pconn_time
|
1353
|
+
message += performance_data(perf_data, [(pconn_time, "connection_time", warning, critical)])
|
1354
|
+
|
1355
|
+
return check_levels(pconn_time, warning, critical, message)
|
1356
|
+
|
1357
|
+
except Exception, e:
|
1358
|
+
return exit_with_general_critical(e)
|
1359
|
+
|
1360
|
+
|
1361
|
+
def check_collection_state(con, database, collection):
|
1362
|
+
try:
|
1363
|
+
con[database][collection].find_one()
|
1364
|
+
print "OK - Collection %s.%s is reachable " % (database, collection)
|
1365
|
+
return 0
|
1366
|
+
|
1367
|
+
except Exception, e:
|
1368
|
+
return exit_with_general_critical(e)
|
1369
|
+
|
1370
|
+
|
1371
|
+
def check_row_count(con, database, collection, warning, critical, perf_data):
|
1372
|
+
try:
|
1373
|
+
count = con[database][collection].count()
|
1374
|
+
message = "Row count: %i" % (count)
|
1375
|
+
message += performance_data(perf_data, [(count, "row_count", warning, critical)])
|
1376
|
+
|
1377
|
+
return check_levels(count, warning, critical, message)
|
1378
|
+
|
1379
|
+
except Exception, e:
|
1380
|
+
return exit_with_general_critical(e)
|
1381
|
+
|
1382
|
+
|
1383
|
+
def build_file_name(host, action):
|
1384
|
+
#done this way so it will work when run independently and from shell
|
1385
|
+
module_name = re.match('(.*//*)*(.*)\..*', __file__).group(2)
|
1386
|
+
return "/tmp/" + module_name + "_data/" + host + "-" + action + ".data"
|
1387
|
+
|
1388
|
+
|
1389
|
+
def ensure_dir(f):
|
1390
|
+
d = os.path.dirname(f)
|
1391
|
+
if not os.path.exists(d):
|
1392
|
+
os.makedirs(d)
|
1393
|
+
|
1394
|
+
|
1395
|
+
def write_values(file_name, string):
|
1396
|
+
f = None
|
1397
|
+
try:
|
1398
|
+
f = open(file_name, 'w')
|
1399
|
+
except IOError, e:
|
1400
|
+
#try creating
|
1401
|
+
if (e.errno == 2):
|
1402
|
+
ensure_dir(file_name)
|
1403
|
+
f = open(file_name, 'w')
|
1404
|
+
else:
|
1405
|
+
raise IOError(e)
|
1406
|
+
f.write(string)
|
1407
|
+
f.close()
|
1408
|
+
return 0
|
1409
|
+
|
1410
|
+
|
1411
|
+
def read_values(file_name):
|
1412
|
+
data = None
|
1413
|
+
try:
|
1414
|
+
f = open(file_name, 'r')
|
1415
|
+
data = f.read()
|
1416
|
+
f.close()
|
1417
|
+
return 0, data
|
1418
|
+
except IOError, e:
|
1419
|
+
if (e.errno == 2):
|
1420
|
+
#no previous data
|
1421
|
+
return 1, ''
|
1422
|
+
except Exception, e:
|
1423
|
+
return 2, None
|
1424
|
+
|
1425
|
+
|
1426
|
+
def calc_delta(old, new):
|
1427
|
+
delta = []
|
1428
|
+
if (len(old) != len(new)):
|
1429
|
+
raise Exception("unequal number of parameters")
|
1430
|
+
for i in range(0, len(old)):
|
1431
|
+
val = float(new[i]) - float(old[i])
|
1432
|
+
if val < 0:
|
1433
|
+
val = new[i]
|
1434
|
+
delta.append(val)
|
1435
|
+
return 0, delta
|
1436
|
+
|
1437
|
+
|
1438
|
+
def maintain_delta(new_vals, host, action):
|
1439
|
+
file_name = build_file_name(host, action)
|
1440
|
+
err, data = read_values(file_name)
|
1441
|
+
old_vals = data.split(';')
|
1442
|
+
new_vals = [str(int(time.time()))] + new_vals
|
1443
|
+
delta = None
|
1444
|
+
try:
|
1445
|
+
err, delta = calc_delta(old_vals, new_vals)
|
1446
|
+
except:
|
1447
|
+
err = 2
|
1448
|
+
write_res = write_values(file_name, ";" . join(str(x) for x in new_vals))
|
1449
|
+
return err + write_res, delta
|
1450
|
+
|
1451
|
+
|
1452
|
+
def replication_get_time_diff(con):
|
1453
|
+
col = 'oplog.rs'
|
1454
|
+
local = con.local
|
1455
|
+
ol = local.system.namespaces.find_one({"name": "local.oplog.$main"})
|
1456
|
+
if ol:
|
1457
|
+
col = 'oplog.$main'
|
1458
|
+
firstc = local[col].find().sort("$natural", 1).limit(1)
|
1459
|
+
lastc = local[col].find().sort("$natural", -1).limit(1)
|
1460
|
+
first = firstc.next()
|
1461
|
+
last = lastc.next()
|
1462
|
+
tfirst = first["ts"]
|
1463
|
+
tlast = last["ts"]
|
1464
|
+
delta = tlast.time - tfirst.time
|
1465
|
+
return delta
|
1466
|
+
|
1467
|
+
#
|
1468
|
+
# main app
|
1469
|
+
#
|
1470
|
+
if __name__ == "__main__":
|
1471
|
+
sys.exit(main(sys.argv[1:]))
|