sensu-plugins-mongodb-wt 2.2.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: c1cebf301a79e401bb146049f805077f2b21ecd6d45a7a05c1173cc3ac804a66
4
+ data.tar.gz: d8b3757d1d6e2d7f813b4af224964949d4218c5354d45acbbe56ae5a5d178f41
5
+ SHA512:
6
+ metadata.gz: 0476135d459f5a6d3da206adfd0a9540fdeaefab5907200f3adfe27458884912408737426e1b02923c0ed54476bb2b7e757d4b75c8be2d1abaca21ae2a80351f
7
+ data.tar.gz: 3382cca4570b6987fa8eba474c62c06f56fde9b8ee497157c82123b7884c8e2388af6cb7104db56a85816cf08372757c9071efc6833bec3a1ddb8919147b9f80
data/CHANGELOG.md ADDED
@@ -0,0 +1,156 @@
1
+ # Change Log
2
+ This project adheres to [Semantic Versioning](http://semver.org/).
3
+
4
+ This CHANGELOG follows the format located [here](https://github.com/sensu-plugins/community/blob/master/HOW_WE_CHANGELOG.md)
5
+
6
+ ## [Unreleased]
7
+ ### Added
8
+ - added WiredTiger metrics (@jonathanschlue-as)
9
+
10
+ ## [2.1.0] - 2018-12-27
11
+ ### Added
12
+ - `bin/metrics-mongodb.rb`: added `--exclude-db-sizes` option that removes database sizes which can be quite large from the payload sent to message broker (rabbitmq) which often need special tuning for (@mdzidic)
13
+
14
+ ## [2.0.2] - 2018-03-17
15
+ ### Fixed
16
+ - renamed library file `metics` to `metrics` and updated all refrences in code to it (@majormoses)
17
+
18
+ ## [2.0.1] - 2017-10-19
19
+ ### Fixed
20
+ - updating the read preferences for `2.2`-`2.8` pymongo clients (@urg)
21
+
22
+ ## [2.0.0] - 2017-09-23
23
+ ### Breaking Change
24
+ - bumped requirement of `sensu-plugin` [to 2.0](https://github.com/sensu-plugins/sensu-plugin/blob/master/CHANGELOG.md#v200---2017-03-29) (@majormoses)
25
+
26
+ ### Fixed
27
+ - check-mongodb-metric.rb: make `--metric` required since it is (@majormoses)
28
+
29
+ ## [1.4.1] - 2017-09-23
30
+ ### Fixed
31
+ - Support for database size metrics (@fandrews)
32
+
33
+ ### Changed
34
+ - updated changelog guidelines location (@majormoses)
35
+
36
+ ## [1.4.0] - 2017-09-05
37
+ ### Added
38
+ - Support for returning replicaset state metrics (@naemono)
39
+ - Tests covering returning replicaset state metrics (@naemono)
40
+ - Ruby 2.4.1 testing
41
+
42
+ ## [1.3.0] - 2017-05-22
43
+ ### Added
44
+ - Support for database size metrics (@naemono)
45
+ - Tests covering returning database size metrics (@naemono)
46
+
47
+ ## [1.2.2] - 2017-05-08
48
+ ### Fixed
49
+ - `check-mongodb.py`: will now correctly crit on connection issues (@majormoses)
50
+ ## [1.2.1] - 2017-05-07
51
+ ### Fixed
52
+ - `check-mongodb.py`: fixed issue of param building with not/using ssl connections (@s-schweer)
53
+
54
+ ## [1.2.0] - 2017-03-06
55
+ ### Fixed
56
+ - `check-mongodb.py`: Set read preference for pymongo 2.2+ to fix 'General MongoDB Error: can't set attribute' (@boutetnico)
57
+ - `check-mongodb.py`: Fix mongo replication lag percent check showing password in plain text (@furbiesandbeans)
58
+ - `metrics-mongodb-replication.rb`: Sort replication members to ensure the primary is the first element (@gonzalo-radio)
59
+
60
+ ### Changed
61
+ - Update `mongo` gem to 2.4.1, which adds support for MongoDB 3.4 (@eheydrick)
62
+
63
+ ## [1.1.0] - 2016-10-17
64
+ ### Added
65
+ - Inclusion of check-mongodb-metrics.rb to perform checks against the same data metrics-mongodb.rb produces. (@stefano-pogliani)
66
+ - Inclusion of lib/sensu-plugins-mongodb/metics.rb to share metric collection logic. (@stefano-pogliani)
67
+ - Tests to the metrics processing shared code. (@stefano-pogliani)
68
+ - Support for SSL certificates for clients. (@b0d0nne11)
69
+ - Inclusion of metrics-mongodb-replication.rb to produce replication metrics including lag statistics (@stefano-pogliani)
70
+ - Updated metrics-mongodb.rb to include version checks to ensure execution in mongodb > 3.2.x (@RycroftSolutions)
71
+ - Additional metrics not included in original metrics-mongodb.rb (@RycroftSolutions)
72
+
73
+ ### Changed
74
+ - Moved most of metrics-mongodb.rb code to shared library. (@stefano-pogliani)
75
+ - MongoDB version checks to skip missing metrics. (@stefano-pogliani)
76
+ - Renamed some metrics to become standard with MongoDB 3.2 equivalent
77
+ (so checks/queries don't have to bother with version detection). (@stefano-pogliani)
78
+
79
+ ## [1.0.0] - 2016-06-03
80
+ ### Removed
81
+ - support for Rubies 1.9.3 and 2.0
82
+
83
+ ### Added
84
+ - support for Ruby 2.3
85
+
86
+ ### Changed
87
+ - Update to rubocop 0.40 and cleanup
88
+ - Update to mongo gem 2.2.x and bson 4.x for MongoDB 3.2 support
89
+
90
+ ### Fixed
91
+ - Long was added as a numeric type
92
+ - metrics-mongodb.rb: fix typo
93
+
94
+ ## [0.0.8] - 2016-03-04
95
+ ### Added
96
+ - Add a ruby wrapper script for check-mongodb.py
97
+
98
+ ### Changed
99
+ - Rubocop upgrade and cleanup
100
+
101
+ ## [0.0.7] - 2015-11-12
102
+ ### Fixed
103
+ - Stopped trying to gather indexCounters data from mongo 3 (metrics-mongodb.rb)
104
+
105
+ ### Changed
106
+ - Updated mongo gem to 1.12.3
107
+
108
+ ## [0.0.6] - 2015-10-13
109
+ ### Fixed
110
+ - Rename option to avoid naming conflict with class variable name
111
+ - Add message for replica set state 9 (rollback)
112
+ - Installation fix
113
+
114
+ ## [0.0.5] - 2015-09-04
115
+ ### Fixed
116
+ - Fixed non ssl mongo connections
117
+
118
+ ## [0.0.4] - 2015-08-12
119
+ ### Changed
120
+ - general gem cleanup
121
+ - bump rubocop
122
+
123
+ ## [0.0.3] - 2015-07-14
124
+ ### Changed
125
+ - updated sensu-plugin gem to 1.2.0
126
+
127
+ ## [0.0.2] - 2015-06-03
128
+ ### Fixed
129
+ - added binstubs
130
+
131
+ ### Changed
132
+ - removed cruft from /lib
133
+
134
+ ## 0.0.1 - 2015-05-20
135
+ ### Added
136
+ - initial release
137
+
138
+ [Unreleased]: https://github.com/sensu-plugins/sensu-plugins-mongodb/compare/2.1.0...HEAD
139
+ [2.1.0]: https://github.com/sensu-plugins/sensu-plugins-mongodb/compare/2.0.2...2.1.0
140
+ [2.0.2]: https://github.com/sensu-plugins/sensu-plugins-mongodb/compare/2.0.1...2.0.2
141
+ [2.0.1]: https://github.com/sensu-plugins/sensu-plugins-mongodb/compare/2.0.0...2.0.1
142
+ [2.0.0]: https://github.com/sensu-plugins/sensu-plugins-mongodb/compare/1.4.1...2.0.0
143
+ [1.4.1]: https://github.com/sensu-plugins/sensu-plugins-mongodb/compare/1.4.0...1.4.1
144
+ [1.4.0]: https://github.com/sensu-plugins/sensu-plugins-mongodb/compare/1.3.0...1.4.0
145
+ [1.3.0]: https://github.com/sensu-plugins/sensu-plugins-mongodb/compare/1.2.1...1.3.0
146
+ [1.2.1]: https://github.com/sensu-plugins/sensu-plugins-mongodb/compare/1.2.0...1.2.1
147
+ [1.2.0]: https://github.com/sensu-plugins/sensu-plugins-mongodb/compare/1.1.0...1.2.0
148
+ [1.1.0]: https://github.com/sensu-plugins/sensu-plugins-mongodb/compare/1.0.0...1.1.0
149
+ [1.0.0]: https://github.com/sensu-plugins/sensu-plugins-mongodb/compare/0.0.8...1.0.0
150
+ [0.0.8]: https://github.com/sensu-plugins/sensu-plugins-mongodb/compare/0.0.7...0.0.8
151
+ [0.0.7]: https://github.com/sensu-plugins/sensu-plugins-mongodb/compare/0.0.6...0.0.7
152
+ [0.0.6]: https://github.com/sensu-plugins/sensu-plugins-mongodb/compare/0.0.5...0.0.6
153
+ [0.0.5]: https://github.com/sensu-plugins/sensu-plugins-mongodb/compare/0.0.4...0.0.5
154
+ [0.0.4]: https://github.com/sensu-plugins/sensu-plugins-mongodb/compare/0.0.3...0.0.4
155
+ [0.0.3]: https://github.com/sensu-plugins/sensu-plugins-mongodb/compare/0.0.2...0.0.3
156
+ [0.0.2]: https://github.com/sensu-plugins/sensu-plugins-mongodb/compare/0.0.1...0.0.2
data/LICENSE ADDED
@@ -0,0 +1,22 @@
1
+ Copyright (c) 2015 Sensu-Plugins
2
+
3
+ MIT License
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining
6
+ a copy of this software and associated documentation files (the
7
+ "Software"), to deal in the Software without restriction, including
8
+ without limitation the rights to use, copy, modify, merge, publish,
9
+ distribute, sublicense, and/or sell copies of the Software, and to
10
+ permit persons to whom the Software is furnished to do so, subject to
11
+ the following conditions:
12
+
13
+ The above copyright notice and this permission notice shall be
14
+ included in all copies or substantial portions of the Software.
15
+
16
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
17
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
18
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
19
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
20
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
21
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
22
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,26 @@
1
+ ## Sensu-Plugins-mongodb
2
+
3
+ [![Build Status](https://travis-ci.org/sensu-plugins/sensu-plugins-mongodb.svg?branch=master)](https://travis-ci.org/sensu-plugins/sensu-plugins-mongodb)
4
+ [![Gem Version](https://badge.fury.io/rb/sensu-plugins-mongodb.svg)](http://badge.fury.io/rb/sensu-plugins-mongodb)
5
+ [![Code Climate](https://codeclimate.com/github/sensu-plugins/sensu-plugins-mongodb/badges/gpa.svg)](https://codeclimate.com/github/sensu-plugins/sensu-plugins-mongodb)
6
+ [![Test Coverage](https://codeclimate.com/github/sensu-plugins/sensu-plugins-mongodb/badges/coverage.svg)](https://codeclimate.com/github/sensu-plugins/sensu-plugins-mongodb)
7
+ [![Dependency Status](https://gemnasium.com/sensu-plugins/sensu-plugins-mongodb.svg)](https://gemnasium.com/sensu-plugins/sensu-plugins-mongodb)
8
+
9
+ ## Functionality
10
+
11
+ ## Files
12
+ * bin/check-mongodb.py
13
+ * bin/check-mongodb.rb - wrapper for check-mongodb.py
14
+ * bin/check-mongodb-metric.rb
15
+ * bin/metrics-mongodb.rb
16
+ * bin/metrics-mongodb-replication.rb
17
+
18
+ ## Usage
19
+
20
+ ## Installation
21
+
22
+ [Installation and Setup](http://sensu-plugins.io/docs/installation_instructions.html)
23
+
24
+ ## Notes
25
+
26
+ The `pymongo` python package needs to be installed to use `check-mongodb`
@@ -0,0 +1,144 @@
1
+ #! /usr/bin/env ruby
2
+ #
3
+ # check-mongodb-metric.rb
4
+ #
5
+ # DESCRIPTION:
6
+ #
7
+ # OUTPUT:
8
+ # plain text
9
+ #
10
+ # PLATFORMS:
11
+ # Linux
12
+ #
13
+ # DEPENDENCIES:
14
+ # gem: sensu-plugin
15
+ # gem: mongo
16
+ # gem: bson
17
+ # gem: bson_ext
18
+ #
19
+ # USAGE:
20
+ # #YELLOW
21
+ #
22
+ # NOTES:
23
+ #
24
+ # LICENSE:
25
+ # Copyright 2016 Conversocial https://github.com/conversocial
26
+ # Released under the same terms as Sensu (the MIT license); see LICENSE
27
+ # for details.
28
+ #
29
+
30
+ require 'sensu-plugin/check/cli'
31
+ require 'sensu-plugins-mongodb/metrics'
32
+ require 'mongo'
33
+ include Mongo
34
+
35
+ #
36
+ # Mongodb
37
+ #
38
+
39
+ class CheckMongodbMetric < Sensu::Plugin::Check::CLI
40
+ option :host,
41
+ description: 'MongoDB host',
42
+ long: '--host HOST',
43
+ default: 'localhost'
44
+
45
+ option :port,
46
+ description: 'MongoDB port',
47
+ long: '--port PORT',
48
+ default: 27_017
49
+
50
+ option :user,
51
+ description: 'MongoDB user',
52
+ long: '--user USER',
53
+ default: nil
54
+
55
+ option :password,
56
+ description: 'MongoDB password',
57
+ long: '--password PASSWORD',
58
+ default: nil
59
+
60
+ option :ssl,
61
+ description: 'Connect using SSL',
62
+ long: '--ssl',
63
+ default: false
64
+
65
+ option :ssl_cert,
66
+ description: 'The certificate file used to identify the local connection against mongod',
67
+ long: '--ssl-cert SSL_CERT',
68
+ default: ''
69
+
70
+ option :ssl_key,
71
+ description: 'The private key used to identify the local connection against mongod',
72
+ long: '--ssl-key SSL_KEY',
73
+ default: ''
74
+
75
+ option :ssl_ca_cert,
76
+ description: 'The set of concatenated CA certificates, which are used to validate certificates passed from the other end of the connection',
77
+ long: '--ssl-ca-cert SSL_CA_CERT',
78
+ default: ''
79
+
80
+ option :ssl_verify,
81
+ description: 'Whether or not to do peer certification validation',
82
+ long: '--ssl-verify',
83
+ default: false
84
+
85
+ option :debug,
86
+ description: 'Enable debug',
87
+ long: '--debug',
88
+ default: false
89
+
90
+ option :require_master,
91
+ description: 'Require the node to be a master node',
92
+ long: '--require-master',
93
+ default: false
94
+
95
+ option :metric,
96
+ description: 'Name of the metric to check',
97
+ long: '--metric METRIC',
98
+ short: '-m METRIC',
99
+ required: true
100
+
101
+ option :warn,
102
+ description: 'Warn if values are above this threshold',
103
+ short: '-w WARN',
104
+ proc: proc(&:to_i),
105
+ default: 0
106
+
107
+ option :crit,
108
+ description: 'Fail if values are above this threshold',
109
+ short: '-c CRIT',
110
+ proc: proc(&:to_i),
111
+ default: 0
112
+
113
+ def run
114
+ Mongo::Logger.logger.level = Logger::FATAL
115
+ @debug = config[:debug]
116
+ if @debug
117
+ Mongo::Logger.logger.level = Logger::DEBUG
118
+ config_debug = config.clone
119
+ config_debug[:password] = '***'
120
+ puts 'Arguments: ' + config_debug.inspect
121
+ end
122
+
123
+ # Get the metrics.
124
+ collector = SensuPluginsMongoDB::Metrics.new(config)
125
+ collector.connect_mongo_db('admin')
126
+ exit(1) if config[:require_master] && !collector.master?
127
+ metrics = collector.server_metrics
128
+
129
+ # Make sure the requested value is available.
130
+ unless metrics.key?(config[:metric])
131
+ unknown "Unable to find a value for metric '#{config[:metric]}'"
132
+ end
133
+
134
+ # Check the requested value against the thresholds.
135
+ value = metrics[config[:metric]]
136
+ if value >= config[:crit]
137
+ critical "The value of '#{config[:metric]}' exceeds #{config[:crit]}."
138
+ end
139
+ if value >= config[:warn]
140
+ warning "The value of '#{config[:metric]}' exceeds #{config[:warn]}."
141
+ end
142
+ ok "The value of '#{config[:metric]}' is below all threshold."
143
+ end
144
+ end
@@ -0,0 +1,1471 @@
1
+ #!/usr/bin/env python
2
+
3
+ #
4
+ # A MongoDB Nagios check script
5
+ #
6
+
7
+ # Script idea taken from a Tag1 script I found and I modified it a lot
8
+ #
9
+ # Main Author
10
+ # - Mike Zupan <mike@zcentric.com>
11
+ # Contributers
12
+ # - Frank Brandewiede <brande@travel-iq.com> <brande@bfiw.de> <brande@novolab.de>
13
+ # - Sam Perman <sam@brightcove.com>
14
+ # - Shlomo Priymak <shlomoid@gmail.com>
15
+ # - @jhoff909 on github
16
+ # - @jbraeuer on github
17
+ # - Dag Stockstad <dag.stockstad@gmail.com>
18
+ # - @Andor on github
19
+ # - Steven Richards - Captainkrtek on Github <sbrichards@mit.edu>
20
+ #
21
+
22
+ # License: BSD
23
+ # Copyright (c) 2012, Mike Zupan <mike@zcentric.com>
24
+ # All rights reserved.
25
+ # Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
26
+ #
27
+ # Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
28
+ # Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the
29
+ # documentation and/or other materials provided with the distribution.
30
+ # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
31
+ # THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS
32
+ # BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
33
+ # GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
34
+ # STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
35
+ #
36
+ # README: https://github.com/mzupan/nagios-plugin-mongodb/blob/master/LICENSE
37
+
38
+ # #RED
39
+ import sys
40
+ import time
41
+ import optparse
42
+ import textwrap
43
+ import re
44
+ import os
45
+
46
+ try:
47
+ import pymongo
48
+ except ImportError, e:
49
+ print e
50
+ sys.exit(2)
51
+
52
+ # As of pymongo v 1.9 the SON API is part of the BSON package, therefore attempt
53
+ # to import from there and fall back to pymongo in cases of older pymongo
54
+ if pymongo.version >= "1.9":
55
+ import bson.son as son
56
+ else:
57
+ import pymongo.son as son
58
+
59
+
60
+ #
61
+ # thanks to http://stackoverflow.com/a/1229667/72987
62
+ #
63
+ def optional_arg(arg_default):
64
+ def func(option, opt_str, value, parser):
65
+ if parser.rargs and not parser.rargs[0].startswith('-'):
66
+ val = parser.rargs[0]
67
+ parser.rargs.pop(0)
68
+ else:
69
+ val = arg_default
70
+ setattr(parser.values, option.dest, val)
71
+ return func
72
+
73
+
74
+ def performance_data(perf_data, params):
75
+ data = ''
76
+ if perf_data:
77
+ data = " |"
78
+ for p in params:
79
+ p += (None, None, None, None)
80
+ param, param_name, warning, critical = p[0:4]
81
+ data += "%s=%s" % (param_name, str(param))
82
+ if warning or critical:
83
+ warning = warning or 0
84
+ critical = critical or 0
85
+ data += ";%s;%s" % (warning, critical)
86
+
87
+ data += " "
88
+
89
+ return data
90
+
91
+
92
+ def numeric_type(param):
93
+ if ((type(param) == float or type(param) == int or type(param) == long or param == None)):
94
+ return True
95
+ return False
96
+
97
+
98
+ def check_levels(param, warning, critical, message, ok=[]):
99
+ if (numeric_type(critical) and numeric_type(warning)):
100
+ if param >= critical:
101
+ print "CRITICAL - " + message
102
+ sys.exit(2)
103
+ elif param >= warning:
104
+ print "WARNING - " + message
105
+ sys.exit(1)
106
+ else:
107
+ print "OK - " + message
108
+ sys.exit(0)
109
+ else:
110
+ if param in critical:
111
+ print "CRITICAL - " + message
112
+ sys.exit(2)
113
+
114
+ if param in warning:
115
+ print "WARNING - " + message
116
+ sys.exit(1)
117
+
118
+ if param in ok:
119
+ print "OK - " + message
120
+ sys.exit(0)
121
+
122
+ # unexpected param value
123
+ print "CRITICAL - Unexpected value : %d" % param + "; " + message
124
+ return 2
125
+
126
+
127
+ def get_server_status(con):
128
+ try:
129
+ set_read_preference(con.admin)
130
+ data = con.admin.command(pymongo.son_manipulator.SON([('serverStatus', 1)]))
131
+ except:
132
+ data = con.admin.command(son.SON([('serverStatus', 1)]))
133
+ return data
134
+
135
+
136
+ def main(argv):
137
+ p = optparse.OptionParser(conflict_handler="resolve", description="This Nagios plugin checks the health of mongodb.")
138
+
139
+ p.add_option('-H', '--host', action='store', type='string', dest='host', default='127.0.0.1', help='The hostname you want to connect to')
140
+ p.add_option('-P', '--port', action='store', type='int', dest='port', default=27017, help='The port mongodb is runnung on')
141
+ p.add_option('-u', '--user', action='store', type='string', dest='user', default=None, help='The username you want to login as')
142
+ p.add_option('-p', '--pass', action='store', type='string', dest='passwd', default=None, help='The password you want to use for that user')
143
+ p.add_option('-W', '--warning', action='store', dest='warning', default=None, help='The warning threshold we want to set')
144
+ p.add_option('-C', '--critical', action='store', dest='critical', default=None, help='The critical threshold we want to set')
145
+ p.add_option('-A', '--action', action='store', type='choice', dest='action', default='connect', help='The action you want to take',
146
+ choices=['connect', 'connections', 'replication_lag', 'replication_lag_percent', 'replset_state', 'memory', 'memory_mapped', 'lock',
147
+ 'flushing', 'last_flush_time', 'index_miss_ratio', 'databases', 'collections', 'database_size', 'database_indexes', 'collection_indexes', 'collection_size',
148
+ 'queues', 'oplog', 'journal_commits_in_wl', 'write_data_files', 'journaled', 'opcounters', 'current_lock', 'replica_primary', 'page_faults',
149
+ 'asserts', 'queries_per_second', 'page_faults', 'chunks_balance', 'connect_primary', 'collection_state', 'row_count', 'replset_quorum'])
150
+ p.add_option('--max-lag', action='store_true', dest='max_lag', default=False, help='Get max replication lag (for replication_lag action only)')
151
+ p.add_option('--mapped-memory', action='store_true', dest='mapped_memory', default=False, help='Get mapped memory instead of resident (if resident memory can not be read)')
152
+ p.add_option('-D', '--perf-data', action='store_true', dest='perf_data', default=False, help='Enable output of Nagios performance data')
153
+ p.add_option('-d', '--database', action='store', dest='database', default='admin', help='Specify the database to check')
154
+ p.add_option('--all-databases', action='store_true', dest='all_databases', default=False, help='Check all databases (action database_size)')
155
+ p.add_option('-s', '--ssl-enabled', dest='ssl_enabled', default=False, action='callback', callback=optional_arg(True), help='Connect using SSL')
156
+ p.add_option('-e', '--ssl-certfile', dest='ssl_certfile', default=None, action='store', help='The certificate file used to identify the local connection against mongod')
157
+ p.add_option('-k', '--ssl-keyfile', dest='ssl_keyfile', default=None, action='store', help='The private key used to identify the local connection against mongod')
158
+ p.add_option('-a', '--ssl-ca-certs', dest='ssl_ca_certs', default=None, action='store', help='The set of concatenated CA certificates, which are used to validate certificates passed from the other end of the connection')
159
+ p.add_option('-r', '--replicaset', dest='replicaset', default=None, action='callback', callback=optional_arg(True), help='Connect to replicaset')
160
+ p.add_option('-q', '--querytype', action='store', dest='query_type', default='query', help='The query type to check [query|insert|update|delete|getmore|command] from queries_per_second')
161
+ p.add_option('-c', '--collection', action='store', dest='collection', default='admin', help='Specify the collection to check')
162
+ p.add_option('-T', '--time', action='store', type='int', dest='sample_time', default=1, help='Time used to sample number of pages faults')
163
+
164
+ options, arguments = p.parse_args()
165
+ host = options.host
166
+ port = options.port
167
+ user = options.user
168
+ passwd = options.passwd
169
+ query_type = options.query_type
170
+ collection = options.collection
171
+ sample_time = options.sample_time
172
+ if (options.action == 'replset_state'):
173
+ warning = str(options.warning or "")
174
+ critical = str(options.critical or "")
175
+ else:
176
+ warning = float(options.warning or 0)
177
+ critical = float(options.critical or 0)
178
+
179
+ action = options.action
180
+ perf_data = options.perf_data
181
+ max_lag = options.max_lag
182
+ database = options.database
183
+ ssl_enabled = options.ssl_enabled
184
+ ssl_certfile = options.ssl_certfile
185
+ ssl_keyfile = options.ssl_keyfile
186
+ ssl_ca_certs = options.ssl_ca_certs
187
+ replicaset = options.replicaset
188
+
189
+ if action == 'replica_primary' and replicaset is None:
190
+ return "replicaset must be passed in when using replica_primary check"
191
+ elif not action == 'replica_primary' and replicaset:
192
+ return "passing a replicaset while not checking replica_primary does not work"
193
+
194
+ #
195
+ # moving the login up here and passing in the connection
196
+ #
197
+ start = time.time()
198
+ err, con = mongo_connect(host, port, ssl_enabled, ssl_certfile, ssl_keyfile, ssl_ca_certs, user, passwd, replicaset)
199
+ if err != 0:
200
+ return err
201
+
202
+ conn_time = time.time() - start
203
+ conn_time = round(conn_time, 0)
204
+
205
+ if action == "connections":
206
+ return check_connections(con, warning, critical, perf_data)
207
+ elif action == "replication_lag":
208
+ return check_rep_lag(con, host, port, warning, critical, False, perf_data, max_lag, user, passwd)
209
+ elif action == "replication_lag_percent":
210
+ return check_rep_lag(con, host, port, warning, critical, True, perf_data, max_lag, user, passwd)
211
+ elif action == "replset_state":
212
+ return check_replset_state(con, perf_data, warning, critical)
213
+ elif action == "memory":
214
+ return check_memory(con, warning, critical, perf_data, options.mapped_memory)
215
+ elif action == "memory_mapped":
216
+ return check_memory_mapped(con, warning, critical, perf_data)
217
+ elif action == "queues":
218
+ return check_queues(con, warning, critical, perf_data)
219
+ elif action == "lock":
220
+ return check_lock(con, warning, critical, perf_data)
221
+ elif action == "current_lock":
222
+ return check_current_lock(con, host, warning, critical, perf_data)
223
+ elif action == "flushing":
224
+ return check_flushing(con, warning, critical, True, perf_data)
225
+ elif action == "last_flush_time":
226
+ return check_flushing(con, warning, critical, False, perf_data)
227
+ elif action == "index_miss_ratio":
228
+ index_miss_ratio(con, warning, critical, perf_data)
229
+ elif action == "databases":
230
+ return check_databases(con, warning, critical, perf_data)
231
+ elif action == "collections":
232
+ return check_collections(con, warning, critical, perf_data)
233
+ elif action == "oplog":
234
+ return check_oplog(con, warning, critical, perf_data)
235
+ elif action == "journal_commits_in_wl":
236
+ return check_journal_commits_in_wl(con, warning, critical, perf_data)
237
+ elif action == "database_size":
238
+ if options.all_databases:
239
+ return check_all_databases_size(con, warning, critical, perf_data)
240
+ else:
241
+ return check_database_size(con, database, warning, critical, perf_data)
242
+ elif action == "database_indexes":
243
+ return check_database_indexes(con, database, warning, critical, perf_data)
244
+ elif action == "collection_indexes":
245
+ return check_collection_indexes(con, database, collection, warning, critical, perf_data)
246
+ elif action == "collection_size":
247
+ return check_collection_size(con, database, collection, warning, critical, perf_data)
248
+ elif action == "journaled":
249
+ return check_journaled(con, warning, critical, perf_data)
250
+ elif action == "write_data_files":
251
+ return check_write_to_datafiles(con, warning, critical, perf_data)
252
+ elif action == "opcounters":
253
+ return check_opcounters(con, host, warning, critical, perf_data)
254
+ elif action == "asserts":
255
+ return check_asserts(con, host, warning, critical, perf_data)
256
+ elif action == "replica_primary":
257
+ return check_replica_primary(con, host, warning, critical, perf_data, replicaset)
258
+ elif action == "queries_per_second":
259
+ return check_queries_per_second(con, query_type, warning, critical, perf_data)
260
+ elif action == "page_faults":
261
+ check_page_faults(con, sample_time, warning, critical, perf_data)
262
+ elif action == "chunks_balance":
263
+ chunks_balance(con, database, collection, warning, critical)
264
+ elif action == "connect_primary":
265
+ return check_connect_primary(con, warning, critical, perf_data)
266
+ elif action == "collection_state":
267
+ return check_collection_state(con, database, collection)
268
+ elif action == "row_count":
269
+ return check_row_count(con, database, collection, warning, critical, perf_data)
270
+ elif action == "replset_quorum":
271
+ return check_replset_quorum(con, perf_data)
272
+ else:
273
+ return check_connect(host, port, warning, critical, perf_data, user, passwd, conn_time)
274
+
275
+
276
+ def mongo_connect(host=None, port=None, ssl_enabled=False, ssl_certfile=None, ssl_keyfile=None, ssl_ca_certs=None, user=None, passwd=None, replica=None):
277
+ try:
278
+ # ssl connection for pymongo > 2.3
279
+ if pymongo.version >= "2.3":
280
+ if replica is None:
281
+ if ssl_enabled:
282
+ con = pymongo.MongoClient(host, port, ssl=ssl_enabled, ssl_certfile=ssl_certfile, ssl_keyfile=ssl_keyfile, ssl_ca_certs=ssl_ca_certs)
283
+ else:
284
+ con = pymongo.MongoClient(host, port)
285
+ else:
286
+ if ssl_enabled:
287
+ con = pymongo.Connection(host, port, read_preference=pymongo.ReadPreference.SECONDARY, ssl=ssl_enabled, ssl_certfile=ssl_certfile, ssl_keyfile=ssl_keyfile, ssl_ca_certs=ssl_ca_certs, replicaSet=replica, network_timeout=10)
288
+ else:
289
+ con = pymongo.Connection(host, port, read_preference=pymongo.ReadPreference.SECONDARY, replicaSet=replica, network_timeout=10)
290
+ try:
291
+ # https://api.mongodb.com/python/current/api/pymongo/mongo_client.html
292
+ # The ismaster command is cheap and does not require auth.
293
+ con.admin.command('ismaster', connectTimeoutMS=10000)
294
+ except Exception, e:
295
+ return exit_with_general_critical(e), None
296
+ else:
297
+ if replica is None:
298
+ con = pymongo.Connection(host, port, slave_okay=True, network_timeout=10)
299
+ else:
300
+ con = pymongo.Connection(host, port, slave_okay=True, network_timeout=10)
301
+ #con = pymongo.Connection(host, port, slave_okay=True, replicaSet=replica, network_timeout=10)
302
+
303
+ if user and passwd:
304
+ db = con["admin"]
305
+ if not db.authenticate(user, passwd):
306
+ sys.exit("Username/Password incorrect")
307
+ except Exception, e:
308
+ if isinstance(e, pymongo.errors.AutoReconnect) and str(e).find(" is an arbiter") != -1:
309
+ # We got a pymongo AutoReconnect exception that tells us we connected to an Arbiter Server
310
+ # This means: Arbiter is reachable and can answer requests/votes - this is all we need to know from an arbiter
311
+ print "OK - State: 7 (Arbiter)"
312
+ sys.exit(0)
313
+ return exit_with_general_critical(e), None
314
+ return 0, con
315
+
316
+
317
+ def exit_with_general_warning(e):
318
+ if isinstance(e, SystemExit):
319
+ return e
320
+ else:
321
+ print "WARNING - General MongoDB warning:", e
322
+ return 1
323
+
324
+
325
+ def exit_with_general_critical(e):
326
+ if isinstance(e, SystemExit):
327
+ return e
328
+ else:
329
+ print "CRITICAL - General MongoDB Error:", e
330
+ return 2
331
+
332
+
333
+ def set_read_preference(db):
334
+ if pymongo.version >= "2.2" and pymongo.version < "2.8":
335
+ pymongo.read_preferences.Secondary
336
+ else:
337
+ db.read_preference = pymongo.ReadPreference.SECONDARY
338
+
339
+
340
+ def check_connect(host, port, warning, critical, perf_data, user, passwd, conn_time):
341
+ warning = warning or 3
342
+ critical = critical or 6
343
+ message = "Connection took %i seconds" % conn_time
344
+ message += performance_data(perf_data, [(conn_time, "connection_time", warning, critical)])
345
+
346
+ return check_levels(conn_time, warning, critical, message)
347
+
348
+
349
+ def check_connections(con, warning, critical, perf_data):
350
+ warning = warning or 80
351
+ critical = critical or 95
352
+ try:
353
+ data = get_server_status(con)
354
+
355
+ current = float(data['connections']['current'])
356
+ available = float(data['connections']['available'])
357
+
358
+ used_percent = int(float(current / (available + current)) * 100)
359
+ message = "%i percent (%i of %i connections) used" % (used_percent, current, current + available)
360
+ message += performance_data(perf_data, [(used_percent, "used_percent", warning, critical),
361
+ (current, "current_connections"),
362
+ (available, "available_connections")])
363
+ return check_levels(used_percent, warning, critical, message)
364
+
365
+ except Exception, e:
366
+ return exit_with_general_critical(e)
367
+
368
+
369
+ def check_rep_lag(con, host, port, warning, critical, percent, perf_data, max_lag, user, passwd):
370
+ # Get mongo to tell us replica set member name when connecting locally
371
+ if "127.0.0.1" == host:
372
+ host = con.admin.command("ismaster","1")["me"].split(':')[0]
373
+
374
+ if percent:
375
+ warning = warning or 50
376
+ critical = critical or 75
377
+ else:
378
+ warning = warning or 600
379
+ critical = critical or 3600
380
+ rs_status = {}
381
+ slaveDelays = {}
382
+ try:
383
+ set_read_preference(con.admin)
384
+
385
+ # Get replica set status
386
+ try:
387
+ rs_status = con.admin.command("replSetGetStatus")
388
+ except pymongo.errors.OperationFailure, e:
389
+ if e.code == None and str(e).find('failed: not running with --replSet"'):
390
+ print "OK - Not running with replSet"
391
+ return 0
392
+
393
+ serverVersion = tuple(con.server_info()['version'].split('.'))
394
+ if serverVersion >= tuple("2.0.0".split(".")):
395
+ #
396
+ # check for version greater then 2.0
397
+ #
398
+ rs_conf = con.local.system.replset.find_one()
399
+ for member in rs_conf['members']:
400
+ if member.get('slaveDelay') is not None:
401
+ slaveDelays[member['host']] = member.get('slaveDelay')
402
+ else:
403
+ slaveDelays[member['host']] = 0
404
+
405
+ # Find the primary and/or the current node
406
+ primary_node = None
407
+ host_node = None
408
+
409
+ for member in rs_status["members"]:
410
+ if member["stateStr"] == "PRIMARY":
411
+ primary_node = member
412
+ if member["name"].split(':')[0] == host and int(member["name"].split(':')[1]) == port:
413
+ host_node = member
414
+
415
+ # Check if we're in the middle of an election and don't have a primary
416
+ if primary_node is None:
417
+ print "WARNING - No primary defined. In an election?"
418
+ return 1
419
+
420
+ # Check if we failed to find the current host
421
+ # below should never happen
422
+ if host_node is None:
423
+ print "CRITICAL - Unable to find host '" + host + "' in replica set."
424
+ return 2
425
+
426
+ # Is the specified host the primary?
427
+ if host_node["stateStr"] == "PRIMARY":
428
+ if max_lag == False:
429
+ print "OK - This is the primary."
430
+ return 0
431
+ else:
432
+ #get the maximal replication lag
433
+ data = ""
434
+ maximal_lag = 0
435
+ for member in rs_status['members']:
436
+ if not member['stateStr'] == "ARBITER":
437
+ lastSlaveOpTime = member['optimeDate']
438
+ replicationLag = abs(primary_node["optimeDate"] - lastSlaveOpTime).seconds - slaveDelays[member['name']]
439
+ data = data + member['name'] + " lag=%d;" % replicationLag
440
+ maximal_lag = max(maximal_lag, replicationLag)
441
+ if percent:
442
+ err, con = mongo_connect(primary_node['name'].split(':')[0], int(primary_node['name'].split(':')[1]), False, user=user, passwd=passwd)
443
+ if err != 0:
444
+ return err
445
+ primary_timediff = replication_get_time_diff(con)
446
+ maximal_lag = int(float(maximal_lag) / float(primary_timediff) * 100)
447
+ message = "Maximal lag is " + str(maximal_lag) + " percents"
448
+ message += performance_data(perf_data, [(maximal_lag, "replication_lag_percent", warning, critical)])
449
+ else:
450
+ message = "Maximal lag is " + str(maximal_lag) + " seconds"
451
+ message += performance_data(perf_data, [(maximal_lag, "replication_lag", warning, critical)])
452
+ return check_levels(maximal_lag, warning, critical, message)
453
+ elif host_node["stateStr"] == "ARBITER":
454
+ print "OK - This is an arbiter"
455
+ return 0
456
+
457
+ # Find the difference in optime between current node and PRIMARY
458
+
459
+ optime_lag = abs(primary_node["optimeDate"] - host_node["optimeDate"])
460
+
461
+ if host_node['name'] in slaveDelays:
462
+ slave_delay = slaveDelays[host_node['name']]
463
+ elif host_node['name'].endswith(':27017') and host_node['name'][:-len(":27017")] in slaveDelays:
464
+ slave_delay = slaveDelays[host_node['name'][:-len(":27017")]]
465
+ else:
466
+ raise Exception("Unable to determine slave delay for {0}".format(host_node['name']))
467
+
468
+ try: # work starting from python2.7
469
+ lag = optime_lag.total_seconds()
470
+ except:
471
+ lag = float(optime_lag.seconds + optime_lag.days * 24 * 3600)
472
+
473
+ if percent:
474
+ err, con = mongo_connect(primary_node['name'].split(':')[0], int(primary_node['name'].split(':')[1]), False, user=user, passwd=passwd)
475
+ if err != 0:
476
+ return err
477
+ primary_timediff = replication_get_time_diff(con)
478
+ if primary_timediff != 0:
479
+ lag = int(float(lag) / float(primary_timediff) * 100)
480
+ else:
481
+ lag = 0
482
+ message = "Lag is " + str(lag) + " percents"
483
+ message += performance_data(perf_data, [(lag, "replication_lag_percent", warning, critical)])
484
+ else:
485
+ message = "Lag is " + str(lag) + " seconds"
486
+ message += performance_data(perf_data, [(lag, "replication_lag", warning, critical)])
487
+ return check_levels(lag, warning + slaveDelays[host_node['name']], critical + slaveDelays[host_node['name']], message)
488
+ else:
489
+ #
490
+ # less than 2.0 check
491
+ #
492
+ # Get replica set status
493
+ rs_status = con.admin.command("replSetGetStatus")
494
+
495
+ # Find the primary and/or the current node
496
+ primary_node = None
497
+ host_node = None
498
+ for member in rs_status["members"]:
499
+ if member["stateStr"] == "PRIMARY":
500
+ primary_node = (member["name"], member["optimeDate"])
501
+ if member["name"].split(":")[0].startswith(host):
502
+ host_node = member
503
+
504
+ # Check if we're in the middle of an election and don't have a primary
505
+ if primary_node is None:
506
+ print "WARNING - No primary defined. In an election?"
507
+ sys.exit(1)
508
+
509
+ # Is the specified host the primary?
510
+ if host_node["stateStr"] == "PRIMARY":
511
+ print "OK - This is the primary."
512
+ sys.exit(0)
513
+
514
+ # Find the difference in optime between current node and PRIMARY
515
+ optime_lag = abs(primary_node[1] - host_node["optimeDate"])
516
+ lag = optime_lag.seconds
517
+ if percent:
518
+ err, con = mongo_connect(primary_node['name'].split(':')[0], int(primary_node['name'].split(':')[1]))
519
+ if err != 0:
520
+ return err
521
+ primary_timediff = replication_get_time_diff(con)
522
+ lag = int(float(lag) / float(primary_timediff) * 100)
523
+ message = "Lag is " + str(lag) + " percents"
524
+ message += performance_data(perf_data, [(lag, "replication_lag_percent", warning, critical)])
525
+ else:
526
+ message = "Lag is " + str(lag) + " seconds"
527
+ message += performance_data(perf_data, [(lag, "replication_lag", warning, critical)])
528
+ return check_levels(lag, warning, critical, message)
529
+
530
+ except Exception, e:
531
+ return exit_with_general_critical(e)
532
+
533
+
534
+ def check_memory(con, warning, critical, perf_data, mapped_memory):
535
+ #
536
+ # These thresholds are basically meaningless, and must be customized to your system's ram
537
+ #
538
+ warning = warning or 8
539
+ critical = critical or 16
540
+ try:
541
+ data = get_server_status(con)
542
+ if not data['mem']['supported'] and not mapped_memory:
543
+ print "OK - Platform not supported for memory info"
544
+ return 0
545
+ #
546
+ # convert to gigs
547
+ #
548
+ message = "Memory Usage:"
549
+ try:
550
+ mem_resident = float(data['mem']['resident']) / 1024.0
551
+ message += " %.2fGB resident," % (mem_resident)
552
+ except:
553
+ mem_resident = 0
554
+ message += " resident unsupported,"
555
+ try:
556
+ mem_virtual = float(data['mem']['virtual']) / 1024.0
557
+ message += " %.2fGB virtual," % mem_virtual
558
+ except:
559
+ mem_virtual = 0
560
+ message += " virtual unsupported,"
561
+ try:
562
+ mem_mapped = float(data['mem']['mapped']) / 1024.0
563
+ message += " %.2fGB mapped," % mem_mapped
564
+ except:
565
+ mem_mapped = 0
566
+ message += " mapped unsupported,"
567
+ try:
568
+ mem_mapped_journal = float(data['mem']['mappedWithJournal']) / 1024.0
569
+ message += " %.2fGB mappedWithJournal" % mem_mapped_journal
570
+ except:
571
+ mem_mapped_journal = 0
572
+ message += performance_data(perf_data, [("%.2f" % mem_resident, "memory_usage", warning, critical),
573
+ ("%.2f" % mem_mapped, "memory_mapped"), ("%.2f" % mem_virtual, "memory_virtual"), ("%.2f" % mem_mapped_journal, "mappedWithJournal")])
574
+ #added for unsupported systems like Solaris
575
+ if mapped_memory and mem_resident == 0:
576
+ return check_levels(mem_mapped, warning, critical, message)
577
+ else:
578
+ return check_levels(mem_resident, warning, critical, message)
579
+
580
+ except Exception, e:
581
+ return exit_with_general_critical(e)
582
+
583
+
584
+ def check_memory_mapped(con, warning, critical, perf_data):
585
+ #
586
+ # These thresholds are basically meaningless, and must be customized to your application
587
+ #
588
+ warning = warning or 8
589
+ critical = critical or 16
590
+ try:
591
+ data = get_server_status(con)
592
+ if not data['mem']['supported']:
593
+ print "OK - Platform not supported for memory info"
594
+ return 0
595
+ #
596
+ # convert to gigs
597
+ #
598
+ message = "Memory Usage:"
599
+ try:
600
+ mem_mapped = float(data['mem']['mapped']) / 1024.0
601
+ message += " %.2fGB mapped," % mem_mapped
602
+ except:
603
+ mem_mapped = -1
604
+ message += " mapped unsupported,"
605
+ try:
606
+ mem_mapped_journal = float(data['mem']['mappedWithJournal']) / 1024.0
607
+ message += " %.2fGB mappedWithJournal" % mem_mapped_journal
608
+ except:
609
+ mem_mapped_journal = 0
610
+ message += performance_data(perf_data, [("%.2f" % mem_mapped, "memory_mapped"), ("%.2f" % mem_mapped_journal, "mappedWithJournal")])
611
+
612
+ if not mem_mapped == -1:
613
+ return check_levels(mem_mapped, warning, critical, message)
614
+ else:
615
+ print "OK - Server does not provide mem.mapped info"
616
+ return 0
617
+
618
+ except Exception, e:
619
+ return exit_with_general_critical(e)
620
+
621
+
622
+ def check_lock(con, warning, critical, perf_data):
623
+ warning = warning or 10
624
+ critical = critical or 30
625
+ try:
626
+ data = get_server_status(con)
627
+ #
628
+ # calculate percentage
629
+ #
630
+ lock_percentage = float(data['globalLock']['lockTime']) / float(data['globalLock']['totalTime']) * 100
631
+ message = "Lock Percentage: %.2f%%" % lock_percentage
632
+ message += performance_data(perf_data, [("%.2f" % lock_percentage, "lock_percentage", warning, critical)])
633
+ return check_levels(lock_percentage, warning, critical, message)
634
+
635
+ except Exception, e:
636
+ return exit_with_general_critical(e)
637
+
638
+
639
+ def check_flushing(con, warning, critical, avg, perf_data):
640
+ #
641
+ # These thresholds mean it's taking 5 seconds to perform a background flush to issue a warning
642
+ # and 10 seconds to issue a critical.
643
+ #
644
+ warning = warning or 5000
645
+ critical = critical or 15000
646
+ try:
647
+ data = get_server_status(con)
648
+ if avg:
649
+ flush_time = float(data['backgroundFlushing']['average_ms'])
650
+ stat_type = "Average"
651
+ else:
652
+ flush_time = float(data['backgroundFlushing']['last_ms'])
653
+ stat_type = "Last"
654
+
655
+ message = "%s Flush Time: %.2fms" % (stat_type, flush_time)
656
+ message += performance_data(perf_data, [("%.2fms" % flush_time, "%s_flush_time" % stat_type.lower(), warning, critical)])
657
+
658
+ return check_levels(flush_time, warning, critical, message)
659
+
660
+ except Exception, e:
661
+ return exit_with_general_critical(e)
662
+
663
+
664
+ def index_miss_ratio(con, warning, critical, perf_data):
665
+ warning = warning or 10
666
+ critical = critical or 30
667
+ try:
668
+ data = get_server_status(con)
669
+
670
+ try:
671
+ serverVersion = tuple(con.server_info()['version'].split('.'))
672
+ if serverVersion >= tuple("2.4.0".split(".")):
673
+ miss_ratio = float(data['indexCounters']['missRatio'])
674
+ else:
675
+ miss_ratio = float(data['indexCounters']['btree']['missRatio'])
676
+ except KeyError:
677
+ not_supported_msg = "not supported on this platform"
678
+ if data['indexCounters'].has_key('note'):
679
+ print "OK - MongoDB says: " + not_supported_msg
680
+ return 0
681
+ else:
682
+ print "WARNING - Can't get counter from MongoDB"
683
+ return 1
684
+
685
+ message = "Miss Ratio: %.2f" % miss_ratio
686
+ message += performance_data(perf_data, [("%.2f" % miss_ratio, "index_miss_ratio", warning, critical)])
687
+
688
+ return check_levels(miss_ratio, warning, critical, message)
689
+
690
+ except Exception, e:
691
+ return exit_with_general_critical(e)
692
+
693
+ def check_replset_quorum(con, perf_data):
694
+ db = con['admin']
695
+ warning = 1
696
+ critical = 2
697
+ primary = 0
698
+
699
+ try:
700
+ rs_members = db.command("replSetGetStatus")['members']
701
+
702
+ for member in rs_members:
703
+ if member['state'] == 1:
704
+ primary += 1
705
+
706
+ if primary == 1:
707
+ state = 0
708
+ message = "Cluster is quorate"
709
+ else:
710
+ state = 2
711
+ message = "Cluster is not quorate and cannot operate"
712
+
713
+ return check_levels(state, warning, critical, message)
714
+ except Exception, e:
715
+ return exit_with_general_critical(e)
716
+
717
+
718
+
719
+ def check_replset_state(con, perf_data, warning="", critical=""):
720
+ try:
721
+ warning = [int(x) for x in warning.split(",")]
722
+ except:
723
+ warning = [0, 3, 5, 9]
724
+ try:
725
+ critical = [int(x) for x in critical.split(",")]
726
+ except:
727
+ critical = [8, 4, -1]
728
+
729
+ ok = range(-1, 8) # should include the range of all posiible values
730
+ try:
731
+ try:
732
+ try:
733
+ set_read_preference(con.admin)
734
+ data = con.admin.command(pymongo.son_manipulator.SON([('replSetGetStatus', 1)]))
735
+ except:
736
+ data = con.admin.command(son.SON([('replSetGetStatus', 1)]))
737
+ state = int(data['myState'])
738
+ except pymongo.errors.OperationFailure, e:
739
+ if e.code == None and str(e).find('failed: not running with --replSet"'):
740
+ state = -1
741
+
742
+ if state == 8:
743
+ message = "State: %i (Down)" % state
744
+ elif state == 4:
745
+ message = "State: %i (Fatal error)" % state
746
+ elif state == 0:
747
+ message = "State: %i (Starting up, phase1)" % state
748
+ elif state == 3:
749
+ message = "State: %i (Recovering)" % state
750
+ elif state == 5:
751
+ message = "State: %i (Starting up, phase2)" % state
752
+ elif state == 1:
753
+ message = "State: %i (Primary)" % state
754
+ elif state == 2:
755
+ message = "State: %i (Secondary)" % state
756
+ elif state == 7:
757
+ message = "State: %i (Arbiter)" % state
758
+ elif state == 9:
759
+ message = "State: %i (Rollback)" % state
760
+ elif state == -1:
761
+ message = "Not running with replSet"
762
+ else:
763
+ message = "State: %i (Unknown state)" % state
764
+ message += performance_data(perf_data, [(state, "state")])
765
+ return check_levels(state, warning, critical, message, ok)
766
+ except Exception, e:
767
+ return exit_with_general_critical(e)
768
+
769
+
770
+ def check_databases(con, warning, critical, perf_data=None):
771
+ try:
772
+ try:
773
+ set_read_preference(con.admin)
774
+ data = con.admin.command(pymongo.son_manipulator.SON([('listDatabases', 1)]))
775
+ except:
776
+ data = con.admin.command(son.SON([('listDatabases', 1)]))
777
+
778
+ count = len(data['databases'])
779
+ message = "Number of DBs: %.0f" % count
780
+ message += performance_data(perf_data, [(count, "databases", warning, critical, message)])
781
+ return check_levels(count, warning, critical, message)
782
+ except Exception, e:
783
+ return exit_with_general_critical(e)
784
+
785
+
786
+ def check_collections(con, warning, critical, perf_data=None):
787
+ try:
788
+ try:
789
+ set_read_preference(con.admin)
790
+ data = con.admin.command(pymongo.son_manipulator.SON([('listDatabases', 1)]))
791
+ except:
792
+ data = con.admin.command(son.SON([('listDatabases', 1)]))
793
+
794
+ count = 0
795
+ for db in data['databases']:
796
+ dbase = con[db['name']]
797
+ set_read_preference(dbase)
798
+ count += len(dbase.collection_names())
799
+
800
+ message = "Number of collections: %.0f" % count
801
+ message += performance_data(perf_data, [(count, "collections", warning, critical, message)])
802
+ return check_levels(count, warning, critical, message)
803
+
804
+ except Exception, e:
805
+ return exit_with_general_critical(e)
806
+
807
+
808
+ def check_all_databases_size(con, warning, critical, perf_data):
809
+ warning = warning or 100
810
+ critical = critical or 1000
811
+ try:
812
+ set_read_preference(con.admin)
813
+ all_dbs_data = con.admin.command(pymongo.son_manipulator.SON([('listDatabases', 1)]))
814
+ except:
815
+ all_dbs_data = con.admin.command(son.SON([('listDatabases', 1)]))
816
+
817
+ total_storage_size = 0
818
+ message = ""
819
+ perf_data_param = [()]
820
+ for db in all_dbs_data['databases']:
821
+ database = db['name']
822
+ data = con[database].command('dbstats')
823
+ storage_size = round(data['storageSize'] / 1024 / 1024, 1)
824
+ message += "; Database %s size: %.0f MB" % (database, storage_size)
825
+ perf_data_param.append((storage_size, database + "_database_size"))
826
+ total_storage_size += storage_size
827
+
828
+ perf_data_param[0] = (total_storage_size, "total_size", warning, critical)
829
+ message += performance_data(perf_data, perf_data_param)
830
+ message = "Total size: %.0f MB" % total_storage_size + message
831
+ return check_levels(total_storage_size, warning, critical, message)
832
+
833
+
834
+ def check_database_size(con, database, warning, critical, perf_data):
835
+ warning = warning or 100
836
+ critical = critical or 1000
837
+ perfdata = ""
838
+ try:
839
+ set_read_preference(con.admin)
840
+ data = con[database].command('dbstats')
841
+ storage_size = data['storageSize'] / 1024 / 1024
842
+ if perf_data:
843
+ perfdata += " | database_size=%i;%i;%i" % (storage_size, warning, critical)
844
+ #perfdata += " database=%s" %(database)
845
+
846
+ if storage_size >= critical:
847
+ print "CRITICAL - Database size: %.0f MB, Database: %s%s" % (storage_size, database, perfdata)
848
+ return 2
849
+ elif storage_size >= warning:
850
+ print "WARNING - Database size: %.0f MB, Database: %s%s" % (storage_size, database, perfdata)
851
+ return 1
852
+ else:
853
+ print "OK - Database size: %.0f MB, Database: %s%s" % (storage_size, database, perfdata)
854
+ return 0
855
+ except Exception, e:
856
+ return exit_with_general_critical(e)
857
+
858
+
859
+ def check_database_indexes(con, database, warning, critical, perf_data):
860
+ #
861
+ # These thresholds are basically meaningless, and must be customized to your application
862
+ #
863
+ warning = warning or 100
864
+ critical = critical or 1000
865
+ perfdata = ""
866
+ try:
867
+ set_read_preference(con.admin)
868
+ data = con[database].command('dbstats')
869
+ index_size = data['indexSize'] / 1024 / 1024
870
+ if perf_data:
871
+ perfdata += " | database_indexes=%i;%i;%i" % (index_size, warning, critical)
872
+
873
+ if index_size >= critical:
874
+ print "CRITICAL - %s indexSize: %.0f MB %s" % (database, index_size, perfdata)
875
+ return 2
876
+ elif index_size >= warning:
877
+ print "WARNING - %s indexSize: %.0f MB %s" % (database, index_size, perfdata)
878
+ return 1
879
+ else:
880
+ print "OK - %s indexSize: %.0f MB %s" % (database, index_size, perfdata)
881
+ return 0
882
+ except Exception, e:
883
+ return exit_with_general_critical(e)
884
+
885
+
886
+ def check_collection_indexes(con, database, collection, warning, critical, perf_data):
887
+ #
888
+ # These thresholds are basically meaningless, and must be customized to your application
889
+ #
890
+ warning = warning or 100
891
+ critical = critical or 1000
892
+ perfdata = ""
893
+ try:
894
+ set_read_preference(con.admin)
895
+ data = con[database].command('collstats', collection)
896
+ total_index_size = data['totalIndexSize'] / 1024 / 1024
897
+ if perf_data:
898
+ perfdata += " | collection_indexes=%i;%i;%i" % (total_index_size, warning, critical)
899
+
900
+ if total_index_size >= critical:
901
+ print "CRITICAL - %s.%s totalIndexSize: %.0f MB %s" % (database, collection, total_index_size, perfdata)
902
+ return 2
903
+ elif total_index_size >= warning:
904
+ print "WARNING - %s.%s totalIndexSize: %.0f MB %s" % (database, collection, total_index_size, perfdata)
905
+ return 1
906
+ else:
907
+ print "OK - %s.%s totalIndexSize: %.0f MB %s" % (database, collection, total_index_size, perfdata)
908
+ return 0
909
+ except Exception, e:
910
+ return exit_with_general_critical(e)
911
+
912
+
913
+ def check_queues(con, warning, critical, perf_data):
914
+ warning = warning or 10
915
+ critical = critical or 30
916
+ try:
917
+ data = get_server_status(con)
918
+
919
+ total_queues = float(data['globalLock']['currentQueue']['total'])
920
+ readers_queues = float(data['globalLock']['currentQueue']['readers'])
921
+ writers_queues = float(data['globalLock']['currentQueue']['writers'])
922
+ message = "Current queue is : total = %d, readers = %d, writers = %d" % (total_queues, readers_queues, writers_queues)
923
+ message += performance_data(perf_data, [(total_queues, "total_queues", warning, critical), (readers_queues, "readers_queues"), (writers_queues, "writers_queues")])
924
+ return check_levels(total_queues, warning, critical, message)
925
+
926
+ except Exception, e:
927
+ return exit_with_general_critical(e)
928
+
929
+ def check_collection_size(con, database, collection, warning, critical, perf_data):
930
+ warning = warning or 100
931
+ critical = critical or 1000
932
+ perfdata = ""
933
+ try:
934
+ set_read_preference(con.admin)
935
+ data = con[database].command('collstats', collection)
936
+ size = data['size'] / 1024 / 1024
937
+ if perf_data:
938
+ perfdata += " | collection_size=%i;%i;%i" % (size, warning, critical)
939
+
940
+ if size >= critical:
941
+ print "CRITICAL - %s.%s size: %.0f MB %s" % (database, collection, size, perfdata)
942
+ return 2
943
+ elif size >= warning:
944
+ print "WARNING - %s.%s size: %.0f MB %s" % (database, collection, size, perfdata)
945
+ return 1
946
+ else:
947
+ print "OK - %s.%s size: %.0f MB %s" % (database, collection, size, perfdata)
948
+ return 0
949
+ except Exception, e:
950
+ return exit_with_general_critical(e)
951
+
952
+ def check_queries_per_second(con, query_type, warning, critical, perf_data):
953
+ warning = warning or 250
954
+ critical = critical or 500
955
+
956
+ if query_type not in ['insert', 'query', 'update', 'delete', 'getmore', 'command']:
957
+ return exit_with_general_critical("The query type of '%s' is not valid" % query_type)
958
+
959
+ try:
960
+ db = con.local
961
+ data = get_server_status(con)
962
+
963
+ # grab the count
964
+ num = int(data['opcounters'][query_type])
965
+
966
+ # do the math
967
+ last_count = db.nagios_check.find_one({'check': 'query_counts'})
968
+ try:
969
+ ts = int(time.time())
970
+ diff_query = num - last_count['data'][query_type]['count']
971
+ diff_ts = ts - last_count['data'][query_type]['ts']
972
+
973
+ query_per_sec = float(diff_query) / float(diff_ts)
974
+
975
+ # update the count now
976
+ db.nagios_check.update({u'_id': last_count['_id']}, {'$set': {"data.%s" % query_type: {'count': num, 'ts': int(time.time())}}})
977
+
978
+ message = "Queries / Sec: %f" % query_per_sec
979
+ message += performance_data(perf_data, [(query_per_sec, "%s_per_sec" % query_type, warning, critical, message)])
980
+ except KeyError:
981
+ #
982
+ # since it is the first run insert it
983
+ query_per_sec = 0
984
+ message = "First run of check.. no data"
985
+ db.nagios_check.update({u'_id': last_count['_id']}, {'$set': {"data.%s" % query_type: {'count': num, 'ts': int(time.time())}}})
986
+ except TypeError:
987
+ #
988
+ # since it is the first run insert it
989
+ query_per_sec = 0
990
+ message = "First run of check.. no data"
991
+ db.nagios_check.insert({'check': 'query_counts', 'data': {query_type: {'count': num, 'ts': int(time.time())}}})
992
+
993
+ return check_levels(query_per_sec, warning, critical, message)
994
+
995
+ except Exception, e:
996
+ return exit_with_general_critical(e)
997
+
998
+
999
+ def check_oplog(con, warning, critical, perf_data):
1000
+ """ Checking the oplog time - the time of the log currntly saved in the oplog collection
1001
+ defaults:
1002
+ critical 4 hours
1003
+ warning 24 hours
1004
+ those can be changed as usual with -C and -W parameters"""
1005
+ warning = warning or 24
1006
+ critical = critical or 4
1007
+ try:
1008
+ db = con.local
1009
+ ol = db.system.namespaces.find_one({"name": "local.oplog.rs"})
1010
+ if (db.system.namespaces.find_one({"name": "local.oplog.rs"}) != None):
1011
+ oplog = "oplog.rs"
1012
+ else:
1013
+ ol = db.system.namespaces.find_one({"name": "local.oplog.$main"})
1014
+ if (db.system.namespaces.find_one({"name": "local.oplog.$main"}) != None):
1015
+ oplog = "oplog.$main"
1016
+ else:
1017
+ message = "neither master/slave nor replica set replication detected"
1018
+ return check_levels(None, warning, critical, message)
1019
+
1020
+ try:
1021
+ set_read_preference(con.admin)
1022
+ data = con.local.command(pymongo.son_manipulator.SON([('collstats', oplog)]))
1023
+ except:
1024
+ data = con.admin.command(son.SON([('collstats', oplog)]))
1025
+
1026
+ ol_size = data['size']
1027
+ ol_storage_size = data['storageSize']
1028
+ ol_used_storage = int(float(ol_size) / ol_storage_size * 100 + 1)
1029
+ ol = con.local[oplog]
1030
+ firstc = ol.find().sort("$natural", pymongo.ASCENDING).limit(1)[0]['ts']
1031
+ lastc = ol.find().sort("$natural", pymongo.DESCENDING).limit(1)[0]['ts']
1032
+ time_in_oplog = (lastc.as_datetime() - firstc.as_datetime())
1033
+ message = "Oplog saves " + str(time_in_oplog) + " %d%% used" % ol_used_storage
1034
+ try: # work starting from python2.7
1035
+ hours_in_oplog = time_in_oplog.total_seconds() / 60 / 60
1036
+ except:
1037
+ hours_in_oplog = float(time_in_oplog.seconds + time_in_oplog.days * 24 * 3600) / 60 / 60
1038
+ approx_level = hours_in_oplog * 100 / ol_used_storage
1039
+ message += performance_data(perf_data, [("%.2f" % hours_in_oplog, 'oplog_time', warning, critical), ("%.2f " % approx_level, 'oplog_time_100_percent_used')])
1040
+ return check_levels(-approx_level, -warning, -critical, message)
1041
+
1042
+ except Exception, e:
1043
+ return exit_with_general_critical(e)
1044
+
1045
+
1046
+ def check_journal_commits_in_wl(con, warning, critical, perf_data):
1047
+ """ Checking the number of commits which occurred in the db's write lock.
1048
+ Most commits are performed outside of this lock; committed while in the write lock is undesirable.
1049
+ Under very high write situations it is normal for this value to be nonzero. """
1050
+
1051
+ warning = warning or 10
1052
+ critical = critical or 40
1053
+ try:
1054
+ data = get_server_status(con)
1055
+ j_commits_in_wl = data['dur']['commitsInWriteLock']
1056
+ message = "Journal commits in DB write lock : %d" % j_commits_in_wl
1057
+ message += performance_data(perf_data, [(j_commits_in_wl, "j_commits_in_wl", warning, critical)])
1058
+ return check_levels(j_commits_in_wl, warning, critical, message)
1059
+
1060
+ except Exception, e:
1061
+ return exit_with_general_critical(e)
1062
+
1063
+
1064
+ def check_journaled(con, warning, critical, perf_data):
1065
+ """ Checking the average amount of data in megabytes written to the recovery log in the last four seconds"""
1066
+
1067
+ warning = warning or 20
1068
+ critical = critical or 40
1069
+ try:
1070
+ data = get_server_status(con)
1071
+ journaled = data['dur']['journaledMB']
1072
+ message = "Journaled : %.2f MB" % journaled
1073
+ message += performance_data(perf_data, [("%.2f" % journaled, "journaled", warning, critical)])
1074
+ return check_levels(journaled, warning, critical, message)
1075
+
1076
+ except Exception, e:
1077
+ return exit_with_general_critical(e)
1078
+
1079
+
1080
+ def check_write_to_datafiles(con, warning, critical, perf_data):
1081
+ """ Checking the average amount of data in megabytes written to the databases datafiles in the last four seconds.
1082
+ As these writes are already journaled, they can occur lazily, and thus the number indicated here may be lower
1083
+ than the amount physically written to disk."""
1084
+ warning = warning or 20
1085
+ critical = critical or 40
1086
+ try:
1087
+ data = get_server_status(con)
1088
+ writes = data['dur']['writeToDataFilesMB']
1089
+ message = "Write to data files : %.2f MB" % writes
1090
+ message += performance_data(perf_data, [("%.2f" % writes, "write_to_data_files", warning, critical)])
1091
+ return check_levels(writes, warning, critical, message)
1092
+
1093
+ except Exception, e:
1094
+ return exit_with_general_critical(e)
1095
+
1096
+
1097
+ def get_opcounters(data, opcounters_name, host):
1098
+ try:
1099
+ insert = data[opcounters_name]['insert']
1100
+ query = data[opcounters_name]['query']
1101
+ update = data[opcounters_name]['update']
1102
+ delete = data[opcounters_name]['delete']
1103
+ getmore = data[opcounters_name]['getmore']
1104
+ command = data[opcounters_name]['command']
1105
+ except KeyError, e:
1106
+ return 0, [0] * 100
1107
+ total_commands = insert + query + update + delete + getmore + command
1108
+ new_vals = [total_commands, insert, query, update, delete, getmore, command]
1109
+ return maintain_delta(new_vals, host, opcounters_name)
1110
+
1111
+
1112
+ def check_opcounters(con, host, warning, critical, perf_data):
1113
+ """ A function to get all opcounters delta per minute. In case of a replication - gets the opcounters+opcountersRepl"""
1114
+ warning = warning or 10000
1115
+ critical = critical or 15000
1116
+
1117
+ data = get_server_status(con)
1118
+ err1, delta_opcounters = get_opcounters(data, 'opcounters', host)
1119
+ err2, delta_opcounters_repl = get_opcounters(data, 'opcountersRepl', host)
1120
+ if err1 == 0 and err2 == 0:
1121
+ delta = [(x + y) for x, y in zip(delta_opcounters, delta_opcounters_repl)]
1122
+ delta[0] = delta_opcounters[0] # only the time delta shouldn't be summarized
1123
+ per_minute_delta = [int(x / delta[0] * 60) for x in delta[1:]]
1124
+ message = "Test succeeded , old values missing"
1125
+ message = "Opcounters: total=%d,insert=%d,query=%d,update=%d,delete=%d,getmore=%d,command=%d" % tuple(per_minute_delta)
1126
+ message += performance_data(perf_data, ([(per_minute_delta[0], "total", warning, critical), (per_minute_delta[1], "insert"),
1127
+ (per_minute_delta[2], "query"), (per_minute_delta[3], "update"), (per_minute_delta[5], "delete"),
1128
+ (per_minute_delta[5], "getmore"), (per_minute_delta[6], "command")]))
1129
+ return check_levels(per_minute_delta[0], warning, critical, message)
1130
+ else:
1131
+ return exit_with_general_critical("problem reading data from temp file")
1132
+
1133
+
1134
+ def check_current_lock(con, host, warning, critical, perf_data):
1135
+ """ A function to get current lock percentage and not a global one, as check_lock function does"""
1136
+ warning = warning or 10
1137
+ critical = critical or 30
1138
+ data = get_server_status(con)
1139
+
1140
+ lockTime = float(data['globalLock']['lockTime'])
1141
+ totalTime = float(data['globalLock']['totalTime'])
1142
+
1143
+ err, delta = maintain_delta([totalTime, lockTime], host, "locktime")
1144
+ if err == 0:
1145
+ lock_percentage = delta[2] / delta[1] * 100 # lockTime/totalTime*100
1146
+ message = "Current Lock Percentage: %.2f%%" % lock_percentage
1147
+ message += performance_data(perf_data, [("%.2f" % lock_percentage, "current_lock_percentage", warning, critical)])
1148
+ return check_levels(lock_percentage, warning, critical, message)
1149
+ else:
1150
+ return exit_with_general_warning("problem reading data from temp file")
1151
+
1152
+
1153
+ def check_page_faults(con, host, warning, critical, perf_data):
1154
+ """ A function to get page_faults per second from the system"""
1155
+ warning = warning or 10
1156
+ critical = critical or 30
1157
+ data = get_server_status(con)
1158
+
1159
+ try:
1160
+ page_faults = float(data['extra_info']['page_faults'])
1161
+ except:
1162
+ # page_faults unsupported on the underlaying system
1163
+ return exit_with_general_critical("page_faults unsupported on the underlaying system")
1164
+
1165
+ err, delta = maintain_delta([page_faults], host, "page_faults")
1166
+ if err == 0:
1167
+ page_faults_ps = delta[1] / delta[0]
1168
+ message = "Page faults : %.2f ps" % page_faults_ps
1169
+ message += performance_data(perf_data, [("%.2f" % page_faults_ps, "page_faults_ps", warning, critical)])
1170
+ return check_levels(page_faults_ps, warning, critical, message)
1171
+ else:
1172
+ return exit_with_general_warning("problem reading data from temp file")
1173
+
1174
+
1175
+ def check_asserts(con, host, warning, critical, perf_data):
1176
+ """ A function to get asserts from the system"""
1177
+ warning = warning or 1
1178
+ critical = critical or 10
1179
+ data = get_server_status(con)
1180
+
1181
+ asserts = data['asserts']
1182
+
1183
+ #{ "regular" : 0, "warning" : 6, "msg" : 0, "user" : 12, "rollovers" : 0 }
1184
+ regular = asserts['regular']
1185
+ warning_asserts = asserts['warning']
1186
+ msg = asserts['msg']
1187
+ user = asserts['user']
1188
+ rollovers = asserts['rollovers']
1189
+
1190
+ err, delta = maintain_delta([regular, warning_asserts, msg, user, rollovers], host, "asserts")
1191
+
1192
+ if err == 0:
1193
+ if delta[5] != 0:
1194
+ #the number of rollovers were increased
1195
+ warning = -1 # no matter the metrics this situation should raise a warning
1196
+ # if this is normal rollover - the warning will not appear again, but if there will be a lot of asserts
1197
+ # the warning will stay for a long period of time
1198
+ # although this is not a usual situation
1199
+
1200
+ regular_ps = delta[1] / delta[0]
1201
+ warning_ps = delta[2] / delta[0]
1202
+ msg_ps = delta[3] / delta[0]
1203
+ user_ps = delta[4] / delta[0]
1204
+ rollovers_ps = delta[5] / delta[0]
1205
+ total_ps = regular_ps + warning_ps + msg_ps + user_ps
1206
+ message = "Total asserts : %.2f ps" % total_ps
1207
+ message += performance_data(perf_data, [(total_ps, "asserts_ps", warning, critical), (regular_ps, "regular"),
1208
+ (warning_ps, "warning"), (msg_ps, "msg"), (user_ps, "user")])
1209
+ return check_levels(total_ps, warning, critical, message)
1210
+ else:
1211
+ return exit_with_general_warning("problem reading data from temp file")
1212
+
1213
+
1214
+ def get_stored_primary_server_name(db):
1215
+ """ get the stored primary server name from db. """
1216
+ if "last_primary_server" in db.collection_names():
1217
+ stored_primary_server = db.last_primary_server.find_one()["server"]
1218
+ else:
1219
+ stored_primary_server = None
1220
+
1221
+ return stored_primary_server
1222
+
1223
+
1224
+ def check_replica_primary(con, host, warning, critical, perf_data, replicaset):
1225
+ """ A function to check if the primary server of a replica set has changed """
1226
+ if warning is None and critical is None:
1227
+ warning = 1
1228
+ warning = warning or 2
1229
+ critical = critical or 2
1230
+
1231
+ primary_status = 0
1232
+ message = "Primary server has not changed"
1233
+ db = con["nagios"]
1234
+ data = get_server_status(con)
1235
+ if replicaset != data['repl'].get('setName'):
1236
+ message = "Replica set requested: %s differs from the one found: %s" % (replicaset, data['repl'].get('setName'))
1237
+ primary_status = 2
1238
+ return check_levels(primary_status, warning, critical, message)
1239
+ current_primary = data['repl'].get('primary')
1240
+ saved_primary = get_stored_primary_server_name(db)
1241
+ if current_primary is None:
1242
+ current_primary = "None"
1243
+ if saved_primary is None:
1244
+ saved_primary = "None"
1245
+ if current_primary != saved_primary:
1246
+ last_primary_server_record = {"server": current_primary}
1247
+ db.last_primary_server.update({"_id": "last_primary"}, {"$set": last_primary_server_record}, upsert=True, safe=True)
1248
+ message = "Primary server has changed from %s to %s" % (saved_primary, current_primary)
1249
+ primary_status = 1
1250
+ return check_levels(primary_status, warning, critical, message)
1251
+
1252
+
1253
+ def check_page_faults(con, sample_time, warning, critical, perf_data):
1254
+ warning = warning or 10
1255
+ critical = critical or 20
1256
+ try:
1257
+ try:
1258
+ set_read_preference(con.admin)
1259
+ data1 = con.admin.command(pymongo.son_manipulator.SON([('serverStatus', 1)]))
1260
+ time.sleep(sample_time)
1261
+ data2 = con.admin.command(pymongo.son_manipulator.SON([('serverStatus', 1)]))
1262
+ except:
1263
+ data1 = con.admin.command(son.SON([('serverStatus', 1)]))
1264
+ time.sleep(sample_time)
1265
+ data2 = con.admin.command(son.SON([('serverStatus', 1)]))
1266
+
1267
+ try:
1268
+ #on linux servers only
1269
+ page_faults = (int(data2['extra_info']['page_faults']) - int(data1['extra_info']['page_faults'])) / sample_time
1270
+ except KeyError:
1271
+ print "WARNING - Can't get extra_info.page_faults counter from MongoDB"
1272
+ sys.exit(1)
1273
+
1274
+ message = "Page Faults: %i" % (page_faults)
1275
+
1276
+ message += performance_data(perf_data, [(page_faults, "page_faults", warning, critical)])
1277
+ check_levels(page_faults, warning, critical, message)
1278
+
1279
+ except Exception, e:
1280
+ exit_with_general_critical(e)
1281
+
1282
+
1283
+ def chunks_balance(con, database, collection, warning, critical):
1284
+ warning = warning or 10
1285
+ critical = critical or 20
1286
+ nsfilter = database + "." + collection
1287
+ try:
1288
+ try:
1289
+ set_read_preference(con.admin)
1290
+ col = con.config.chunks
1291
+ nscount = col.find({"ns": nsfilter}).count()
1292
+ shards = col.distinct("shard")
1293
+
1294
+ except:
1295
+ print "WARNING - Can't get chunks infos from MongoDB"
1296
+ sys.exit(1)
1297
+
1298
+ if nscount == 0:
1299
+ print "WARNING - Namespace %s is not sharded" % (nsfilter)
1300
+ sys.exit(1)
1301
+
1302
+ avgchunksnb = nscount / len(shards)
1303
+ warningnb = avgchunksnb * warning / 100
1304
+ criticalnb = avgchunksnb * critical / 100
1305
+
1306
+ for shard in shards:
1307
+ delta = abs(avgchunksnb - col.find({"ns": nsfilter, "shard": shard}).count())
1308
+ message = "Namespace: %s, Shard name: %s, Chunk delta: %i" % (nsfilter, shard, delta)
1309
+
1310
+ if delta >= criticalnb and delta > 0:
1311
+ print "CRITICAL - Chunks not well balanced " + message
1312
+ sys.exit(2)
1313
+ elif delta >= warningnb and delta > 0:
1314
+ print "WARNING - Chunks not well balanced " + message
1315
+ sys.exit(1)
1316
+
1317
+ print "OK - Chunks well balanced across shards"
1318
+ sys.exit(0)
1319
+
1320
+ except Exception, e:
1321
+ exit_with_general_critical(e)
1322
+
1323
+ print "OK - Chunks well balanced across shards"
1324
+ sys.exit(0)
1325
+
1326
+
1327
+ def check_connect_primary(con, warning, critical, perf_data):
1328
+ warning = warning or 3
1329
+ critical = critical or 6
1330
+
1331
+ try:
1332
+ try:
1333
+ set_read_preference(con.admin)
1334
+ data = con.admin.command(pymongo.son_manipulator.SON([('isMaster', 1)]))
1335
+ except:
1336
+ data = con.admin.command(son.SON([('isMaster', 1)]))
1337
+
1338
+ if data['ismaster'] == True:
1339
+ print "OK - This server is primary"
1340
+ return 0
1341
+
1342
+ phost = data['primary'].split(':')[0]
1343
+ pport = int(data['primary'].split(':')[1])
1344
+ start = time.time()
1345
+
1346
+ err, con = mongo_connect(phost, pport)
1347
+ if err != 0:
1348
+ return err
1349
+
1350
+ pconn_time = time.time() - start
1351
+ pconn_time = round(pconn_time, 0)
1352
+ message = "Connection to primary server " + data['primary'] + " took %i seconds" % pconn_time
1353
+ message += performance_data(perf_data, [(pconn_time, "connection_time", warning, critical)])
1354
+
1355
+ return check_levels(pconn_time, warning, critical, message)
1356
+
1357
+ except Exception, e:
1358
+ return exit_with_general_critical(e)
1359
+
1360
+
1361
+ def check_collection_state(con, database, collection):
1362
+ try:
1363
+ con[database][collection].find_one()
1364
+ print "OK - Collection %s.%s is reachable " % (database, collection)
1365
+ return 0
1366
+
1367
+ except Exception, e:
1368
+ return exit_with_general_critical(e)
1369
+
1370
+
1371
+ def check_row_count(con, database, collection, warning, critical, perf_data):
1372
+ try:
1373
+ count = con[database][collection].count()
1374
+ message = "Row count: %i" % (count)
1375
+ message += performance_data(perf_data, [(count, "row_count", warning, critical)])
1376
+
1377
+ return check_levels(count, warning, critical, message)
1378
+
1379
+ except Exception, e:
1380
+ return exit_with_general_critical(e)
1381
+
1382
+
1383
+ def build_file_name(host, action):
1384
+ #done this way so it will work when run independently and from shell
1385
+ module_name = re.match('(.*//*)*(.*)\..*', __file__).group(2)
1386
+ return "/tmp/" + module_name + "_data/" + host + "-" + action + ".data"
1387
+
1388
+
1389
+ def ensure_dir(f):
1390
+ d = os.path.dirname(f)
1391
+ if not os.path.exists(d):
1392
+ os.makedirs(d)
1393
+
1394
+
1395
+ def write_values(file_name, string):
1396
+ f = None
1397
+ try:
1398
+ f = open(file_name, 'w')
1399
+ except IOError, e:
1400
+ #try creating
1401
+ if (e.errno == 2):
1402
+ ensure_dir(file_name)
1403
+ f = open(file_name, 'w')
1404
+ else:
1405
+ raise IOError(e)
1406
+ f.write(string)
1407
+ f.close()
1408
+ return 0
1409
+
1410
+
1411
+ def read_values(file_name):
1412
+ data = None
1413
+ try:
1414
+ f = open(file_name, 'r')
1415
+ data = f.read()
1416
+ f.close()
1417
+ return 0, data
1418
+ except IOError, e:
1419
+ if (e.errno == 2):
1420
+ #no previous data
1421
+ return 1, ''
1422
+ except Exception, e:
1423
+ return 2, None
1424
+
1425
+
1426
+ def calc_delta(old, new):
1427
+ delta = []
1428
+ if (len(old) != len(new)):
1429
+ raise Exception("unequal number of parameters")
1430
+ for i in range(0, len(old)):
1431
+ val = float(new[i]) - float(old[i])
1432
+ if val < 0:
1433
+ val = new[i]
1434
+ delta.append(val)
1435
+ return 0, delta
1436
+
1437
+
1438
+ def maintain_delta(new_vals, host, action):
1439
+ file_name = build_file_name(host, action)
1440
+ err, data = read_values(file_name)
1441
+ old_vals = data.split(';')
1442
+ new_vals = [str(int(time.time()))] + new_vals
1443
+ delta = None
1444
+ try:
1445
+ err, delta = calc_delta(old_vals, new_vals)
1446
+ except:
1447
+ err = 2
1448
+ write_res = write_values(file_name, ";" . join(str(x) for x in new_vals))
1449
+ return err + write_res, delta
1450
+
1451
+
1452
+ def replication_get_time_diff(con):
1453
+ col = 'oplog.rs'
1454
+ local = con.local
1455
+ ol = local.system.namespaces.find_one({"name": "local.oplog.$main"})
1456
+ if ol:
1457
+ col = 'oplog.$main'
1458
+ firstc = local[col].find().sort("$natural", 1).limit(1)
1459
+ lastc = local[col].find().sort("$natural", -1).limit(1)
1460
+ first = firstc.next()
1461
+ last = lastc.next()
1462
+ tfirst = first["ts"]
1463
+ tlast = last["ts"]
1464
+ delta = tlast.time - tfirst.time
1465
+ return delta
1466
+
1467
+ #
1468
+ # main app
1469
+ #
1470
+ if __name__ == "__main__":
1471
+ sys.exit(main(sys.argv[1:]))