bosh-monitor 1.2624.0 → 1.2640.0

Sign up to get free protection for your applications and to get access to all the features.
Files changed (5) hide show
  1. checksums.yaml +7 -0
  2. data/README.md +105 -0
  3. data/lib/bosh/monitor/version.rb +1 -1
  4. metadata +30 -69
  5. data/README +0 -80
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: 235888ed3668f9879fc1115fdee790c8be428d97
4
+ data.tar.gz: aaa803354916d1e53d6a8131886d5d5bfb330a27
5
+ SHA512:
6
+ metadata.gz: 8725456e1287862c434326a0e5ac7e0c1b06378bac2f972bfe8878a2372fa4220e06ce9e98f74ef204a77a0d0bcba486a530dc6df1c8dc3e4e2eaf822a5e6af9
7
+ data.tar.gz: a37746a0e2906592874766567857032e5b5cc8d1a373d5e05d26f11cf89a8e81f74371e5ea21f918dd55e4deac4c11d2ae2dc3926eef63b6c00cce437770f643
@@ -0,0 +1,105 @@
1
+ ## Synopsis
2
+
3
+ BOSH Monitor is a component that listens to and responds to events (Heartbeats & Alerts) on the message bus (NATS).
4
+
5
+ The Monitor also includes a few primary components:
6
+ - The Agent Monitor maintains a record of known agents (by heartbeat event subscription)
7
+ - The Director Monitor maintains a record of known agents (by director HTTP polling).
8
+ - The Agent Analyzer that analyzes agent state periodically and generates Alerts.
9
+ - The HTTP Server that responds to the /varz endpoint
10
+
11
+ The Monitor also supports generic event processing plugins that respond to Heartbeats & Alerts.
12
+
13
+ ## Heartbeat Events
14
+
15
+ The Agent on each VM sends periodic heartbeats to the BOSH Monitor via the message bus (NATS).
16
+
17
+ The message syntax is as follows:
18
+
19
+ | *Subject* | *Payload* |
20
+ |-----------|-----------|
21
+ | hm.agent.heartbeat.\<agent_id\> | none |
22
+
23
+ ## Alert Events
24
+
25
+ A BOSH Alert is a specific type of event sent by BOSH components via the message bus.
26
+
27
+ Alerts includes the following data:
28
+
29
+ - Id
30
+ - Severity
31
+ - Source (usually deployment/job/index tuple)
32
+ - Timestamp
33
+ - Description
34
+ - Long description (optional)
35
+ - Tags (optional)
36
+
37
+ ## Event Handling Plugins
38
+
39
+ Alerts are processed by a number of plugins that register to receive incoming alerts.
40
+
41
+ Among the included plugins are:
42
+ - Event Logger - Logs all events
43
+ - Resurrector - Restarts VMs that have stopped heartbeating
44
+ - PagerDuty - Sends various events to PagerDuty.com using their API
45
+ - DataDog - Sends various events to DataDog.com using their API
46
+ - AWS CloudWatch - Sends various events to Amazon's CloudWatch using their API
47
+ - Emailer - Sends configurable Emails on events reciept
48
+
49
+ Plugins should conform to the following interface:
50
+
51
+ | *Method* | *Arguments* | *Description* |
52
+ |----------|-------------|---------------|
53
+ | *validate_options* | | Validates the plugin configuration options |
54
+ | *run* | | Initializes the plugin process |
55
+ | *process* | event | Processes an event (Bosh::Monitor::Events::Heartbeat or Bosh::Monitor::Events::Alert) |
56
+
57
+ The event processor handles deduping duplicate events.
58
+
59
+ Plugins are notified in the order that they were registered (based on configuration order).
60
+
61
+ ## Agent Monitor - Heartbeat Event Processing
62
+
63
+ The Agent Monitor listens for heartbeat events on the message bus and handles them in the following way:
64
+
65
+ - If the Agent is known to the Monitor then the last heartbeat timestamp gets updated.
66
+ - If the Agent is unknown to the Monitor then it is recorded with a flag that marks it as a "rogue agent".
67
+
68
+ No analysis is performed when a heatbeat is received. The Agent Analyzer process and Director Monitor polling are asynchronous to heartbeat event processing by the Agent Monitor.
69
+
70
+ ## Director Monitor - Agent Discovery
71
+
72
+ The Director Monitor polls the Director periodically via HTTP to get the list of managed VMs.
73
+
74
+ The message syntax is as follows:
75
+
76
+ | *Method* | *Endpoint* | *Response* |
77
+ |----------|------------|------------|
78
+ | /deployments/\<deployment_name\>/vms | GET | JSON including agent ids, job names and indices for all managed VMs |
79
+
80
+ - If a new agent is discovered via polling then it is recorded by the Monitor as part of the managed deployment.
81
+ - If a "rogue agent" is discovered via polling then its "rogue agent" flag is cleared.
82
+
83
+ The Director Monitor does not actively poll the agents themselves, just the Director. The Director Monitor simply remembers the state of the world as reported by polling and event processing so that the difference can be analyzed.
84
+
85
+ ## Agent Analyzer
86
+
87
+ The Agent Analyzer is a periodic process that generates "Agent Missing" alerts.
88
+
89
+ If an agent's heartbeat timestamp is not updated within the configured time period, the Agent Analyzer process will generate an "Agent Missing" alert.
90
+
91
+ Both known VM agents and rogue agents may send "Agent Missing" alerts, but they have different configurable time periods.
92
+
93
+ ## Alerts from BOSH Agent
94
+
95
+ The Monitor subscribes to Agent alerts of the following format:
96
+
97
+ | *Subject* | *Payload* |
98
+ |-----------|-----------|
99
+ | hm.agent.alert.\<agent_id\> | JSON containing the following keys: id, service, event, action, description, timestamp, tags |
100
+
101
+ BOSH Agent is responsible for mapping any underlying supervisor alert format to the expected JSON payload and sending it to BOSH Monitor.
102
+
103
+ The Monitor is responsible for interpreting the JSON payload and mapping it to a sequence of Monitor & Plugin actions, possibly generating new alerts that bypass the message bus. Malformed payloads are ignored.
104
+
105
+ Job name and index are not part of alerts from the Agent, those are looked up in the Director. If heartbeat came from a rogue agent and we have no job name and/or index then we note that fact in the alert description but don't try to be too worried about that (service name and agent id should be enough). We might consider including agent IP address as a part of heartbeat so we can track down rogue agents.
@@ -1,5 +1,5 @@
1
1
  module Bosh
2
2
  module Monitor
3
- VERSION = '1.2624.0'
3
+ VERSION = '1.2640.0'
4
4
  end
5
5
  end
metadata CHANGED
@@ -1,68 +1,60 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: bosh-monitor
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.2624.0
5
- prerelease:
4
+ version: 1.2640.0
6
5
  platform: ruby
7
6
  authors:
8
7
  - VMware
9
8
  autorequire:
10
9
  bindir: bin
11
10
  cert_chain: []
12
- date: 2014-07-01 00:00:00.000000000 Z
11
+ date: 2014-07-08 00:00:00.000000000 Z
13
12
  dependencies:
14
13
  - !ruby/object:Gem::Dependency
15
14
  name: eventmachine
16
15
  requirement: !ruby/object:Gem::Requirement
17
- none: false
18
16
  requirements:
19
- - - ~>
17
+ - - "~>"
20
18
  - !ruby/object:Gem::Version
21
19
  version: 1.0.0
22
20
  type: :runtime
23
21
  prerelease: false
24
22
  version_requirements: !ruby/object:Gem::Requirement
25
- none: false
26
23
  requirements:
27
- - - ~>
24
+ - - "~>"
28
25
  - !ruby/object:Gem::Version
29
26
  version: 1.0.0
30
27
  - !ruby/object:Gem::Dependency
31
28
  name: logging
32
29
  requirement: !ruby/object:Gem::Requirement
33
- none: false
34
30
  requirements:
35
- - - ~>
31
+ - - "~>"
36
32
  - !ruby/object:Gem::Version
37
33
  version: 1.5.0
38
34
  type: :runtime
39
35
  prerelease: false
40
36
  version_requirements: !ruby/object:Gem::Requirement
41
- none: false
42
37
  requirements:
43
- - - ~>
38
+ - - "~>"
44
39
  - !ruby/object:Gem::Version
45
40
  version: 1.5.0
46
41
  - !ruby/object:Gem::Dependency
47
42
  name: em-http-request
48
43
  requirement: !ruby/object:Gem::Requirement
49
- none: false
50
44
  requirements:
51
- - - ~>
45
+ - - "~>"
52
46
  - !ruby/object:Gem::Version
53
47
  version: 0.3.0
54
48
  type: :runtime
55
49
  prerelease: false
56
50
  version_requirements: !ruby/object:Gem::Requirement
57
- none: false
58
51
  requirements:
59
- - - ~>
52
+ - - "~>"
60
53
  - !ruby/object:Gem::Version
61
54
  version: 0.3.0
62
55
  - !ruby/object:Gem::Dependency
63
56
  name: nats
64
57
  requirement: !ruby/object:Gem::Requirement
65
- none: false
66
58
  requirements:
67
59
  - - '='
68
60
  - !ruby/object:Gem::Version
@@ -70,7 +62,6 @@ dependencies:
70
62
  type: :runtime
71
63
  prerelease: false
72
64
  version_requirements: !ruby/object:Gem::Requirement
73
- none: false
74
65
  requirements:
75
66
  - - '='
76
67
  - !ruby/object:Gem::Version
@@ -78,102 +69,76 @@ dependencies:
78
69
  - !ruby/object:Gem::Dependency
79
70
  name: yajl-ruby
80
71
  requirement: !ruby/object:Gem::Requirement
81
- none: false
82
72
  requirements:
83
- - - ~>
73
+ - - "~>"
84
74
  - !ruby/object:Gem::Version
85
75
  version: 1.1.0
86
76
  type: :runtime
87
77
  prerelease: false
88
78
  version_requirements: !ruby/object:Gem::Requirement
89
- none: false
90
79
  requirements:
91
- - - ~>
80
+ - - "~>"
92
81
  - !ruby/object:Gem::Version
93
82
  version: 1.1.0
94
83
  - !ruby/object:Gem::Dependency
95
84
  name: thin
96
85
  requirement: !ruby/object:Gem::Requirement
97
- none: false
98
86
  requirements:
99
- - - ~>
87
+ - - "~>"
100
88
  - !ruby/object:Gem::Version
101
89
  version: 1.5.0
102
90
  type: :runtime
103
91
  prerelease: false
104
92
  version_requirements: !ruby/object:Gem::Requirement
105
- none: false
106
93
  requirements:
107
- - - ~>
94
+ - - "~>"
108
95
  - !ruby/object:Gem::Version
109
96
  version: 1.5.0
110
97
  - !ruby/object:Gem::Dependency
111
98
  name: sinatra
112
99
  requirement: !ruby/object:Gem::Requirement
113
- none: false
114
100
  requirements:
115
- - - ~>
101
+ - - "~>"
116
102
  - !ruby/object:Gem::Version
117
103
  version: 1.4.2
118
104
  type: :runtime
119
105
  prerelease: false
120
106
  version_requirements: !ruby/object:Gem::Requirement
121
- none: false
122
107
  requirements:
123
- - - ~>
108
+ - - "~>"
124
109
  - !ruby/object:Gem::Version
125
110
  version: 1.4.2
126
111
  - !ruby/object:Gem::Dependency
127
112
  name: aws-sdk
128
113
  requirement: !ruby/object:Gem::Requirement
129
- none: false
130
114
  requirements:
131
115
  - - '='
132
116
  - !ruby/object:Gem::Version
133
- version: 1.32.0
117
+ version: 1.44.0
134
118
  type: :runtime
135
119
  prerelease: false
136
120
  version_requirements: !ruby/object:Gem::Requirement
137
- none: false
138
121
  requirements:
139
122
  - - '='
140
123
  - !ruby/object:Gem::Version
141
- version: 1.32.0
124
+ version: 1.44.0
142
125
  - !ruby/object:Gem::Dependency
143
126
  name: dogapi
144
127
  requirement: !ruby/object:Gem::Requirement
145
- none: false
146
128
  requirements:
147
- - - ~>
129
+ - - "~>"
148
130
  - !ruby/object:Gem::Version
149
131
  version: 1.6.0
150
132
  type: :runtime
151
133
  prerelease: false
152
134
  version_requirements: !ruby/object:Gem::Requirement
153
- none: false
154
135
  requirements:
155
- - - ~>
136
+ - - "~>"
156
137
  - !ruby/object:Gem::Version
157
138
  version: 1.6.0
158
- - !ruby/object:Gem::Dependency
159
- name: uuidtools
160
- requirement: !ruby/object:Gem::Requirement
161
- none: false
162
- requirements:
163
- - - ~>
164
- - !ruby/object:Gem::Version
165
- version: '2.1'
166
- type: :runtime
167
- prerelease: false
168
- version_requirements: !ruby/object:Gem::Requirement
169
- none: false
170
- requirements:
171
- - - ~>
172
- - !ruby/object:Gem::Version
173
- version: '2.1'
174
- description: ! 'BOSH Health Monitor
175
-
176
- 03f604'
139
+ description: |-
140
+ BOSH Health Monitor
141
+ 986896
177
142
  email: support@cloudfoundry.com
178
143
  executables:
179
144
  - bosh-monitor-console
@@ -182,6 +147,10 @@ executables:
182
147
  extensions: []
183
148
  extra_rdoc_files: []
184
149
  files:
150
+ - README.md
151
+ - bin/bosh-monitor
152
+ - bin/bosh-monitor-console
153
+ - bin/listener
185
154
  - lib/bosh/monitor.rb
186
155
  - lib/bosh/monitor/agent.rb
187
156
  - lib/bosh/monitor/agent_manager.rb
@@ -214,36 +183,28 @@ files:
214
183
  - lib/bosh/monitor/runner.rb
215
184
  - lib/bosh/monitor/version.rb
216
185
  - lib/bosh/monitor/yaml_helper.rb
217
- - README
218
- - bin/bosh-monitor-console
219
- - bin/bosh-monitor
220
- - bin/listener
221
186
  homepage: https://github.com/cloudfoundry/bosh
222
187
  licenses:
223
188
  - Apache 2.0
189
+ metadata: {}
224
190
  post_install_message:
225
191
  rdoc_options: []
226
192
  require_paths:
227
193
  - lib
228
194
  required_ruby_version: !ruby/object:Gem::Requirement
229
- none: false
230
195
  requirements:
231
- - - ! '>='
196
+ - - ">="
232
197
  - !ruby/object:Gem::Version
233
198
  version: 1.9.3
234
199
  required_rubygems_version: !ruby/object:Gem::Requirement
235
- none: false
236
200
  requirements:
237
- - - ! '>='
201
+ - - ">="
238
202
  - !ruby/object:Gem::Version
239
203
  version: '0'
240
- segments:
241
- - 0
242
- hash: -3444887689356418361
243
204
  requirements: []
244
205
  rubyforge_project:
245
- rubygems_version: 1.8.23.2
206
+ rubygems_version: 2.2.2
246
207
  signing_key:
247
- specification_version: 3
208
+ specification_version: 4
248
209
  summary: BOSH Health Monitor
249
210
  test_files: []
data/README DELETED
@@ -1,80 +0,0 @@
1
- h4. Synopsis
2
-
3
- BOSH Health Monitor (BHM) is a component that monitors health of one or multiple BOSH deployments. It processes heartbeats and alerts from BOSH agents and notifies interested parties if something goes wrong.
4
-
5
- h4. Heartbeats
6
-
7
- Agent sends periodic heartbeats to HM. Heartbeats are sent via message bus and have the following format:
8
-
9
- | *Subject* | hm.agent.heartbeat.<agent_id> |
10
- | *Payload* | none |
11
-
12
- h6. Heartbeat processing
13
-
14
- # If the agent is known to HM the last heartbeat timestamp gets updated. No analysis is attempted at this point, analyze agents routine is asynchronous to heartbeat processing.
15
- # If the agent is unknown it gets registered with HM with a warning flag set (we call them rogue agents). Next director poll will possibly include this agent to a list of managed agents and clear the flag. We might generate the alert if the flag hasn't been cleared for some (configurable) time.
16
-
17
- h4. Agents discovery
18
-
19
- HM polls director periodically to get the list of managed VMs:
20
-
21
- | *Endpoint* | GET /deployments/<deployment_name>/vms |
22
- | *Response* | JSON including agent ids, job names and indices for all managed VMs |
23
-
24
- When new agent is discovered it gets registered and added to a managed deployment. No active operations are performed to reach the agent and query it, we only rely on heartbeats and agent alerts.
25
-
26
- h4. Agents analysis
27
-
28
- This is a periodic operation that goes through all known agents. First it tries to go through all managed deployments, then analyzes rogue agents as well. The following procedure is used:
29
-
30
- # If agent missed more than N heartbeats the "Agent Missing" alert is generated.
31
-
32
- h4. Alerts
33
-
34
- Alert is a concept used by HM to flag and deliver information about important events. It includes the following data:
35
-
36
- # Id
37
- # Severity
38
- # Source (usually deployment/job/index tuple)
39
- # Timestamp
40
- # Description
41
- # Long description (optional)
42
- # Tags (optional)
43
-
44
- h6. Alert Processor
45
-
46
- Alert Processor is a module that registers incoming alerts and routes them to interested parties via appropriate delivery agent. It should conform to the following interface:
47
-
48
- | *Method* | *Arguments* | *Description* |
49
- | *register_alert* | alert (object responding to :id, :severity, :timestamp, :description, :long_description, :source and :tags) | Registers an alert and invokes a delivery agent. Delivery agent might or might not deliver alert immediately depending on the implementation, so Alert Processor shouldn't make any assumptions about delivery (i.e. agent might queue up several alerts and send them asynchronously. |
50
- | *add_delivery_agent* | delivery_agent, options | Adds a delivery agent to a processor |
51
-
52
- Alert id can be an arbitrary string however Alert Processor might use it to keep track of registered alerts and don't process the same alert twice. This way other HM modules can just blindly register any incoming alerts and leave the dedup step to the alert processor).
53
-
54
- Alerts are only persisted in HM memory (at least in the initial version) so losing HM leads to losing any undelivered alerts that might have been queued by a delivery agent or alert processor).
55
-
56
- If alert processor has more than one delivery agents associated with it then it notifies all of them in order (i.e. we want to notify both Zabbix and Pager Duty).
57
-
58
- h6. Delivery Agent
59
-
60
- Delivery Agent is a module that takes care of an alert delivery mechanism (such as an email, Pager Duty alert, writing to a journal or even silently discarding the alert). It should conform to the following interface:
61
-
62
- | *Method* | *Arguments* | *Description* |
63
- | *deliver* | alert | Delivers alert or queues it for delivery. |
64
-
65
- The initial implementation will have email and Pager Duty delivery agents.
66
-
67
- Alert Processor is not pluggable, it's just one of HM classes. Delivery agents are pluggable but generally not changed in a runtime but initialized using an HM configuration file on HM startup.
68
-
69
- h4. Alerts from agent
70
-
71
- HM subscribes to agent alerts on a message bus:
72
-
73
- | *Subject* | hm.agent.alert.<agent_id> |
74
- | *Payload* | JSON containing the following keys: id, service, event, action, description, timestamp, tags |
75
-
76
- BOSH Agent is responsible for mapping any underlying supervisor alert format to the expected JSON payload and send it to HM.
77
-
78
- HM is responsible for interpreting JSON payload and mapping it to a sequence of HM actions and possibly creating an HM alert compatible with Alert Processor module. HM never dedups incoming alerts outside of Alert Processor (this adds some overhead to an incoming alert parser but shouldn't be too bad). Malformed payloads are ignored.
79
-
80
- Job name and index are not featured in agent incoming alert, those are looked up in director. If heartbeat came from a rogue agent and we have no job name and/or index then we note that fact in alert description but don't try to be too worried about that (service name and agent id should be enough). We might consider including agent IP address as a part of heartbeat so we can track down rogue agents.