scout_agent 3.0.7 → 3.1.0
Sign up to get free protection for your applications and to get access to all the features.
- data/CHANGELOG +13 -0
- data/README +44 -2
- data/Rakefile +1 -1
- data/TODO +7 -1
- data/lib/scout_agent.rb +26 -1
- data/lib/scout_agent/agent/communication_agent.rb +75 -2
- data/lib/scout_agent/agent/master_agent.rb +132 -31
- data/lib/scout_agent/assignment/queue.rb +3 -0
- data/lib/scout_agent/assignment/snapshot.rb +4 -1
- data/lib/scout_agent/assignment/start.rb +3 -4
- data/lib/scout_agent/assignment/stop.rb +64 -30
- data/lib/scout_agent/id_card.rb +23 -12
- data/lib/scout_agent/lifeline.rb +129 -19
- data/lib/scout_agent/mission.rb +221 -68
- data/lib/scout_agent/order.rb +2 -2
- data/lib/scout_agent/plan.rb +38 -18
- metadata +2 -2
data/CHANGELOG
CHANGED
@@ -1,3 +1,16 @@
|
|
1
|
+
== 3.1.0
|
2
|
+
|
3
|
+
* Fixed a bug where the monitor might not properly signal all subprocesses
|
4
|
+
* Made the master agent forward shutdown requests to running missions
|
5
|
+
* Enhanced the stop command to ensure all processes exit and to skip the waits
|
6
|
+
on KILL signals since they cannot be responded to
|
7
|
+
* Improved the error message for forced stops to give next steps
|
8
|
+
* Fixed a bug that would prevent the agent from daemonizing itself at start-up
|
9
|
+
if it had to clean out a stale PID file first
|
10
|
+
* Added status database and log file maintenance to the snapshot and queue
|
11
|
+
commands
|
12
|
+
* Completed full code documentation and clean-up pass
|
13
|
+
|
1
14
|
== 3.0.7
|
2
15
|
|
3
16
|
* Added a log() method (with a logger() alias) plugins can access to add
|
data/README
CHANGED
@@ -1,3 +1,45 @@
|
|
1
|
-
=
|
1
|
+
= The Scout Agent
|
2
2
|
|
3
|
-
Scout,
|
3
|
+
This is the agent software installed on servers to work with {the Scout monitoring application}[http://scoutapp.com/]. See the sections below for details on how to install and use the agent, how to build your own plugins for it to track the data you care about, and how to add features to the agent itself.
|
4
|
+
|
5
|
+
== How do I use the Scout agent?
|
6
|
+
|
7
|
+
Installing the agent is just a simple gem install:
|
8
|
+
|
9
|
+
$ sudo gem install scout_agent
|
10
|
+
|
11
|
+
Note that the gem requires Ruby 1.8.6 or higher and Rubygems 1.3.1 or higher. It also doesn't run on Windows due to Ruby not supporting fork() there.
|
12
|
+
|
13
|
+
Once the gem is installed, you need to identify yourself with the agent key (which looks like a7349498-bec3-4ddf-963c-149a666433a4) that you get from the Web application. Just issue this command and have your key ready when it asks for it:
|
14
|
+
|
15
|
+
$ sudo scout_agent id
|
16
|
+
|
17
|
+
At this point, you should be all set to run the agent. You start it up with this command:
|
18
|
+
|
19
|
+
$ sudo scout_agent start
|
20
|
+
|
21
|
+
The agent is a daemon, so it should return your prompt after it moves into the land of background processes. It will be running though. You can issue the following command if you want to check up on it:
|
22
|
+
|
23
|
+
$ scout_agent status
|
24
|
+
|
25
|
+
With the agent running, you should be able to log into your account on the {Web application}[http://scoutapp.com/] to setup your list of plugins and see the agent delivering data.
|
26
|
+
|
27
|
+
== How do I build my own plugins?
|
28
|
+
|
29
|
+
Scout makes it very easy to build your own plugins for anything you need to monitor. In a matter of minutes you could be tracking the user sign-ups in your application or anything else that's important to you. Once you pipe some data into Scout you can take advantages of all the graphing and trend analysis we use for more traditional monitoring, like Rails applications.
|
30
|
+
|
31
|
+
We have {a tutorial}[http://scoutapp.com/plugin_urls/static/creating_a_plugin] on the Web site that walks you through building a Scout plugin.
|
32
|
+
|
33
|
+
== How do I hack on the agent?
|
34
|
+
|
35
|
+
We try to keep the agent code fairly clean and documented, so it's hopefully not too tough to poke around in. However, it is a big code base. Let me give you the dime tour of where to look for things. All paths below are relative to <tt>lib/scount_agent</tt>.
|
36
|
+
|
37
|
+
<tt>dispatcher.rb</tt>, <tt>assignment.rb</tt>, and <tt>assignment/*</tt>:: This is the code the agent uses to interpret commands users give on the command-line. You'll find configuration file loading (<tt>plan.rb</tt> is a configuration), switch parsing, command selection, and invocation in here.
|
38
|
+
<tt>api.rb</tt>:: The ScoutAgent::API is the external interface for the queue and snapshot commands. This can be used to push data into Scout, without even building a Plugin, or just to request an updated snapshot of the environment.
|
39
|
+
<tt>lifeline.rb</tt> and <tt>agent.rb</tt>:: The ScoutAgent::Lifeline object monitors a ScoutAgent::Agent class, which is a major function of Scout, namely the plugin runner and the XMPP communication module. This is a pretty typical multi-process heartbeat setup where the Agent is fork()ed into a separate process and then monitored for regular check-ins written to a shared pipe.
|
40
|
+
<tt>agent/master_agent.rb</tt> and <tt>mission.rb</tt>:: Together these two pieces make up the heart of the agent. The ScoutAgent::Agent::MasterAgent is the main event loop and ScoutAgent::Mission (aliased Plugin) are the pieces of code that get run in that loop.
|
41
|
+
<tt>agent/communication_agent.rb</tt>, <tt>order.rb</tt>, and <tt>order/*</tt>:: This code is used to listen for supported commands over an XMPP connection. The ScoutAgent::Agent::CommunicationAgent manages all the XMPP talking and ScoutAgent::Order and subclasses are the commands.
|
42
|
+
<tt>database.rb</tt> and <tt>database/*</tt>:: This is a thin wrapper over Amalgalite[http://copiousfreetime.rubyforge.org/amalgalite/] (and SQLite databases by extension). These are the memory of the agent and, with locking, the primary IPC used used by the agent.
|
43
|
+
<tt>core_extensions.rb</tt>:: This file holds a handful of extensions that make sense in the context of the agent. This is not an ActiveSupport size library, but just some simple niceties. These extensions can be used in your own Plugins.
|
44
|
+
|
45
|
+
We welcome additions to the agent and will incorporate patches if we feel they add to the platform as a whole. Obviously, the easier we can understand what you did the easier it is to judge that, so tests and documentation are plusses to us.
|
data/Rakefile
CHANGED
@@ -118,7 +118,7 @@ task :upload_docs => :rdoc do
|
|
118
118
|
)
|
119
119
|
host = "#{config['username']}@rubyforge.org"
|
120
120
|
remote_dir = "/var/www/gforge-projects/#{SA_SPEC.rubyforge_project}"
|
121
|
-
local_dir =
|
121
|
+
local_dir = "doc"
|
122
122
|
|
123
123
|
sh "rsync -av --delete #{local_dir}/ #{host}:#{remote_dir}"
|
124
124
|
end
|
data/TODO
CHANGED
@@ -1,3 +1,9 @@
|
|
1
1
|
= To Do List
|
2
2
|
|
3
|
-
|
3
|
+
* Build a `scout_agent help` command that provides general and command specific
|
4
|
+
help, perhaps by parsing the comments on assignments
|
5
|
+
* Add SSL certificate verification show we ensure we are always talking with the
|
6
|
+
official Scout server
|
7
|
+
* Add something like a SHA signature of the code to reports to help developers
|
8
|
+
see which version they are looking at data from
|
9
|
+
* Improve test coverage all over the agent
|
data/lib/scout_agent.rb
CHANGED
@@ -64,9 +64,34 @@ module ScoutAgent
|
|
64
64
|
wire_tap.tap = $stdout unless skip_stdout or Plan.run_as_daemon?
|
65
65
|
wire_tap
|
66
66
|
end
|
67
|
+
|
68
|
+
#
|
69
|
+
# A maintenance method used to remove log files written more than seven days
|
70
|
+
# ago. This prevents the hard drive from slowly filling with log files and
|
71
|
+
# thus is called as part of the main event loop as well as after snapshot and
|
72
|
+
# queue commands. A recent +log+ must be provided to notify of rotation
|
73
|
+
# errors.
|
74
|
+
#
|
75
|
+
def self.remove_old_log_files(log)
|
76
|
+
Plan.log_dir.each_entry do |log_file|
|
77
|
+
if log_file.to_s =~ /\.(\d{4})(\d{2})(\d{2})\z/
|
78
|
+
log_day = Time.local(*$~.captures.map { |n| n.to_i })
|
79
|
+
if Time.now - log_day > 60 * 60 * 24 * 7
|
80
|
+
begin
|
81
|
+
(Plan.log_dir + log_file).unlink
|
82
|
+
rescue Exception => error # file cannot be unlinked
|
83
|
+
log.error( "Failed to unlink old log file '#{log_file}': " +
|
84
|
+
"#{error.message} (#{error.class})." )
|
85
|
+
next
|
86
|
+
end
|
87
|
+
log.debug("Successfully unlinked old log file '#{log_file}'.")
|
88
|
+
end
|
89
|
+
end
|
90
|
+
end
|
91
|
+
end
|
67
92
|
|
68
93
|
# The version of this agent.
|
69
|
-
VERSION = "3.0
|
94
|
+
VERSION = "3.1.0".freeze
|
70
95
|
# A Pathname reference to the agent code directory, used in dynamic loading.
|
71
96
|
LIB_DIR = Pathname.new(File.dirname(__FILE__)) + agent_name
|
72
97
|
end
|
@@ -6,9 +6,20 @@ require "scout_agent/order"
|
|
6
6
|
|
7
7
|
module ScoutAgent
|
8
8
|
class Agent
|
9
|
+
#
|
10
|
+
# This agent manages the XMPP connection with the server. It mainly just
|
11
|
+
# listens for messages from the server and passes them on to matching Order
|
12
|
+
# instances.
|
13
|
+
#
|
9
14
|
class CommunicationAgent < Agent
|
15
|
+
# The number of seconds to wait before attempting another connection.
|
10
16
|
RECONNECT_WAIT = 60
|
11
17
|
|
18
|
+
#
|
19
|
+
# Prepares a log() and passses it down to the Orders, which are also
|
20
|
+
# loaded here. A list of trusted XMPP users is also prepared as part of
|
21
|
+
# this start-up.
|
22
|
+
#
|
12
23
|
def initialize
|
13
24
|
super # setup our log and status
|
14
25
|
|
@@ -26,6 +37,11 @@ module ScoutAgent
|
|
26
37
|
}
|
27
38
|
end
|
28
39
|
|
40
|
+
#
|
41
|
+
# This method encupsulates the process of the XMPP listener, which is
|
42
|
+
# pretty much: login, setup, listen for commands until told to stop, and
|
43
|
+
# exit.
|
44
|
+
#
|
29
45
|
def run
|
30
46
|
login
|
31
47
|
update_status("Online since #{Time.now.utc.to_db_s}")
|
@@ -35,7 +51,8 @@ module ScoutAgent
|
|
35
51
|
listen
|
36
52
|
close_connection
|
37
53
|
end
|
38
|
-
|
54
|
+
|
55
|
+
# Triggers the shutdown process.
|
39
56
|
def finish
|
40
57
|
log.info("Shutting down.")
|
41
58
|
if @shutdown_thread
|
@@ -45,8 +62,18 @@ module ScoutAgent
|
|
45
62
|
end
|
46
63
|
end
|
47
64
|
|
65
|
+
#######
|
48
66
|
private
|
67
|
+
#######
|
49
68
|
|
69
|
+
#######################
|
70
|
+
### XMPP Operations ###
|
71
|
+
#######################
|
72
|
+
|
73
|
+
#
|
74
|
+
# Prepares a Jabber::Client with identification and Exception handling,
|
75
|
+
# then hands off to try_connection().
|
76
|
+
#
|
50
77
|
def login
|
51
78
|
Thread.abort_on_exception = true # make XMPP4R fail fast
|
52
79
|
@agent_jid = Jabber::JID.new("#{agent_key}@#{jabber_server}/agent")
|
@@ -58,6 +85,10 @@ module ScoutAgent
|
|
58
85
|
try_connection
|
59
86
|
end
|
60
87
|
|
88
|
+
#
|
89
|
+
# Loops over connection attempts until successfully reaching the server.
|
90
|
+
# There's a pause of <tt>RECONNECT_WAIT</tt> seconds between each attempt.
|
91
|
+
#
|
61
92
|
def try_connection
|
62
93
|
@connecting = true
|
63
94
|
until connect_and_authenticate?
|
@@ -68,13 +99,18 @@ module ScoutAgent
|
|
68
99
|
@connecting = false
|
69
100
|
end
|
70
101
|
|
102
|
+
#
|
103
|
+
# Attempts an XMPP connection and login. The process will be cancelled if
|
104
|
+
# either part fails. Returns +true+ if the entire process completes as
|
105
|
+
# expected, or +false+ otherwise.
|
106
|
+
#
|
71
107
|
def connect_and_authenticate?
|
72
108
|
status("Connecting")
|
73
109
|
close_connection
|
74
110
|
begin
|
75
111
|
no_warnings { @jabber.connect }
|
76
112
|
rescue Exception => error # connection failure
|
77
|
-
log.error("Failed to connect
|
113
|
+
log.error("Failed to connect to XMPP server.")
|
78
114
|
return false
|
79
115
|
end
|
80
116
|
begin
|
@@ -87,6 +123,10 @@ module ScoutAgent
|
|
87
123
|
true
|
88
124
|
end
|
89
125
|
|
126
|
+
#
|
127
|
+
# Builds an XMPP status with +message+ and arranges for it to be sent to
|
128
|
+
# the server in a separate Thread.
|
129
|
+
#
|
90
130
|
def update_status(message, status = nil)
|
91
131
|
status("Queuing status change")
|
92
132
|
presence = Jabber::Presence.new
|
@@ -104,11 +144,19 @@ module ScoutAgent
|
|
104
144
|
end
|
105
145
|
end
|
106
146
|
|
147
|
+
#
|
148
|
+
# Grabs a _roster_ that can be used to manage the subscription requests of
|
149
|
+
# this agent's XMPP user.
|
150
|
+
#
|
107
151
|
def fetch_roster
|
108
152
|
status("Preparing connection")
|
109
153
|
@roster = Jabber::Roster::Helper.new(@jabber)
|
110
154
|
end
|
111
155
|
|
156
|
+
#
|
157
|
+
# Installs a callback that will accept all subscription requests from
|
158
|
+
# trusted XMPP identities.
|
159
|
+
#
|
112
160
|
def install_subscriptions_callback
|
113
161
|
@roster.add_subscription_request_callback do |_, presence|
|
114
162
|
log.info("Subscription request: #{presence.from}")
|
@@ -121,6 +169,15 @@ module ScoutAgent
|
|
121
169
|
end
|
122
170
|
end
|
123
171
|
|
172
|
+
#
|
173
|
+
# Installs a callback that reads XMPP messages looking for supported
|
174
|
+
# commands. The identified commands are handed off to an Order subclass
|
175
|
+
# for execution.
|
176
|
+
#
|
177
|
+
# Before processing a command, this process will strip a leading message
|
178
|
+
# ID, if provided. A response is sent to the server aknowledging the
|
179
|
+
# receit of the message in such cases.
|
180
|
+
#
|
124
181
|
def install_messages_callback
|
125
182
|
@jabber.add_message_callback do |message|
|
126
183
|
log.info("Received message from #{message.from}: #{message.body}")
|
@@ -144,6 +201,10 @@ module ScoutAgent
|
|
144
201
|
end
|
145
202
|
end
|
146
203
|
|
204
|
+
#
|
205
|
+
# Sends a chat message with +body+ to the XMPP identity named by +who+.
|
206
|
+
# Messages are sent in a separate Thread.
|
207
|
+
#
|
147
208
|
def send_chat_message(who, body)
|
148
209
|
status("Queuing message")
|
149
210
|
message = Jabber::Message.new(who, body)
|
@@ -160,6 +221,7 @@ module ScoutAgent
|
|
160
221
|
end
|
161
222
|
end
|
162
223
|
|
224
|
+
# Stops this main Thread to allow the listening Thread to take over.
|
163
225
|
def listen
|
164
226
|
log.info("Listening for commands.")
|
165
227
|
status("Listening for commands")
|
@@ -167,6 +229,7 @@ module ScoutAgent
|
|
167
229
|
Thread.stop
|
168
230
|
end
|
169
231
|
|
232
|
+
# Closes our XMPP connection, if it's still open.
|
170
233
|
def close_connection
|
171
234
|
@jabber.close! if @jabber.is_connected?
|
172
235
|
rescue Exception # connection already closed
|
@@ -174,17 +237,27 @@ module ScoutAgent
|
|
174
237
|
log.warn("Failed to close connection.")
|
175
238
|
end
|
176
239
|
|
240
|
+
###############
|
241
|
+
### Helpers ###
|
242
|
+
###############
|
243
|
+
|
244
|
+
# Returns an appropriate key for the environment.
|
177
245
|
def agent_key
|
178
246
|
@agent_key ||= Plan.test_mode? ?
|
179
247
|
"a7349498-bec3-4ddf-963c-149a666433a4" :
|
180
248
|
Plan.agent_key
|
181
249
|
end
|
182
250
|
|
251
|
+
#
|
252
|
+
# Returns an appropriate server for the environment. In non-test mode
|
253
|
+
# this will be the host() parsed out of the check-in URL.
|
254
|
+
#
|
183
255
|
def jabber_server
|
184
256
|
@jabber_server ||= Plan.test_mode? ? "jabber.org" :
|
185
257
|
URI.parse(Plan.server_url).host
|
186
258
|
end
|
187
259
|
|
260
|
+
# Returns +true+ if +user+ matches any of our trusted XMPP identities.
|
188
261
|
def trusted?(user)
|
189
262
|
id = user.to_s
|
190
263
|
@trusted.any? { |trusted| id =~ trusted }
|
@@ -6,16 +6,31 @@ require "scout_agent/mission"
|
|
6
6
|
|
7
7
|
module ScoutAgent
|
8
8
|
class Agent
|
9
|
+
#
|
10
|
+
# This agent is the main event loop for the platform. It's primary function
|
11
|
+
# is to run each Mission downloaded from the server at the correct time. As
|
12
|
+
# part of that, it regularly updates the Mission list from the server and
|
13
|
+
# also prepares and sends check-ins to the server after a set of mission
|
14
|
+
# runs. Snapshots are also periodically run here and added to check-ins.
|
15
|
+
#
|
16
|
+
# This loop also manages regular maintenance like <tt>VACUUM</tt>ing SQLite
|
17
|
+
# databases and cleaning out old log files.
|
18
|
+
#
|
9
19
|
class MasterAgent < Agent
|
20
|
+
#
|
21
|
+
# Prepares the primary event loop for execution. The main function at
|
22
|
+
# this point is to ensure that we can load all of the needed databases.
|
23
|
+
#
|
10
24
|
def initialize
|
11
25
|
super # setup our log and status
|
12
26
|
|
13
|
-
@running
|
14
|
-
@main_loop
|
15
|
-
@
|
16
|
-
@
|
17
|
-
@
|
18
|
-
@
|
27
|
+
@running = true
|
28
|
+
@main_loop = nil
|
29
|
+
@mission_pid = nil
|
30
|
+
@server = Server.new(log)
|
31
|
+
@db = Database.load(:mission_log, log)
|
32
|
+
@queue = Database.load(:queue, log)
|
33
|
+
@snapshots = Database.load(:snapshots, log)
|
19
34
|
|
20
35
|
if [@db, @queue, @snapshots].any? { |db| db.nil? }
|
21
36
|
log.fatal("Could not load all required databases.")
|
@@ -23,6 +38,11 @@ module ScoutAgent
|
|
23
38
|
end
|
24
39
|
end
|
25
40
|
|
41
|
+
#
|
42
|
+
# This method outlines the steps of the event loop: update our plan from
|
43
|
+
# the server, run Missions and snapshots, check-in, handle maintenance,
|
44
|
+
# and rest.
|
45
|
+
#
|
26
46
|
def run
|
27
47
|
log.info("Running.")
|
28
48
|
@main_loop = Thread.new do
|
@@ -42,31 +62,64 @@ module ScoutAgent
|
|
42
62
|
@main_loop.join
|
43
63
|
end
|
44
64
|
|
65
|
+
#
|
66
|
+
# This method is called automatically when this Agent process receives an
|
67
|
+
# +ALRM+ signal and all it does is to wake up the event loop Thread, if it
|
68
|
+
# is not already running. This allows the Agent to notice external
|
69
|
+
# changes quicker.
|
70
|
+
#
|
45
71
|
def notice_changes
|
46
72
|
@main_loop.run if @main_loop
|
47
73
|
rescue ThreadError # Thread was already killed
|
48
74
|
# do nothing: we're shutting down and can't notice new things
|
49
75
|
end
|
50
76
|
|
77
|
+
#
|
78
|
+
# This method is called automatically when this Agent process receives a
|
79
|
+
# stop request, like a +TERM+ signal. It prepares the Agent to shutdown,
|
80
|
+
# but doesn't trigger an immediate stop. Instead, the Agent will check to
|
81
|
+
# see if this has been called after each phase of the main event loop.
|
82
|
+
# This allows it to avoid repeating work when it relaunches.
|
83
|
+
#
|
84
|
+
# This request is also forwarded to a currently running Mission, if there
|
85
|
+
# is one.
|
86
|
+
#
|
51
87
|
def finish
|
52
88
|
if @running
|
53
89
|
log.info("Shutting down.")
|
90
|
+
if @mission_pid
|
91
|
+
log.info("Forwarding shutdown request to the running mission.")
|
92
|
+
begin
|
93
|
+
Process.kill("TERM", @mission_pid)
|
94
|
+
rescue Exception # unable to signal mission
|
95
|
+
log.warn("Mission could not be signaled.")
|
96
|
+
# do nothing: mission already ended
|
97
|
+
end
|
98
|
+
end
|
54
99
|
else
|
55
100
|
log.warn("Received multiple shutdown signals.")
|
56
101
|
end
|
57
102
|
@running = false
|
58
103
|
notice_changes
|
59
104
|
end
|
60
|
-
|
105
|
+
|
106
|
+
#######
|
61
107
|
private
|
108
|
+
#######
|
62
109
|
|
63
110
|
#############
|
64
111
|
### Agent ###
|
65
112
|
#############
|
66
113
|
|
114
|
+
#
|
115
|
+
# Updates our plan from the server, if it hasn't changed. This includes
|
116
|
+
# both the list of plugins to run as well as our list of commands involved
|
117
|
+
# in a snapshot.
|
118
|
+
#
|
67
119
|
def fetch_plan
|
68
120
|
log.info("Fetching plan from server.")
|
69
121
|
status("Fetching plan from server")
|
122
|
+
# read the plan
|
70
123
|
headers = {}
|
71
124
|
if not Plan.test_mode? and (old_plan = @db.current_plan)
|
72
125
|
log.debug( "Adding If-Modified-Since for plan fetch: " +
|
@@ -74,6 +127,7 @@ module ScoutAgent
|
|
74
127
|
headers[:if_modified_since] = old_plan[:last_modified]
|
75
128
|
end
|
76
129
|
json_plan = @server.get_plan(headers)
|
130
|
+
# skip mission or empty plans
|
77
131
|
if json_plan.nil? # failed to retrieve plan
|
78
132
|
log.warn("Could not retrieve plan from server.")
|
79
133
|
return
|
@@ -83,24 +137,38 @@ module ScoutAgent
|
|
83
137
|
else
|
84
138
|
log.info("Received plan (#{json_plan.to_s.size} bytes).")
|
85
139
|
end
|
140
|
+
# parse the plan
|
86
141
|
begin
|
87
142
|
ruby_plan = JSON.parse(json_plan.to_s)
|
88
143
|
rescue JSON::ParserError # bad JSON
|
89
144
|
log.error("Plan from server was malformed JSON.")
|
90
145
|
return # skip plan update
|
91
146
|
end
|
147
|
+
# update the local databases with the changes
|
92
148
|
@db.update_plan( json_plan.headers[:last_modified],
|
93
149
|
Array(ruby_plan["plugins"]) )
|
94
150
|
@snapshots.update_commands(Array(ruby_plan["commands"]))
|
95
151
|
end
|
96
152
|
|
153
|
+
#
|
154
|
+
# This loop runs all Missions that have exceeded their run time wait, one
|
155
|
+
# at a time. Missions are run in a child process to make it easy to time
|
156
|
+
# them out and clean up after them, as well as to ensure the agent isn't
|
157
|
+
# affected by their code.
|
158
|
+
#
|
159
|
+
# The execution process outlined for a Mission here is: create a child
|
160
|
+
# process, compile the Mission code in that process, run the code in that
|
161
|
+
# process, and have that process record the results in the database so
|
162
|
+
# this process can access them.
|
163
|
+
#
|
97
164
|
def execute_missions
|
98
165
|
status("Running missions")
|
99
166
|
ran_a_mission = false
|
100
|
-
while mission = @db.current_mission
|
167
|
+
while mission = @db.current_mission # loop over pending Missions
|
101
168
|
log.info("Running #{mission[:name]} mission.")
|
102
169
|
ran_a_mission = true
|
103
|
-
|
170
|
+
# run a Mission
|
171
|
+
@mission_pid = fork do
|
104
172
|
reset_environment
|
105
173
|
compile_mission(mission)
|
106
174
|
run_mission(mission)
|
@@ -108,10 +176,12 @@ module ScoutAgent
|
|
108
176
|
end
|
109
177
|
|
110
178
|
begin
|
179
|
+
# wait for the Mission to complete, or the timeout to expire
|
111
180
|
Timeout.timeout(mission[:timeout]) do
|
112
|
-
Process.wait(
|
181
|
+
Process.wait(@mission_pid)
|
113
182
|
end
|
114
|
-
|
183
|
+
@mission_pid = nil
|
184
|
+
unless $?.success? # record that the Mission exited with an error
|
115
185
|
log.warn( "#{mission[:name]} exited with an error: " +
|
116
186
|
"#{$?.exitstatus}." )
|
117
187
|
@db.write_report(
|
@@ -121,8 +191,9 @@ module ScoutAgent
|
|
121
191
|
:body => "Exit status: #{$?.exitstatus}"
|
122
192
|
)
|
123
193
|
end
|
124
|
-
rescue Timeout::Error #
|
125
|
-
status
|
194
|
+
rescue Timeout::Error # Mission exceeded allowed execution
|
195
|
+
status = Process.term_or_kill(@mission_pid)
|
196
|
+
@mission_pid = nil
|
126
197
|
log.error( "#{mission[:name]} took too long to run: " +
|
127
198
|
"#{status && status.exitstatus}." )
|
128
199
|
@db.write_report(
|
@@ -138,9 +209,16 @@ module ScoutAgent
|
|
138
209
|
break
|
139
210
|
end
|
140
211
|
end
|
212
|
+
# we shouldn't wake up with nothing to do, so check for that
|
141
213
|
log.warn("No missions to run.") unless ran_a_mission
|
142
214
|
end
|
143
215
|
|
216
|
+
#
|
217
|
+
# Request a snapshot via the API. This is a normal (non-forced) request,
|
218
|
+
# so only commands that have passed their interval will be run and it's
|
219
|
+
# likely that nothing at all will be done (because the last trip through
|
220
|
+
# the event loop ran a full snapshot, for example).
|
221
|
+
#
|
144
222
|
def prepare_snapshot
|
145
223
|
if Plan.periodic_snapshots?
|
146
224
|
status("Preparing a system snapshot")
|
@@ -150,10 +228,18 @@ module ScoutAgent
|
|
150
228
|
end
|
151
229
|
end
|
152
230
|
|
231
|
+
#
|
232
|
+
# Sends all generated data up to the Scout server. This is not mission
|
233
|
+
# critical data so it is removed from the databases as it is sent. This
|
234
|
+
# prevents something like a slow send that times out on our end but does
|
235
|
+
# eventually complete from causing us to later send duplicated data.
|
236
|
+
#
|
153
237
|
def checkin
|
238
|
+
# get the data from the databases
|
154
239
|
reports = @db.current_reports
|
155
240
|
queued = @queue.queued_reports
|
156
241
|
snapshots = @snapshots.current_runs
|
242
|
+
# ensure we have something to send
|
157
243
|
if reports.empty? and queued.empty? and snapshots.empty?
|
158
244
|
log.warn("No data to report to the server.")
|
159
245
|
return
|
@@ -161,16 +247,18 @@ module ScoutAgent
|
|
161
247
|
|
162
248
|
log.info("Checking in with server.")
|
163
249
|
status("Checking in with server")
|
250
|
+
# prepare the data for transport to the server
|
164
251
|
checkin = { :reports => Array.new,
|
165
252
|
:hints => Array.new,
|
166
253
|
:alerts => Array.new,
|
167
254
|
:errors => Array.new }
|
168
255
|
(reports + queued).each do |report|
|
169
|
-
type
|
256
|
+
type = report.delete_at(:type)
|
170
257
|
checkin["#{type}s".to_sym] << report.to_hash
|
171
258
|
end
|
172
259
|
checkin[:snapshots] = snapshots.map { |run| run.to_hash }
|
173
260
|
|
261
|
+
# log some details about what we are sending
|
174
262
|
report_dates = String.new
|
175
263
|
if reports.first or queued.first
|
176
264
|
dates = [ [reports.first, queued.first],
|
@@ -191,6 +279,7 @@ module ScoutAgent
|
|
191
279
|
"#{checkin[:alerts].size} alerts, " +
|
192
280
|
"and #{checkin[:errors].size} errors)#{report_dates} " +
|
193
281
|
"and #{snapshots.size} snapshot runs#{snapshot_dates}." )
|
282
|
+
# transmit the data and record the results
|
194
283
|
if @server.post_checkin(checkin)
|
195
284
|
log.info("Server received data.")
|
196
285
|
else
|
@@ -198,6 +287,11 @@ module ScoutAgent
|
|
198
287
|
end
|
199
288
|
end
|
200
289
|
|
290
|
+
#
|
291
|
+
# Performs the regular maintenance needed to keep the agent from slowly
|
292
|
+
# filling the hard drive with data. It <tt>VACUUM</tt>s databases to
|
293
|
+
# reclaim space and removes old log files.
|
294
|
+
#
|
201
295
|
def perform_maintenance
|
202
296
|
log.info("Running maintenance tasks.")
|
203
297
|
status("Running maintenance tasks")
|
@@ -213,23 +307,10 @@ module ScoutAgent
|
|
213
307
|
end
|
214
308
|
|
215
309
|
# clean out old logs
|
216
|
-
|
217
|
-
if log_file.to_s =~ /\.(\d{4})(\d{2})(\d{2})\z/
|
218
|
-
log_day = Time.local(*$~.captures.map { |n| n.to_i })
|
219
|
-
if Time.now - log_day > 60 * 60 * 24 * 7
|
220
|
-
begin
|
221
|
-
(Plan.log_dir + log_file).unlink
|
222
|
-
rescue Exception => error # file cannot be unlinked
|
223
|
-
log.error( "Failed to unlink old log file '#{log_file}': " +
|
224
|
-
"#{error.message} (#{error.class})." )
|
225
|
-
next
|
226
|
-
end
|
227
|
-
log.debug("Successfully unlinked old log file '#{log_file}'.")
|
228
|
-
end
|
229
|
-
end
|
230
|
-
end
|
310
|
+
ScoutAgent.remove_old_log_files(log)
|
231
311
|
end
|
232
312
|
|
313
|
+
# Rest for however much time we have before more work is needed.
|
233
314
|
def wait_for_orders
|
234
315
|
pause = @db.seconds_to_next_mission
|
235
316
|
log.info("Waiting #{pause} seconds for next mission run.")
|
@@ -237,6 +318,10 @@ module ScoutAgent
|
|
237
318
|
sleep pause
|
238
319
|
end
|
239
320
|
|
321
|
+
#
|
322
|
+
# Finish the shutdown process, if it was started while we were doing the
|
323
|
+
# last step of the main event loop.
|
324
|
+
#
|
240
325
|
def check_running_status
|
241
326
|
exit unless @running
|
242
327
|
end
|
@@ -245,9 +330,18 @@ module ScoutAgent
|
|
245
330
|
### Mission ###
|
246
331
|
###############
|
247
332
|
|
333
|
+
#
|
334
|
+
# Reset our parent's signal handlers, authorize the Mission identity,
|
335
|
+
# prepare a log, and reset our status.
|
336
|
+
#
|
248
337
|
def reset_environment
|
249
338
|
# swap out our parent's signal handlers
|
250
|
-
install_shutdown_handler
|
339
|
+
install_shutdown_handler do
|
340
|
+
Thread.new do
|
341
|
+
log.info("Shutting down.")
|
342
|
+
exit
|
343
|
+
end
|
344
|
+
end
|
251
345
|
|
252
346
|
# clear the parent's identity and assume mine
|
253
347
|
IDCard.me = nil
|
@@ -263,12 +357,13 @@ module ScoutAgent
|
|
263
357
|
end
|
264
358
|
end
|
265
359
|
|
360
|
+
# Build the +mission+ code or exit() with an error if it cannot be built.
|
266
361
|
def compile_mission(mission)
|
267
362
|
log.info("Compiling #{mission[:name]} mission.")
|
268
363
|
status("Compiling")
|
269
364
|
begin
|
270
365
|
eval(mission[:code], TOPLEVEL_BINDING, mission[:name])
|
271
|
-
rescue Exception => error
|
366
|
+
rescue Exception => error # any compile error
|
272
367
|
raise if $!.is_a? SystemExit # don't catch exit() calls
|
273
368
|
log.error( "#{mission[:name]} could not be compiled: " +
|
274
369
|
"#{error.message} (#{error.class})." )
|
@@ -282,6 +377,11 @@ module ScoutAgent
|
|
282
377
|
end
|
283
378
|
end
|
284
379
|
|
380
|
+
#
|
381
|
+
# Create a Mission object from the code previously prepared, passing in
|
382
|
+
# details like <tt>:memory</tt> and <tt>:options</tt> from +mission+.
|
383
|
+
# Once this Mission is created, it is run().
|
384
|
+
#
|
285
385
|
def run_mission(mission)
|
286
386
|
log.info("Preparing #{mission[:name]} mission.")
|
287
387
|
if prepared = Mission.prepared
|
@@ -304,6 +404,7 @@ module ScoutAgent
|
|
304
404
|
end
|
305
405
|
end
|
306
406
|
|
407
|
+
# Report that +mission+ is now complete.
|
307
408
|
def complete_mission(mission)
|
308
409
|
log.info("#{mission[:name]} mission complete.")
|
309
410
|
end
|