resque-telework 0.2.0
Sign up to get free protection for your applications and to get access to all the features.
- data/Gemfile +6 -0
- data/MIT-LICENSE +20 -0
- data/README.md +151 -0
- data/doc/screenshots/view_overview.png +0 -0
- data/lib/resque-telework/global.rb +10 -0
- data/lib/resque-telework/manager.rb +205 -0
- data/lib/resque-telework/railtie.rb +11 -0
- data/lib/resque-telework/redis.rb +354 -0
- data/lib/resque-telework/server/views/misc.erb +21 -0
- data/lib/resque-telework/server/views/revision.erb +25 -0
- data/lib/resque-telework/server/views/stopit.erb +35 -0
- data/lib/resque-telework/server/views/telework.erb +262 -0
- data/lib/resque-telework/server/views/worker.erb +38 -0
- data/lib/resque-telework/server.rb +220 -0
- data/lib/resque-telework.rb +10 -0
- data/lib/tasks/telework.rake +98 -0
- data/resque-telework.gemspec +26 -0
- metadata +93 -0
data/Gemfile
ADDED
data/MIT-LICENSE
ADDED
@@ -0,0 +1,20 @@
|
|
1
|
+
Copyright (c) Gilles Pirio
|
2
|
+
|
3
|
+
Permission is hereby granted, free of charge, to any person obtaining
|
4
|
+
a copy of this software and associated documentation files (the
|
5
|
+
"Software"), to deal in the Software without restriction, including
|
6
|
+
without limitation the rights to use, copy, modify, merge, publish,
|
7
|
+
distribute, sublicense, and/or sell copies of the Software, and to
|
8
|
+
permit persons to whom the Software is furnished to do so, subject to
|
9
|
+
the following conditions:
|
10
|
+
|
11
|
+
The above copyright notice and this permission notice shall be
|
12
|
+
included in all copies or substantial portions of the Software.
|
13
|
+
|
14
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
15
|
+
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
16
|
+
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
|
17
|
+
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
|
18
|
+
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
|
19
|
+
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
|
20
|
+
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
data/README.md
ADDED
@@ -0,0 +1,151 @@
|
|
1
|
+
Resque Telework
|
2
|
+
===============
|
3
|
+
|
4
|
+
[github.com/gip/resque-telework](https://github.com/gip/resque-telework)
|
5
|
+
|
6
|
+
Telework depends on Resque 1.20+ and Redis 2.2+
|
7
|
+
|
8
|
+
Description
|
9
|
+
-----------
|
10
|
+
|
11
|
+
Telework is a [Resque](https://github.com/defunkt/resque) plugin aimed at controlling Resque workers from the web UI. It makes it easy to manage workers on a complex systems that includes several hosts, different queue(s) and an evolving source code that is deployed several times a day. Beyond starting and stopping workers on remote hosts, the plugin makes it easy to switch between code revisions, gives a partial view of each worker's log (stdout and stderr) and maintains a status of each workers.
|
12
|
+
|
13
|
+
Telework comes with three main components
|
14
|
+
|
15
|
+
* A web interface that smoothly integrates in Resque and adds its own tab
|
16
|
+
* A daemon process to be started on each host (`rake telework:start_daemon` starts a new daemon and returns while `rake telework:daemon` runs the daemon interactively)
|
17
|
+
* A registration command (`rake telework:register_revision`) to be called by the deployment script when a new revision is added on the host
|
18
|
+
|
19
|
+
Note that currently (Telework 0.0.1), the daemon process is included in the main app, which is not really elegant as the full Rails environment needs to be loaded to run the daemon. A light-weight daemon is currently being developed and should be ready in the coming weeks.
|
20
|
+
|
21
|
+
Overview of the WebUI
|
22
|
+
---------------------
|
23
|
+
|
24
|
+
![Main Telework Window](https://github.com/gip/resque-telework/raw/master/doc/screenshots/view_overview.png)
|
25
|
+
|
26
|
+
The screenshot above shows the initial version of the Telework main window. The top table shows the active hosts, the different revision and the running workers. The bottom table shows the different status messages received from the hosts. Not that the layout is being improved and will look better soon :)
|
27
|
+
|
28
|
+
Installation
|
29
|
+
------------
|
30
|
+
|
31
|
+
Install as a gem:
|
32
|
+
|
33
|
+
```
|
34
|
+
gilles@myapphost $ gem install resque-telework
|
35
|
+
```
|
36
|
+
|
37
|
+
You may also add the following line in the Gemfile
|
38
|
+
|
39
|
+
```
|
40
|
+
gem 'resque-telework'
|
41
|
+
```
|
42
|
+
|
43
|
+
Configuration
|
44
|
+
-------------
|
45
|
+
|
46
|
+
Some external configuration is necessary when working with Telework as the gem needs a way to retrieve information about the code revision being deployed (git hash or SVN revision number), its path, the location for log files and so on.. When Telework rake tasks start (`telework:register_revision`, `telework:start_daemon` or `telework:daemon`), it will try to open the file in the environment variable `TELEWORK_CONFIG_FILE`. If this variable doesn't exist it will try to open the `telework.conf` file in the local directory.
|
47
|
+
|
48
|
+
The configuration file should contains information about the revision being deployed in the JSON format. A simple way of achieving this is to add a task in the deployment script. For instance, if you are using [Capistrano](https://github.com/capistrano/capistrano), the new task could look like this:
|
49
|
+
|
50
|
+
```ruby
|
51
|
+
# ...
|
52
|
+
|
53
|
+
namespace :deploy do
|
54
|
+
|
55
|
+
# ... other deployment tasks here
|
56
|
+
|
57
|
+
# Telework registration task (example for github)
|
58
|
+
task :telework_register do
|
59
|
+
repo= 'john/reputedly' # <<< Change your Github repo name here
|
60
|
+
github_repo= "https://github.com/#{repo}"
|
61
|
+
log_path= "#{deploy_to}/shared/worker_log" # <<< Change paths to the log files here
|
62
|
+
run "mkdir -p #{log_path}" # Making sure the log directory exists
|
63
|
+
begin
|
64
|
+
require 'octokit' # Gem to access the Github API
|
65
|
+
client = Octokit::Client.new(:login => ACCOUNT, :password => PASSWORD ) # <<< Put your Github credentials here
|
66
|
+
commit= client.commit(repo, latest_revision)
|
67
|
+
rev_date= commit['commit']['committer']['date']
|
68
|
+
rev_name= commit['commit']['committer']['name']
|
69
|
+
rev_info= commit['commit']['message']
|
70
|
+
rescue # No big deal if there is a problem accessing Github,
|
71
|
+
# the info fields will just remain empty
|
72
|
+
end
|
73
|
+
cfg= { :revision => latest_revision, # latest_revison, current_release, branch,...
|
74
|
+
:revision_small => latest_revision[0..6], # are defined by Capistrano
|
75
|
+
:revision_path => "#{current_release}",
|
76
|
+
:revision_link => "#{github_repo}/commit/#{latest_revision}",
|
77
|
+
:revision_branch => branch,
|
78
|
+
:revision_date => rev_date,
|
79
|
+
:revision_committer => rev_name,
|
80
|
+
:revision_deployement_date => Time.now,
|
81
|
+
:revision_info => rev_info,
|
82
|
+
:revision_log_path => log_path,
|
83
|
+
:daemon_pooling_interval => 2,
|
84
|
+
:daemon_log_path => deploy_to }
|
85
|
+
|
86
|
+
# Create the config file
|
87
|
+
require 'json'
|
88
|
+
put cfg.to_json, "#{deploy_to}/current/telework.conf"
|
89
|
+
|
90
|
+
# Start the registration rake task
|
91
|
+
run "cd #{deploy_to}/current && bundle exec rake telework:register_revision --trace"
|
92
|
+
end
|
93
|
+
after "deploy:more_symlinks", "deploy:telework_register" # <<< Schedule the task at the end of deployment
|
94
|
+
|
95
|
+
end
|
96
|
+
```
|
97
|
+
|
98
|
+
Workflow
|
99
|
+
--------
|
100
|
+
|
101
|
+
After Telework is installed and the `TeleworkConfig` class modified to match your environment, the code may be deployed to all the relevant hosts. If you're using [Capistrano](https://github.com/capistrano/capistrano) it may look like:
|
102
|
+
|
103
|
+
```
|
104
|
+
gilles@myapphost $ cap deploy -S servers=myapphost,myworkhost0,myworkhost1,myworkhost2
|
105
|
+
```
|
106
|
+
|
107
|
+
The code above deploys the code to the main app box (`myapphost`) and all the other 'worker' hosts. On each of these hosts, it is now necessary to register the new revision with Telework and start the Telework daemon. For instance on host0, this is done using the following commands:
|
108
|
+
|
109
|
+
```
|
110
|
+
gilles@myworkhost0 $ rake telework:register_revision
|
111
|
+
gilles@myworkhost0 $ rake telework:start_daemon
|
112
|
+
```
|
113
|
+
|
114
|
+
The main Telework tab should now show the new box as alive. It is now possible to seamlessly start new workers on these boxes using the new web-based UI.
|
115
|
+
|
116
|
+
Going forward, when a new version of the app is deployed on host, it is necessary to register the new revision using the following command:
|
117
|
+
|
118
|
+
```
|
119
|
+
gilles@myworkhost0 $ rake telework:register_revision
|
120
|
+
```
|
121
|
+
Note that it is not necessary to stop and restart the daemon. Restarting the daemon is only required when the Telework gem is updated.
|
122
|
+
|
123
|
+
Known Issues
|
124
|
+
------------
|
125
|
+
|
126
|
+
For version 0.2:
|
127
|
+
|
128
|
+
* Buttons are not aligned in the web-UI
|
129
|
+
* The daemon crashes if any of the log directories do not exist
|
130
|
+
|
131
|
+
Bugs
|
132
|
+
----
|
133
|
+
|
134
|
+
Please report bugs on [github](https://github.com/gip/resque-telework/issues) or directly to [gip.github@gmail.com](gip.github@gmail.com)
|
135
|
+
|
136
|
+
Todo
|
137
|
+
----
|
138
|
+
|
139
|
+
The following features are are being developed and should be available shortly:
|
140
|
+
|
141
|
+
* Improved window layout
|
142
|
+
|
143
|
+
The following features are planned for future releases:
|
144
|
+
|
145
|
+
* Light-weight daemon in Haskell
|
146
|
+
* Worker statistics
|
147
|
+
|
148
|
+
Thanks
|
149
|
+
------
|
150
|
+
|
151
|
+
I would like to thank [Entelo](http://www.entelo.com/) for the awesome environment and support to open-source development
|
Binary file
|
@@ -0,0 +1,205 @@
|
|
1
|
+
module Resque
|
2
|
+
module Plugins
|
3
|
+
module Telework
|
4
|
+
class Manager
|
5
|
+
|
6
|
+
include Resque::Plugins::Telework::Redis
|
7
|
+
|
8
|
+
def initialize(cfg)
|
9
|
+
@RUN_DAEMON= true
|
10
|
+
@HOST= cfg['hostname']
|
11
|
+
@SLEEP= cfg['daemon_pooling_interval']
|
12
|
+
@WORKERS= {}
|
13
|
+
@STOPPED= []
|
14
|
+
end
|
15
|
+
|
16
|
+
# The manager (e.g. daemon) main loop
|
17
|
+
def start
|
18
|
+
send_status( 'Info', "Daemon (PID #{Process.pid}) starting on host #{@HOST}" )
|
19
|
+
unless check_redis # Check the Redis interface version
|
20
|
+
err= "Telework: Error: Redis interface version mismatch, exiting"
|
21
|
+
puts err # We can't use send_status() as it relies on Redis so we just show a message
|
22
|
+
raise err
|
23
|
+
end
|
24
|
+
if is_alive(@HOST) # Only one deamon can be run on a given host at the moment (this may change)
|
25
|
+
send_status( 'Error', "There is already a daemon running on #{@HOST}")
|
26
|
+
send_status( 'Error', "This daemon (PID #{Process.pid}) cannot be started and will terminare now")
|
27
|
+
exit
|
28
|
+
end
|
29
|
+
loop do # The main loop
|
30
|
+
while @RUN_DAEMON do # If there is no request to stop
|
31
|
+
i_am_alive(health_info) # Notify the system that the daemon is alive
|
32
|
+
check_processes # Check the status of the child processes (to catch zombies)
|
33
|
+
while cmd= cmds_pop( @HOST ) do # Pop a command in the command queue
|
34
|
+
do_command(cmd) # Execute it
|
35
|
+
end
|
36
|
+
sleep @SLEEP # Sleep
|
37
|
+
end
|
38
|
+
# A stop request has been received
|
39
|
+
send_status( 'Info', "A stop request has been received and the #{@HOST} daemon will now terminate") if @WORKERS.empty?
|
40
|
+
break if @WORKERS.empty?
|
41
|
+
send_status( 'Error', "A stop request has been received by the #{@HOST} daemon but there are still running worker(s) so it will keep running") unless @WORKERS.empty?
|
42
|
+
@RUN_DAEMON= true
|
43
|
+
end
|
44
|
+
rescue Interrupt # Control-C
|
45
|
+
send_status( 'Info', "Interruption for #{@HOST} daemon, exiting gracefully") if @WORKERS.empty?
|
46
|
+
send_status( 'Error', "Interruption for #{@HOST} daemon, exiting, running workers may now unexpectedly terminate") unless @WORKERS.empty?
|
47
|
+
rescue SystemExit # Exit has been called
|
48
|
+
send_status( 'Info', "Exit called in #{@HOST} daemon") if @WORKERS.empty?
|
49
|
+
send_status( 'Error', "Exit called in #{@HOST} daemon but workers are still running") unless @WORKERS.empty?
|
50
|
+
rescue Exception => e # Other exceptions
|
51
|
+
send_status( 'Error', "Exception #{e.message}")
|
52
|
+
puts "Backtrace: #{e.backtrace}"
|
53
|
+
send_status( 'Error', "Exception should not be raised in the #{@HOST} daemon, please submit a bug report")
|
54
|
+
end
|
55
|
+
|
56
|
+
# Health info
|
57
|
+
def health_info
|
58
|
+
require "sys/cpu"
|
59
|
+
load= Sys::CPU.load_avg
|
60
|
+
{ :cpu_load_1mins => load[0],
|
61
|
+
:cpu_load_5mins => load[1],
|
62
|
+
:cpu_load_15mins => load[2] }
|
63
|
+
rescue
|
64
|
+
{}
|
65
|
+
end
|
66
|
+
|
67
|
+
# Add a status message on the status queue
|
68
|
+
def send_status( severity, message )
|
69
|
+
puts "Telework: #{severity}: #{message}"
|
70
|
+
info= { 'host'=> @HOST, 'severity' => severity, 'message'=> message,
|
71
|
+
'date'=> Time.now }
|
72
|
+
status_push(info)
|
73
|
+
end
|
74
|
+
|
75
|
+
# Execute a command synchronously
|
76
|
+
def do_command( cmd )
|
77
|
+
case cmd['command']
|
78
|
+
when 'start_worker'
|
79
|
+
start_worker( cmd, find_revision(cmd['revision']) )
|
80
|
+
when 'signal_worker'
|
81
|
+
manage_worker( cmd )
|
82
|
+
when 'stop_daemon'
|
83
|
+
@RUN_DAEMON= false
|
84
|
+
when 'kill_daemon'
|
85
|
+
send_status( 'Error', "A kill request has been received, the daemon on #{@HOST} is now brutally terminating by calling exit()")
|
86
|
+
exit # Bye
|
87
|
+
else
|
88
|
+
send_status( 'Error', "Unknown command '#{cmd['command']}'" )
|
89
|
+
end
|
90
|
+
end
|
91
|
+
|
92
|
+
# Start a task
|
93
|
+
def start_worker( cmd, rev_info )
|
94
|
+
# Retrieving args
|
95
|
+
path= rev_info['revision_path']
|
96
|
+
log_path= rev_info['revision_log_path']
|
97
|
+
log_path||= "."
|
98
|
+
rev= rev_info['revision']
|
99
|
+
id= cmd['worker_id']
|
100
|
+
queuel= cmd['queue'].gsub(/,/, '_').gsub(/\*/, 'STAR')
|
101
|
+
# Starting the job
|
102
|
+
env= {}
|
103
|
+
env["QUEUE"]= cmd['queue']
|
104
|
+
# env["COUNT"]= cmd['worker_count'] if cmd['worker_count']
|
105
|
+
env["RAILS_ENV"]= cmd['rails_env'] if "(default)" != cmd['rails_env']
|
106
|
+
env["BUNDLE_GEMFILE"] = path+"/Gemfile" if ENV["BUNDLE_GEMFILE"] # To make sure we use the new gems
|
107
|
+
opt= { :in => "/dev/null",
|
108
|
+
:out => "#{log_path}/telework_#{id}_#{queuel}_stdout.log",
|
109
|
+
:err => "#{log_path}/telework_#{id}_#{queuel}_stderr.log",
|
110
|
+
:chdir => path,
|
111
|
+
:unsetenv_others => false }
|
112
|
+
exec= cmd['exec']
|
113
|
+
pid= spawn( env, exec, opt) # Start it!
|
114
|
+
info= { 'pid' => pid, 'status' => 'RUN', 'environment' => env, 'options' => opt, 'revision_info' => rev_info }
|
115
|
+
# Log snapshot
|
116
|
+
info['log_snapshot_period']= cmd['log_snapshot_period'] if cmd['log_snapshot_period']
|
117
|
+
info['log_snapshort_lines']= cmd['log_snapshot_lines'] if cmd['log_snapshot_lines']
|
118
|
+
@WORKERS[id]= info
|
119
|
+
workers_add( @HOST, id, info )
|
120
|
+
send_status( 'Info', "Starting worker #{id} (PID #{pid})" )
|
121
|
+
# Create an helper file
|
122
|
+
intro = "# Telework: starting worker #{id} on host #{@HOST} at #{Time.now.strftime("%a %b %e %R %Y")}"
|
123
|
+
env.keys.each { |v| intro+= "\n# Telework: environment variable '#{v}' set to '#{env[v]}'" }
|
124
|
+
intro+= "\n# Telework: command line is: #{exec}"
|
125
|
+
intro+= "\n# Telework: path is: #{path}"
|
126
|
+
intro+= "\n# Telework: log file for stdout is: #{opt[:out]}"
|
127
|
+
intro+= "\n# Telework: log file for stderr is: #{opt[:err]}"
|
128
|
+
intro+= "\n# Telework: PID is: #{pid}"
|
129
|
+
intro+= "\n"
|
130
|
+
File.open("#{log_path}/telework_#{id}.log", 'w') { |f| f.write(intro) }
|
131
|
+
end
|
132
|
+
|
133
|
+
def manage_worker ( cmd )
|
134
|
+
id= cmd['worker_id']
|
135
|
+
sig= cmd['action'] # Can be QUIT, KILL, CONT, PAUSE
|
136
|
+
info= @WORKERS[id]
|
137
|
+
send_status( 'Error', "Worker #{id} was not found on this host" ) unless info
|
138
|
+
return unless info
|
139
|
+
status= sig
|
140
|
+
sig= 'USR2' if 'PAUSE'==sig # Pause a Resque worker using USR2 signal
|
141
|
+
status= 'RUN' if status=='CONT'
|
142
|
+
send_status( 'Info', "Signaling worker #{id} (PID #{info['pid']}) using signal #{sig}" )
|
143
|
+
Process.kill( sig, info['pid'] ) # Signaling...
|
144
|
+
@STOPPED << id if 'QUIT'==sig || 'KILL'==sig
|
145
|
+
info['status']= status
|
146
|
+
workers_add( @HOST, id, info )
|
147
|
+
@WORKERS[id]= info
|
148
|
+
end
|
149
|
+
|
150
|
+
def check_processes
|
151
|
+
#workers_delall( @HOST )
|
152
|
+
@WORKERS.keys.each do |id|
|
153
|
+
remove= false
|
154
|
+
unexpected_death= false
|
155
|
+
begin # Zombie hunt..
|
156
|
+
res= Process.waitpid(@WORKERS[id]['pid'], Process::WNOHANG)
|
157
|
+
remove= true if res
|
158
|
+
rescue # Not a child.. so the process is already dead (we don't know why, maybe someone did a kill -9)
|
159
|
+
unexpected_death= true
|
160
|
+
remove= true
|
161
|
+
end
|
162
|
+
if remove
|
163
|
+
workers_rem( @HOST, id )
|
164
|
+
if unexpected_death
|
165
|
+
send_status( 'Error', "Worker #{id} (PID #{@WORKERS[id]['pid']}) has unexpectedly ended" )
|
166
|
+
else
|
167
|
+
send_status( 'Info', "Worker #{id} (PID #{@WORKERS[id]['pid']}) has exited" ) if @STOPPED.index(id)
|
168
|
+
send_status( 'Error', "Worker #{id} (PID #{@WORKERS[id]['pid']}) has unexpectedly exited" ) unless @STOPPED.index(id)
|
169
|
+
@STOPPED.delete(id)
|
170
|
+
end
|
171
|
+
@WORKERS.delete(id)
|
172
|
+
else
|
173
|
+
update_log_snapshot(id)
|
174
|
+
workers_add( @HOST, id, @WORKERS[id] )
|
175
|
+
end
|
176
|
+
end
|
177
|
+
end
|
178
|
+
|
179
|
+
def update_log_snapshot( id )
|
180
|
+
ls= @WORKERS[id]['log_snapshot_period']
|
181
|
+
return unless ls
|
182
|
+
last= @WORKERS[id]['last_log_snapshot']
|
183
|
+
last||= 0
|
184
|
+
now= Time.now.to_i
|
185
|
+
if now >= last+ls
|
186
|
+
size= @WORKERS[id]['log_snapshot_lines']
|
187
|
+
size||= 20
|
188
|
+
# Getting the logs
|
189
|
+
logerr= get_tail( @WORKERS[id]['options'][:err], size )
|
190
|
+
logout= get_tail( @WORKERS[id]['options'][:out], size )
|
191
|
+
# Write back
|
192
|
+
info= { :date => Time.now, :log_stderr => logerr, :log_stdout => logout }
|
193
|
+
logs_add( @HOST, id, info )
|
194
|
+
@WORKERS[id]['last_log_snapshot']= now
|
195
|
+
end
|
196
|
+
end
|
197
|
+
|
198
|
+
def get_tail( f, size )
|
199
|
+
`tail -n #{size} #{f}`
|
200
|
+
end
|
201
|
+
|
202
|
+
end
|
203
|
+
end
|
204
|
+
end
|
205
|
+
end
|
@@ -0,0 +1,354 @@
|
|
1
|
+
module Resque
|
2
|
+
module Plugins
|
3
|
+
module Telework
|
4
|
+
module Redis
|
5
|
+
|
6
|
+
def key_prefix
|
7
|
+
"plugins:#{Resque::Plugins::Telework::Nickname}"
|
8
|
+
end
|
9
|
+
|
10
|
+
def redis_interface_key # String
|
11
|
+
"#{key_prefix}:redisif"
|
12
|
+
end
|
13
|
+
|
14
|
+
def ids_key # String
|
15
|
+
"#{key_prefix}:ids"
|
16
|
+
end
|
17
|
+
|
18
|
+
def hosts_key # Set
|
19
|
+
"#{key_prefix}:hosts"
|
20
|
+
end
|
21
|
+
|
22
|
+
def revisions_key( h ) # List
|
23
|
+
"#{key_prefix}:host:#{h}:revisions"
|
24
|
+
end
|
25
|
+
|
26
|
+
def workers_key( h ) # Hash
|
27
|
+
"#{key_prefix}:host:#{h}:workers"
|
28
|
+
end
|
29
|
+
|
30
|
+
def tasks_key( h ) # Hash
|
31
|
+
"#{key_prefix}:host:#{h}:tasks"
|
32
|
+
end
|
33
|
+
|
34
|
+
def logs_key( h ) # Hash
|
35
|
+
"#{key_prefix}:host:#{h}:logs"
|
36
|
+
end
|
37
|
+
|
38
|
+
def cmds_key( h ) # List
|
39
|
+
"#{key_prefix}:host:#{h}:cmds"
|
40
|
+
end
|
41
|
+
|
42
|
+
def acks_key( h ) # List
|
43
|
+
"#{key_prefix}:host:#{h}:acks"
|
44
|
+
end
|
45
|
+
|
46
|
+
def status_key # List
|
47
|
+
"#{key_prefix}:status"
|
48
|
+
end
|
49
|
+
|
50
|
+
def alive_key( h ) # String, with TTL
|
51
|
+
"#{key_prefix}:host:#{h}:alive"
|
52
|
+
end
|
53
|
+
|
54
|
+
def last_seen_key( h ) # String, no TTL
|
55
|
+
"#{key_prefix}:host:#{h}:last_seen"
|
56
|
+
end
|
57
|
+
|
58
|
+
def notes_key # List
|
59
|
+
"#{key_prefix}:notes"
|
60
|
+
end
|
61
|
+
|
62
|
+
# Checks
|
63
|
+
def check_redis
|
64
|
+
res= true
|
65
|
+
v0= Resque::Plugins::Telework::REDIS_INTERFACE_VERSION
|
66
|
+
v= Resque.redis.get(redis_interface_key)
|
67
|
+
if v!=v0
|
68
|
+
Resque.redis.set(redis_interface_key, v0) unless v
|
69
|
+
res=false if v
|
70
|
+
end
|
71
|
+
res
|
72
|
+
end
|
73
|
+
|
74
|
+
# Clients (hosts) side
|
75
|
+
|
76
|
+
def i_am_alive( info= {}, ttl=10 )
|
77
|
+
h= @HOST
|
78
|
+
t= Time.now
|
79
|
+
info= info.merge( { 'date' => t, 'version' => Resque::Plugins::Telework::Version } )
|
80
|
+
k= alive_key(h)
|
81
|
+
hosts_add(h)
|
82
|
+
Resque.redis.set(k, info.to_json )
|
83
|
+
Resque.redis.expire(k, ttl)
|
84
|
+
Resque.redis.set(last_seen_key(h), t)
|
85
|
+
end
|
86
|
+
|
87
|
+
def register_revision( h, rev, lim=9 )
|
88
|
+
k= revisions_key(h)
|
89
|
+
Resque.redis.ltrim(k, 0, lim-1)
|
90
|
+
rem= []
|
91
|
+
Resque.redis.lrange(k, 0, lim-1).each do |s|
|
92
|
+
info= ActiveSupport::JSON.decode(s)
|
93
|
+
if info['revision']==rev['revision']
|
94
|
+
rem << s
|
95
|
+
puts "Telework: Info: Revision #{rev['revision']} was already registered for this host, so the previous one will be unregistered"
|
96
|
+
end
|
97
|
+
if info['revision_path']==rev['revision_path']
|
98
|
+
rem << s
|
99
|
+
puts "Telework: Info: Path for revision #{rev['revision']} was already registedred by another revision which will therefore by removed"
|
100
|
+
end
|
101
|
+
end
|
102
|
+
rem.each { |r| Resque.redis.lrem(k, 0, r) }
|
103
|
+
revisions_add( h, rev )
|
104
|
+
end
|
105
|
+
|
106
|
+
def find_revision( rev )
|
107
|
+
revisions(@HOST).each do |r|
|
108
|
+
return r if rev==r['revision']
|
109
|
+
end
|
110
|
+
nil
|
111
|
+
end
|
112
|
+
|
113
|
+
def hosts_add( h )
|
114
|
+
Resque.redis.sadd(hosts_key, h)
|
115
|
+
end
|
116
|
+
|
117
|
+
def revisions_add( h, v )
|
118
|
+
hosts_add(h)
|
119
|
+
k= revisions_key(h)
|
120
|
+
Resque.redis.lpush(k, v.to_json )
|
121
|
+
end
|
122
|
+
|
123
|
+
def workers_delall( h )
|
124
|
+
Resque.redis.del(workers_key(h))
|
125
|
+
end
|
126
|
+
|
127
|
+
def workers_add( h, id, info, ttl=10 )
|
128
|
+
k= workers_key(h)
|
129
|
+
Resque.redis.hset(k, id, info.to_json )
|
130
|
+
Resque.redis.expire(k, ttl)
|
131
|
+
end
|
132
|
+
|
133
|
+
def workers_rem( h , id )
|
134
|
+
k= workers_key(h)
|
135
|
+
Resque.redis.hdel(k, id)
|
136
|
+
end
|
137
|
+
|
138
|
+
def tasks_add( h, id, info)
|
139
|
+
k= tasks_key(h)
|
140
|
+
Resque.redis.hset(k, id, info.to_json )
|
141
|
+
end
|
142
|
+
|
143
|
+
def tasks_rem( h , id )
|
144
|
+
k= tasks_key(h)
|
145
|
+
Resque.redis.hdel(k, id)
|
146
|
+
end
|
147
|
+
|
148
|
+
def cmds_pop( h )
|
149
|
+
info= Resque.redis.rpop(cmds_key(h))
|
150
|
+
info ? ActiveSupport::JSON.decode(info) : nil
|
151
|
+
end
|
152
|
+
|
153
|
+
def logs_add( h, id, info )
|
154
|
+
k= logs_key(h)
|
155
|
+
Resque.redis.hset(k, id, info.to_json )
|
156
|
+
end
|
157
|
+
|
158
|
+
def acks_push( h, info, lim=10 )
|
159
|
+
Resque.redis.lpush(acks_key(h), info)
|
160
|
+
Resque.redis.ltrim(acks_key(h), 0, lim-1)
|
161
|
+
end
|
162
|
+
|
163
|
+
def status_push( info, lim=100 )
|
164
|
+
Resque.redis.lpush(status_key, info.to_json )
|
165
|
+
Resque.redis.ltrim(status_key, 0, lim-1)
|
166
|
+
end
|
167
|
+
|
168
|
+
# Server side
|
169
|
+
|
170
|
+
def daemons_state( clean = 30000000 )
|
171
|
+
alive= []
|
172
|
+
dead= []
|
173
|
+
unknown= []
|
174
|
+
hosts.each do |h|
|
175
|
+
life= is_alive(h)
|
176
|
+
alive << [h, "Alive", life] if life
|
177
|
+
unless life
|
178
|
+
ls= last_seen(h)
|
179
|
+
dead << [h, "Last seen #{fmt_date(ls, true)}", {} ] if ls
|
180
|
+
unknown << [h, 'Unknown', {} ] unless ls
|
181
|
+
end
|
182
|
+
end
|
183
|
+
alive+dead+unknown
|
184
|
+
end
|
185
|
+
|
186
|
+
def configuration
|
187
|
+
c= {}
|
188
|
+
hosts.each do |h|
|
189
|
+
c[h]= tasks(h).map{ |id, info| info }
|
190
|
+
end
|
191
|
+
c.to_json
|
192
|
+
end
|
193
|
+
|
194
|
+
# This function update the status of the tasks depending of what is found in workers
|
195
|
+
# This function must be idempotent
|
196
|
+
def reconcile
|
197
|
+
hosts.each do |h|
|
198
|
+
tasks(h).each do |id, info|
|
199
|
+
statuses= []
|
200
|
+
pids= []
|
201
|
+
tstatus= info['worker_status'] # Task status
|
202
|
+
info['worker_id'].each do |id|
|
203
|
+
worker= workers_by_id( h, id )
|
204
|
+
wstatus= worker ? worker['status'] : 'STOP' # Worker status
|
205
|
+
# wstatus: QUIT, KILL, CONT, PAUSE, RUN, STOP
|
206
|
+
# tstatus: Running, Starting, Stopped, Paused
|
207
|
+
ws= case wstatus
|
208
|
+
when "QUIT"
|
209
|
+
"Quitting"
|
210
|
+
when "KILL"
|
211
|
+
"Killing"
|
212
|
+
when "CONT"
|
213
|
+
"Resuming"
|
214
|
+
when "PAUSE"
|
215
|
+
"Paused"
|
216
|
+
when "RUN"
|
217
|
+
"Running"
|
218
|
+
when "STOP"
|
219
|
+
"Stopped"
|
220
|
+
else
|
221
|
+
"Unknown"
|
222
|
+
end
|
223
|
+
statuses << ws
|
224
|
+
pids << worker['pid'] if worker
|
225
|
+
end
|
226
|
+
ts= statuses.uniq * ","
|
227
|
+
if ts!=tstatus #&& (tstatus!="Starting" || wstatus!="STOP")
|
228
|
+
info['worker_status']= ts
|
229
|
+
info['worker_pid']= pids
|
230
|
+
tasks_add( h, id, info )
|
231
|
+
end
|
232
|
+
end
|
233
|
+
end
|
234
|
+
end
|
235
|
+
|
236
|
+
def workers( h )
|
237
|
+
Resque.redis.hgetall(workers_key(h)).collect { |id, info| [id, ActiveSupport::JSON.decode(info)] }
|
238
|
+
end
|
239
|
+
|
240
|
+
def workers_by_id( h, id )
|
241
|
+
k= workers_key(h)
|
242
|
+
info= Resque.redis.hget(k, id)
|
243
|
+
info ? ActiveSupport::JSON.decode(info) : nil
|
244
|
+
end
|
245
|
+
|
246
|
+
def tasks( h )
|
247
|
+
Resque.redis.hgetall(tasks_key(h)).collect { |id, info| [id, ActiveSupport::JSON.decode(info)] }
|
248
|
+
end
|
249
|
+
|
250
|
+
def tasks_by_id( h, id )
|
251
|
+
k= tasks_key(h)
|
252
|
+
info= Resque.redis.hget(k, id)
|
253
|
+
info ? ActiveSupport::JSON.decode(info) : nil
|
254
|
+
end
|
255
|
+
|
256
|
+
def logs_by_id( h, id )
|
257
|
+
k= logs_key(h)
|
258
|
+
info= Resque.redis.hget(k, id)
|
259
|
+
info ? ActiveSupport::JSON.decode(info) : nil
|
260
|
+
end
|
261
|
+
|
262
|
+
def unique_id
|
263
|
+
Resque.redis.incr(ids_key)
|
264
|
+
end
|
265
|
+
|
266
|
+
def cmds_push( h, info, ttl=300 )
|
267
|
+
k= cmds_key(h)
|
268
|
+
Resque.redis.lpush(k, info.to_json)
|
269
|
+
Resque.redis.expire(k, ttl)
|
270
|
+
end
|
271
|
+
|
272
|
+
def notes_push( info )
|
273
|
+
Resque.redis.lpush(notes_key, info.to_json)
|
274
|
+
end
|
275
|
+
|
276
|
+
def notes_pop ( lim= 100 )
|
277
|
+
Resque.redis.lrange(notes_key, 0, lim-1).collect { |s| ActiveSupport::JSON.decode(s) }
|
278
|
+
end
|
279
|
+
|
280
|
+
def notes_del( id )
|
281
|
+
info= Resque.redis.lindex(notes_key, id)
|
282
|
+
Resque.redis.lrem(notes_key, 0, info)
|
283
|
+
end
|
284
|
+
|
285
|
+
def acks_pop( h )
|
286
|
+
Resque.redis.rpop(acks_key(h))
|
287
|
+
end
|
288
|
+
|
289
|
+
def statuses( lim=100 )
|
290
|
+
Resque.redis.lrange(status_key, 0, lim-1).collect { |s| ActiveSupport::JSON.decode(s) }
|
291
|
+
end
|
292
|
+
|
293
|
+
def hosts_rem( h )
|
294
|
+
[ revisions_key(h), workers_key(h),
|
295
|
+
cmds_key(h), alive_key(h), last_seen_key(h) ].each do |k|
|
296
|
+
Resque.redis.del(k)
|
297
|
+
end
|
298
|
+
Resque.redis.srem( hosts_key, h )
|
299
|
+
end
|
300
|
+
|
301
|
+
def hosts
|
302
|
+
Resque.redis.smembers(hosts_key)
|
303
|
+
end
|
304
|
+
|
305
|
+
def revisions( h, lim=30 )
|
306
|
+
k= revisions_key(h)
|
307
|
+
Resque.redis.ltrim(k, 0, lim-1)
|
308
|
+
Resque.redis.lrange(k, 0, lim-1).map { |s| ActiveSupport::JSON.decode(s) }
|
309
|
+
end
|
310
|
+
|
311
|
+
def is_alive( h )
|
312
|
+
v= Resque.redis.get(alive_key(h))
|
313
|
+
return nil unless v
|
314
|
+
begin
|
315
|
+
ActiveSupport::JSON.decode(v)
|
316
|
+
rescue
|
317
|
+
{}
|
318
|
+
end
|
319
|
+
end
|
320
|
+
|
321
|
+
def last_seen( h )
|
322
|
+
Resque.redis.get(last_seen_key(h))
|
323
|
+
end
|
324
|
+
|
325
|
+
def nb_keys
|
326
|
+
Resque.redis.keys("#{key_prefix}:*").length
|
327
|
+
end
|
328
|
+
|
329
|
+
def fmt_date( t, rel=false ) # This is not redis-specific and should be moved to another class!
|
330
|
+
begin
|
331
|
+
if rel
|
332
|
+
"#{time_ago_in_words(Time.parse(t))} ago"
|
333
|
+
else
|
334
|
+
Time.parse(t).strftime("%a %b %e %R %Y")
|
335
|
+
end
|
336
|
+
rescue
|
337
|
+
"(unknown date)"
|
338
|
+
end
|
339
|
+
end
|
340
|
+
|
341
|
+
def text_to_html(s)
|
342
|
+
return "" unless s
|
343
|
+
ss= s.gsub(/\n/, '<br>')
|
344
|
+
end
|
345
|
+
|
346
|
+
end
|
347
|
+
end
|
348
|
+
end
|
349
|
+
end
|
350
|
+
|
351
|
+
class TeleworkRedis
|
352
|
+
include ActionView::Helpers::DateHelper
|
353
|
+
include Resque::Plugins::Telework::Redis
|
354
|
+
end
|