einhorn 0.7.4 → 0.8.2
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +7 -0
- data/README.md +37 -0
- data/README.md.in +21 -3
- data/bin/einhorn +17 -2
- data/example/pool_worker.rb +1 -1
- data/lib/einhorn.rb +40 -6
- data/lib/einhorn/client.rb +2 -3
- data/lib/einhorn/command.rb +103 -15
- data/lib/einhorn/command/interface.rb +11 -0
- data/lib/einhorn/event.rb +10 -1
- data/lib/einhorn/prctl.rb +26 -0
- data/lib/einhorn/prctl_linux.rb +49 -0
- data/lib/einhorn/version.rb +1 -1
- data/lib/einhorn/worker.rb +47 -25
- data/test/integration/_lib/fixtures/exit_during_upgrade/exiting_server.rb +1 -0
- data/test/integration/_lib/fixtures/pdeathsig_printer/pdeathsig_printer.rb +29 -0
- data/test/integration/_lib/fixtures/signal_timeout/sleepy_server.rb +23 -0
- data/test/integration/_lib/fixtures/upgrade_project/upgrading_server.rb +2 -0
- data/test/integration/_lib/helpers/einhorn_helpers.rb +5 -0
- data/test/integration/pdeathsig.rb +26 -0
- data/test/integration/upgrading.rb +47 -0
- data/test/unit/_lib/bad_worker.rb +7 -0
- data/test/unit/_lib/sleep_worker.rb +5 -0
- data/test/unit/einhorn.rb +41 -3
- data/test/unit/einhorn/command.rb +114 -0
- metadata +36 -47
checksums.yaml
ADDED
@@ -0,0 +1,7 @@
|
|
1
|
+
---
|
2
|
+
SHA256:
|
3
|
+
metadata.gz: f3f9c31b861db9b8b7ab5d2345be06f60223d60e1243bc2618ec5ef1db2b72e5
|
4
|
+
data.tar.gz: 38144bb080c8719b4d164bcbf8d96d498844626a66b9a522c75fb8bab6309c4f
|
5
|
+
SHA512:
|
6
|
+
metadata.gz: 3370ff020a249f5af7be26bfb48392a0b5721d139b895684e37aa92f612bb00d10e48eeb1b95acc15c74448af96c1df039fc841f6ff45ffdf91d6a16bcc614ab
|
7
|
+
data.tar.gz: 3e6b93f1ed82a46a9578dd3dd59cdb0c5c962772d285e59f1e78e531b7577fa4debbc7e3a28a0e4f9cddf053e4481979b37229ce40a6ed9e476dbba6e7985f1e
|
data/README.md
CHANGED
@@ -194,6 +194,17 @@ library.
|
|
194
194
|
You can set the name that Einhorn and your workers show in PS. Just
|
195
195
|
pass `-c <name>`.
|
196
196
|
|
197
|
+
### Re exec
|
198
|
+
|
199
|
+
You can use the `--reexec-as` option to replace the `einhorn` command with a command or script of your own. This might be useful for those with a Capistrano like deploy process that has changing symlinks. To ensure that you are following the symlinks you could use a bash script like this.
|
200
|
+
|
201
|
+
#!/bin/bash
|
202
|
+
|
203
|
+
cd <symlinked directory>
|
204
|
+
exec /usr/local/bin/einhorn "$@"
|
205
|
+
|
206
|
+
Then you could set `--reexec-as=` to the name of your bash script and it will run in place of the plain einhorn command.
|
207
|
+
|
197
208
|
### Options
|
198
209
|
|
199
210
|
-b, --bind ADDR Bind an address and add the corresponding FD via the environment
|
@@ -217,11 +228,18 @@ pass `-c <name>`.
|
|
217
228
|
Unix nice level at which to run the einhorn processes. If not running as root, make sure to ulimit -e as appopriate.
|
218
229
|
--with-state-fd STATE [Internal option] With file descriptor containing state
|
219
230
|
--upgrade-check [Internal option] Check if Einhorn can exec itself and exit with status 0 before loading code
|
231
|
+
-t, --signal-timeout=T If children do not react to signals after T seconds, escalate to SIGKILL
|
220
232
|
--version Show version
|
221
233
|
|
222
234
|
|
223
235
|
## Contributing
|
224
236
|
|
237
|
+
### Development Status
|
238
|
+
|
239
|
+
Einhorn is still in active operation at Stripe, but we are not maintaining
|
240
|
+
Einhorn actively. PRs are very welcome, and we will review and merge,
|
241
|
+
but we are unlikely to triage and fix reported issues without code.
|
242
|
+
|
225
243
|
Contributions are definitely welcome. To contribute, just follow the
|
226
244
|
usual workflow:
|
227
245
|
|
@@ -251,6 +269,25 @@ EventMachine-LE to support file-descriptor passing. Check out
|
|
251
269
|
|
252
270
|
Einhorn runs in Ruby 2.0, 2.1, and 2.2
|
253
271
|
|
272
|
+
The following libraries ease integration with Einhorn with languages other than
|
273
|
+
Ruby:
|
274
|
+
|
275
|
+
- **[go-einhorn](https://github.com/stripe/go-einhorn)**: Stripe's own library
|
276
|
+
for *talking* to an einhorn master (doesn't wrap socket code).
|
277
|
+
- **[goji](https://github.com/zenazn/goji/)**: Go (golang) server framework. The
|
278
|
+
[`bind`](https://godoc.org/github.com/zenazn/goji/bind) and
|
279
|
+
[`graceful`](https://godoc.org/github.com/zenazn/goji/graceful)
|
280
|
+
packages provide helpers and HTTP/TCP connection wrappers for Einhorn
|
281
|
+
integration.
|
282
|
+
- **[github.com/CHH/einhorn](https://github.com/CHH/einhorn)**: PHP library
|
283
|
+
- **[thin-attach\_socket](https://github.com/ConradIrwin/thin-attach_socket)**:
|
284
|
+
run `thin` behind Einhorn
|
285
|
+
- **[baseplate](https://reddit.github.io/baseplate/cli/serve.html)**: a
|
286
|
+
collection of Python helpers and libraries, with support for running behind
|
287
|
+
Einhorn
|
288
|
+
|
289
|
+
*NB: this list should not imply any official endorsement or vetting!*
|
290
|
+
|
254
291
|
## About
|
255
292
|
|
256
293
|
Einhorn is a project of [Stripe](https://stripe.com), led by [Carl Jackson](https://github.com/zenazn). Feel free to get in touch at
|
data/README.md.in
CHANGED
@@ -67,10 +67,28 @@ EventMachine-LE to support file-descriptor passing. Check out
|
|
67
67
|
|
68
68
|
## Compatibility
|
69
69
|
|
70
|
-
Einhorn
|
70
|
+
Einhorn runs in Ruby 2.0, 2.1, and 2.2
|
71
|
+
|
72
|
+
The following libraries ease integration with Einhorn with languages other than
|
73
|
+
Ruby:
|
74
|
+
|
75
|
+
- **[go-einhorn](https://github.com/stripe/go-einhorn)**: Stripe's own library
|
76
|
+
for *talking* to an einhorn master (doesn't wrap socket code).
|
77
|
+
- **[goji](https://github.com/zenazn/goji/)**: Go (golang) server framework. The
|
78
|
+
[`bind`](https://godoc.org/github.com/zenazn/goji/bind) and
|
79
|
+
[`graceful`](https://godoc.org/github.com/zenazn/goji/graceful)
|
80
|
+
packages provide helpers and HTTP/TCP connection wrappers for Einhorn
|
81
|
+
integration.
|
82
|
+
- **[github.com/CHH/einhorn](https://github.com/CHH/einhorn)**: PHP library
|
83
|
+
- **[thin-attach\_socket](https://github.com/ConradIrwin/thin-attach_socket)**:
|
84
|
+
run `thin` behind Einhorn
|
85
|
+
- **[baseplate](https://reddit.github.io/baseplate/cli/serve.html)**: a
|
86
|
+
collection of Python helpers and libraries, with support for running behind
|
87
|
+
Einhorn
|
88
|
+
|
89
|
+
*NB: this list should not imply any official endorsement or vetting!*
|
71
90
|
|
72
91
|
## About
|
73
92
|
|
74
|
-
Einhorn is a project of [Stripe](https://stripe.com), led by [
|
75
|
-
Brockman](https://twitter.com/thegdb). Feel free to get in touch at
|
93
|
+
Einhorn is a project of [Stripe](https://stripe.com), led by [Carl Jackson](https://github.com/zenazn). Feel free to get in touch at
|
76
94
|
info@stripe.com.
|
data/bin/einhorn
CHANGED
@@ -266,8 +266,11 @@ if true # $0 == __FILE__
|
|
266
266
|
Einhorn::Command.quieter(false)
|
267
267
|
end
|
268
268
|
|
269
|
-
opts.on('-s', '--seconds N', 'Number of seconds to wait until respawning') do |
|
270
|
-
|
269
|
+
opts.on('-s', '--seconds N', 'Number of seconds to wait until respawning') do |s|
|
270
|
+
seconds = Float(s)
|
271
|
+
raise ArgumentError, 'seconds must be > 0' if seconds.zero?
|
272
|
+
|
273
|
+
Einhorn::State.config[:seconds] = seconds
|
271
274
|
end
|
272
275
|
|
273
276
|
opts.on('-v', '--verbose', 'Make output verbose (can be reconfigured on the fly)') do
|
@@ -310,6 +313,18 @@ if true # $0 == __FILE__
|
|
310
313
|
Einhorn::State.signal_timeout = Integer(t)
|
311
314
|
end
|
312
315
|
|
316
|
+
opts.on('--max-unacked=N', 'Maximum number of workers that can be unacked when gracefully upgrading.') do |n|
|
317
|
+
Einhorn::State.config[:max_unacked] = Integer(n)
|
318
|
+
end
|
319
|
+
|
320
|
+
opts.on('--max-upgrade-additional=N', 'Maximum number of additional workers that can be running during an upgrade.') do |n|
|
321
|
+
Einhorn::State.config[:max_upgrade_additional] = Integer(n)
|
322
|
+
end
|
323
|
+
|
324
|
+
opts.on('--gc-before-fork', 'Run the GC three times before forking to improve memory sharing for copy-on-write.') do
|
325
|
+
Einhorn::State.config[:gc_before_fork] = true
|
326
|
+
end
|
327
|
+
|
313
328
|
opts.on('--version', 'Show version') do
|
314
329
|
puts Einhorn::VERSION
|
315
330
|
exit
|
data/example/pool_worker.rb
CHANGED
data/lib/einhorn.rb
CHANGED
@@ -45,6 +45,7 @@ module Einhorn
|
|
45
45
|
:orig_cmd => nil,
|
46
46
|
:bind => [],
|
47
47
|
:bind_fds => [],
|
48
|
+
:bound_ports => [],
|
48
49
|
:cmd => nil,
|
49
50
|
:script_name => nil,
|
50
51
|
:respawn => true,
|
@@ -68,6 +69,7 @@ module Einhorn
|
|
68
69
|
:reexec_commandline => nil,
|
69
70
|
:drop_environment_variables => [],
|
70
71
|
:signal_timeout => nil,
|
72
|
+
:preloaded => false
|
71
73
|
}
|
72
74
|
end
|
73
75
|
end
|
@@ -77,7 +79,6 @@ module Einhorn
|
|
77
79
|
def self.default_state
|
78
80
|
{
|
79
81
|
:whatami => :master,
|
80
|
-
:preloaded => false,
|
81
82
|
:script_name => nil,
|
82
83
|
:argv => [],
|
83
84
|
:environ => {},
|
@@ -120,7 +121,7 @@ module Einhorn
|
|
120
121
|
end
|
121
122
|
end
|
122
123
|
Einhorn::Event::Timer.open(0) do
|
123
|
-
dead.each {|pid| Einhorn::Command.
|
124
|
+
dead.each {|pid| Einhorn::Command.cleanup(pid)}
|
124
125
|
end
|
125
126
|
end
|
126
127
|
|
@@ -162,20 +163,23 @@ module Einhorn
|
|
162
163
|
end
|
163
164
|
|
164
165
|
Einhorn::TransientState.socket_handles << sd
|
165
|
-
sd.fileno
|
166
|
+
[sd.fileno, sd.local_address.ip_port]
|
166
167
|
end
|
167
168
|
|
168
169
|
# Implement these ourselves so it plays nicely with state persistence
|
169
170
|
def self.log_debug(msg, tag=nil)
|
170
171
|
$stderr.puts("#{log_tag} DEBUG: #{msg}\n") if Einhorn::State.verbosity <= 0
|
172
|
+
$stderr.flush
|
171
173
|
self.send_tagged_message(tag, msg) if tag
|
172
174
|
end
|
173
175
|
def self.log_info(msg, tag=nil)
|
174
176
|
$stderr.puts("#{log_tag} INFO: #{msg}\n") if Einhorn::State.verbosity <= 1
|
177
|
+
$stderr.flush
|
175
178
|
self.send_tagged_message(tag, msg) if tag
|
176
179
|
end
|
177
180
|
def self.log_error(msg, tag=nil)
|
178
181
|
$stderr.puts("#{log_tag} ERROR: #{msg}\n") if Einhorn::State.verbosity <= 2
|
182
|
+
$stderr.flush
|
179
183
|
self.send_tagged_message(tag, "ERROR: #{msg}") if tag
|
180
184
|
end
|
181
185
|
|
@@ -226,6 +230,8 @@ module Einhorn
|
|
226
230
|
set_argv(Einhorn::State.cmd, false)
|
227
231
|
|
228
232
|
begin
|
233
|
+
# Reset preloaded state to false - this allows us to monitor for failed preloads during reloads.
|
234
|
+
Einhorn::State.preloaded = false
|
229
235
|
# If it's not going to be requireable, then load it.
|
230
236
|
if !path.end_with?('.rb') && File.exists?(path)
|
231
237
|
log_info("Loading #{path} (if this hangs, make sure your code can be properly loaded as a library)", :upgrade)
|
@@ -233,13 +239,15 @@ module Einhorn
|
|
233
239
|
else
|
234
240
|
log_info("Requiring #{path} (if this hangs, make sure your code can be properly loaded as a library)", :upgrade)
|
235
241
|
require path
|
242
|
+
|
243
|
+
force_move_to_oldgen if Einhorn::State.config[:gc_before_fork]
|
236
244
|
end
|
237
245
|
rescue Exception => e
|
238
246
|
log_info("Proceeding with postload -- could not load #{path}: #{e} (#{e.class})\n #{e.backtrace.join("\n ")}", :upgrade)
|
239
247
|
else
|
240
248
|
if defined?(einhorn_main)
|
241
249
|
log_info("Successfully loaded #{path}", :upgrade)
|
242
|
-
Einhorn::
|
250
|
+
Einhorn::State.preloaded = true
|
243
251
|
else
|
244
252
|
log_info("Proceeding with postload -- loaded #{path}, but no einhorn_main method was defined", :upgrade)
|
245
253
|
end
|
@@ -247,6 +255,22 @@ module Einhorn
|
|
247
255
|
end
|
248
256
|
end
|
249
257
|
|
258
|
+
# Make the GC more copy-on-write friendly by forcibly incrementing the generation
|
259
|
+
# counter on all objects to its maximum value. Learn more at: https://github.com/ko1/nakayoshi_fork
|
260
|
+
def self.force_move_to_oldgen
|
261
|
+
log_info("Starting GC to improve copy-on-write memory sharing", :upgrade)
|
262
|
+
|
263
|
+
GC.start
|
264
|
+
3.times do
|
265
|
+
GC.start(full_mark: false)
|
266
|
+
end
|
267
|
+
|
268
|
+
GC.compact if GC.respond_to?(:compact)
|
269
|
+
|
270
|
+
log_info("Finished GC after preloading", :upgrade)
|
271
|
+
end
|
272
|
+
private_class_method :force_move_to_oldgen
|
273
|
+
|
250
274
|
def self.set_argv(cmd, set_ps_name)
|
251
275
|
# TODO: clean up this hack
|
252
276
|
idx = 0
|
@@ -304,8 +328,9 @@ module Einhorn
|
|
304
328
|
|
305
329
|
def self.socketify_env!
|
306
330
|
Einhorn::State.bind.each do |host, port, flags|
|
307
|
-
fd = bind(host, port, flags)
|
331
|
+
fd, actual_port = bind(host, port, flags)
|
308
332
|
Einhorn::State.bind_fds << fd
|
333
|
+
Einhorn::State.bound_ports << actual_port
|
309
334
|
end
|
310
335
|
end
|
311
336
|
|
@@ -319,7 +344,8 @@ module Einhorn
|
|
319
344
|
host = $2
|
320
345
|
port = $3
|
321
346
|
flags = $4.split(',').select {|flag| flag.length > 0}.map {|flag| flag.downcase}
|
322
|
-
|
347
|
+
Einhorn::State.sockets[[host, port]] ||= bind(host, port, flags)[0]
|
348
|
+
fd = Einhorn::State.sockets[[host, port]]
|
323
349
|
"#{opt}#{fd}"
|
324
350
|
else
|
325
351
|
arg
|
@@ -411,6 +437,14 @@ module Einhorn
|
|
411
437
|
Einhorn::State.reloading_for_upgrade = false
|
412
438
|
end
|
413
439
|
|
440
|
+
# If setting a signal-timeout, timeout the event loop
|
441
|
+
# in the same timeframe, ensuring processes are culled
|
442
|
+
# on a regular basis.
|
443
|
+
if Einhorn::State.signal_timeout
|
444
|
+
Einhorn::Event.default_timeout = Einhorn::Event.default_timeout.nil? ?
|
445
|
+
Einhorn::State.signal_timeout : [Einhorn::State.signal_timeout, Einhorn::Event.default_timeout].min
|
446
|
+
end
|
447
|
+
|
414
448
|
while Einhorn::State.respawn || Einhorn::State.children.size > 0
|
415
449
|
log_debug("Entering event loop")
|
416
450
|
|
data/lib/einhorn/client.rb
CHANGED
@@ -1,5 +1,4 @@
|
|
1
1
|
require 'set'
|
2
|
-
require 'uri'
|
3
2
|
require 'yaml'
|
4
3
|
|
5
4
|
module Einhorn
|
@@ -22,12 +21,12 @@ module Einhorn
|
|
22
21
|
|
23
22
|
def self.serialize_message(message)
|
24
23
|
serialized = YAML.dump(message)
|
25
|
-
escaped =
|
24
|
+
escaped = serialized.gsub(/%|\n/, '%' => '%25', "\n" => '%0A')
|
26
25
|
escaped + "\n"
|
27
26
|
end
|
28
27
|
|
29
28
|
def self.deserialize_message(line)
|
30
|
-
serialized =
|
29
|
+
serialized = line.gsub(/%(25|0A)/, '%25' => '%', '%0A' => "\n")
|
31
30
|
YAML.load(serialized)
|
32
31
|
end
|
33
32
|
end
|
data/lib/einhorn/command.rb
CHANGED
@@ -3,6 +3,7 @@ require 'set'
|
|
3
3
|
require 'tmpdir'
|
4
4
|
|
5
5
|
require 'einhorn/command/interface'
|
6
|
+
require 'einhorn/prctl'
|
6
7
|
|
7
8
|
module Einhorn
|
8
9
|
module Command
|
@@ -10,18 +11,16 @@ module Einhorn
|
|
10
11
|
begin
|
11
12
|
while true
|
12
13
|
Einhorn.log_debug('Going to reap a child process')
|
13
|
-
|
14
14
|
pid = Process.wait(-1, Process::WNOHANG)
|
15
15
|
return unless pid
|
16
|
-
|
16
|
+
cleanup(pid)
|
17
17
|
Einhorn::Event.break_loop
|
18
18
|
end
|
19
19
|
rescue Errno::ECHILD
|
20
20
|
end
|
21
21
|
end
|
22
22
|
|
23
|
-
|
24
|
-
def self.mourn(pid)
|
23
|
+
def self.cleanup(pid)
|
25
24
|
unless spec = Einhorn::State.children[pid]
|
26
25
|
Einhorn.log_error("Could not find any config for exited child #{pid.inspect}! This probably indicates a bug in Einhorn.")
|
27
26
|
return
|
@@ -47,6 +46,16 @@ module Einhorn
|
|
47
46
|
end
|
48
47
|
end
|
49
48
|
|
49
|
+
def self.register_ping(pid, request_id)
|
50
|
+
unless spec = Einhorn::State.children[pid]
|
51
|
+
Einhorn.log_error("Could not find state for PID #{pid.inspect}; ignoring ACK.")
|
52
|
+
return
|
53
|
+
end
|
54
|
+
|
55
|
+
spec[:pinged_at] = Time.now
|
56
|
+
spec[:pinged_request_id] = request_id
|
57
|
+
end
|
58
|
+
|
50
59
|
def self.register_manual_ack(pid)
|
51
60
|
ack_mode = Einhorn::State.ack_mode
|
52
61
|
unless ack_mode[:type] == :manual
|
@@ -100,8 +109,8 @@ module Einhorn
|
|
100
109
|
|
101
110
|
def self.signal_all(signal, children=nil, record=true)
|
102
111
|
children ||= Einhorn::WorkerPool.workers
|
103
|
-
|
104
112
|
signaled = {}
|
113
|
+
|
105
114
|
Einhorn.log_info("Sending #{signal} to #{children.inspect}", :upgrade)
|
106
115
|
|
107
116
|
children.each do |child|
|
@@ -115,11 +124,13 @@ module Einhorn
|
|
115
124
|
Einhorn.log_error("Re-sending #{signal} to already-signaled child #{child.inspect}. It may be slow to spin down, or it may be swallowing #{signal}s.", :upgrade)
|
116
125
|
end
|
117
126
|
spec[:signaled].add(signal)
|
127
|
+
spec[:last_signaled_at] = Time.now
|
118
128
|
end
|
119
129
|
|
120
130
|
begin
|
121
131
|
Process.kill(signal, child)
|
122
132
|
rescue Errno::ESRCH
|
133
|
+
Einhorn.log_debug("Attempted to #{signal} child #{child.inspect} but the process does not exist", :upgrade)
|
123
134
|
else
|
124
135
|
signaled[child] = spec
|
125
136
|
end
|
@@ -129,7 +140,7 @@ module Einhorn
|
|
129
140
|
Einhorn::Event::Timer.open(Einhorn::State.signal_timeout) do
|
130
141
|
children.each do |child|
|
131
142
|
spec = Einhorn::State.children[child]
|
132
|
-
next unless spec # Process is already dead and removed by
|
143
|
+
next unless spec # Process is already dead and removed by cleanup
|
133
144
|
signaled_spec = signaled[child]
|
134
145
|
next unless signaled_spec # We got ESRCH when trying to signal
|
135
146
|
if spec[:spinup_time] != signaled_spec[:spinup_time]
|
@@ -145,11 +156,12 @@ module Einhorn
|
|
145
156
|
spec[:signaled].add('KILL')
|
146
157
|
end
|
147
158
|
end
|
148
|
-
end
|
149
159
|
|
150
|
-
|
160
|
+
Einhorn.log_info("Successfully sent #{signal}s to #{signaled.length} processes: #{signaled.keys}")
|
161
|
+
end
|
151
162
|
end
|
152
163
|
|
164
|
+
|
153
165
|
def self.increment
|
154
166
|
Einhorn::Event.break_loop
|
155
167
|
old = Einhorn::State.config[:number]
|
@@ -266,7 +278,8 @@ module Einhorn
|
|
266
278
|
def self.spinup(cmd=nil)
|
267
279
|
cmd ||= Einhorn::State.cmd
|
268
280
|
index = next_index
|
269
|
-
|
281
|
+
expected_ppid = Process.pid
|
282
|
+
if Einhorn::State.preloaded
|
270
283
|
pid = fork do
|
271
284
|
Einhorn::TransientState.whatami = :worker
|
272
285
|
prepare_child_process
|
@@ -278,6 +291,8 @@ module Einhorn
|
|
278
291
|
|
279
292
|
reseed_random
|
280
293
|
|
294
|
+
setup_parent_watch(expected_ppid)
|
295
|
+
|
281
296
|
prepare_child_environment(index)
|
282
297
|
einhorn_main
|
283
298
|
end
|
@@ -287,6 +302,7 @@ module Einhorn
|
|
287
302
|
prepare_child_process
|
288
303
|
|
289
304
|
Einhorn.log_info("About to exec #{cmd.inspect}")
|
305
|
+
Einhorn::Command::Interface.uninit
|
290
306
|
# Here's the only case where cloexec would help. Since we
|
291
307
|
# have to track and manually close FDs for other cases, we
|
292
308
|
# may as well just reuse close_all rather than also set
|
@@ -295,6 +311,8 @@ module Einhorn
|
|
295
311
|
# Note that Ruby 1.9's close_others option is useful here.
|
296
312
|
Einhorn::Event.close_all_for_worker
|
297
313
|
|
314
|
+
setup_parent_watch(expected_ppid)
|
315
|
+
|
298
316
|
prepare_child_environment(index)
|
299
317
|
Einhorn::Compat.exec(cmd[0], cmd[1..-1], :close_others => false)
|
300
318
|
end
|
@@ -307,6 +325,7 @@ module Einhorn
|
|
307
325
|
:version => Einhorn::State.version,
|
308
326
|
:acked => false,
|
309
327
|
:signaled => Set.new,
|
328
|
+
:last_signaled_at => nil,
|
310
329
|
:index => index,
|
311
330
|
:spinup_time => Einhorn::State.last_spinup,
|
312
331
|
}
|
@@ -379,6 +398,24 @@ module Einhorn
|
|
379
398
|
Einhorn.renice_self
|
380
399
|
end
|
381
400
|
|
401
|
+
def self.setup_parent_watch(expected_ppid)
|
402
|
+
if Einhorn::State.kill_children_on_exit then
|
403
|
+
begin
|
404
|
+
# NB: Having the USR2 signal handler set to terminate (the default) at
|
405
|
+
# this point is required. If it's set to a ruby handler, there are
|
406
|
+
# race conditions that could cause the worker to leak.
|
407
|
+
|
408
|
+
Einhorn::Prctl.set_pdeathsig("USR2")
|
409
|
+
if Process.ppid != expected_ppid then
|
410
|
+
Einhorn.log_error("Parent process died before we set pdeathsig; cowardly refusing to exec child process.")
|
411
|
+
exit(1)
|
412
|
+
end
|
413
|
+
rescue NotImplementedError
|
414
|
+
# Unsupported OS; silently continue.
|
415
|
+
end
|
416
|
+
end
|
417
|
+
end
|
418
|
+
|
382
419
|
# @param options [Hash]
|
383
420
|
#
|
384
421
|
# @option options [Boolean] :smooth (false) Whether to perform a smooth or
|
@@ -463,6 +500,41 @@ module Einhorn
|
|
463
500
|
Einhorn.log_info("Have too many workers at the current version, so killing off #{excess.length} of them.")
|
464
501
|
signal_all("USR2", excess)
|
465
502
|
end
|
503
|
+
|
504
|
+
# Ensure all signaled workers that have outlived signal_timeout get killed.
|
505
|
+
kill_expired_signaled_workers if Einhorn::State.signal_timeout
|
506
|
+
end
|
507
|
+
|
508
|
+
def self.kill_expired_signaled_workers
|
509
|
+
now = Time.now
|
510
|
+
children = Einhorn::State.children.select do |_,c|
|
511
|
+
# Only interested in USR2 signaled workers
|
512
|
+
next unless c[:signaled] && c[:signaled].length > 0
|
513
|
+
next unless c[:signaled].include?('USR2')
|
514
|
+
|
515
|
+
# Ignore processes that have received KILL since it can't be trapped.
|
516
|
+
next if c[:signaled].include?('KILL')
|
517
|
+
|
518
|
+
# Filter out those children that have not reached signal_timeout yet.
|
519
|
+
next unless c[:last_signaled_at]
|
520
|
+
expires_at = c[:last_signaled_at] + Einhorn::State.signal_timeout
|
521
|
+
next unless now >= expires_at
|
522
|
+
|
523
|
+
true
|
524
|
+
end
|
525
|
+
|
526
|
+
Einhorn.log_info("#{children.size} expired signaled workers found.") if children.size > 0
|
527
|
+
children.each do |pid, child|
|
528
|
+
Einhorn.log_info("Child #{pid.inspect} was signaled #{(child[:last_signaled_at] - now).abs.to_i}s ago. Sending SIGKILL as it is still active after #{Einhorn::State.signal_timeout}s timeout.", :upgrade)
|
529
|
+
begin
|
530
|
+
Process.kill('KILL', pid)
|
531
|
+
rescue Errno::ESRCH
|
532
|
+
Einhorn.log_debug("Attempted to SIGKILL child #{pid.inspect} but the process does not exist.")
|
533
|
+
end
|
534
|
+
|
535
|
+
child[:signaled].add('KILL')
|
536
|
+
child[:last_signaled_at] = Time.now
|
537
|
+
end
|
466
538
|
end
|
467
539
|
|
468
540
|
def self.stop_respawning
|
@@ -500,6 +572,8 @@ module Einhorn
|
|
500
572
|
return if Einhorn::TransientState.has_outstanding_spinup_timer
|
501
573
|
return unless Einhorn::WorkerPool.missing_worker_count > 0
|
502
574
|
|
575
|
+
max_unacked ||= Einhorn::State.config[:max_unacked]
|
576
|
+
|
503
577
|
# default to spinning up at most NCPU workers at once
|
504
578
|
unless max_unacked
|
505
579
|
begin
|
@@ -522,11 +596,8 @@ module Einhorn
|
|
522
596
|
seconds_ago = (Time.now - Einhorn::State.last_spinup).to_f
|
523
597
|
|
524
598
|
if seconds_ago > spinup_interval
|
525
|
-
|
526
|
-
|
527
|
-
Einhorn.log_debug("There are #{unacked} unacked new workers, and max_unacked is #{max_unacked}, so not spinning up a new process")
|
528
|
-
else
|
529
|
-
msg = "Last spinup was #{seconds_ago}s ago, and spinup_interval is #{spinup_interval}s, so spinning up a new process"
|
599
|
+
if trigger_spinup?(max_unacked)
|
600
|
+
msg = "Last spinup was #{seconds_ago}s ago, and spinup_interval is #{spinup_interval}s, so spinning up a new process."
|
530
601
|
|
531
602
|
if Einhorn::State.consecutive_deaths_before_ack > 0
|
532
603
|
Einhorn.log_info("#{msg} (there have been #{Einhorn::State.consecutive_deaths_before_ack} consecutive unacked worker deaths)", :upgrade)
|
@@ -537,7 +608,7 @@ module Einhorn
|
|
537
608
|
spinup
|
538
609
|
end
|
539
610
|
else
|
540
|
-
Einhorn.log_debug("Last spinup was #{seconds_ago}s ago, and spinup_interval is #{spinup_interval}s, so not spinning up a new process")
|
611
|
+
Einhorn.log_debug("Last spinup was #{seconds_ago}s ago, and spinup_interval is #{spinup_interval}s, so not spinning up a new process.")
|
541
612
|
end
|
542
613
|
|
543
614
|
Einhorn::TransientState.has_outstanding_spinup_timer = true
|
@@ -560,5 +631,22 @@ module Einhorn
|
|
560
631
|
Einhorn.log_info(output) if log
|
561
632
|
output
|
562
633
|
end
|
634
|
+
|
635
|
+
def self.trigger_spinup?(max_unacked)
|
636
|
+
unacked = Einhorn::WorkerPool.unacked_unsignaled_modern_workers.length
|
637
|
+
if unacked >= max_unacked
|
638
|
+
Einhorn.log_info("There are #{unacked} unacked new workers, and max_unacked is #{max_unacked}, so not spinning up a new process.")
|
639
|
+
return false
|
640
|
+
elsif Einhorn::State.config[:max_upgrade_additional]
|
641
|
+
capacity_exceeded = (Einhorn::State.config[:number] + Einhorn::State.config[:max_upgrade_additional]) - Einhorn::WorkerPool.workers_with_state.length
|
642
|
+
if capacity_exceeded < 0
|
643
|
+
Einhorn.log_info("Over worker capacity by #{capacity_exceeded.abs} during upgrade, #{Einhorn::WorkerPool.modern_workers.length} new workers of #{Einhorn::WorkerPool.workers_with_state.length} total. Waiting for old workers to exit before spinning up a process.")
|
644
|
+
|
645
|
+
return false
|
646
|
+
end
|
647
|
+
end
|
648
|
+
|
649
|
+
true
|
650
|
+
end
|
563
651
|
end
|
564
652
|
end
|