einhorn 0.7.4 → 0.8.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/README.md +37 -0
- data/README.md.in +21 -3
- data/bin/einhorn +17 -2
- data/example/pool_worker.rb +1 -1
- data/lib/einhorn.rb +40 -6
- data/lib/einhorn/client.rb +2 -3
- data/lib/einhorn/command.rb +103 -15
- data/lib/einhorn/command/interface.rb +11 -0
- data/lib/einhorn/event.rb +10 -1
- data/lib/einhorn/prctl.rb +26 -0
- data/lib/einhorn/prctl_linux.rb +49 -0
- data/lib/einhorn/version.rb +1 -1
- data/lib/einhorn/worker.rb +47 -25
- data/test/integration/_lib/fixtures/exit_during_upgrade/exiting_server.rb +1 -0
- data/test/integration/_lib/fixtures/pdeathsig_printer/pdeathsig_printer.rb +29 -0
- data/test/integration/_lib/fixtures/signal_timeout/sleepy_server.rb +23 -0
- data/test/integration/_lib/fixtures/upgrade_project/upgrading_server.rb +2 -0
- data/test/integration/_lib/helpers/einhorn_helpers.rb +5 -0
- data/test/integration/pdeathsig.rb +26 -0
- data/test/integration/upgrading.rb +47 -0
- data/test/unit/_lib/bad_worker.rb +7 -0
- data/test/unit/_lib/sleep_worker.rb +5 -0
- data/test/unit/einhorn.rb +41 -3
- data/test/unit/einhorn/command.rb +114 -0
- metadata +36 -47
checksums.yaml
ADDED
@@ -0,0 +1,7 @@
|
|
1
|
+
---
|
2
|
+
SHA256:
|
3
|
+
metadata.gz: f3f9c31b861db9b8b7ab5d2345be06f60223d60e1243bc2618ec5ef1db2b72e5
|
4
|
+
data.tar.gz: 38144bb080c8719b4d164bcbf8d96d498844626a66b9a522c75fb8bab6309c4f
|
5
|
+
SHA512:
|
6
|
+
metadata.gz: 3370ff020a249f5af7be26bfb48392a0b5721d139b895684e37aa92f612bb00d10e48eeb1b95acc15c74448af96c1df039fc841f6ff45ffdf91d6a16bcc614ab
|
7
|
+
data.tar.gz: 3e6b93f1ed82a46a9578dd3dd59cdb0c5c962772d285e59f1e78e531b7577fa4debbc7e3a28a0e4f9cddf053e4481979b37229ce40a6ed9e476dbba6e7985f1e
|
data/README.md
CHANGED
@@ -194,6 +194,17 @@ library.
|
|
194
194
|
You can set the name that Einhorn and your workers show in PS. Just
|
195
195
|
pass `-c <name>`.
|
196
196
|
|
197
|
+
### Re exec
|
198
|
+
|
199
|
+
You can use the `--reexec-as` option to replace the `einhorn` command with a command or script of your own. This might be useful for those with a Capistrano like deploy process that has changing symlinks. To ensure that you are following the symlinks you could use a bash script like this.
|
200
|
+
|
201
|
+
#!/bin/bash
|
202
|
+
|
203
|
+
cd <symlinked directory>
|
204
|
+
exec /usr/local/bin/einhorn "$@"
|
205
|
+
|
206
|
+
Then you could set `--reexec-as=` to the name of your bash script and it will run in place of the plain einhorn command.
|
207
|
+
|
197
208
|
### Options
|
198
209
|
|
199
210
|
-b, --bind ADDR Bind an address and add the corresponding FD via the environment
|
@@ -217,11 +228,18 @@ pass `-c <name>`.
|
|
217
228
|
Unix nice level at which to run the einhorn processes. If not running as root, make sure to ulimit -e as appopriate.
|
218
229
|
--with-state-fd STATE [Internal option] With file descriptor containing state
|
219
230
|
--upgrade-check [Internal option] Check if Einhorn can exec itself and exit with status 0 before loading code
|
231
|
+
-t, --signal-timeout=T If children do not react to signals after T seconds, escalate to SIGKILL
|
220
232
|
--version Show version
|
221
233
|
|
222
234
|
|
223
235
|
## Contributing
|
224
236
|
|
237
|
+
### Development Status
|
238
|
+
|
239
|
+
Einhorn is still in active operation at Stripe, but we are not maintaining
|
240
|
+
Einhorn actively. PRs are very welcome, and we will review and merge,
|
241
|
+
but we are unlikely to triage and fix reported issues without code.
|
242
|
+
|
225
243
|
Contributions are definitely welcome. To contribute, just follow the
|
226
244
|
usual workflow:
|
227
245
|
|
@@ -251,6 +269,25 @@ EventMachine-LE to support file-descriptor passing. Check out
|
|
251
269
|
|
252
270
|
Einhorn runs in Ruby 2.0, 2.1, and 2.2
|
253
271
|
|
272
|
+
The following libraries ease integration with Einhorn with languages other than
|
273
|
+
Ruby:
|
274
|
+
|
275
|
+
- **[go-einhorn](https://github.com/stripe/go-einhorn)**: Stripe's own library
|
276
|
+
for *talking* to an einhorn master (doesn't wrap socket code).
|
277
|
+
- **[goji](https://github.com/zenazn/goji/)**: Go (golang) server framework. The
|
278
|
+
[`bind`](https://godoc.org/github.com/zenazn/goji/bind) and
|
279
|
+
[`graceful`](https://godoc.org/github.com/zenazn/goji/graceful)
|
280
|
+
packages provide helpers and HTTP/TCP connection wrappers for Einhorn
|
281
|
+
integration.
|
282
|
+
- **[github.com/CHH/einhorn](https://github.com/CHH/einhorn)**: PHP library
|
283
|
+
- **[thin-attach\_socket](https://github.com/ConradIrwin/thin-attach_socket)**:
|
284
|
+
run `thin` behind Einhorn
|
285
|
+
- **[baseplate](https://reddit.github.io/baseplate/cli/serve.html)**: a
|
286
|
+
collection of Python helpers and libraries, with support for running behind
|
287
|
+
Einhorn
|
288
|
+
|
289
|
+
*NB: this list should not imply any official endorsement or vetting!*
|
290
|
+
|
254
291
|
## About
|
255
292
|
|
256
293
|
Einhorn is a project of [Stripe](https://stripe.com), led by [Carl Jackson](https://github.com/zenazn). Feel free to get in touch at
|
data/README.md.in
CHANGED
@@ -67,10 +67,28 @@ EventMachine-LE to support file-descriptor passing. Check out
|
|
67
67
|
|
68
68
|
## Compatibility
|
69
69
|
|
70
|
-
Einhorn
|
70
|
+
Einhorn runs in Ruby 2.0, 2.1, and 2.2
|
71
|
+
|
72
|
+
The following libraries ease integration with Einhorn with languages other than
|
73
|
+
Ruby:
|
74
|
+
|
75
|
+
- **[go-einhorn](https://github.com/stripe/go-einhorn)**: Stripe's own library
|
76
|
+
for *talking* to an einhorn master (doesn't wrap socket code).
|
77
|
+
- **[goji](https://github.com/zenazn/goji/)**: Go (golang) server framework. The
|
78
|
+
[`bind`](https://godoc.org/github.com/zenazn/goji/bind) and
|
79
|
+
[`graceful`](https://godoc.org/github.com/zenazn/goji/graceful)
|
80
|
+
packages provide helpers and HTTP/TCP connection wrappers for Einhorn
|
81
|
+
integration.
|
82
|
+
- **[github.com/CHH/einhorn](https://github.com/CHH/einhorn)**: PHP library
|
83
|
+
- **[thin-attach\_socket](https://github.com/ConradIrwin/thin-attach_socket)**:
|
84
|
+
run `thin` behind Einhorn
|
85
|
+
- **[baseplate](https://reddit.github.io/baseplate/cli/serve.html)**: a
|
86
|
+
collection of Python helpers and libraries, with support for running behind
|
87
|
+
Einhorn
|
88
|
+
|
89
|
+
*NB: this list should not imply any official endorsement or vetting!*
|
71
90
|
|
72
91
|
## About
|
73
92
|
|
74
|
-
Einhorn is a project of [Stripe](https://stripe.com), led by [
|
75
|
-
Brockman](https://twitter.com/thegdb). Feel free to get in touch at
|
93
|
+
Einhorn is a project of [Stripe](https://stripe.com), led by [Carl Jackson](https://github.com/zenazn). Feel free to get in touch at
|
76
94
|
info@stripe.com.
|
data/bin/einhorn
CHANGED
@@ -266,8 +266,11 @@ if true # $0 == __FILE__
|
|
266
266
|
Einhorn::Command.quieter(false)
|
267
267
|
end
|
268
268
|
|
269
|
-
opts.on('-s', '--seconds N', 'Number of seconds to wait until respawning') do |
|
270
|
-
|
269
|
+
opts.on('-s', '--seconds N', 'Number of seconds to wait until respawning') do |s|
|
270
|
+
seconds = Float(s)
|
271
|
+
raise ArgumentError, 'seconds must be > 0' if seconds.zero?
|
272
|
+
|
273
|
+
Einhorn::State.config[:seconds] = seconds
|
271
274
|
end
|
272
275
|
|
273
276
|
opts.on('-v', '--verbose', 'Make output verbose (can be reconfigured on the fly)') do
|
@@ -310,6 +313,18 @@ if true # $0 == __FILE__
|
|
310
313
|
Einhorn::State.signal_timeout = Integer(t)
|
311
314
|
end
|
312
315
|
|
316
|
+
opts.on('--max-unacked=N', 'Maximum number of workers that can be unacked when gracefully upgrading.') do |n|
|
317
|
+
Einhorn::State.config[:max_unacked] = Integer(n)
|
318
|
+
end
|
319
|
+
|
320
|
+
opts.on('--max-upgrade-additional=N', 'Maximum number of additional workers that can be running during an upgrade.') do |n|
|
321
|
+
Einhorn::State.config[:max_upgrade_additional] = Integer(n)
|
322
|
+
end
|
323
|
+
|
324
|
+
opts.on('--gc-before-fork', 'Run the GC three times before forking to improve memory sharing for copy-on-write.') do
|
325
|
+
Einhorn::State.config[:gc_before_fork] = true
|
326
|
+
end
|
327
|
+
|
313
328
|
opts.on('--version', 'Show version') do
|
314
329
|
puts Einhorn::VERSION
|
315
330
|
exit
|
data/example/pool_worker.rb
CHANGED
data/lib/einhorn.rb
CHANGED
@@ -45,6 +45,7 @@ module Einhorn
|
|
45
45
|
:orig_cmd => nil,
|
46
46
|
:bind => [],
|
47
47
|
:bind_fds => [],
|
48
|
+
:bound_ports => [],
|
48
49
|
:cmd => nil,
|
49
50
|
:script_name => nil,
|
50
51
|
:respawn => true,
|
@@ -68,6 +69,7 @@ module Einhorn
|
|
68
69
|
:reexec_commandline => nil,
|
69
70
|
:drop_environment_variables => [],
|
70
71
|
:signal_timeout => nil,
|
72
|
+
:preloaded => false
|
71
73
|
}
|
72
74
|
end
|
73
75
|
end
|
@@ -77,7 +79,6 @@ module Einhorn
|
|
77
79
|
def self.default_state
|
78
80
|
{
|
79
81
|
:whatami => :master,
|
80
|
-
:preloaded => false,
|
81
82
|
:script_name => nil,
|
82
83
|
:argv => [],
|
83
84
|
:environ => {},
|
@@ -120,7 +121,7 @@ module Einhorn
|
|
120
121
|
end
|
121
122
|
end
|
122
123
|
Einhorn::Event::Timer.open(0) do
|
123
|
-
dead.each {|pid| Einhorn::Command.
|
124
|
+
dead.each {|pid| Einhorn::Command.cleanup(pid)}
|
124
125
|
end
|
125
126
|
end
|
126
127
|
|
@@ -162,20 +163,23 @@ module Einhorn
|
|
162
163
|
end
|
163
164
|
|
164
165
|
Einhorn::TransientState.socket_handles << sd
|
165
|
-
sd.fileno
|
166
|
+
[sd.fileno, sd.local_address.ip_port]
|
166
167
|
end
|
167
168
|
|
168
169
|
# Implement these ourselves so it plays nicely with state persistence
|
169
170
|
def self.log_debug(msg, tag=nil)
|
170
171
|
$stderr.puts("#{log_tag} DEBUG: #{msg}\n") if Einhorn::State.verbosity <= 0
|
172
|
+
$stderr.flush
|
171
173
|
self.send_tagged_message(tag, msg) if tag
|
172
174
|
end
|
173
175
|
def self.log_info(msg, tag=nil)
|
174
176
|
$stderr.puts("#{log_tag} INFO: #{msg}\n") if Einhorn::State.verbosity <= 1
|
177
|
+
$stderr.flush
|
175
178
|
self.send_tagged_message(tag, msg) if tag
|
176
179
|
end
|
177
180
|
def self.log_error(msg, tag=nil)
|
178
181
|
$stderr.puts("#{log_tag} ERROR: #{msg}\n") if Einhorn::State.verbosity <= 2
|
182
|
+
$stderr.flush
|
179
183
|
self.send_tagged_message(tag, "ERROR: #{msg}") if tag
|
180
184
|
end
|
181
185
|
|
@@ -226,6 +230,8 @@ module Einhorn
|
|
226
230
|
set_argv(Einhorn::State.cmd, false)
|
227
231
|
|
228
232
|
begin
|
233
|
+
# Reset preloaded state to false - this allows us to monitor for failed preloads during reloads.
|
234
|
+
Einhorn::State.preloaded = false
|
229
235
|
# If it's not going to be requireable, then load it.
|
230
236
|
if !path.end_with?('.rb') && File.exists?(path)
|
231
237
|
log_info("Loading #{path} (if this hangs, make sure your code can be properly loaded as a library)", :upgrade)
|
@@ -233,13 +239,15 @@ module Einhorn
|
|
233
239
|
else
|
234
240
|
log_info("Requiring #{path} (if this hangs, make sure your code can be properly loaded as a library)", :upgrade)
|
235
241
|
require path
|
242
|
+
|
243
|
+
force_move_to_oldgen if Einhorn::State.config[:gc_before_fork]
|
236
244
|
end
|
237
245
|
rescue Exception => e
|
238
246
|
log_info("Proceeding with postload -- could not load #{path}: #{e} (#{e.class})\n #{e.backtrace.join("\n ")}", :upgrade)
|
239
247
|
else
|
240
248
|
if defined?(einhorn_main)
|
241
249
|
log_info("Successfully loaded #{path}", :upgrade)
|
242
|
-
Einhorn::
|
250
|
+
Einhorn::State.preloaded = true
|
243
251
|
else
|
244
252
|
log_info("Proceeding with postload -- loaded #{path}, but no einhorn_main method was defined", :upgrade)
|
245
253
|
end
|
@@ -247,6 +255,22 @@ module Einhorn
|
|
247
255
|
end
|
248
256
|
end
|
249
257
|
|
258
|
+
# Make the GC more copy-on-write friendly by forcibly incrementing the generation
|
259
|
+
# counter on all objects to its maximum value. Learn more at: https://github.com/ko1/nakayoshi_fork
|
260
|
+
def self.force_move_to_oldgen
|
261
|
+
log_info("Starting GC to improve copy-on-write memory sharing", :upgrade)
|
262
|
+
|
263
|
+
GC.start
|
264
|
+
3.times do
|
265
|
+
GC.start(full_mark: false)
|
266
|
+
end
|
267
|
+
|
268
|
+
GC.compact if GC.respond_to?(:compact)
|
269
|
+
|
270
|
+
log_info("Finished GC after preloading", :upgrade)
|
271
|
+
end
|
272
|
+
private_class_method :force_move_to_oldgen
|
273
|
+
|
250
274
|
def self.set_argv(cmd, set_ps_name)
|
251
275
|
# TODO: clean up this hack
|
252
276
|
idx = 0
|
@@ -304,8 +328,9 @@ module Einhorn
|
|
304
328
|
|
305
329
|
def self.socketify_env!
|
306
330
|
Einhorn::State.bind.each do |host, port, flags|
|
307
|
-
fd = bind(host, port, flags)
|
331
|
+
fd, actual_port = bind(host, port, flags)
|
308
332
|
Einhorn::State.bind_fds << fd
|
333
|
+
Einhorn::State.bound_ports << actual_port
|
309
334
|
end
|
310
335
|
end
|
311
336
|
|
@@ -319,7 +344,8 @@ module Einhorn
|
|
319
344
|
host = $2
|
320
345
|
port = $3
|
321
346
|
flags = $4.split(',').select {|flag| flag.length > 0}.map {|flag| flag.downcase}
|
322
|
-
|
347
|
+
Einhorn::State.sockets[[host, port]] ||= bind(host, port, flags)[0]
|
348
|
+
fd = Einhorn::State.sockets[[host, port]]
|
323
349
|
"#{opt}#{fd}"
|
324
350
|
else
|
325
351
|
arg
|
@@ -411,6 +437,14 @@ module Einhorn
|
|
411
437
|
Einhorn::State.reloading_for_upgrade = false
|
412
438
|
end
|
413
439
|
|
440
|
+
# If setting a signal-timeout, timeout the event loop
|
441
|
+
# in the same timeframe, ensuring processes are culled
|
442
|
+
# on a regular basis.
|
443
|
+
if Einhorn::State.signal_timeout
|
444
|
+
Einhorn::Event.default_timeout = Einhorn::Event.default_timeout.nil? ?
|
445
|
+
Einhorn::State.signal_timeout : [Einhorn::State.signal_timeout, Einhorn::Event.default_timeout].min
|
446
|
+
end
|
447
|
+
|
414
448
|
while Einhorn::State.respawn || Einhorn::State.children.size > 0
|
415
449
|
log_debug("Entering event loop")
|
416
450
|
|
data/lib/einhorn/client.rb
CHANGED
@@ -1,5 +1,4 @@
|
|
1
1
|
require 'set'
|
2
|
-
require 'uri'
|
3
2
|
require 'yaml'
|
4
3
|
|
5
4
|
module Einhorn
|
@@ -22,12 +21,12 @@ module Einhorn
|
|
22
21
|
|
23
22
|
def self.serialize_message(message)
|
24
23
|
serialized = YAML.dump(message)
|
25
|
-
escaped =
|
24
|
+
escaped = serialized.gsub(/%|\n/, '%' => '%25', "\n" => '%0A')
|
26
25
|
escaped + "\n"
|
27
26
|
end
|
28
27
|
|
29
28
|
def self.deserialize_message(line)
|
30
|
-
serialized =
|
29
|
+
serialized = line.gsub(/%(25|0A)/, '%25' => '%', '%0A' => "\n")
|
31
30
|
YAML.load(serialized)
|
32
31
|
end
|
33
32
|
end
|
data/lib/einhorn/command.rb
CHANGED
@@ -3,6 +3,7 @@ require 'set'
|
|
3
3
|
require 'tmpdir'
|
4
4
|
|
5
5
|
require 'einhorn/command/interface'
|
6
|
+
require 'einhorn/prctl'
|
6
7
|
|
7
8
|
module Einhorn
|
8
9
|
module Command
|
@@ -10,18 +11,16 @@ module Einhorn
|
|
10
11
|
begin
|
11
12
|
while true
|
12
13
|
Einhorn.log_debug('Going to reap a child process')
|
13
|
-
|
14
14
|
pid = Process.wait(-1, Process::WNOHANG)
|
15
15
|
return unless pid
|
16
|
-
|
16
|
+
cleanup(pid)
|
17
17
|
Einhorn::Event.break_loop
|
18
18
|
end
|
19
19
|
rescue Errno::ECHILD
|
20
20
|
end
|
21
21
|
end
|
22
22
|
|
23
|
-
|
24
|
-
def self.mourn(pid)
|
23
|
+
def self.cleanup(pid)
|
25
24
|
unless spec = Einhorn::State.children[pid]
|
26
25
|
Einhorn.log_error("Could not find any config for exited child #{pid.inspect}! This probably indicates a bug in Einhorn.")
|
27
26
|
return
|
@@ -47,6 +46,16 @@ module Einhorn
|
|
47
46
|
end
|
48
47
|
end
|
49
48
|
|
49
|
+
def self.register_ping(pid, request_id)
|
50
|
+
unless spec = Einhorn::State.children[pid]
|
51
|
+
Einhorn.log_error("Could not find state for PID #{pid.inspect}; ignoring ACK.")
|
52
|
+
return
|
53
|
+
end
|
54
|
+
|
55
|
+
spec[:pinged_at] = Time.now
|
56
|
+
spec[:pinged_request_id] = request_id
|
57
|
+
end
|
58
|
+
|
50
59
|
def self.register_manual_ack(pid)
|
51
60
|
ack_mode = Einhorn::State.ack_mode
|
52
61
|
unless ack_mode[:type] == :manual
|
@@ -100,8 +109,8 @@ module Einhorn
|
|
100
109
|
|
101
110
|
def self.signal_all(signal, children=nil, record=true)
|
102
111
|
children ||= Einhorn::WorkerPool.workers
|
103
|
-
|
104
112
|
signaled = {}
|
113
|
+
|
105
114
|
Einhorn.log_info("Sending #{signal} to #{children.inspect}", :upgrade)
|
106
115
|
|
107
116
|
children.each do |child|
|
@@ -115,11 +124,13 @@ module Einhorn
|
|
115
124
|
Einhorn.log_error("Re-sending #{signal} to already-signaled child #{child.inspect}. It may be slow to spin down, or it may be swallowing #{signal}s.", :upgrade)
|
116
125
|
end
|
117
126
|
spec[:signaled].add(signal)
|
127
|
+
spec[:last_signaled_at] = Time.now
|
118
128
|
end
|
119
129
|
|
120
130
|
begin
|
121
131
|
Process.kill(signal, child)
|
122
132
|
rescue Errno::ESRCH
|
133
|
+
Einhorn.log_debug("Attempted to #{signal} child #{child.inspect} but the process does not exist", :upgrade)
|
123
134
|
else
|
124
135
|
signaled[child] = spec
|
125
136
|
end
|
@@ -129,7 +140,7 @@ module Einhorn
|
|
129
140
|
Einhorn::Event::Timer.open(Einhorn::State.signal_timeout) do
|
130
141
|
children.each do |child|
|
131
142
|
spec = Einhorn::State.children[child]
|
132
|
-
next unless spec # Process is already dead and removed by
|
143
|
+
next unless spec # Process is already dead and removed by cleanup
|
133
144
|
signaled_spec = signaled[child]
|
134
145
|
next unless signaled_spec # We got ESRCH when trying to signal
|
135
146
|
if spec[:spinup_time] != signaled_spec[:spinup_time]
|
@@ -145,11 +156,12 @@ module Einhorn
|
|
145
156
|
spec[:signaled].add('KILL')
|
146
157
|
end
|
147
158
|
end
|
148
|
-
end
|
149
159
|
|
150
|
-
|
160
|
+
Einhorn.log_info("Successfully sent #{signal}s to #{signaled.length} processes: #{signaled.keys}")
|
161
|
+
end
|
151
162
|
end
|
152
163
|
|
164
|
+
|
153
165
|
def self.increment
|
154
166
|
Einhorn::Event.break_loop
|
155
167
|
old = Einhorn::State.config[:number]
|
@@ -266,7 +278,8 @@ module Einhorn
|
|
266
278
|
def self.spinup(cmd=nil)
|
267
279
|
cmd ||= Einhorn::State.cmd
|
268
280
|
index = next_index
|
269
|
-
|
281
|
+
expected_ppid = Process.pid
|
282
|
+
if Einhorn::State.preloaded
|
270
283
|
pid = fork do
|
271
284
|
Einhorn::TransientState.whatami = :worker
|
272
285
|
prepare_child_process
|
@@ -278,6 +291,8 @@ module Einhorn
|
|
278
291
|
|
279
292
|
reseed_random
|
280
293
|
|
294
|
+
setup_parent_watch(expected_ppid)
|
295
|
+
|
281
296
|
prepare_child_environment(index)
|
282
297
|
einhorn_main
|
283
298
|
end
|
@@ -287,6 +302,7 @@ module Einhorn
|
|
287
302
|
prepare_child_process
|
288
303
|
|
289
304
|
Einhorn.log_info("About to exec #{cmd.inspect}")
|
305
|
+
Einhorn::Command::Interface.uninit
|
290
306
|
# Here's the only case where cloexec would help. Since we
|
291
307
|
# have to track and manually close FDs for other cases, we
|
292
308
|
# may as well just reuse close_all rather than also set
|
@@ -295,6 +311,8 @@ module Einhorn
|
|
295
311
|
# Note that Ruby 1.9's close_others option is useful here.
|
296
312
|
Einhorn::Event.close_all_for_worker
|
297
313
|
|
314
|
+
setup_parent_watch(expected_ppid)
|
315
|
+
|
298
316
|
prepare_child_environment(index)
|
299
317
|
Einhorn::Compat.exec(cmd[0], cmd[1..-1], :close_others => false)
|
300
318
|
end
|
@@ -307,6 +325,7 @@ module Einhorn
|
|
307
325
|
:version => Einhorn::State.version,
|
308
326
|
:acked => false,
|
309
327
|
:signaled => Set.new,
|
328
|
+
:last_signaled_at => nil,
|
310
329
|
:index => index,
|
311
330
|
:spinup_time => Einhorn::State.last_spinup,
|
312
331
|
}
|
@@ -379,6 +398,24 @@ module Einhorn
|
|
379
398
|
Einhorn.renice_self
|
380
399
|
end
|
381
400
|
|
401
|
+
def self.setup_parent_watch(expected_ppid)
|
402
|
+
if Einhorn::State.kill_children_on_exit then
|
403
|
+
begin
|
404
|
+
# NB: Having the USR2 signal handler set to terminate (the default) at
|
405
|
+
# this point is required. If it's set to a ruby handler, there are
|
406
|
+
# race conditions that could cause the worker to leak.
|
407
|
+
|
408
|
+
Einhorn::Prctl.set_pdeathsig("USR2")
|
409
|
+
if Process.ppid != expected_ppid then
|
410
|
+
Einhorn.log_error("Parent process died before we set pdeathsig; cowardly refusing to exec child process.")
|
411
|
+
exit(1)
|
412
|
+
end
|
413
|
+
rescue NotImplementedError
|
414
|
+
# Unsupported OS; silently continue.
|
415
|
+
end
|
416
|
+
end
|
417
|
+
end
|
418
|
+
|
382
419
|
# @param options [Hash]
|
383
420
|
#
|
384
421
|
# @option options [Boolean] :smooth (false) Whether to perform a smooth or
|
@@ -463,6 +500,41 @@ module Einhorn
|
|
463
500
|
Einhorn.log_info("Have too many workers at the current version, so killing off #{excess.length} of them.")
|
464
501
|
signal_all("USR2", excess)
|
465
502
|
end
|
503
|
+
|
504
|
+
# Ensure all signaled workers that have outlived signal_timeout get killed.
|
505
|
+
kill_expired_signaled_workers if Einhorn::State.signal_timeout
|
506
|
+
end
|
507
|
+
|
508
|
+
def self.kill_expired_signaled_workers
|
509
|
+
now = Time.now
|
510
|
+
children = Einhorn::State.children.select do |_,c|
|
511
|
+
# Only interested in USR2 signaled workers
|
512
|
+
next unless c[:signaled] && c[:signaled].length > 0
|
513
|
+
next unless c[:signaled].include?('USR2')
|
514
|
+
|
515
|
+
# Ignore processes that have received KILL since it can't be trapped.
|
516
|
+
next if c[:signaled].include?('KILL')
|
517
|
+
|
518
|
+
# Filter out those children that have not reached signal_timeout yet.
|
519
|
+
next unless c[:last_signaled_at]
|
520
|
+
expires_at = c[:last_signaled_at] + Einhorn::State.signal_timeout
|
521
|
+
next unless now >= expires_at
|
522
|
+
|
523
|
+
true
|
524
|
+
end
|
525
|
+
|
526
|
+
Einhorn.log_info("#{children.size} expired signaled workers found.") if children.size > 0
|
527
|
+
children.each do |pid, child|
|
528
|
+
Einhorn.log_info("Child #{pid.inspect} was signaled #{(child[:last_signaled_at] - now).abs.to_i}s ago. Sending SIGKILL as it is still active after #{Einhorn::State.signal_timeout}s timeout.", :upgrade)
|
529
|
+
begin
|
530
|
+
Process.kill('KILL', pid)
|
531
|
+
rescue Errno::ESRCH
|
532
|
+
Einhorn.log_debug("Attempted to SIGKILL child #{pid.inspect} but the process does not exist.")
|
533
|
+
end
|
534
|
+
|
535
|
+
child[:signaled].add('KILL')
|
536
|
+
child[:last_signaled_at] = Time.now
|
537
|
+
end
|
466
538
|
end
|
467
539
|
|
468
540
|
def self.stop_respawning
|
@@ -500,6 +572,8 @@ module Einhorn
|
|
500
572
|
return if Einhorn::TransientState.has_outstanding_spinup_timer
|
501
573
|
return unless Einhorn::WorkerPool.missing_worker_count > 0
|
502
574
|
|
575
|
+
max_unacked ||= Einhorn::State.config[:max_unacked]
|
576
|
+
|
503
577
|
# default to spinning up at most NCPU workers at once
|
504
578
|
unless max_unacked
|
505
579
|
begin
|
@@ -522,11 +596,8 @@ module Einhorn
|
|
522
596
|
seconds_ago = (Time.now - Einhorn::State.last_spinup).to_f
|
523
597
|
|
524
598
|
if seconds_ago > spinup_interval
|
525
|
-
|
526
|
-
|
527
|
-
Einhorn.log_debug("There are #{unacked} unacked new workers, and max_unacked is #{max_unacked}, so not spinning up a new process")
|
528
|
-
else
|
529
|
-
msg = "Last spinup was #{seconds_ago}s ago, and spinup_interval is #{spinup_interval}s, so spinning up a new process"
|
599
|
+
if trigger_spinup?(max_unacked)
|
600
|
+
msg = "Last spinup was #{seconds_ago}s ago, and spinup_interval is #{spinup_interval}s, so spinning up a new process."
|
530
601
|
|
531
602
|
if Einhorn::State.consecutive_deaths_before_ack > 0
|
532
603
|
Einhorn.log_info("#{msg} (there have been #{Einhorn::State.consecutive_deaths_before_ack} consecutive unacked worker deaths)", :upgrade)
|
@@ -537,7 +608,7 @@ module Einhorn
|
|
537
608
|
spinup
|
538
609
|
end
|
539
610
|
else
|
540
|
-
Einhorn.log_debug("Last spinup was #{seconds_ago}s ago, and spinup_interval is #{spinup_interval}s, so not spinning up a new process")
|
611
|
+
Einhorn.log_debug("Last spinup was #{seconds_ago}s ago, and spinup_interval is #{spinup_interval}s, so not spinning up a new process.")
|
541
612
|
end
|
542
613
|
|
543
614
|
Einhorn::TransientState.has_outstanding_spinup_timer = true
|
@@ -560,5 +631,22 @@ module Einhorn
|
|
560
631
|
Einhorn.log_info(output) if log
|
561
632
|
output
|
562
633
|
end
|
634
|
+
|
635
|
+
def self.trigger_spinup?(max_unacked)
|
636
|
+
unacked = Einhorn::WorkerPool.unacked_unsignaled_modern_workers.length
|
637
|
+
if unacked >= max_unacked
|
638
|
+
Einhorn.log_info("There are #{unacked} unacked new workers, and max_unacked is #{max_unacked}, so not spinning up a new process.")
|
639
|
+
return false
|
640
|
+
elsif Einhorn::State.config[:max_upgrade_additional]
|
641
|
+
capacity_exceeded = (Einhorn::State.config[:number] + Einhorn::State.config[:max_upgrade_additional]) - Einhorn::WorkerPool.workers_with_state.length
|
642
|
+
if capacity_exceeded < 0
|
643
|
+
Einhorn.log_info("Over worker capacity by #{capacity_exceeded.abs} during upgrade, #{Einhorn::WorkerPool.modern_workers.length} new workers of #{Einhorn::WorkerPool.workers_with_state.length} total. Waiting for old workers to exit before spinning up a process.")
|
644
|
+
|
645
|
+
return false
|
646
|
+
end
|
647
|
+
end
|
648
|
+
|
649
|
+
true
|
650
|
+
end
|
563
651
|
end
|
564
652
|
end
|