einhorn 0.7.0 → 0.8.2
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +7 -0
- data/.travis.yml +6 -4
- data/README.md +39 -3
- data/README.md.in +21 -3
- data/bin/einhorn +17 -2
- data/bin/einhornsh +7 -15
- data/einhorn.gemspec +1 -0
- data/example/pool_worker.rb +1 -1
- data/lib/einhorn.rb +55 -41
- data/lib/einhorn/client.rb +2 -3
- data/lib/einhorn/command.rb +127 -20
- data/lib/einhorn/command/interface.rb +21 -10
- data/lib/einhorn/event.rb +10 -1
- data/lib/einhorn/event/connection.rb +2 -2
- data/lib/einhorn/prctl.rb +26 -0
- data/lib/einhorn/prctl_linux.rb +49 -0
- data/lib/einhorn/version.rb +1 -1
- data/lib/einhorn/worker.rb +47 -25
- data/test/integration/_lib/fixtures/exit_during_upgrade/exiting_server.rb +1 -0
- data/test/integration/_lib/fixtures/pdeathsig_printer/pdeathsig_printer.rb +29 -0
- data/test/integration/_lib/fixtures/signal_timeout/sleepy_server.rb +23 -0
- data/test/integration/_lib/fixtures/upgrade_project/upgrading_server.rb +2 -0
- data/test/integration/_lib/helpers/einhorn_helpers.rb +14 -9
- data/test/integration/pdeathsig.rb +26 -0
- data/test/integration/upgrading.rb +47 -0
- data/test/unit/_lib/bad_worker.rb +7 -0
- data/test/unit/_lib/sleep_worker.rb +5 -0
- data/test/unit/einhorn.rb +41 -3
- data/test/unit/einhorn/command.rb +114 -0
- metadata +48 -38
checksums.yaml
ADDED
@@ -0,0 +1,7 @@
|
|
1
|
+
---
|
2
|
+
SHA256:
|
3
|
+
metadata.gz: f3f9c31b861db9b8b7ab5d2345be06f60223d60e1243bc2618ec5ef1db2b72e5
|
4
|
+
data.tar.gz: 38144bb080c8719b4d164bcbf8d96d498844626a66b9a522c75fb8bab6309c4f
|
5
|
+
SHA512:
|
6
|
+
metadata.gz: 3370ff020a249f5af7be26bfb48392a0b5721d139b895684e37aa92f612bb00d10e48eeb1b95acc15c74448af96c1df039fc841f6ff45ffdf91d6a16bcc614ab
|
7
|
+
data.tar.gz: 3e6b93f1ed82a46a9578dd3dd59cdb0c5c962772d285e59f1e78e531b7577fa4debbc7e3a28a0e4f9cddf053e4481979b37229ce40a6ed9e476dbba6e7985f1e
|
data/.travis.yml
CHANGED
@@ -1,8 +1,10 @@
|
|
1
1
|
language: ruby
|
2
2
|
rvm:
|
3
|
-
- 1.8.7
|
4
|
-
- 1.9.2
|
5
|
-
- 1.9.3
|
6
3
|
- 2.0.0
|
7
4
|
- 2.1
|
8
|
-
-
|
5
|
+
- 2.2
|
6
|
+
|
7
|
+
# This is to work around the version of bundler installed in Travis and
|
8
|
+
# https://github.com/bundler/bundler/issues/3558
|
9
|
+
before_install:
|
10
|
+
- gem update bundler
|
data/README.md
CHANGED
@@ -194,6 +194,17 @@ library.
|
|
194
194
|
You can set the name that Einhorn and your workers show in PS. Just
|
195
195
|
pass `-c <name>`.
|
196
196
|
|
197
|
+
### Re exec
|
198
|
+
|
199
|
+
You can use the `--reexec-as` option to replace the `einhorn` command with a command or script of your own. This might be useful for those with a Capistrano like deploy process that has changing symlinks. To ensure that you are following the symlinks you could use a bash script like this.
|
200
|
+
|
201
|
+
#!/bin/bash
|
202
|
+
|
203
|
+
cd <symlinked directory>
|
204
|
+
exec /usr/local/bin/einhorn "$@"
|
205
|
+
|
206
|
+
Then you could set `--reexec-as=` to the name of your bash script and it will run in place of the plain einhorn command.
|
207
|
+
|
197
208
|
### Options
|
198
209
|
|
199
210
|
-b, --bind ADDR Bind an address and add the corresponding FD via the environment
|
@@ -217,11 +228,18 @@ pass `-c <name>`.
|
|
217
228
|
Unix nice level at which to run the einhorn processes. If not running as root, make sure to ulimit -e as appopriate.
|
218
229
|
--with-state-fd STATE [Internal option] With file descriptor containing state
|
219
230
|
--upgrade-check [Internal option] Check if Einhorn can exec itself and exit with status 0 before loading code
|
231
|
+
-t, --signal-timeout=T If children do not react to signals after T seconds, escalate to SIGKILL
|
220
232
|
--version Show version
|
221
233
|
|
222
234
|
|
223
235
|
## Contributing
|
224
236
|
|
237
|
+
### Development Status
|
238
|
+
|
239
|
+
Einhorn is still in active operation at Stripe, but we are not maintaining
|
240
|
+
Einhorn actively. PRs are very welcome, and we will review and merge,
|
241
|
+
but we are unlikely to triage and fix reported issues without code.
|
242
|
+
|
225
243
|
Contributions are definitely welcome. To contribute, just follow the
|
226
244
|
usual workflow:
|
227
245
|
|
@@ -249,10 +267,28 @@ EventMachine-LE to support file-descriptor passing. Check out
|
|
249
267
|
|
250
268
|
## Compatibility
|
251
269
|
|
252
|
-
Einhorn
|
270
|
+
Einhorn runs in Ruby 2.0, 2.1, and 2.2
|
271
|
+
|
272
|
+
The following libraries ease integration with Einhorn with languages other than
|
273
|
+
Ruby:
|
274
|
+
|
275
|
+
- **[go-einhorn](https://github.com/stripe/go-einhorn)**: Stripe's own library
|
276
|
+
for *talking* to an einhorn master (doesn't wrap socket code).
|
277
|
+
- **[goji](https://github.com/zenazn/goji/)**: Go (golang) server framework. The
|
278
|
+
[`bind`](https://godoc.org/github.com/zenazn/goji/bind) and
|
279
|
+
[`graceful`](https://godoc.org/github.com/zenazn/goji/graceful)
|
280
|
+
packages provide helpers and HTTP/TCP connection wrappers for Einhorn
|
281
|
+
integration.
|
282
|
+
- **[github.com/CHH/einhorn](https://github.com/CHH/einhorn)**: PHP library
|
283
|
+
- **[thin-attach\_socket](https://github.com/ConradIrwin/thin-attach_socket)**:
|
284
|
+
run `thin` behind Einhorn
|
285
|
+
- **[baseplate](https://reddit.github.io/baseplate/cli/serve.html)**: a
|
286
|
+
collection of Python helpers and libraries, with support for running behind
|
287
|
+
Einhorn
|
288
|
+
|
289
|
+
*NB: this list should not imply any official endorsement or vetting!*
|
253
290
|
|
254
291
|
## About
|
255
292
|
|
256
|
-
Einhorn is a project of [Stripe](https://stripe.com), led by [
|
257
|
-
Brockman](https://twitter.com/thegdb). Feel free to get in touch at
|
293
|
+
Einhorn is a project of [Stripe](https://stripe.com), led by [Carl Jackson](https://github.com/zenazn). Feel free to get in touch at
|
258
294
|
info@stripe.com.
|
data/README.md.in
CHANGED
@@ -67,10 +67,28 @@ EventMachine-LE to support file-descriptor passing. Check out
|
|
67
67
|
|
68
68
|
## Compatibility
|
69
69
|
|
70
|
-
Einhorn
|
70
|
+
Einhorn runs in Ruby 2.0, 2.1, and 2.2
|
71
|
+
|
72
|
+
The following libraries ease integration with Einhorn with languages other than
|
73
|
+
Ruby:
|
74
|
+
|
75
|
+
- **[go-einhorn](https://github.com/stripe/go-einhorn)**: Stripe's own library
|
76
|
+
for *talking* to an einhorn master (doesn't wrap socket code).
|
77
|
+
- **[goji](https://github.com/zenazn/goji/)**: Go (golang) server framework. The
|
78
|
+
[`bind`](https://godoc.org/github.com/zenazn/goji/bind) and
|
79
|
+
[`graceful`](https://godoc.org/github.com/zenazn/goji/graceful)
|
80
|
+
packages provide helpers and HTTP/TCP connection wrappers for Einhorn
|
81
|
+
integration.
|
82
|
+
- **[github.com/CHH/einhorn](https://github.com/CHH/einhorn)**: PHP library
|
83
|
+
- **[thin-attach\_socket](https://github.com/ConradIrwin/thin-attach_socket)**:
|
84
|
+
run `thin` behind Einhorn
|
85
|
+
- **[baseplate](https://reddit.github.io/baseplate/cli/serve.html)**: a
|
86
|
+
collection of Python helpers and libraries, with support for running behind
|
87
|
+
Einhorn
|
88
|
+
|
89
|
+
*NB: this list should not imply any official endorsement or vetting!*
|
71
90
|
|
72
91
|
## About
|
73
92
|
|
74
|
-
Einhorn is a project of [Stripe](https://stripe.com), led by [
|
75
|
-
Brockman](https://twitter.com/thegdb). Feel free to get in touch at
|
93
|
+
Einhorn is a project of [Stripe](https://stripe.com), led by [Carl Jackson](https://github.com/zenazn). Feel free to get in touch at
|
76
94
|
info@stripe.com.
|
data/bin/einhorn
CHANGED
@@ -266,8 +266,11 @@ if true # $0 == __FILE__
|
|
266
266
|
Einhorn::Command.quieter(false)
|
267
267
|
end
|
268
268
|
|
269
|
-
opts.on('-s', '--seconds N', 'Number of seconds to wait until respawning') do |
|
270
|
-
|
269
|
+
opts.on('-s', '--seconds N', 'Number of seconds to wait until respawning') do |s|
|
270
|
+
seconds = Float(s)
|
271
|
+
raise ArgumentError, 'seconds must be > 0' if seconds.zero?
|
272
|
+
|
273
|
+
Einhorn::State.config[:seconds] = seconds
|
271
274
|
end
|
272
275
|
|
273
276
|
opts.on('-v', '--verbose', 'Make output verbose (can be reconfigured on the fly)') do
|
@@ -310,6 +313,18 @@ if true # $0 == __FILE__
|
|
310
313
|
Einhorn::State.signal_timeout = Integer(t)
|
311
314
|
end
|
312
315
|
|
316
|
+
opts.on('--max-unacked=N', 'Maximum number of workers that can be unacked when gracefully upgrading.') do |n|
|
317
|
+
Einhorn::State.config[:max_unacked] = Integer(n)
|
318
|
+
end
|
319
|
+
|
320
|
+
opts.on('--max-upgrade-additional=N', 'Maximum number of additional workers that can be running during an upgrade.') do |n|
|
321
|
+
Einhorn::State.config[:max_upgrade_additional] = Integer(n)
|
322
|
+
end
|
323
|
+
|
324
|
+
opts.on('--gc-before-fork', 'Run the GC three times before forking to improve memory sharing for copy-on-write.') do
|
325
|
+
Einhorn::State.config[:gc_before_fork] = true
|
326
|
+
end
|
327
|
+
|
313
328
|
opts.on('--version', 'Show version') do
|
314
329
|
puts Einhorn::VERSION
|
315
330
|
exit
|
data/bin/einhornsh
CHANGED
@@ -21,22 +21,14 @@ module Einhorn
|
|
21
21
|
end
|
22
22
|
|
23
23
|
def send_command(hash)
|
24
|
-
|
25
|
-
|
26
|
-
|
27
|
-
|
28
|
-
|
29
|
-
|
30
|
-
|
31
|
-
puts "Invalid response type #{response.class}: #{response.inspect}"
|
32
|
-
end
|
24
|
+
@client.send_command(hash)
|
25
|
+
while response = @client.receive_message
|
26
|
+
if response.kind_of?(Hash)
|
27
|
+
yield response['message']
|
28
|
+
return unless response['wait']
|
29
|
+
else
|
30
|
+
puts "Invalid response type #{response.class}: #{response.inspect}"
|
33
31
|
end
|
34
|
-
rescue Errno::EPIPE => e
|
35
|
-
emit("einhornsh: Error communicating with Einhorn: #{e} (#{e.class})")
|
36
|
-
emit("einhornsh: Attempting to reconnect...")
|
37
|
-
reconnect
|
38
|
-
|
39
|
-
retry
|
40
32
|
end
|
41
33
|
end
|
42
34
|
|
data/einhorn.gemspec
CHANGED
@@ -15,6 +15,7 @@ Gem::Specification.new do |gem|
|
|
15
15
|
gem.name = 'einhorn'
|
16
16
|
gem.require_paths = ['lib']
|
17
17
|
|
18
|
+
gem.add_development_dependency 'rack', '~> 1.6'
|
18
19
|
gem.add_development_dependency 'rake'
|
19
20
|
gem.add_development_dependency 'pry'
|
20
21
|
gem.add_development_dependency 'minitest', '< 5.0'
|
data/example/pool_worker.rb
CHANGED
data/lib/einhorn.rb
CHANGED
@@ -45,6 +45,7 @@ module Einhorn
|
|
45
45
|
:orig_cmd => nil,
|
46
46
|
:bind => [],
|
47
47
|
:bind_fds => [],
|
48
|
+
:bound_ports => [],
|
48
49
|
:cmd => nil,
|
49
50
|
:script_name => nil,
|
50
51
|
:respawn => true,
|
@@ -68,14 +69,9 @@ module Einhorn
|
|
68
69
|
:reexec_commandline => nil,
|
69
70
|
:drop_environment_variables => [],
|
70
71
|
:signal_timeout => nil,
|
72
|
+
:preloaded => false
|
71
73
|
}
|
72
74
|
end
|
73
|
-
|
74
|
-
def self.dumpable_state
|
75
|
-
dump = state
|
76
|
-
dump[:reloading_for_preload_upgrade] = dump[:reloading_for_upgrade]
|
77
|
-
dump
|
78
|
-
end
|
79
75
|
end
|
80
76
|
|
81
77
|
module TransientState
|
@@ -83,7 +79,6 @@ module Einhorn
|
|
83
79
|
def self.default_state
|
84
80
|
{
|
85
81
|
:whatami => :master,
|
86
|
-
:preloaded => false,
|
87
82
|
:script_name => nil,
|
88
83
|
:argv => [],
|
89
84
|
:environ => {},
|
@@ -110,38 +105,24 @@ module Einhorn
|
|
110
105
|
updated_state = old_state.dup
|
111
106
|
|
112
107
|
# Handle changes in state format updates from previous einhorn versions
|
113
|
-
if store == Einhorn::State
|
114
|
-
#
|
115
|
-
|
116
|
-
|
117
|
-
|
118
|
-
|
119
|
-
|
120
|
-
|
121
|
-
|
122
|
-
|
123
|
-
|
124
|
-
|
125
|
-
|
126
|
-
|
127
|
-
# Depending on what is passed for --reexec-as, it's possible
|
128
|
-
# that the process received a SIGCHLD while something other
|
129
|
-
# than einhorn was the active executable. If that happened,
|
130
|
-
# einhorn might not know about a dead child, so let's check
|
131
|
-
# them all
|
132
|
-
dead = []
|
133
|
-
updated_state[:children].each do |pid, v|
|
134
|
-
begin
|
135
|
-
pid = Process.wait(pid, Process::WNOHANG)
|
136
|
-
dead << pid if pid
|
137
|
-
rescue Errno::ECHILD
|
138
|
-
dead << pid
|
139
|
-
end
|
140
|
-
end
|
141
|
-
Einhorn::Event::Timer.open(0) do
|
142
|
-
dead.each {|pid| Einhorn::Command.mourn(pid)}
|
108
|
+
if store == Einhorn::State && updated_state[:children]
|
109
|
+
# Depending on what is passed for --reexec-as, it's possible
|
110
|
+
# that the process received a SIGCHLD while something other
|
111
|
+
# than einhorn was the active executable. If that happened,
|
112
|
+
# einhorn might not know about a dead child, so let's check
|
113
|
+
# them all
|
114
|
+
dead = []
|
115
|
+
updated_state[:children].each do |pid, v|
|
116
|
+
begin
|
117
|
+
pid = Process.wait(pid, Process::WNOHANG)
|
118
|
+
dead << pid if pid
|
119
|
+
rescue Errno::ECHILD
|
120
|
+
dead << pid
|
143
121
|
end
|
144
122
|
end
|
123
|
+
Einhorn::Event::Timer.open(0) do
|
124
|
+
dead.each {|pid| Einhorn::Command.cleanup(pid)}
|
125
|
+
end
|
145
126
|
end
|
146
127
|
|
147
128
|
default = store.default_state
|
@@ -182,20 +163,23 @@ module Einhorn
|
|
182
163
|
end
|
183
164
|
|
184
165
|
Einhorn::TransientState.socket_handles << sd
|
185
|
-
sd.fileno
|
166
|
+
[sd.fileno, sd.local_address.ip_port]
|
186
167
|
end
|
187
168
|
|
188
169
|
# Implement these ourselves so it plays nicely with state persistence
|
189
170
|
def self.log_debug(msg, tag=nil)
|
190
171
|
$stderr.puts("#{log_tag} DEBUG: #{msg}\n") if Einhorn::State.verbosity <= 0
|
172
|
+
$stderr.flush
|
191
173
|
self.send_tagged_message(tag, msg) if tag
|
192
174
|
end
|
193
175
|
def self.log_info(msg, tag=nil)
|
194
176
|
$stderr.puts("#{log_tag} INFO: #{msg}\n") if Einhorn::State.verbosity <= 1
|
177
|
+
$stderr.flush
|
195
178
|
self.send_tagged_message(tag, msg) if tag
|
196
179
|
end
|
197
180
|
def self.log_error(msg, tag=nil)
|
198
181
|
$stderr.puts("#{log_tag} ERROR: #{msg}\n") if Einhorn::State.verbosity <= 2
|
182
|
+
$stderr.flush
|
199
183
|
self.send_tagged_message(tag, "ERROR: #{msg}") if tag
|
200
184
|
end
|
201
185
|
|
@@ -246,6 +230,8 @@ module Einhorn
|
|
246
230
|
set_argv(Einhorn::State.cmd, false)
|
247
231
|
|
248
232
|
begin
|
233
|
+
# Reset preloaded state to false - this allows us to monitor for failed preloads during reloads.
|
234
|
+
Einhorn::State.preloaded = false
|
249
235
|
# If it's not going to be requireable, then load it.
|
250
236
|
if !path.end_with?('.rb') && File.exists?(path)
|
251
237
|
log_info("Loading #{path} (if this hangs, make sure your code can be properly loaded as a library)", :upgrade)
|
@@ -253,13 +239,15 @@ module Einhorn
|
|
253
239
|
else
|
254
240
|
log_info("Requiring #{path} (if this hangs, make sure your code can be properly loaded as a library)", :upgrade)
|
255
241
|
require path
|
242
|
+
|
243
|
+
force_move_to_oldgen if Einhorn::State.config[:gc_before_fork]
|
256
244
|
end
|
257
245
|
rescue Exception => e
|
258
246
|
log_info("Proceeding with postload -- could not load #{path}: #{e} (#{e.class})\n #{e.backtrace.join("\n ")}", :upgrade)
|
259
247
|
else
|
260
248
|
if defined?(einhorn_main)
|
261
249
|
log_info("Successfully loaded #{path}", :upgrade)
|
262
|
-
Einhorn::
|
250
|
+
Einhorn::State.preloaded = true
|
263
251
|
else
|
264
252
|
log_info("Proceeding with postload -- loaded #{path}, but no einhorn_main method was defined", :upgrade)
|
265
253
|
end
|
@@ -267,6 +255,22 @@ module Einhorn
|
|
267
255
|
end
|
268
256
|
end
|
269
257
|
|
258
|
+
# Make the GC more copy-on-write friendly by forcibly incrementing the generation
|
259
|
+
# counter on all objects to its maximum value. Learn more at: https://github.com/ko1/nakayoshi_fork
|
260
|
+
def self.force_move_to_oldgen
|
261
|
+
log_info("Starting GC to improve copy-on-write memory sharing", :upgrade)
|
262
|
+
|
263
|
+
GC.start
|
264
|
+
3.times do
|
265
|
+
GC.start(full_mark: false)
|
266
|
+
end
|
267
|
+
|
268
|
+
GC.compact if GC.respond_to?(:compact)
|
269
|
+
|
270
|
+
log_info("Finished GC after preloading", :upgrade)
|
271
|
+
end
|
272
|
+
private_class_method :force_move_to_oldgen
|
273
|
+
|
270
274
|
def self.set_argv(cmd, set_ps_name)
|
271
275
|
# TODO: clean up this hack
|
272
276
|
idx = 0
|
@@ -324,8 +328,9 @@ module Einhorn
|
|
324
328
|
|
325
329
|
def self.socketify_env!
|
326
330
|
Einhorn::State.bind.each do |host, port, flags|
|
327
|
-
fd = bind(host, port, flags)
|
331
|
+
fd, actual_port = bind(host, port, flags)
|
328
332
|
Einhorn::State.bind_fds << fd
|
333
|
+
Einhorn::State.bound_ports << actual_port
|
329
334
|
end
|
330
335
|
end
|
331
336
|
|
@@ -339,7 +344,8 @@ module Einhorn
|
|
339
344
|
host = $2
|
340
345
|
port = $3
|
341
346
|
flags = $4.split(',').select {|flag| flag.length > 0}.map {|flag| flag.downcase}
|
342
|
-
|
347
|
+
Einhorn::State.sockets[[host, port]] ||= bind(host, port, flags)[0]
|
348
|
+
fd = Einhorn::State.sockets[[host, port]]
|
343
349
|
"#{opt}#{fd}"
|
344
350
|
else
|
345
351
|
arg
|
@@ -431,6 +437,14 @@ module Einhorn
|
|
431
437
|
Einhorn::State.reloading_for_upgrade = false
|
432
438
|
end
|
433
439
|
|
440
|
+
# If setting a signal-timeout, timeout the event loop
|
441
|
+
# in the same timeframe, ensuring processes are culled
|
442
|
+
# on a regular basis.
|
443
|
+
if Einhorn::State.signal_timeout
|
444
|
+
Einhorn::Event.default_timeout = Einhorn::Event.default_timeout.nil? ?
|
445
|
+
Einhorn::State.signal_timeout : [Einhorn::State.signal_timeout, Einhorn::Event.default_timeout].min
|
446
|
+
end
|
447
|
+
|
434
448
|
while Einhorn::State.respawn || Einhorn::State.children.size > 0
|
435
449
|
log_debug("Entering event loop")
|
436
450
|
|
data/lib/einhorn/client.rb
CHANGED
@@ -1,5 +1,4 @@
|
|
1
1
|
require 'set'
|
2
|
-
require 'uri'
|
3
2
|
require 'yaml'
|
4
3
|
|
5
4
|
module Einhorn
|
@@ -22,12 +21,12 @@ module Einhorn
|
|
22
21
|
|
23
22
|
def self.serialize_message(message)
|
24
23
|
serialized = YAML.dump(message)
|
25
|
-
escaped =
|
24
|
+
escaped = serialized.gsub(/%|\n/, '%' => '%25', "\n" => '%0A')
|
26
25
|
escaped + "\n"
|
27
26
|
end
|
28
27
|
|
29
28
|
def self.deserialize_message(line)
|
30
|
-
serialized =
|
29
|
+
serialized = line.gsub(/%(25|0A)/, '%25' => '%', '%0A' => "\n")
|
31
30
|
YAML.load(serialized)
|
32
31
|
end
|
33
32
|
end
|
data/lib/einhorn/command.rb
CHANGED
@@ -3,6 +3,7 @@ require 'set'
|
|
3
3
|
require 'tmpdir'
|
4
4
|
|
5
5
|
require 'einhorn/command/interface'
|
6
|
+
require 'einhorn/prctl'
|
6
7
|
|
7
8
|
module Einhorn
|
8
9
|
module Command
|
@@ -10,18 +11,16 @@ module Einhorn
|
|
10
11
|
begin
|
11
12
|
while true
|
12
13
|
Einhorn.log_debug('Going to reap a child process')
|
13
|
-
|
14
14
|
pid = Process.wait(-1, Process::WNOHANG)
|
15
15
|
return unless pid
|
16
|
-
|
16
|
+
cleanup(pid)
|
17
17
|
Einhorn::Event.break_loop
|
18
18
|
end
|
19
19
|
rescue Errno::ECHILD
|
20
20
|
end
|
21
21
|
end
|
22
22
|
|
23
|
-
|
24
|
-
def self.mourn(pid)
|
23
|
+
def self.cleanup(pid)
|
25
24
|
unless spec = Einhorn::State.children[pid]
|
26
25
|
Einhorn.log_error("Could not find any config for exited child #{pid.inspect}! This probably indicates a bug in Einhorn.")
|
27
26
|
return
|
@@ -40,11 +39,23 @@ module Einhorn
|
|
40
39
|
case type = spec[:type]
|
41
40
|
when :worker
|
42
41
|
Einhorn.log_info("===> Exited worker #{pid.inspect}#{extra}", :upgrade)
|
42
|
+
when :state_passer
|
43
|
+
Einhorn.log_debug("===> Exited state passing process #{pid.inspect}", :upgrade)
|
43
44
|
else
|
44
45
|
Einhorn.log_error("===> Exited process #{pid.inspect} has unrecgonized type #{type.inspect}: #{spec.inspect}", :upgrade)
|
45
46
|
end
|
46
47
|
end
|
47
48
|
|
49
|
+
def self.register_ping(pid, request_id)
|
50
|
+
unless spec = Einhorn::State.children[pid]
|
51
|
+
Einhorn.log_error("Could not find state for PID #{pid.inspect}; ignoring ACK.")
|
52
|
+
return
|
53
|
+
end
|
54
|
+
|
55
|
+
spec[:pinged_at] = Time.now
|
56
|
+
spec[:pinged_request_id] = request_id
|
57
|
+
end
|
58
|
+
|
48
59
|
def self.register_manual_ack(pid)
|
49
60
|
ack_mode = Einhorn::State.ack_mode
|
50
61
|
unless ack_mode[:type] == :manual
|
@@ -98,8 +109,8 @@ module Einhorn
|
|
98
109
|
|
99
110
|
def self.signal_all(signal, children=nil, record=true)
|
100
111
|
children ||= Einhorn::WorkerPool.workers
|
112
|
+
signaled = {}
|
101
113
|
|
102
|
-
signaled = []
|
103
114
|
Einhorn.log_info("Sending #{signal} to #{children.inspect}", :upgrade)
|
104
115
|
|
105
116
|
children.each do |child|
|
@@ -113,22 +124,31 @@ module Einhorn
|
|
113
124
|
Einhorn.log_error("Re-sending #{signal} to already-signaled child #{child.inspect}. It may be slow to spin down, or it may be swallowing #{signal}s.", :upgrade)
|
114
125
|
end
|
115
126
|
spec[:signaled].add(signal)
|
127
|
+
spec[:last_signaled_at] = Time.now
|
116
128
|
end
|
117
129
|
|
118
130
|
begin
|
119
131
|
Process.kill(signal, child)
|
120
132
|
rescue Errno::ESRCH
|
133
|
+
Einhorn.log_debug("Attempted to #{signal} child #{child.inspect} but the process does not exist", :upgrade)
|
121
134
|
else
|
122
|
-
signaled
|
135
|
+
signaled[child] = spec
|
123
136
|
end
|
124
137
|
end
|
125
138
|
|
126
|
-
if Einhorn::State.signal_timeout
|
139
|
+
if Einhorn::State.signal_timeout && record
|
127
140
|
Einhorn::Event::Timer.open(Einhorn::State.signal_timeout) do
|
128
141
|
children.each do |child|
|
129
|
-
|
142
|
+
spec = Einhorn::State.children[child]
|
143
|
+
next unless spec # Process is already dead and removed by cleanup
|
144
|
+
signaled_spec = signaled[child]
|
145
|
+
next unless signaled_spec # We got ESRCH when trying to signal
|
146
|
+
if spec[:spinup_time] != signaled_spec[:spinup_time]
|
147
|
+
Einhorn.log_info("Different spinup time recorded for #{child} after #{Einhorn::State.signal_timeout}s. This probably indicates a PID rollover.", :upgrade)
|
148
|
+
next
|
149
|
+
end
|
130
150
|
|
131
|
-
Einhorn.log_info("Child #{child.inspect} is still active after #{Einhorn::State.signal_timeout}. Sending SIGKILL.")
|
151
|
+
Einhorn.log_info("Child #{child.inspect} is still active after #{Einhorn::State.signal_timeout}s. Sending SIGKILL.")
|
132
152
|
begin
|
133
153
|
Process.kill('KILL', child)
|
134
154
|
rescue Errno::ESRCH
|
@@ -136,11 +156,12 @@ module Einhorn
|
|
136
156
|
spec[:signaled].add('KILL')
|
137
157
|
end
|
138
158
|
end
|
139
|
-
end
|
140
159
|
|
141
|
-
|
160
|
+
Einhorn.log_info("Successfully sent #{signal}s to #{signaled.length} processes: #{signaled.keys}")
|
161
|
+
end
|
142
162
|
end
|
143
163
|
|
164
|
+
|
144
165
|
def self.increment
|
145
166
|
Einhorn::Event.break_loop
|
146
167
|
old = Einhorn::State.config[:number]
|
@@ -211,6 +232,7 @@ module Einhorn
|
|
211
232
|
|
212
233
|
fork do
|
213
234
|
Einhorn::TransientState.whatami = :state_passer
|
235
|
+
Einhorn::State.children[Process.pid] = {type: :state_passer}
|
214
236
|
Einhorn::State.generation += 1
|
215
237
|
read.close
|
216
238
|
|
@@ -256,7 +278,8 @@ module Einhorn
|
|
256
278
|
def self.spinup(cmd=nil)
|
257
279
|
cmd ||= Einhorn::State.cmd
|
258
280
|
index = next_index
|
259
|
-
|
281
|
+
expected_ppid = Process.pid
|
282
|
+
if Einhorn::State.preloaded
|
260
283
|
pid = fork do
|
261
284
|
Einhorn::TransientState.whatami = :worker
|
262
285
|
prepare_child_process
|
@@ -268,6 +291,8 @@ module Einhorn
|
|
268
291
|
|
269
292
|
reseed_random
|
270
293
|
|
294
|
+
setup_parent_watch(expected_ppid)
|
295
|
+
|
271
296
|
prepare_child_environment(index)
|
272
297
|
einhorn_main
|
273
298
|
end
|
@@ -277,6 +302,7 @@ module Einhorn
|
|
277
302
|
prepare_child_process
|
278
303
|
|
279
304
|
Einhorn.log_info("About to exec #{cmd.inspect}")
|
305
|
+
Einhorn::Command::Interface.uninit
|
280
306
|
# Here's the only case where cloexec would help. Since we
|
281
307
|
# have to track and manually close FDs for other cases, we
|
282
308
|
# may as well just reuse close_all rather than also set
|
@@ -285,20 +311,24 @@ module Einhorn
|
|
285
311
|
# Note that Ruby 1.9's close_others option is useful here.
|
286
312
|
Einhorn::Event.close_all_for_worker
|
287
313
|
|
314
|
+
setup_parent_watch(expected_ppid)
|
315
|
+
|
288
316
|
prepare_child_environment(index)
|
289
317
|
Einhorn::Compat.exec(cmd[0], cmd[1..-1], :close_others => false)
|
290
318
|
end
|
291
319
|
end
|
292
320
|
|
293
321
|
Einhorn.log_info("===> Launched #{pid} (index: #{index})", :upgrade)
|
322
|
+
Einhorn::State.last_spinup = Time.now
|
294
323
|
Einhorn::State.children[pid] = {
|
295
324
|
:type => :worker,
|
296
325
|
:version => Einhorn::State.version,
|
297
326
|
:acked => false,
|
298
327
|
:signaled => Set.new,
|
299
|
-
:
|
328
|
+
:last_signaled_at => nil,
|
329
|
+
:index => index,
|
330
|
+
:spinup_time => Einhorn::State.last_spinup,
|
300
331
|
}
|
301
|
-
Einhorn::State.last_spinup = Time.now
|
302
332
|
|
303
333
|
# Set up whatever's needed for ACKing
|
304
334
|
ack_mode = Einhorn::State.ack_mode
|
@@ -364,9 +394,28 @@ module Einhorn
|
|
364
394
|
end
|
365
395
|
|
366
396
|
def self.prepare_child_process
|
397
|
+
Process.setpgrp
|
367
398
|
Einhorn.renice_self
|
368
399
|
end
|
369
400
|
|
401
|
+
def self.setup_parent_watch(expected_ppid)
|
402
|
+
if Einhorn::State.kill_children_on_exit then
|
403
|
+
begin
|
404
|
+
# NB: Having the USR2 signal handler set to terminate (the default) at
|
405
|
+
# this point is required. If it's set to a ruby handler, there are
|
406
|
+
# race conditions that could cause the worker to leak.
|
407
|
+
|
408
|
+
Einhorn::Prctl.set_pdeathsig("USR2")
|
409
|
+
if Process.ppid != expected_ppid then
|
410
|
+
Einhorn.log_error("Parent process died before we set pdeathsig; cowardly refusing to exec child process.")
|
411
|
+
exit(1)
|
412
|
+
end
|
413
|
+
rescue NotImplementedError
|
414
|
+
# Unsupported OS; silently continue.
|
415
|
+
end
|
416
|
+
end
|
417
|
+
end
|
418
|
+
|
370
419
|
# @param options [Hash]
|
371
420
|
#
|
372
421
|
# @option options [Boolean] :smooth (false) Whether to perform a smooth or
|
@@ -451,6 +500,41 @@ module Einhorn
|
|
451
500
|
Einhorn.log_info("Have too many workers at the current version, so killing off #{excess.length} of them.")
|
452
501
|
signal_all("USR2", excess)
|
453
502
|
end
|
503
|
+
|
504
|
+
# Ensure all signaled workers that have outlived signal_timeout get killed.
|
505
|
+
kill_expired_signaled_workers if Einhorn::State.signal_timeout
|
506
|
+
end
|
507
|
+
|
508
|
+
def self.kill_expired_signaled_workers
|
509
|
+
now = Time.now
|
510
|
+
children = Einhorn::State.children.select do |_,c|
|
511
|
+
# Only interested in USR2 signaled workers
|
512
|
+
next unless c[:signaled] && c[:signaled].length > 0
|
513
|
+
next unless c[:signaled].include?('USR2')
|
514
|
+
|
515
|
+
# Ignore processes that have received KILL since it can't be trapped.
|
516
|
+
next if c[:signaled].include?('KILL')
|
517
|
+
|
518
|
+
# Filter out those children that have not reached signal_timeout yet.
|
519
|
+
next unless c[:last_signaled_at]
|
520
|
+
expires_at = c[:last_signaled_at] + Einhorn::State.signal_timeout
|
521
|
+
next unless now >= expires_at
|
522
|
+
|
523
|
+
true
|
524
|
+
end
|
525
|
+
|
526
|
+
Einhorn.log_info("#{children.size} expired signaled workers found.") if children.size > 0
|
527
|
+
children.each do |pid, child|
|
528
|
+
Einhorn.log_info("Child #{pid.inspect} was signaled #{(child[:last_signaled_at] - now).abs.to_i}s ago. Sending SIGKILL as it is still active after #{Einhorn::State.signal_timeout}s timeout.", :upgrade)
|
529
|
+
begin
|
530
|
+
Process.kill('KILL', pid)
|
531
|
+
rescue Errno::ESRCH
|
532
|
+
Einhorn.log_debug("Attempted to SIGKILL child #{pid.inspect} but the process does not exist.")
|
533
|
+
end
|
534
|
+
|
535
|
+
child[:signaled].add('KILL')
|
536
|
+
child[:last_signaled_at] = Time.now
|
537
|
+
end
|
454
538
|
end
|
455
539
|
|
456
540
|
def self.stop_respawning
|
@@ -478,10 +562,18 @@ module Einhorn
|
|
478
562
|
missing.times {spinup}
|
479
563
|
end
|
480
564
|
|
565
|
+
# Unbounded exponential backoff is not a thing: we run into problems if
|
566
|
+
# e.g., each of our hundred workers simultaneously fail to boot for the same
|
567
|
+
# ephemeral reason. Instead cap backoff by some reasonable maximum, so we
|
568
|
+
# don't wait until the heat death of the universe to spin up new capacity.
|
569
|
+
MAX_SPINUP_INTERVAL = 30.0
|
570
|
+
|
481
571
|
def self.replenish_gradually(max_unacked=nil)
|
482
572
|
return if Einhorn::TransientState.has_outstanding_spinup_timer
|
483
573
|
return unless Einhorn::WorkerPool.missing_worker_count > 0
|
484
574
|
|
575
|
+
max_unacked ||= Einhorn::State.config[:max_unacked]
|
576
|
+
|
485
577
|
# default to spinning up at most NCPU workers at once
|
486
578
|
unless max_unacked
|
487
579
|
begin
|
@@ -500,14 +592,12 @@ module Einhorn
|
|
500
592
|
# Exponentially backoff automated spinup if we're just having
|
501
593
|
# things die before ACKing
|
502
594
|
spinup_interval = Einhorn::State.config[:seconds] * (1.5 ** Einhorn::State.consecutive_deaths_before_ack)
|
595
|
+
spinup_interval = [spinup_interval, MAX_SPINUP_INTERVAL].min
|
503
596
|
seconds_ago = (Time.now - Einhorn::State.last_spinup).to_f
|
504
597
|
|
505
598
|
if seconds_ago > spinup_interval
|
506
|
-
|
507
|
-
|
508
|
-
Einhorn.log_debug("There are #{unacked} unacked new workers, and max_unacked is #{max_unacked}, so not spinning up a new process")
|
509
|
-
else
|
510
|
-
msg = "Last spinup was #{seconds_ago}s ago, and spinup_interval is #{spinup_interval}s, so spinning up a new process"
|
599
|
+
if trigger_spinup?(max_unacked)
|
600
|
+
msg = "Last spinup was #{seconds_ago}s ago, and spinup_interval is #{spinup_interval}s, so spinning up a new process."
|
511
601
|
|
512
602
|
if Einhorn::State.consecutive_deaths_before_ack > 0
|
513
603
|
Einhorn.log_info("#{msg} (there have been #{Einhorn::State.consecutive_deaths_before_ack} consecutive unacked worker deaths)", :upgrade)
|
@@ -518,7 +608,7 @@ module Einhorn
|
|
518
608
|
spinup
|
519
609
|
end
|
520
610
|
else
|
521
|
-
Einhorn.log_debug("Last spinup was #{seconds_ago}s ago, and spinup_interval is #{spinup_interval}s, so not spinning up a new process")
|
611
|
+
Einhorn.log_debug("Last spinup was #{seconds_ago}s ago, and spinup_interval is #{spinup_interval}s, so not spinning up a new process.")
|
522
612
|
end
|
523
613
|
|
524
614
|
Einhorn::TransientState.has_outstanding_spinup_timer = true
|
@@ -541,5 +631,22 @@ module Einhorn
|
|
541
631
|
Einhorn.log_info(output) if log
|
542
632
|
output
|
543
633
|
end
|
634
|
+
|
635
|
+
def self.trigger_spinup?(max_unacked)
|
636
|
+
unacked = Einhorn::WorkerPool.unacked_unsignaled_modern_workers.length
|
637
|
+
if unacked >= max_unacked
|
638
|
+
Einhorn.log_info("There are #{unacked} unacked new workers, and max_unacked is #{max_unacked}, so not spinning up a new process.")
|
639
|
+
return false
|
640
|
+
elsif Einhorn::State.config[:max_upgrade_additional]
|
641
|
+
capacity_exceeded = (Einhorn::State.config[:number] + Einhorn::State.config[:max_upgrade_additional]) - Einhorn::WorkerPool.workers_with_state.length
|
642
|
+
if capacity_exceeded < 0
|
643
|
+
Einhorn.log_info("Over worker capacity by #{capacity_exceeded.abs} during upgrade, #{Einhorn::WorkerPool.modern_workers.length} new workers of #{Einhorn::WorkerPool.workers_with_state.length} total. Waiting for old workers to exit before spinning up a process.")
|
644
|
+
|
645
|
+
return false
|
646
|
+
end
|
647
|
+
end
|
648
|
+
|
649
|
+
true
|
650
|
+
end
|
544
651
|
end
|
545
652
|
end
|