einhorn 0.7.0 → 0.8.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/.travis.yml +6 -4
- data/README.md +39 -3
- data/README.md.in +21 -3
- data/bin/einhorn +17 -2
- data/bin/einhornsh +7 -15
- data/einhorn.gemspec +1 -0
- data/example/pool_worker.rb +1 -1
- data/lib/einhorn.rb +55 -41
- data/lib/einhorn/client.rb +2 -3
- data/lib/einhorn/command.rb +127 -20
- data/lib/einhorn/command/interface.rb +21 -10
- data/lib/einhorn/event.rb +10 -1
- data/lib/einhorn/event/connection.rb +2 -2
- data/lib/einhorn/prctl.rb +26 -0
- data/lib/einhorn/prctl_linux.rb +49 -0
- data/lib/einhorn/version.rb +1 -1
- data/lib/einhorn/worker.rb +47 -25
- data/test/integration/_lib/fixtures/exit_during_upgrade/exiting_server.rb +1 -0
- data/test/integration/_lib/fixtures/pdeathsig_printer/pdeathsig_printer.rb +29 -0
- data/test/integration/_lib/fixtures/signal_timeout/sleepy_server.rb +23 -0
- data/test/integration/_lib/fixtures/upgrade_project/upgrading_server.rb +2 -0
- data/test/integration/_lib/helpers/einhorn_helpers.rb +14 -9
- data/test/integration/pdeathsig.rb +26 -0
- data/test/integration/upgrading.rb +47 -0
- data/test/unit/_lib/bad_worker.rb +7 -0
- data/test/unit/_lib/sleep_worker.rb +5 -0
- data/test/unit/einhorn.rb +41 -3
- data/test/unit/einhorn/command.rb +114 -0
- metadata +48 -38
checksums.yaml
ADDED
@@ -0,0 +1,7 @@
|
|
1
|
+
---
|
2
|
+
SHA256:
|
3
|
+
metadata.gz: f3f9c31b861db9b8b7ab5d2345be06f60223d60e1243bc2618ec5ef1db2b72e5
|
4
|
+
data.tar.gz: 38144bb080c8719b4d164bcbf8d96d498844626a66b9a522c75fb8bab6309c4f
|
5
|
+
SHA512:
|
6
|
+
metadata.gz: 3370ff020a249f5af7be26bfb48392a0b5721d139b895684e37aa92f612bb00d10e48eeb1b95acc15c74448af96c1df039fc841f6ff45ffdf91d6a16bcc614ab
|
7
|
+
data.tar.gz: 3e6b93f1ed82a46a9578dd3dd59cdb0c5c962772d285e59f1e78e531b7577fa4debbc7e3a28a0e4f9cddf053e4481979b37229ce40a6ed9e476dbba6e7985f1e
|
data/.travis.yml
CHANGED
@@ -1,8 +1,10 @@
|
|
1
1
|
language: ruby
|
2
2
|
rvm:
|
3
|
-
- 1.8.7
|
4
|
-
- 1.9.2
|
5
|
-
- 1.9.3
|
6
3
|
- 2.0.0
|
7
4
|
- 2.1
|
8
|
-
-
|
5
|
+
- 2.2
|
6
|
+
|
7
|
+
# This is to work around the version of bundler installed in Travis and
|
8
|
+
# https://github.com/bundler/bundler/issues/3558
|
9
|
+
before_install:
|
10
|
+
- gem update bundler
|
data/README.md
CHANGED
@@ -194,6 +194,17 @@ library.
|
|
194
194
|
You can set the name that Einhorn and your workers show in PS. Just
|
195
195
|
pass `-c <name>`.
|
196
196
|
|
197
|
+
### Re exec
|
198
|
+
|
199
|
+
You can use the `--reexec-as` option to replace the `einhorn` command with a command or script of your own. This might be useful for those with a Capistrano like deploy process that has changing symlinks. To ensure that you are following the symlinks you could use a bash script like this.
|
200
|
+
|
201
|
+
#!/bin/bash
|
202
|
+
|
203
|
+
cd <symlinked directory>
|
204
|
+
exec /usr/local/bin/einhorn "$@"
|
205
|
+
|
206
|
+
Then you could set `--reexec-as=` to the name of your bash script and it will run in place of the plain einhorn command.
|
207
|
+
|
197
208
|
### Options
|
198
209
|
|
199
210
|
-b, --bind ADDR Bind an address and add the corresponding FD via the environment
|
@@ -217,11 +228,18 @@ pass `-c <name>`.
|
|
217
228
|
Unix nice level at which to run the einhorn processes. If not running as root, make sure to ulimit -e as appopriate.
|
218
229
|
--with-state-fd STATE [Internal option] With file descriptor containing state
|
219
230
|
--upgrade-check [Internal option] Check if Einhorn can exec itself and exit with status 0 before loading code
|
231
|
+
-t, --signal-timeout=T If children do not react to signals after T seconds, escalate to SIGKILL
|
220
232
|
--version Show version
|
221
233
|
|
222
234
|
|
223
235
|
## Contributing
|
224
236
|
|
237
|
+
### Development Status
|
238
|
+
|
239
|
+
Einhorn is still in active operation at Stripe, but we are not maintaining
|
240
|
+
Einhorn actively. PRs are very welcome, and we will review and merge,
|
241
|
+
but we are unlikely to triage and fix reported issues without code.
|
242
|
+
|
225
243
|
Contributions are definitely welcome. To contribute, just follow the
|
226
244
|
usual workflow:
|
227
245
|
|
@@ -249,10 +267,28 @@ EventMachine-LE to support file-descriptor passing. Check out
|
|
249
267
|
|
250
268
|
## Compatibility
|
251
269
|
|
252
|
-
Einhorn
|
270
|
+
Einhorn runs in Ruby 2.0, 2.1, and 2.2
|
271
|
+
|
272
|
+
The following libraries ease integration with Einhorn with languages other than
|
273
|
+
Ruby:
|
274
|
+
|
275
|
+
- **[go-einhorn](https://github.com/stripe/go-einhorn)**: Stripe's own library
|
276
|
+
for *talking* to an einhorn master (doesn't wrap socket code).
|
277
|
+
- **[goji](https://github.com/zenazn/goji/)**: Go (golang) server framework. The
|
278
|
+
[`bind`](https://godoc.org/github.com/zenazn/goji/bind) and
|
279
|
+
[`graceful`](https://godoc.org/github.com/zenazn/goji/graceful)
|
280
|
+
packages provide helpers and HTTP/TCP connection wrappers for Einhorn
|
281
|
+
integration.
|
282
|
+
- **[github.com/CHH/einhorn](https://github.com/CHH/einhorn)**: PHP library
|
283
|
+
- **[thin-attach\_socket](https://github.com/ConradIrwin/thin-attach_socket)**:
|
284
|
+
run `thin` behind Einhorn
|
285
|
+
- **[baseplate](https://reddit.github.io/baseplate/cli/serve.html)**: a
|
286
|
+
collection of Python helpers and libraries, with support for running behind
|
287
|
+
Einhorn
|
288
|
+
|
289
|
+
*NB: this list should not imply any official endorsement or vetting!*
|
253
290
|
|
254
291
|
## About
|
255
292
|
|
256
|
-
Einhorn is a project of [Stripe](https://stripe.com), led by [
|
257
|
-
Brockman](https://twitter.com/thegdb). Feel free to get in touch at
|
293
|
+
Einhorn is a project of [Stripe](https://stripe.com), led by [Carl Jackson](https://github.com/zenazn). Feel free to get in touch at
|
258
294
|
info@stripe.com.
|
data/README.md.in
CHANGED
@@ -67,10 +67,28 @@ EventMachine-LE to support file-descriptor passing. Check out
|
|
67
67
|
|
68
68
|
## Compatibility
|
69
69
|
|
70
|
-
Einhorn
|
70
|
+
Einhorn runs in Ruby 2.0, 2.1, and 2.2
|
71
|
+
|
72
|
+
The following libraries ease integration with Einhorn with languages other than
|
73
|
+
Ruby:
|
74
|
+
|
75
|
+
- **[go-einhorn](https://github.com/stripe/go-einhorn)**: Stripe's own library
|
76
|
+
for *talking* to an einhorn master (doesn't wrap socket code).
|
77
|
+
- **[goji](https://github.com/zenazn/goji/)**: Go (golang) server framework. The
|
78
|
+
[`bind`](https://godoc.org/github.com/zenazn/goji/bind) and
|
79
|
+
[`graceful`](https://godoc.org/github.com/zenazn/goji/graceful)
|
80
|
+
packages provide helpers and HTTP/TCP connection wrappers for Einhorn
|
81
|
+
integration.
|
82
|
+
- **[github.com/CHH/einhorn](https://github.com/CHH/einhorn)**: PHP library
|
83
|
+
- **[thin-attach\_socket](https://github.com/ConradIrwin/thin-attach_socket)**:
|
84
|
+
run `thin` behind Einhorn
|
85
|
+
- **[baseplate](https://reddit.github.io/baseplate/cli/serve.html)**: a
|
86
|
+
collection of Python helpers and libraries, with support for running behind
|
87
|
+
Einhorn
|
88
|
+
|
89
|
+
*NB: this list should not imply any official endorsement or vetting!*
|
71
90
|
|
72
91
|
## About
|
73
92
|
|
74
|
-
Einhorn is a project of [Stripe](https://stripe.com), led by [
|
75
|
-
Brockman](https://twitter.com/thegdb). Feel free to get in touch at
|
93
|
+
Einhorn is a project of [Stripe](https://stripe.com), led by [Carl Jackson](https://github.com/zenazn). Feel free to get in touch at
|
76
94
|
info@stripe.com.
|
data/bin/einhorn
CHANGED
@@ -266,8 +266,11 @@ if true # $0 == __FILE__
|
|
266
266
|
Einhorn::Command.quieter(false)
|
267
267
|
end
|
268
268
|
|
269
|
-
opts.on('-s', '--seconds N', 'Number of seconds to wait until respawning') do |
|
270
|
-
|
269
|
+
opts.on('-s', '--seconds N', 'Number of seconds to wait until respawning') do |s|
|
270
|
+
seconds = Float(s)
|
271
|
+
raise ArgumentError, 'seconds must be > 0' if seconds.zero?
|
272
|
+
|
273
|
+
Einhorn::State.config[:seconds] = seconds
|
271
274
|
end
|
272
275
|
|
273
276
|
opts.on('-v', '--verbose', 'Make output verbose (can be reconfigured on the fly)') do
|
@@ -310,6 +313,18 @@ if true # $0 == __FILE__
|
|
310
313
|
Einhorn::State.signal_timeout = Integer(t)
|
311
314
|
end
|
312
315
|
|
316
|
+
opts.on('--max-unacked=N', 'Maximum number of workers that can be unacked when gracefully upgrading.') do |n|
|
317
|
+
Einhorn::State.config[:max_unacked] = Integer(n)
|
318
|
+
end
|
319
|
+
|
320
|
+
opts.on('--max-upgrade-additional=N', 'Maximum number of additional workers that can be running during an upgrade.') do |n|
|
321
|
+
Einhorn::State.config[:max_upgrade_additional] = Integer(n)
|
322
|
+
end
|
323
|
+
|
324
|
+
opts.on('--gc-before-fork', 'Run the GC three times before forking to improve memory sharing for copy-on-write.') do
|
325
|
+
Einhorn::State.config[:gc_before_fork] = true
|
326
|
+
end
|
327
|
+
|
313
328
|
opts.on('--version', 'Show version') do
|
314
329
|
puts Einhorn::VERSION
|
315
330
|
exit
|
data/bin/einhornsh
CHANGED
@@ -21,22 +21,14 @@ module Einhorn
|
|
21
21
|
end
|
22
22
|
|
23
23
|
def send_command(hash)
|
24
|
-
|
25
|
-
|
26
|
-
|
27
|
-
|
28
|
-
|
29
|
-
|
30
|
-
|
31
|
-
puts "Invalid response type #{response.class}: #{response.inspect}"
|
32
|
-
end
|
24
|
+
@client.send_command(hash)
|
25
|
+
while response = @client.receive_message
|
26
|
+
if response.kind_of?(Hash)
|
27
|
+
yield response['message']
|
28
|
+
return unless response['wait']
|
29
|
+
else
|
30
|
+
puts "Invalid response type #{response.class}: #{response.inspect}"
|
33
31
|
end
|
34
|
-
rescue Errno::EPIPE => e
|
35
|
-
emit("einhornsh: Error communicating with Einhorn: #{e} (#{e.class})")
|
36
|
-
emit("einhornsh: Attempting to reconnect...")
|
37
|
-
reconnect
|
38
|
-
|
39
|
-
retry
|
40
32
|
end
|
41
33
|
end
|
42
34
|
|
data/einhorn.gemspec
CHANGED
@@ -15,6 +15,7 @@ Gem::Specification.new do |gem|
|
|
15
15
|
gem.name = 'einhorn'
|
16
16
|
gem.require_paths = ['lib']
|
17
17
|
|
18
|
+
gem.add_development_dependency 'rack', '~> 1.6'
|
18
19
|
gem.add_development_dependency 'rake'
|
19
20
|
gem.add_development_dependency 'pry'
|
20
21
|
gem.add_development_dependency 'minitest', '< 5.0'
|
data/example/pool_worker.rb
CHANGED
data/lib/einhorn.rb
CHANGED
@@ -45,6 +45,7 @@ module Einhorn
|
|
45
45
|
:orig_cmd => nil,
|
46
46
|
:bind => [],
|
47
47
|
:bind_fds => [],
|
48
|
+
:bound_ports => [],
|
48
49
|
:cmd => nil,
|
49
50
|
:script_name => nil,
|
50
51
|
:respawn => true,
|
@@ -68,14 +69,9 @@ module Einhorn
|
|
68
69
|
:reexec_commandline => nil,
|
69
70
|
:drop_environment_variables => [],
|
70
71
|
:signal_timeout => nil,
|
72
|
+
:preloaded => false
|
71
73
|
}
|
72
74
|
end
|
73
|
-
|
74
|
-
def self.dumpable_state
|
75
|
-
dump = state
|
76
|
-
dump[:reloading_for_preload_upgrade] = dump[:reloading_for_upgrade]
|
77
|
-
dump
|
78
|
-
end
|
79
75
|
end
|
80
76
|
|
81
77
|
module TransientState
|
@@ -83,7 +79,6 @@ module Einhorn
|
|
83
79
|
def self.default_state
|
84
80
|
{
|
85
81
|
:whatami => :master,
|
86
|
-
:preloaded => false,
|
87
82
|
:script_name => nil,
|
88
83
|
:argv => [],
|
89
84
|
:environ => {},
|
@@ -110,38 +105,24 @@ module Einhorn
|
|
110
105
|
updated_state = old_state.dup
|
111
106
|
|
112
107
|
# Handle changes in state format updates from previous einhorn versions
|
113
|
-
if store == Einhorn::State
|
114
|
-
#
|
115
|
-
|
116
|
-
|
117
|
-
|
118
|
-
|
119
|
-
|
120
|
-
|
121
|
-
|
122
|
-
|
123
|
-
|
124
|
-
|
125
|
-
|
126
|
-
|
127
|
-
# Depending on what is passed for --reexec-as, it's possible
|
128
|
-
# that the process received a SIGCHLD while something other
|
129
|
-
# than einhorn was the active executable. If that happened,
|
130
|
-
# einhorn might not know about a dead child, so let's check
|
131
|
-
# them all
|
132
|
-
dead = []
|
133
|
-
updated_state[:children].each do |pid, v|
|
134
|
-
begin
|
135
|
-
pid = Process.wait(pid, Process::WNOHANG)
|
136
|
-
dead << pid if pid
|
137
|
-
rescue Errno::ECHILD
|
138
|
-
dead << pid
|
139
|
-
end
|
140
|
-
end
|
141
|
-
Einhorn::Event::Timer.open(0) do
|
142
|
-
dead.each {|pid| Einhorn::Command.mourn(pid)}
|
108
|
+
if store == Einhorn::State && updated_state[:children]
|
109
|
+
# Depending on what is passed for --reexec-as, it's possible
|
110
|
+
# that the process received a SIGCHLD while something other
|
111
|
+
# than einhorn was the active executable. If that happened,
|
112
|
+
# einhorn might not know about a dead child, so let's check
|
113
|
+
# them all
|
114
|
+
dead = []
|
115
|
+
updated_state[:children].each do |pid, v|
|
116
|
+
begin
|
117
|
+
pid = Process.wait(pid, Process::WNOHANG)
|
118
|
+
dead << pid if pid
|
119
|
+
rescue Errno::ECHILD
|
120
|
+
dead << pid
|
143
121
|
end
|
144
122
|
end
|
123
|
+
Einhorn::Event::Timer.open(0) do
|
124
|
+
dead.each {|pid| Einhorn::Command.cleanup(pid)}
|
125
|
+
end
|
145
126
|
end
|
146
127
|
|
147
128
|
default = store.default_state
|
@@ -182,20 +163,23 @@ module Einhorn
|
|
182
163
|
end
|
183
164
|
|
184
165
|
Einhorn::TransientState.socket_handles << sd
|
185
|
-
sd.fileno
|
166
|
+
[sd.fileno, sd.local_address.ip_port]
|
186
167
|
end
|
187
168
|
|
188
169
|
# Implement these ourselves so it plays nicely with state persistence
|
189
170
|
def self.log_debug(msg, tag=nil)
|
190
171
|
$stderr.puts("#{log_tag} DEBUG: #{msg}\n") if Einhorn::State.verbosity <= 0
|
172
|
+
$stderr.flush
|
191
173
|
self.send_tagged_message(tag, msg) if tag
|
192
174
|
end
|
193
175
|
def self.log_info(msg, tag=nil)
|
194
176
|
$stderr.puts("#{log_tag} INFO: #{msg}\n") if Einhorn::State.verbosity <= 1
|
177
|
+
$stderr.flush
|
195
178
|
self.send_tagged_message(tag, msg) if tag
|
196
179
|
end
|
197
180
|
def self.log_error(msg, tag=nil)
|
198
181
|
$stderr.puts("#{log_tag} ERROR: #{msg}\n") if Einhorn::State.verbosity <= 2
|
182
|
+
$stderr.flush
|
199
183
|
self.send_tagged_message(tag, "ERROR: #{msg}") if tag
|
200
184
|
end
|
201
185
|
|
@@ -246,6 +230,8 @@ module Einhorn
|
|
246
230
|
set_argv(Einhorn::State.cmd, false)
|
247
231
|
|
248
232
|
begin
|
233
|
+
# Reset preloaded state to false - this allows us to monitor for failed preloads during reloads.
|
234
|
+
Einhorn::State.preloaded = false
|
249
235
|
# If it's not going to be requireable, then load it.
|
250
236
|
if !path.end_with?('.rb') && File.exists?(path)
|
251
237
|
log_info("Loading #{path} (if this hangs, make sure your code can be properly loaded as a library)", :upgrade)
|
@@ -253,13 +239,15 @@ module Einhorn
|
|
253
239
|
else
|
254
240
|
log_info("Requiring #{path} (if this hangs, make sure your code can be properly loaded as a library)", :upgrade)
|
255
241
|
require path
|
242
|
+
|
243
|
+
force_move_to_oldgen if Einhorn::State.config[:gc_before_fork]
|
256
244
|
end
|
257
245
|
rescue Exception => e
|
258
246
|
log_info("Proceeding with postload -- could not load #{path}: #{e} (#{e.class})\n #{e.backtrace.join("\n ")}", :upgrade)
|
259
247
|
else
|
260
248
|
if defined?(einhorn_main)
|
261
249
|
log_info("Successfully loaded #{path}", :upgrade)
|
262
|
-
Einhorn::
|
250
|
+
Einhorn::State.preloaded = true
|
263
251
|
else
|
264
252
|
log_info("Proceeding with postload -- loaded #{path}, but no einhorn_main method was defined", :upgrade)
|
265
253
|
end
|
@@ -267,6 +255,22 @@ module Einhorn
|
|
267
255
|
end
|
268
256
|
end
|
269
257
|
|
258
|
+
# Make the GC more copy-on-write friendly by forcibly incrementing the generation
|
259
|
+
# counter on all objects to its maximum value. Learn more at: https://github.com/ko1/nakayoshi_fork
|
260
|
+
def self.force_move_to_oldgen
|
261
|
+
log_info("Starting GC to improve copy-on-write memory sharing", :upgrade)
|
262
|
+
|
263
|
+
GC.start
|
264
|
+
3.times do
|
265
|
+
GC.start(full_mark: false)
|
266
|
+
end
|
267
|
+
|
268
|
+
GC.compact if GC.respond_to?(:compact)
|
269
|
+
|
270
|
+
log_info("Finished GC after preloading", :upgrade)
|
271
|
+
end
|
272
|
+
private_class_method :force_move_to_oldgen
|
273
|
+
|
270
274
|
def self.set_argv(cmd, set_ps_name)
|
271
275
|
# TODO: clean up this hack
|
272
276
|
idx = 0
|
@@ -324,8 +328,9 @@ module Einhorn
|
|
324
328
|
|
325
329
|
def self.socketify_env!
|
326
330
|
Einhorn::State.bind.each do |host, port, flags|
|
327
|
-
fd = bind(host, port, flags)
|
331
|
+
fd, actual_port = bind(host, port, flags)
|
328
332
|
Einhorn::State.bind_fds << fd
|
333
|
+
Einhorn::State.bound_ports << actual_port
|
329
334
|
end
|
330
335
|
end
|
331
336
|
|
@@ -339,7 +344,8 @@ module Einhorn
|
|
339
344
|
host = $2
|
340
345
|
port = $3
|
341
346
|
flags = $4.split(',').select {|flag| flag.length > 0}.map {|flag| flag.downcase}
|
342
|
-
|
347
|
+
Einhorn::State.sockets[[host, port]] ||= bind(host, port, flags)[0]
|
348
|
+
fd = Einhorn::State.sockets[[host, port]]
|
343
349
|
"#{opt}#{fd}"
|
344
350
|
else
|
345
351
|
arg
|
@@ -431,6 +437,14 @@ module Einhorn
|
|
431
437
|
Einhorn::State.reloading_for_upgrade = false
|
432
438
|
end
|
433
439
|
|
440
|
+
# If setting a signal-timeout, timeout the event loop
|
441
|
+
# in the same timeframe, ensuring processes are culled
|
442
|
+
# on a regular basis.
|
443
|
+
if Einhorn::State.signal_timeout
|
444
|
+
Einhorn::Event.default_timeout = Einhorn::Event.default_timeout.nil? ?
|
445
|
+
Einhorn::State.signal_timeout : [Einhorn::State.signal_timeout, Einhorn::Event.default_timeout].min
|
446
|
+
end
|
447
|
+
|
434
448
|
while Einhorn::State.respawn || Einhorn::State.children.size > 0
|
435
449
|
log_debug("Entering event loop")
|
436
450
|
|
data/lib/einhorn/client.rb
CHANGED
@@ -1,5 +1,4 @@
|
|
1
1
|
require 'set'
|
2
|
-
require 'uri'
|
3
2
|
require 'yaml'
|
4
3
|
|
5
4
|
module Einhorn
|
@@ -22,12 +21,12 @@ module Einhorn
|
|
22
21
|
|
23
22
|
def self.serialize_message(message)
|
24
23
|
serialized = YAML.dump(message)
|
25
|
-
escaped =
|
24
|
+
escaped = serialized.gsub(/%|\n/, '%' => '%25', "\n" => '%0A')
|
26
25
|
escaped + "\n"
|
27
26
|
end
|
28
27
|
|
29
28
|
def self.deserialize_message(line)
|
30
|
-
serialized =
|
29
|
+
serialized = line.gsub(/%(25|0A)/, '%25' => '%', '%0A' => "\n")
|
31
30
|
YAML.load(serialized)
|
32
31
|
end
|
33
32
|
end
|
data/lib/einhorn/command.rb
CHANGED
@@ -3,6 +3,7 @@ require 'set'
|
|
3
3
|
require 'tmpdir'
|
4
4
|
|
5
5
|
require 'einhorn/command/interface'
|
6
|
+
require 'einhorn/prctl'
|
6
7
|
|
7
8
|
module Einhorn
|
8
9
|
module Command
|
@@ -10,18 +11,16 @@ module Einhorn
|
|
10
11
|
begin
|
11
12
|
while true
|
12
13
|
Einhorn.log_debug('Going to reap a child process')
|
13
|
-
|
14
14
|
pid = Process.wait(-1, Process::WNOHANG)
|
15
15
|
return unless pid
|
16
|
-
|
16
|
+
cleanup(pid)
|
17
17
|
Einhorn::Event.break_loop
|
18
18
|
end
|
19
19
|
rescue Errno::ECHILD
|
20
20
|
end
|
21
21
|
end
|
22
22
|
|
23
|
-
|
24
|
-
def self.mourn(pid)
|
23
|
+
def self.cleanup(pid)
|
25
24
|
unless spec = Einhorn::State.children[pid]
|
26
25
|
Einhorn.log_error("Could not find any config for exited child #{pid.inspect}! This probably indicates a bug in Einhorn.")
|
27
26
|
return
|
@@ -40,11 +39,23 @@ module Einhorn
|
|
40
39
|
case type = spec[:type]
|
41
40
|
when :worker
|
42
41
|
Einhorn.log_info("===> Exited worker #{pid.inspect}#{extra}", :upgrade)
|
42
|
+
when :state_passer
|
43
|
+
Einhorn.log_debug("===> Exited state passing process #{pid.inspect}", :upgrade)
|
43
44
|
else
|
44
45
|
Einhorn.log_error("===> Exited process #{pid.inspect} has unrecgonized type #{type.inspect}: #{spec.inspect}", :upgrade)
|
45
46
|
end
|
46
47
|
end
|
47
48
|
|
49
|
+
def self.register_ping(pid, request_id)
|
50
|
+
unless spec = Einhorn::State.children[pid]
|
51
|
+
Einhorn.log_error("Could not find state for PID #{pid.inspect}; ignoring ACK.")
|
52
|
+
return
|
53
|
+
end
|
54
|
+
|
55
|
+
spec[:pinged_at] = Time.now
|
56
|
+
spec[:pinged_request_id] = request_id
|
57
|
+
end
|
58
|
+
|
48
59
|
def self.register_manual_ack(pid)
|
49
60
|
ack_mode = Einhorn::State.ack_mode
|
50
61
|
unless ack_mode[:type] == :manual
|
@@ -98,8 +109,8 @@ module Einhorn
|
|
98
109
|
|
99
110
|
def self.signal_all(signal, children=nil, record=true)
|
100
111
|
children ||= Einhorn::WorkerPool.workers
|
112
|
+
signaled = {}
|
101
113
|
|
102
|
-
signaled = []
|
103
114
|
Einhorn.log_info("Sending #{signal} to #{children.inspect}", :upgrade)
|
104
115
|
|
105
116
|
children.each do |child|
|
@@ -113,22 +124,31 @@ module Einhorn
|
|
113
124
|
Einhorn.log_error("Re-sending #{signal} to already-signaled child #{child.inspect}. It may be slow to spin down, or it may be swallowing #{signal}s.", :upgrade)
|
114
125
|
end
|
115
126
|
spec[:signaled].add(signal)
|
127
|
+
spec[:last_signaled_at] = Time.now
|
116
128
|
end
|
117
129
|
|
118
130
|
begin
|
119
131
|
Process.kill(signal, child)
|
120
132
|
rescue Errno::ESRCH
|
133
|
+
Einhorn.log_debug("Attempted to #{signal} child #{child.inspect} but the process does not exist", :upgrade)
|
121
134
|
else
|
122
|
-
signaled
|
135
|
+
signaled[child] = spec
|
123
136
|
end
|
124
137
|
end
|
125
138
|
|
126
|
-
if Einhorn::State.signal_timeout
|
139
|
+
if Einhorn::State.signal_timeout && record
|
127
140
|
Einhorn::Event::Timer.open(Einhorn::State.signal_timeout) do
|
128
141
|
children.each do |child|
|
129
|
-
|
142
|
+
spec = Einhorn::State.children[child]
|
143
|
+
next unless spec # Process is already dead and removed by cleanup
|
144
|
+
signaled_spec = signaled[child]
|
145
|
+
next unless signaled_spec # We got ESRCH when trying to signal
|
146
|
+
if spec[:spinup_time] != signaled_spec[:spinup_time]
|
147
|
+
Einhorn.log_info("Different spinup time recorded for #{child} after #{Einhorn::State.signal_timeout}s. This probably indicates a PID rollover.", :upgrade)
|
148
|
+
next
|
149
|
+
end
|
130
150
|
|
131
|
-
Einhorn.log_info("Child #{child.inspect} is still active after #{Einhorn::State.signal_timeout}. Sending SIGKILL.")
|
151
|
+
Einhorn.log_info("Child #{child.inspect} is still active after #{Einhorn::State.signal_timeout}s. Sending SIGKILL.")
|
132
152
|
begin
|
133
153
|
Process.kill('KILL', child)
|
134
154
|
rescue Errno::ESRCH
|
@@ -136,11 +156,12 @@ module Einhorn
|
|
136
156
|
spec[:signaled].add('KILL')
|
137
157
|
end
|
138
158
|
end
|
139
|
-
end
|
140
159
|
|
141
|
-
|
160
|
+
Einhorn.log_info("Successfully sent #{signal}s to #{signaled.length} processes: #{signaled.keys}")
|
161
|
+
end
|
142
162
|
end
|
143
163
|
|
164
|
+
|
144
165
|
def self.increment
|
145
166
|
Einhorn::Event.break_loop
|
146
167
|
old = Einhorn::State.config[:number]
|
@@ -211,6 +232,7 @@ module Einhorn
|
|
211
232
|
|
212
233
|
fork do
|
213
234
|
Einhorn::TransientState.whatami = :state_passer
|
235
|
+
Einhorn::State.children[Process.pid] = {type: :state_passer}
|
214
236
|
Einhorn::State.generation += 1
|
215
237
|
read.close
|
216
238
|
|
@@ -256,7 +278,8 @@ module Einhorn
|
|
256
278
|
def self.spinup(cmd=nil)
|
257
279
|
cmd ||= Einhorn::State.cmd
|
258
280
|
index = next_index
|
259
|
-
|
281
|
+
expected_ppid = Process.pid
|
282
|
+
if Einhorn::State.preloaded
|
260
283
|
pid = fork do
|
261
284
|
Einhorn::TransientState.whatami = :worker
|
262
285
|
prepare_child_process
|
@@ -268,6 +291,8 @@ module Einhorn
|
|
268
291
|
|
269
292
|
reseed_random
|
270
293
|
|
294
|
+
setup_parent_watch(expected_ppid)
|
295
|
+
|
271
296
|
prepare_child_environment(index)
|
272
297
|
einhorn_main
|
273
298
|
end
|
@@ -277,6 +302,7 @@ module Einhorn
|
|
277
302
|
prepare_child_process
|
278
303
|
|
279
304
|
Einhorn.log_info("About to exec #{cmd.inspect}")
|
305
|
+
Einhorn::Command::Interface.uninit
|
280
306
|
# Here's the only case where cloexec would help. Since we
|
281
307
|
# have to track and manually close FDs for other cases, we
|
282
308
|
# may as well just reuse close_all rather than also set
|
@@ -285,20 +311,24 @@ module Einhorn
|
|
285
311
|
# Note that Ruby 1.9's close_others option is useful here.
|
286
312
|
Einhorn::Event.close_all_for_worker
|
287
313
|
|
314
|
+
setup_parent_watch(expected_ppid)
|
315
|
+
|
288
316
|
prepare_child_environment(index)
|
289
317
|
Einhorn::Compat.exec(cmd[0], cmd[1..-1], :close_others => false)
|
290
318
|
end
|
291
319
|
end
|
292
320
|
|
293
321
|
Einhorn.log_info("===> Launched #{pid} (index: #{index})", :upgrade)
|
322
|
+
Einhorn::State.last_spinup = Time.now
|
294
323
|
Einhorn::State.children[pid] = {
|
295
324
|
:type => :worker,
|
296
325
|
:version => Einhorn::State.version,
|
297
326
|
:acked => false,
|
298
327
|
:signaled => Set.new,
|
299
|
-
:
|
328
|
+
:last_signaled_at => nil,
|
329
|
+
:index => index,
|
330
|
+
:spinup_time => Einhorn::State.last_spinup,
|
300
331
|
}
|
301
|
-
Einhorn::State.last_spinup = Time.now
|
302
332
|
|
303
333
|
# Set up whatever's needed for ACKing
|
304
334
|
ack_mode = Einhorn::State.ack_mode
|
@@ -364,9 +394,28 @@ module Einhorn
|
|
364
394
|
end
|
365
395
|
|
366
396
|
def self.prepare_child_process
|
397
|
+
Process.setpgrp
|
367
398
|
Einhorn.renice_self
|
368
399
|
end
|
369
400
|
|
401
|
+
def self.setup_parent_watch(expected_ppid)
|
402
|
+
if Einhorn::State.kill_children_on_exit then
|
403
|
+
begin
|
404
|
+
# NB: Having the USR2 signal handler set to terminate (the default) at
|
405
|
+
# this point is required. If it's set to a ruby handler, there are
|
406
|
+
# race conditions that could cause the worker to leak.
|
407
|
+
|
408
|
+
Einhorn::Prctl.set_pdeathsig("USR2")
|
409
|
+
if Process.ppid != expected_ppid then
|
410
|
+
Einhorn.log_error("Parent process died before we set pdeathsig; cowardly refusing to exec child process.")
|
411
|
+
exit(1)
|
412
|
+
end
|
413
|
+
rescue NotImplementedError
|
414
|
+
# Unsupported OS; silently continue.
|
415
|
+
end
|
416
|
+
end
|
417
|
+
end
|
418
|
+
|
370
419
|
# @param options [Hash]
|
371
420
|
#
|
372
421
|
# @option options [Boolean] :smooth (false) Whether to perform a smooth or
|
@@ -451,6 +500,41 @@ module Einhorn
|
|
451
500
|
Einhorn.log_info("Have too many workers at the current version, so killing off #{excess.length} of them.")
|
452
501
|
signal_all("USR2", excess)
|
453
502
|
end
|
503
|
+
|
504
|
+
# Ensure all signaled workers that have outlived signal_timeout get killed.
|
505
|
+
kill_expired_signaled_workers if Einhorn::State.signal_timeout
|
506
|
+
end
|
507
|
+
|
508
|
+
def self.kill_expired_signaled_workers
|
509
|
+
now = Time.now
|
510
|
+
children = Einhorn::State.children.select do |_,c|
|
511
|
+
# Only interested in USR2 signaled workers
|
512
|
+
next unless c[:signaled] && c[:signaled].length > 0
|
513
|
+
next unless c[:signaled].include?('USR2')
|
514
|
+
|
515
|
+
# Ignore processes that have received KILL since it can't be trapped.
|
516
|
+
next if c[:signaled].include?('KILL')
|
517
|
+
|
518
|
+
# Filter out those children that have not reached signal_timeout yet.
|
519
|
+
next unless c[:last_signaled_at]
|
520
|
+
expires_at = c[:last_signaled_at] + Einhorn::State.signal_timeout
|
521
|
+
next unless now >= expires_at
|
522
|
+
|
523
|
+
true
|
524
|
+
end
|
525
|
+
|
526
|
+
Einhorn.log_info("#{children.size} expired signaled workers found.") if children.size > 0
|
527
|
+
children.each do |pid, child|
|
528
|
+
Einhorn.log_info("Child #{pid.inspect} was signaled #{(child[:last_signaled_at] - now).abs.to_i}s ago. Sending SIGKILL as it is still active after #{Einhorn::State.signal_timeout}s timeout.", :upgrade)
|
529
|
+
begin
|
530
|
+
Process.kill('KILL', pid)
|
531
|
+
rescue Errno::ESRCH
|
532
|
+
Einhorn.log_debug("Attempted to SIGKILL child #{pid.inspect} but the process does not exist.")
|
533
|
+
end
|
534
|
+
|
535
|
+
child[:signaled].add('KILL')
|
536
|
+
child[:last_signaled_at] = Time.now
|
537
|
+
end
|
454
538
|
end
|
455
539
|
|
456
540
|
def self.stop_respawning
|
@@ -478,10 +562,18 @@ module Einhorn
|
|
478
562
|
missing.times {spinup}
|
479
563
|
end
|
480
564
|
|
565
|
+
# Unbounded exponential backoff is not a thing: we run into problems if
|
566
|
+
# e.g., each of our hundred workers simultaneously fail to boot for the same
|
567
|
+
# ephemeral reason. Instead cap backoff by some reasonable maximum, so we
|
568
|
+
# don't wait until the heat death of the universe to spin up new capacity.
|
569
|
+
MAX_SPINUP_INTERVAL = 30.0
|
570
|
+
|
481
571
|
def self.replenish_gradually(max_unacked=nil)
|
482
572
|
return if Einhorn::TransientState.has_outstanding_spinup_timer
|
483
573
|
return unless Einhorn::WorkerPool.missing_worker_count > 0
|
484
574
|
|
575
|
+
max_unacked ||= Einhorn::State.config[:max_unacked]
|
576
|
+
|
485
577
|
# default to spinning up at most NCPU workers at once
|
486
578
|
unless max_unacked
|
487
579
|
begin
|
@@ -500,14 +592,12 @@ module Einhorn
|
|
500
592
|
# Exponentially backoff automated spinup if we're just having
|
501
593
|
# things die before ACKing
|
502
594
|
spinup_interval = Einhorn::State.config[:seconds] * (1.5 ** Einhorn::State.consecutive_deaths_before_ack)
|
595
|
+
spinup_interval = [spinup_interval, MAX_SPINUP_INTERVAL].min
|
503
596
|
seconds_ago = (Time.now - Einhorn::State.last_spinup).to_f
|
504
597
|
|
505
598
|
if seconds_ago > spinup_interval
|
506
|
-
|
507
|
-
|
508
|
-
Einhorn.log_debug("There are #{unacked} unacked new workers, and max_unacked is #{max_unacked}, so not spinning up a new process")
|
509
|
-
else
|
510
|
-
msg = "Last spinup was #{seconds_ago}s ago, and spinup_interval is #{spinup_interval}s, so spinning up a new process"
|
599
|
+
if trigger_spinup?(max_unacked)
|
600
|
+
msg = "Last spinup was #{seconds_ago}s ago, and spinup_interval is #{spinup_interval}s, so spinning up a new process."
|
511
601
|
|
512
602
|
if Einhorn::State.consecutive_deaths_before_ack > 0
|
513
603
|
Einhorn.log_info("#{msg} (there have been #{Einhorn::State.consecutive_deaths_before_ack} consecutive unacked worker deaths)", :upgrade)
|
@@ -518,7 +608,7 @@ module Einhorn
|
|
518
608
|
spinup
|
519
609
|
end
|
520
610
|
else
|
521
|
-
Einhorn.log_debug("Last spinup was #{seconds_ago}s ago, and spinup_interval is #{spinup_interval}s, so not spinning up a new process")
|
611
|
+
Einhorn.log_debug("Last spinup was #{seconds_ago}s ago, and spinup_interval is #{spinup_interval}s, so not spinning up a new process.")
|
522
612
|
end
|
523
613
|
|
524
614
|
Einhorn::TransientState.has_outstanding_spinup_timer = true
|
@@ -541,5 +631,22 @@ module Einhorn
|
|
541
631
|
Einhorn.log_info(output) if log
|
542
632
|
output
|
543
633
|
end
|
634
|
+
|
635
|
+
def self.trigger_spinup?(max_unacked)
|
636
|
+
unacked = Einhorn::WorkerPool.unacked_unsignaled_modern_workers.length
|
637
|
+
if unacked >= max_unacked
|
638
|
+
Einhorn.log_info("There are #{unacked} unacked new workers, and max_unacked is #{max_unacked}, so not spinning up a new process.")
|
639
|
+
return false
|
640
|
+
elsif Einhorn::State.config[:max_upgrade_additional]
|
641
|
+
capacity_exceeded = (Einhorn::State.config[:number] + Einhorn::State.config[:max_upgrade_additional]) - Einhorn::WorkerPool.workers_with_state.length
|
642
|
+
if capacity_exceeded < 0
|
643
|
+
Einhorn.log_info("Over worker capacity by #{capacity_exceeded.abs} during upgrade, #{Einhorn::WorkerPool.modern_workers.length} new workers of #{Einhorn::WorkerPool.workers_with_state.length} total. Waiting for old workers to exit before spinning up a process.")
|
644
|
+
|
645
|
+
return false
|
646
|
+
end
|
647
|
+
end
|
648
|
+
|
649
|
+
true
|
650
|
+
end
|
544
651
|
end
|
545
652
|
end
|