uhttp-workers 1.4.0__tar.gz → 1.6.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {uhttp_workers-1.4.0/uhttp_workers.egg-info → uhttp_workers-1.6.0}/PKG-INFO +73 -2
- {uhttp_workers-1.4.0 → uhttp_workers-1.6.0}/README.md +71 -0
- {uhttp_workers-1.4.0 → uhttp_workers-1.6.0}/pyproject.toml +1 -1
- {uhttp_workers-1.4.0 → uhttp_workers-1.6.0}/tests/test_dispatcher.py +294 -3
- {uhttp_workers-1.4.0 → uhttp_workers-1.6.0}/tests/test_worker.py +33 -1
- {uhttp_workers-1.4.0 → uhttp_workers-1.6.0}/tests/test_worker_pool.py +21 -0
- {uhttp_workers-1.4.0 → uhttp_workers-1.6.0}/uhttp/workers.py +109 -9
- {uhttp_workers-1.4.0 → uhttp_workers-1.6.0/uhttp_workers.egg-info}/PKG-INFO +73 -2
- uhttp_workers-1.6.0/uhttp_workers.egg-info/requires.txt +1 -0
- uhttp_workers-1.4.0/uhttp_workers.egg-info/requires.txt +0 -1
- {uhttp_workers-1.4.0 → uhttp_workers-1.6.0}/.github/workflows/publish.yml +0 -0
- {uhttp_workers-1.4.0 → uhttp_workers-1.6.0}/.github/workflows/tests.yml +0 -0
- {uhttp_workers-1.4.0 → uhttp_workers-1.6.0}/.gitignore +0 -0
- {uhttp_workers-1.4.0 → uhttp_workers-1.6.0}/examples/simple_workers.py +0 -0
- {uhttp_workers-1.4.0 → uhttp_workers-1.6.0}/examples/sse_workers.py +0 -0
- {uhttp_workers-1.4.0 → uhttp_workers-1.6.0}/examples/static/index.html +0 -0
- {uhttp_workers-1.4.0 → uhttp_workers-1.6.0}/setup.cfg +0 -0
- {uhttp_workers-1.4.0 → uhttp_workers-1.6.0}/tests/__init__.py +0 -0
- {uhttp_workers-1.4.0 → uhttp_workers-1.6.0}/tests/test_api_handler.py +0 -0
- {uhttp_workers-1.4.0 → uhttp_workers-1.6.0}/tests/test_decorators.py +0 -0
- {uhttp_workers-1.4.0 → uhttp_workers-1.6.0}/tests/test_pattern_matching.py +0 -0
- {uhttp_workers-1.4.0 → uhttp_workers-1.6.0}/tests/test_request_response.py +0 -0
- {uhttp_workers-1.4.0 → uhttp_workers-1.6.0}/uhttp_workers.egg-info/SOURCES.txt +0 -0
- {uhttp_workers-1.4.0 → uhttp_workers-1.6.0}/uhttp_workers.egg-info/dependency_links.txt +0 -0
- {uhttp_workers-1.4.0 → uhttp_workers-1.6.0}/uhttp_workers.egg-info/top_level.txt +0 -0
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: uhttp-workers
|
|
3
|
-
Version: 1.
|
|
3
|
+
Version: 1.6.0
|
|
4
4
|
Summary: Multi-process worker dispatcher built on uhttp-server
|
|
5
5
|
Author-email: Pavel Revak <pavelrevak@gmail.com>
|
|
6
6
|
License-Expression: MIT
|
|
@@ -10,7 +10,7 @@ Classifier: Programming Language :: Python :: 3
|
|
|
10
10
|
Classifier: Operating System :: POSIX
|
|
11
11
|
Requires-Python: >=3.10
|
|
12
12
|
Description-Content-Type: text/markdown
|
|
13
|
-
Requires-Dist: uhttp-server
|
|
13
|
+
Requires-Dist: uhttp-server>=2.5.2
|
|
14
14
|
|
|
15
15
|
# uhttp-workers
|
|
16
16
|
|
|
@@ -384,6 +384,31 @@ Available streaming methods on `Request`:
|
|
|
384
384
|
|
|
385
385
|
Streaming requests are excluded from dispatcher timeout expiration. When the client disconnects, the dispatcher notifies the worker via `on_disconnect(request_id)`.
|
|
386
386
|
|
|
387
|
+
### NDJSON Streaming
|
|
388
|
+
|
|
389
|
+
Stream JSON objects line-by-line (`application/x-ndjson`) — one JSON value per line, terminated by `\n`. Useful for incremental APIs that aren't event-shaped (long lists, log tails, progress updates):
|
|
390
|
+
|
|
391
|
+
```python
|
|
392
|
+
class MyWorker(_workers.Worker):
|
|
393
|
+
@_workers.api('/devices/scan', 'GET')
|
|
394
|
+
def scan(self, request):
|
|
395
|
+
request.response_ndjson()
|
|
396
|
+
for device in self.discover_devices():
|
|
397
|
+
request.send_ndjson({'id': device.id, 'name': device.name})
|
|
398
|
+
request.response_stream_end()
|
|
399
|
+
return _workers.DEFERRED
|
|
400
|
+
```
|
|
401
|
+
|
|
402
|
+
NDJSON methods on `Request`:
|
|
403
|
+
|
|
404
|
+
| Method | Description |
|
|
405
|
+
|--------|-------------|
|
|
406
|
+
| `response_ndjson(headers, cookies)` | Start NDJSON stream (wrapper over `response_stream` with `application/x-ndjson`) |
|
|
407
|
+
| `send_ndjson(obj)` | Send one JSON-serializable value as a line |
|
|
408
|
+
| `response_stream_end()` | End stream and close connection (shared with SSE) |
|
|
409
|
+
|
|
410
|
+
Same lifecycle as SSE: excluded from timeout expiration, client disconnect triggers `on_disconnect(request_id)`.
|
|
411
|
+
|
|
387
412
|
### Flow Control
|
|
388
413
|
|
|
389
414
|
Workers can stop accepting new requests when busy. Requests stay in the shared pool queue for other workers to pick up:
|
|
@@ -540,6 +565,7 @@ Reason is one of:
|
|
|
540
565
|
| `PENDING_DISCONNECTED` | Client disconnected mid-stream; worker was notified via control queue (race possible). |
|
|
541
566
|
| `PENDING_STREAM_CLOSED` | Worker ended the SSE stream cleanly. |
|
|
542
567
|
| `PENDING_SHUTDOWN` | Dispatcher is shutting down; client got 503. |
|
|
568
|
+
| `PENDING_WORKER_DIED` | Worker process died/was killed while owning the request; client got 500. `on_worker_died()` runs first. |
|
|
543
569
|
|
|
544
570
|
The hook is invoked after the client-facing action (respond / disconnect / control queue put)
|
|
545
571
|
so dispatcher state is finalized when it runs. Exceptions raised by the hook are logged at
|
|
@@ -550,6 +576,51 @@ Override `on_pending_removed()` if you need exactly-once cleanup. Overriding bot
|
|
|
550
576
|
but discouraged — for the `PENDING_COMPLETED` reason, `on_response()` is called immediately
|
|
551
577
|
before `on_pending_removed()`.
|
|
552
578
|
|
|
579
|
+
## Worker Death Hook
|
|
580
|
+
|
|
581
|
+
Workers die — segfault in a C extension, OOM-kill, or the dispatcher kills them after
|
|
582
|
+
`stuck_timeout`. Override `on_worker_died()` to capture which requests they had in-flight
|
|
583
|
+
(useful for forensics when a malformed payload reproduces a crash):
|
|
584
|
+
|
|
585
|
+
```python
|
|
586
|
+
class MyDispatcher(_workers.Dispatcher):
|
|
587
|
+
def on_worker_died(
|
|
588
|
+
self, pool, worker_id, reason, exitcode, victims):
|
|
589
|
+
# `victims` is a list of (request_id, _PendingRequest) for all
|
|
590
|
+
# requests this worker had claimed but not completed.
|
|
591
|
+
# `exitcode` is None for stuck workers, otherwise the process exit
|
|
592
|
+
# code (negative = signal: -9 OOM, -11 SIGSEGV).
|
|
593
|
+
for rid, pending in victims:
|
|
594
|
+
c = pending.client
|
|
595
|
+
self._crash_queue.append({
|
|
596
|
+
'reason': reason,
|
|
597
|
+
'exitcode': exitcode,
|
|
598
|
+
'method': c.method,
|
|
599
|
+
'path': c.path,
|
|
600
|
+
'address': c.address,
|
|
601
|
+
'body': c.body, # raw bytes — replay this to reproduce
|
|
602
|
+
})
|
|
603
|
+
# Default impl responds 500 to victims and fires
|
|
604
|
+
# on_pending_removed(PENDING_WORKER_DIED). Call it after capture.
|
|
605
|
+
super().on_worker_died(
|
|
606
|
+
pool, worker_id, reason, exitcode, victims)
|
|
607
|
+
```
|
|
608
|
+
|
|
609
|
+
What's a victim: any request the worker had claimed via `MSG_HEARTBEAT`
|
|
610
|
+
(`pending.worker_id == worker_id`). Requests still in the queue (`worker_id is None`)
|
|
611
|
+
are **not** victims — other workers in the pool will pick them up after restart.
|
|
612
|
+
|
|
613
|
+
Default behavior (if you don't override) is to log the death + each victim, respond
|
|
614
|
+
500 (or close the stream for SSE/NDJSON), and fire `on_pending_removed` for each.
|
|
615
|
+
Override only if you want to persist payloads or customize the response status/body.
|
|
616
|
+
|
|
617
|
+
**500 vs 503:** a victim of a crashed worker gets **500** (processing started, then
|
|
618
|
+
the server failed). A new request arriving while the pool has zero alive workers
|
|
619
|
+
gets **503 + `Retry-After: 1`** (rejected before processing — try again shortly).
|
|
620
|
+
A request to a pool that has exceeded `max_restarts` in `restart_window` gets **503**
|
|
621
|
+
permanently (`pool.is_degraded`). `pool.alive_count` is exposed for monitoring and
|
|
622
|
+
also appears in `pool.status()`.
|
|
623
|
+
|
|
553
624
|
## Dispatcher Idle Hook
|
|
554
625
|
|
|
555
626
|
Override `on_idle()` on the dispatcher for periodic background tasks — called on each `select()` timeout (every `SELECT_TIMEOUT` seconds, default 1s):
|
|
@@ -370,6 +370,31 @@ Available streaming methods on `Request`:
|
|
|
370
370
|
|
|
371
371
|
Streaming requests are excluded from dispatcher timeout expiration. When the client disconnects, the dispatcher notifies the worker via `on_disconnect(request_id)`.
|
|
372
372
|
|
|
373
|
+
### NDJSON Streaming
|
|
374
|
+
|
|
375
|
+
Stream JSON objects line-by-line (`application/x-ndjson`) — one JSON value per line, terminated by `\n`. Useful for incremental APIs that aren't event-shaped (long lists, log tails, progress updates):
|
|
376
|
+
|
|
377
|
+
```python
|
|
378
|
+
class MyWorker(_workers.Worker):
|
|
379
|
+
@_workers.api('/devices/scan', 'GET')
|
|
380
|
+
def scan(self, request):
|
|
381
|
+
request.response_ndjson()
|
|
382
|
+
for device in self.discover_devices():
|
|
383
|
+
request.send_ndjson({'id': device.id, 'name': device.name})
|
|
384
|
+
request.response_stream_end()
|
|
385
|
+
return _workers.DEFERRED
|
|
386
|
+
```
|
|
387
|
+
|
|
388
|
+
NDJSON methods on `Request`:
|
|
389
|
+
|
|
390
|
+
| Method | Description |
|
|
391
|
+
|--------|-------------|
|
|
392
|
+
| `response_ndjson(headers, cookies)` | Start NDJSON stream (wrapper over `response_stream` with `application/x-ndjson`) |
|
|
393
|
+
| `send_ndjson(obj)` | Send one JSON-serializable value as a line |
|
|
394
|
+
| `response_stream_end()` | End stream and close connection (shared with SSE) |
|
|
395
|
+
|
|
396
|
+
Same lifecycle as SSE: excluded from timeout expiration, client disconnect triggers `on_disconnect(request_id)`.
|
|
397
|
+
|
|
373
398
|
### Flow Control
|
|
374
399
|
|
|
375
400
|
Workers can stop accepting new requests when busy. Requests stay in the shared pool queue for other workers to pick up:
|
|
@@ -526,6 +551,7 @@ Reason is one of:
|
|
|
526
551
|
| `PENDING_DISCONNECTED` | Client disconnected mid-stream; worker was notified via control queue (race possible). |
|
|
527
552
|
| `PENDING_STREAM_CLOSED` | Worker ended the SSE stream cleanly. |
|
|
528
553
|
| `PENDING_SHUTDOWN` | Dispatcher is shutting down; client got 503. |
|
|
554
|
+
| `PENDING_WORKER_DIED` | Worker process died/was killed while owning the request; client got 500. `on_worker_died()` runs first. |
|
|
529
555
|
|
|
530
556
|
The hook is invoked after the client-facing action (respond / disconnect / control queue put)
|
|
531
557
|
so dispatcher state is finalized when it runs. Exceptions raised by the hook are logged at
|
|
@@ -536,6 +562,51 @@ Override `on_pending_removed()` if you need exactly-once cleanup. Overriding bot
|
|
|
536
562
|
but discouraged — for the `PENDING_COMPLETED` reason, `on_response()` is called immediately
|
|
537
563
|
before `on_pending_removed()`.
|
|
538
564
|
|
|
565
|
+
## Worker Death Hook
|
|
566
|
+
|
|
567
|
+
Workers die — segfault in a C extension, OOM-kill, or the dispatcher kills them after
|
|
568
|
+
`stuck_timeout`. Override `on_worker_died()` to capture which requests they had in-flight
|
|
569
|
+
(useful for forensics when a malformed payload reproduces a crash):
|
|
570
|
+
|
|
571
|
+
```python
|
|
572
|
+
class MyDispatcher(_workers.Dispatcher):
|
|
573
|
+
def on_worker_died(
|
|
574
|
+
self, pool, worker_id, reason, exitcode, victims):
|
|
575
|
+
# `victims` is a list of (request_id, _PendingRequest) for all
|
|
576
|
+
# requests this worker had claimed but not completed.
|
|
577
|
+
# `exitcode` is None for stuck workers, otherwise the process exit
|
|
578
|
+
# code (negative = signal: -9 OOM, -11 SIGSEGV).
|
|
579
|
+
for rid, pending in victims:
|
|
580
|
+
c = pending.client
|
|
581
|
+
self._crash_queue.append({
|
|
582
|
+
'reason': reason,
|
|
583
|
+
'exitcode': exitcode,
|
|
584
|
+
'method': c.method,
|
|
585
|
+
'path': c.path,
|
|
586
|
+
'address': c.address,
|
|
587
|
+
'body': c.body, # raw bytes — replay this to reproduce
|
|
588
|
+
})
|
|
589
|
+
# Default impl responds 500 to victims and fires
|
|
590
|
+
# on_pending_removed(PENDING_WORKER_DIED). Call it after capture.
|
|
591
|
+
super().on_worker_died(
|
|
592
|
+
pool, worker_id, reason, exitcode, victims)
|
|
593
|
+
```
|
|
594
|
+
|
|
595
|
+
What's a victim: any request the worker had claimed via `MSG_HEARTBEAT`
|
|
596
|
+
(`pending.worker_id == worker_id`). Requests still in the queue (`worker_id is None`)
|
|
597
|
+
are **not** victims — other workers in the pool will pick them up after restart.
|
|
598
|
+
|
|
599
|
+
Default behavior (if you don't override) is to log the death + each victim, respond
|
|
600
|
+
500 (or close the stream for SSE/NDJSON), and fire `on_pending_removed` for each.
|
|
601
|
+
Override only if you want to persist payloads or customize the response status/body.
|
|
602
|
+
|
|
603
|
+
**500 vs 503:** a victim of a crashed worker gets **500** (processing started, then
|
|
604
|
+
the server failed). A new request arriving while the pool has zero alive workers
|
|
605
|
+
gets **503 + `Retry-After: 1`** (rejected before processing — try again shortly).
|
|
606
|
+
A request to a pool that has exceeded `max_restarts` in `restart_window` gets **503**
|
|
607
|
+
permanently (`pool.is_degraded`). `pool.alive_count` is exposed for monitoring and
|
|
608
|
+
also appears in `pool.status()`.
|
|
609
|
+
|
|
539
610
|
## Dispatcher Idle Hook
|
|
540
611
|
|
|
541
612
|
Override `on_idle()` on the dispatcher for periodic background tasks — called on each `select()` timeout (every `SELECT_TIMEOUT` seconds, default 1s):
|
|
@@ -10,10 +10,10 @@ from uhttp.workers import (
|
|
|
10
10
|
Dispatcher, Worker, WorkerPool, Request, Response,
|
|
11
11
|
api, sync, RejectRequest,
|
|
12
12
|
MSG_RESPONSE, MSG_HEARTBEAT,
|
|
13
|
-
MSG_SSE_OPEN, MSG_SSE_EVENT, MSG_SSE_CLOSE,
|
|
13
|
+
MSG_SSE_OPEN, MSG_SSE_EVENT, MSG_SSE_CLOSE, MSG_NDJSON,
|
|
14
14
|
CTL_DISCONNECT,
|
|
15
15
|
PENDING_COMPLETED, PENDING_TIMEOUT, PENDING_DISCONNECTED,
|
|
16
|
-
PENDING_STREAM_CLOSED, PENDING_SHUTDOWN,
|
|
16
|
+
PENDING_STREAM_CLOSED, PENDING_SHUTDOWN, PENDING_WORKER_DIED,
|
|
17
17
|
LOG_ERROR,
|
|
18
18
|
_PendingRequest,
|
|
19
19
|
)
|
|
@@ -29,13 +29,16 @@ class MockClient:
|
|
|
29
29
|
"""Mock HttpConnection for testing dispatcher logic."""
|
|
30
30
|
|
|
31
31
|
def __init__(self, method='GET', path='/', query=None, data=None,
|
|
32
|
-
headers=None, content_type=None
|
|
32
|
+
headers=None, content_type=None, body=None,
|
|
33
|
+
address='127.0.0.1'):
|
|
33
34
|
self.method = method
|
|
34
35
|
self.path = path
|
|
35
36
|
self.query = query
|
|
36
37
|
self.data = data
|
|
37
38
|
self.headers = headers or {}
|
|
38
39
|
self.content_type = content_type
|
|
40
|
+
self.body = body
|
|
41
|
+
self.address = address
|
|
39
42
|
self.responded = False
|
|
40
43
|
self.response_data = None
|
|
41
44
|
self.response_status = None
|
|
@@ -78,6 +81,12 @@ class MockClient:
|
|
|
78
81
|
self._chunks.append(data)
|
|
79
82
|
return getattr(self, '_connected', True)
|
|
80
83
|
|
|
84
|
+
def send_ndjson(self, obj):
|
|
85
|
+
if not hasattr(self, '_ndjson'):
|
|
86
|
+
self._ndjson = []
|
|
87
|
+
self._ndjson.append(obj)
|
|
88
|
+
return getattr(self, '_connected', True)
|
|
89
|
+
|
|
81
90
|
def response_stream_end(self):
|
|
82
91
|
self.stream_ended = True
|
|
83
92
|
|
|
@@ -182,6 +191,8 @@ class TestDispatcherDoCheck(unittest.TestCase):
|
|
|
182
191
|
|
|
183
192
|
def test_pass_check(self):
|
|
184
193
|
pool = WorkerPool(DummyWorker, routes=['/api/**'])
|
|
194
|
+
# fake a live worker so alive_count > 0 without starting processes
|
|
195
|
+
pool.workers = [type('W', (), {'is_alive': lambda self: True})()]
|
|
185
196
|
d = Dispatcher.__new__(Dispatcher)
|
|
186
197
|
d._sync_routes = []
|
|
187
198
|
d._static_routes = {}
|
|
@@ -311,6 +322,48 @@ class TestDispatcherPoolRouting(unittest.TestCase):
|
|
|
311
322
|
self.assertTrue(client.responded)
|
|
312
323
|
self.assertEqual(client.response_status, 503)
|
|
313
324
|
|
|
325
|
+
def test_dispatch_no_alive_workers_returns_503(self):
|
|
326
|
+
"""Transient: pool has workers but none currently alive."""
|
|
327
|
+
# pool has dead workers (not degraded yet)
|
|
328
|
+
self.pool_default.workers = [
|
|
329
|
+
type('W', (), {'is_alive': lambda self: False})()]
|
|
330
|
+
d = Dispatcher.__new__(Dispatcher)
|
|
331
|
+
d._sync_routes = []
|
|
332
|
+
d._static_routes = {}
|
|
333
|
+
d._pools = [self.pool_default]
|
|
334
|
+
d._pending = {}
|
|
335
|
+
d._max_pending = 1000
|
|
336
|
+
d._next_request_id = 0
|
|
337
|
+
|
|
338
|
+
client = MockClient('GET', '/test')
|
|
339
|
+
d._dispatch_to_pool(client)
|
|
340
|
+
self.assertTrue(client.responded)
|
|
341
|
+
self.assertEqual(client.response_status, 503)
|
|
342
|
+
self.assertEqual(
|
|
343
|
+
client.response_data['error'], 'No workers available')
|
|
344
|
+
self.assertEqual(
|
|
345
|
+
client.response_headers.get('Retry-After'), '1')
|
|
346
|
+
# request was NOT enqueued
|
|
347
|
+
self.assertEqual(d._pending, {})
|
|
348
|
+
|
|
349
|
+
def test_dispatch_empty_workers_returns_503(self):
|
|
350
|
+
"""Pool was never started (empty workers list)."""
|
|
351
|
+
# self.pool_default.workers is [] from __init__
|
|
352
|
+
d = Dispatcher.__new__(Dispatcher)
|
|
353
|
+
d._sync_routes = []
|
|
354
|
+
d._static_routes = {}
|
|
355
|
+
d._pools = [self.pool_default]
|
|
356
|
+
d._pending = {}
|
|
357
|
+
d._max_pending = 1000
|
|
358
|
+
d._next_request_id = 0
|
|
359
|
+
|
|
360
|
+
client = MockClient('GET', '/test')
|
|
361
|
+
d._dispatch_to_pool(client)
|
|
362
|
+
self.assertTrue(client.responded)
|
|
363
|
+
self.assertEqual(client.response_status, 503)
|
|
364
|
+
self.assertEqual(
|
|
365
|
+
client.response_data['error'], 'No workers available')
|
|
366
|
+
|
|
314
367
|
|
|
315
368
|
class TestDispatcherProcessResponse(unittest.TestCase):
|
|
316
369
|
|
|
@@ -547,6 +600,42 @@ class TestDispatcherSSE(unittest.TestCase):
|
|
|
547
600
|
# non-streaming should be expired
|
|
548
601
|
self.assertNotIn(1, d._pending)
|
|
549
602
|
|
|
603
|
+
def test_ndjson_send(self):
|
|
604
|
+
d, pool = self._make_dispatcher()
|
|
605
|
+
client = MockClient('GET', '/api/stream')
|
|
606
|
+
pending = _PendingRequest(client, pool)
|
|
607
|
+
pending.streaming = True
|
|
608
|
+
d._pending[1] = pending
|
|
609
|
+
d._process_response(
|
|
610
|
+
(MSG_NDJSON, 1, {'devices': [1, 2, 3]}))
|
|
611
|
+
self.assertEqual(len(client._ndjson), 1)
|
|
612
|
+
self.assertEqual(client._ndjson[0], {'devices': [1, 2, 3]})
|
|
613
|
+
# still pending — stream open
|
|
614
|
+
self.assertIn(1, d._pending)
|
|
615
|
+
|
|
616
|
+
def test_ndjson_client_disconnect(self):
|
|
617
|
+
d, pool = self._make_dispatcher()
|
|
618
|
+
pool.start(d._response_queue)
|
|
619
|
+
client = MockClient('GET', '/api/stream')
|
|
620
|
+
client._connected = False
|
|
621
|
+
pending = _PendingRequest(client, pool)
|
|
622
|
+
pending.streaming = True
|
|
623
|
+
pending.worker_id = 0
|
|
624
|
+
d._pending[1] = pending
|
|
625
|
+
d._process_response((MSG_NDJSON, 1, {'x': 1}))
|
|
626
|
+
# removed from pending
|
|
627
|
+
self.assertNotIn(1, d._pending)
|
|
628
|
+
# CTL_DISCONNECT sent to worker's control queue
|
|
629
|
+
msg = pool._control_queues[0].get(timeout=1)
|
|
630
|
+
self.assertEqual(msg, (CTL_DISCONNECT, 1))
|
|
631
|
+
pool.shutdown(timeout=2)
|
|
632
|
+
|
|
633
|
+
def test_ndjson_ignored_after_close(self):
|
|
634
|
+
d, pool = self._make_dispatcher()
|
|
635
|
+
# no pending request with id 99
|
|
636
|
+
d._process_response((MSG_NDJSON, 99, {'x': 1}))
|
|
637
|
+
# should not raise, just ignore
|
|
638
|
+
|
|
550
639
|
def test_sse_event_ignored_after_close(self):
|
|
551
640
|
d, pool = self._make_dispatcher()
|
|
552
641
|
# no pending request with id 99
|
|
@@ -727,5 +816,207 @@ class TestDispatcherPendingRemoved(unittest.TestCase):
|
|
|
727
816
|
self.assertTrue(client.responded)
|
|
728
817
|
|
|
729
818
|
|
|
819
|
+
class TestDispatcherWorkerDied(unittest.TestCase):
|
|
820
|
+
"""Tests for on_worker_died hook and victim handling."""
|
|
821
|
+
|
|
822
|
+
def _make_dispatcher(self, dispatcher_cls=Dispatcher):
|
|
823
|
+
# queue_warning=0 disables the queue-size check (which would
|
|
824
|
+
# otherwise touch pool.pending_count → request_queue.qsize()).
|
|
825
|
+
pool = WorkerPool(
|
|
826
|
+
DummyWorker, routes=['/api/**'], queue_warning=0)
|
|
827
|
+
# Mock check_workers so we control what it returns without
|
|
828
|
+
# actually starting processes.
|
|
829
|
+
pool._fake_restarted = []
|
|
830
|
+
pool.check_workers = lambda: pool._fake_restarted
|
|
831
|
+
d = dispatcher_cls.__new__(dispatcher_cls)
|
|
832
|
+
d._sync_routes = []
|
|
833
|
+
d._static_routes = {}
|
|
834
|
+
d._pools = [pool]
|
|
835
|
+
d._pending = {}
|
|
836
|
+
d._max_pending = 1000
|
|
837
|
+
d._next_request_id = 0
|
|
838
|
+
d._response_queue = mp.Queue()
|
|
839
|
+
d._log_is_tty = False
|
|
840
|
+
d.log_calls = []
|
|
841
|
+
d.on_log = lambda name, level, msg: d.log_calls.append(
|
|
842
|
+
(name, level, msg))
|
|
843
|
+
d.recorded_removed = []
|
|
844
|
+
return d, pool
|
|
845
|
+
|
|
846
|
+
def test_single_victim_gets_500(self):
|
|
847
|
+
|
|
848
|
+
class RecordingDispatcher(Dispatcher):
|
|
849
|
+
def on_pending_removed(self, request_id, pending, reason):
|
|
850
|
+
self.recorded_removed.append((request_id, reason))
|
|
851
|
+
|
|
852
|
+
d, pool = self._make_dispatcher(RecordingDispatcher)
|
|
853
|
+
client = MockClient(
|
|
854
|
+
'POST', '/api/scan', body=b'\x00\x01bad', address='10.0.0.7')
|
|
855
|
+
pending = _PendingRequest(client, pool)
|
|
856
|
+
pending.worker_id = 0
|
|
857
|
+
d._pending[42] = pending
|
|
858
|
+
pool._fake_restarted = [(0, 'died exit=-11', -11)]
|
|
859
|
+
d._check_all_workers()
|
|
860
|
+
# client got 500
|
|
861
|
+
self.assertTrue(client.responded)
|
|
862
|
+
self.assertEqual(client.response_status, 500)
|
|
863
|
+
self.assertEqual(client.response_data['error'], 'Worker crashed')
|
|
864
|
+
self.assertIn('exit=-11', client.response_data['reason'])
|
|
865
|
+
# removed from pending + hook fired
|
|
866
|
+
self.assertNotIn(42, d._pending)
|
|
867
|
+
self.assertEqual(
|
|
868
|
+
d.recorded_removed, [(42, PENDING_WORKER_DIED)])
|
|
869
|
+
|
|
870
|
+
def test_multiple_victims_all_handled(self):
|
|
871
|
+
d, pool = self._make_dispatcher()
|
|
872
|
+
c1 = MockClient('GET', '/api/a', address='1.1.1.1')
|
|
873
|
+
c2 = MockClient('GET', '/api/b', address='2.2.2.2')
|
|
874
|
+
c3 = MockClient('GET', '/api/c', address='3.3.3.3')
|
|
875
|
+
for rid, c in [(1, c1), (2, c2), (3, c3)]:
|
|
876
|
+
p = _PendingRequest(c, pool)
|
|
877
|
+
p.worker_id = 0
|
|
878
|
+
d._pending[rid] = p
|
|
879
|
+
pool._fake_restarted = [(0, 'stuck', None)]
|
|
880
|
+
d._check_all_workers()
|
|
881
|
+
for c in (c1, c2, c3):
|
|
882
|
+
self.assertTrue(c.responded)
|
|
883
|
+
self.assertEqual(c.response_status, 500)
|
|
884
|
+
self.assertEqual(d._pending, {})
|
|
885
|
+
|
|
886
|
+
def test_streaming_victim_gets_stream_end(self):
|
|
887
|
+
d, pool = self._make_dispatcher()
|
|
888
|
+
client = MockClient('GET', '/api/events')
|
|
889
|
+
pending = _PendingRequest(client, pool)
|
|
890
|
+
pending.worker_id = 0
|
|
891
|
+
pending.streaming = True
|
|
892
|
+
d._pending[1] = pending
|
|
893
|
+
pool._fake_restarted = [(0, 'died exit=-9', -9)]
|
|
894
|
+
d._check_all_workers()
|
|
895
|
+
# stream ended, NOT respond()
|
|
896
|
+
self.assertTrue(getattr(client, 'stream_ended', False))
|
|
897
|
+
self.assertFalse(client.responded)
|
|
898
|
+
self.assertNotIn(1, d._pending)
|
|
899
|
+
|
|
900
|
+
def test_queued_request_not_a_victim(self):
|
|
901
|
+
"""Request with worker_id=None is still in queue — not a victim."""
|
|
902
|
+
d, pool = self._make_dispatcher()
|
|
903
|
+
# request belonging to dying worker
|
|
904
|
+
in_flight = MockClient('GET', '/api/active')
|
|
905
|
+
p1 = _PendingRequest(in_flight, pool)
|
|
906
|
+
p1.worker_id = 0
|
|
907
|
+
d._pending[1] = p1
|
|
908
|
+
# request still in queue, no worker claimed it
|
|
909
|
+
queued = MockClient('GET', '/api/queued')
|
|
910
|
+
p2 = _PendingRequest(queued, pool)
|
|
911
|
+
# p2.worker_id stays None
|
|
912
|
+
d._pending[2] = p2
|
|
913
|
+
pool._fake_restarted = [(0, 'died exit=-11', -11)]
|
|
914
|
+
d._check_all_workers()
|
|
915
|
+
# in-flight responded
|
|
916
|
+
self.assertTrue(in_flight.responded)
|
|
917
|
+
self.assertNotIn(1, d._pending)
|
|
918
|
+
# queued untouched
|
|
919
|
+
self.assertFalse(queued.responded)
|
|
920
|
+
self.assertIn(2, d._pending)
|
|
921
|
+
|
|
922
|
+
def test_other_worker_not_affected(self):
|
|
923
|
+
"""Only victims of THIS worker are handled; other workers stay."""
|
|
924
|
+
d, pool = self._make_dispatcher()
|
|
925
|
+
c1 = MockClient('GET', '/api/a')
|
|
926
|
+
c2 = MockClient('GET', '/api/b')
|
|
927
|
+
p1 = _PendingRequest(c1, pool)
|
|
928
|
+
p1.worker_id = 0
|
|
929
|
+
p2 = _PendingRequest(c2, pool)
|
|
930
|
+
p2.worker_id = 1
|
|
931
|
+
d._pending[1] = p1
|
|
932
|
+
d._pending[2] = p2
|
|
933
|
+
pool._fake_restarted = [(0, 'died exit=-11', -11)]
|
|
934
|
+
d._check_all_workers()
|
|
935
|
+
self.assertTrue(c1.responded)
|
|
936
|
+
self.assertNotIn(1, d._pending)
|
|
937
|
+
self.assertFalse(c2.responded)
|
|
938
|
+
self.assertIn(2, d._pending)
|
|
939
|
+
|
|
940
|
+
def test_late_response_after_victim_cleanup_dropped(self):
|
|
941
|
+
"""MSG_RESPONSE from dead worker arriving after victim removal is dropped."""
|
|
942
|
+
d, pool = self._make_dispatcher()
|
|
943
|
+
client = MockClient('GET', '/api/test')
|
|
944
|
+
pending = _PendingRequest(client, pool)
|
|
945
|
+
pending.worker_id = 0
|
|
946
|
+
d._pending[1] = pending
|
|
947
|
+
pool._fake_restarted = [(0, 'died exit=-11', -11)]
|
|
948
|
+
d._check_all_workers()
|
|
949
|
+
# request already gone; client already got 500
|
|
950
|
+
self.assertEqual(client.response_status, 500)
|
|
951
|
+
# late response from before-death — must not break or double-respond
|
|
952
|
+
client.response_status = None
|
|
953
|
+
late = Response(request_id=1, data={'ok': True}, status=200)
|
|
954
|
+
d._process_response((MSG_RESPONSE, 1, late))
|
|
955
|
+
# silently dropped
|
|
956
|
+
self.assertIsNone(client.response_status)
|
|
957
|
+
|
|
958
|
+
def test_no_victims_just_logs(self):
|
|
959
|
+
"""Worker died while idle — restarted but no pending requests."""
|
|
960
|
+
d, pool = self._make_dispatcher()
|
|
961
|
+
pool._fake_restarted = [(0, 'died exit=0', 0)]
|
|
962
|
+
d._check_all_workers()
|
|
963
|
+
# no crash, no pending changes
|
|
964
|
+
self.assertEqual(d._pending, {})
|
|
965
|
+
# should have logged
|
|
966
|
+
error_logs = [
|
|
967
|
+
msg for _, level, msg in d.log_calls if level == LOG_ERROR]
|
|
968
|
+
self.assertEqual(len(error_logs), 1)
|
|
969
|
+
self.assertIn('victims=0', error_logs[0])
|
|
970
|
+
|
|
971
|
+
def test_override_can_persist_payload(self):
|
|
972
|
+
"""User override can capture victim payload before super() responds."""
|
|
973
|
+
captured = []
|
|
974
|
+
|
|
975
|
+
class ForensicDispatcher(Dispatcher):
|
|
976
|
+
def on_worker_died(
|
|
977
|
+
self, pool, worker_id, reason, exitcode, victims):
|
|
978
|
+
for rid, pending in victims:
|
|
979
|
+
captured.append({
|
|
980
|
+
'rid': rid,
|
|
981
|
+
'address': pending.client.address,
|
|
982
|
+
'body': pending.client.body,
|
|
983
|
+
'reason': reason,
|
|
984
|
+
'exitcode': exitcode})
|
|
985
|
+
super().on_worker_died(
|
|
986
|
+
pool, worker_id, reason, exitcode, victims)
|
|
987
|
+
|
|
988
|
+
d, pool = self._make_dispatcher(ForensicDispatcher)
|
|
989
|
+
client = MockClient(
|
|
990
|
+
'POST', '/api/process',
|
|
991
|
+
body=b'\xff\xfecorrupted', address='9.9.9.9')
|
|
992
|
+
pending = _PendingRequest(client, pool)
|
|
993
|
+
pending.worker_id = 0
|
|
994
|
+
d._pending[7] = pending
|
|
995
|
+
pool._fake_restarted = [(0, 'died exit=-11', -11)]
|
|
996
|
+
d._check_all_workers()
|
|
997
|
+
self.assertEqual(len(captured), 1)
|
|
998
|
+
self.assertEqual(captured[0]['address'], '9.9.9.9')
|
|
999
|
+
self.assertEqual(captured[0]['body'], b'\xff\xfecorrupted')
|
|
1000
|
+
self.assertEqual(captured[0]['exitcode'], -11)
|
|
1001
|
+
# super() still ran
|
|
1002
|
+
self.assertEqual(client.response_status, 500)
|
|
1003
|
+
|
|
1004
|
+
def test_hook_exception_does_not_crash_dispatcher(self):
|
|
1005
|
+
|
|
1006
|
+
class BrokenDispatcher(Dispatcher):
|
|
1007
|
+
def on_worker_died(self, *args, **kwargs):
|
|
1008
|
+
raise RuntimeError('boom')
|
|
1009
|
+
|
|
1010
|
+
d, pool = self._make_dispatcher(BrokenDispatcher)
|
|
1011
|
+
pool._fake_restarted = [(0, 'died exit=-11', -11)]
|
|
1012
|
+
# must not propagate
|
|
1013
|
+
d._check_all_workers()
|
|
1014
|
+
error_logs = [
|
|
1015
|
+
msg for _, level, msg in d.log_calls if level == LOG_ERROR]
|
|
1016
|
+
# one error log about the hook failure
|
|
1017
|
+
self.assertTrue(any('on_worker_died' in m for m in error_logs))
|
|
1018
|
+
self.assertTrue(any('boom' in m for m in error_logs))
|
|
1019
|
+
|
|
1020
|
+
|
|
730
1021
|
if __name__ == '__main__':
|
|
731
1022
|
unittest.main()
|
|
@@ -8,7 +8,7 @@ from uhttp.workers import (
|
|
|
8
8
|
Worker, Request, Response, api, RejectRequest, DEFERRED,
|
|
9
9
|
Logger, LOG_DEBUG, LOG_INFO, LOG_WARNING, LOG_ERROR,
|
|
10
10
|
MSG_RESPONSE, MSG_HEARTBEAT,
|
|
11
|
-
MSG_SSE_OPEN, MSG_SSE_EVENT, MSG_SSE_CLOSE,
|
|
11
|
+
MSG_SSE_OPEN, MSG_SSE_EVENT, MSG_SSE_CLOSE, MSG_NDJSON,
|
|
12
12
|
CTL_DISCONNECT)
|
|
13
13
|
|
|
14
14
|
|
|
@@ -520,6 +520,38 @@ class TestSSERequest(unittest.TestCase):
|
|
|
520
520
|
self.assertIsNone(msg[4])
|
|
521
521
|
self.assertIsNone(msg[5])
|
|
522
522
|
|
|
523
|
+
def test_response_ndjson(self):
|
|
524
|
+
self.req.response_ndjson(
|
|
525
|
+
headers={'X-Custom': '1'},
|
|
526
|
+
cookies={'sid': 'abc'})
|
|
527
|
+
msg = self.queue.get(timeout=1)
|
|
528
|
+
self.assertEqual(msg[0], MSG_SSE_OPEN)
|
|
529
|
+
self.assertEqual(msg[1], 10)
|
|
530
|
+
self.assertEqual(msg[2], 'application/x-ndjson')
|
|
531
|
+
self.assertEqual(msg[3], {'X-Custom': '1'})
|
|
532
|
+
self.assertEqual(msg[4], {'sid': 'abc'})
|
|
533
|
+
|
|
534
|
+
def test_response_ndjson_defaults(self):
|
|
535
|
+
self.req.response_ndjson()
|
|
536
|
+
msg = self.queue.get(timeout=1)
|
|
537
|
+
self.assertEqual(msg[0], MSG_SSE_OPEN)
|
|
538
|
+
self.assertEqual(msg[2], 'application/x-ndjson')
|
|
539
|
+
self.assertIsNone(msg[3])
|
|
540
|
+
self.assertIsNone(msg[4])
|
|
541
|
+
|
|
542
|
+
def test_send_ndjson(self):
|
|
543
|
+
self.req.send_ndjson({'devices': [1, 2, 3]})
|
|
544
|
+
msg = self.queue.get(timeout=1)
|
|
545
|
+
self.assertEqual(msg[0], MSG_NDJSON)
|
|
546
|
+
self.assertEqual(msg[1], 10)
|
|
547
|
+
self.assertEqual(msg[2], {'devices': [1, 2, 3]})
|
|
548
|
+
|
|
549
|
+
def test_send_ndjson_keepalive(self):
|
|
550
|
+
self.req.send_ndjson({})
|
|
551
|
+
msg = self.queue.get(timeout=1)
|
|
552
|
+
self.assertEqual(msg[0], MSG_NDJSON)
|
|
553
|
+
self.assertEqual(msg[2], {})
|
|
554
|
+
|
|
523
555
|
def test_response_stream_end(self):
|
|
524
556
|
self.req.response_stream_end()
|
|
525
557
|
msg = self.queue.get(timeout=1)
|
|
@@ -120,6 +120,27 @@ class TestWorkerPoolStatus(unittest.TestCase):
|
|
|
120
120
|
pool = WorkerPool(DummyWorker)
|
|
121
121
|
self.assertFalse(pool.is_degraded)
|
|
122
122
|
|
|
123
|
+
def test_alive_count_empty(self):
|
|
124
|
+
pool = WorkerPool(DummyWorker, num_workers=2)
|
|
125
|
+
# not started yet
|
|
126
|
+
self.assertEqual(pool.alive_count, 0)
|
|
127
|
+
|
|
128
|
+
def test_alive_count_running(self):
|
|
129
|
+
pool = WorkerPool(DummyWorker, num_workers=2)
|
|
130
|
+
response_queue = mp.Queue()
|
|
131
|
+
pool.start(response_queue)
|
|
132
|
+
time.sleep(0.2)
|
|
133
|
+
self.assertEqual(pool.alive_count, 2)
|
|
134
|
+
pool.shutdown(timeout=3)
|
|
135
|
+
|
|
136
|
+
def test_alive_count_in_status(self):
|
|
137
|
+
pool = WorkerPool(DummyWorker, num_workers=2)
|
|
138
|
+
response_queue = mp.Queue()
|
|
139
|
+
pool.start(response_queue)
|
|
140
|
+
time.sleep(0.2)
|
|
141
|
+
self.assertEqual(pool.status()['alive_count'], 2)
|
|
142
|
+
pool.shutdown(timeout=3)
|
|
143
|
+
|
|
123
144
|
|
|
124
145
|
class TestWorkerPoolCheckWorkers(unittest.TestCase):
|
|
125
146
|
|
|
@@ -23,6 +23,7 @@ MSG_LOG = 'LOG'
|
|
|
23
23
|
MSG_SSE_OPEN = 'SSE_OPEN'
|
|
24
24
|
MSG_SSE_EVENT = 'SSE_EVENT'
|
|
25
25
|
MSG_SSE_CLOSE = 'SSE_CLOSE'
|
|
26
|
+
MSG_NDJSON = 'NDJSON'
|
|
26
27
|
|
|
27
28
|
# Worker control messages
|
|
28
29
|
CTL_STOP = 'STOP'
|
|
@@ -35,6 +36,7 @@ PENDING_TIMEOUT = 'TIMEOUT'
|
|
|
35
36
|
PENDING_DISCONNECTED = 'DISCONNECTED'
|
|
36
37
|
PENDING_STREAM_CLOSED = 'STREAM_CLOSED'
|
|
37
38
|
PENDING_SHUTDOWN = 'SHUTDOWN'
|
|
39
|
+
PENDING_WORKER_DIED = 'WORKER_DIED'
|
|
38
40
|
|
|
39
41
|
# Sentinel for deferred response
|
|
40
42
|
DEFERRED = object()
|
|
@@ -282,6 +284,25 @@ class Request:
|
|
|
282
284
|
(MSG_SSE_EVENT, self.request_id,
|
|
283
285
|
data, event, event_id, retry))
|
|
284
286
|
|
|
287
|
+
def response_ndjson(self, headers=None, cookies=None):
|
|
288
|
+
"""Start NDJSON streaming response (application/x-ndjson).
|
|
289
|
+
|
|
290
|
+
Thin wrapper over response_stream(). Use with DEFERRED — call from
|
|
291
|
+
handler, then send_ndjson() later. Call response_stream_end() to finish.
|
|
292
|
+
"""
|
|
293
|
+
self._response_queue.put(
|
|
294
|
+
(MSG_SSE_OPEN, self.request_id,
|
|
295
|
+
'application/x-ndjson', headers, cookies))
|
|
296
|
+
|
|
297
|
+
def send_ndjson(self, obj):
|
|
298
|
+
"""Send one JSON-serializable object as an NDJSON line.
|
|
299
|
+
|
|
300
|
+
Args:
|
|
301
|
+
obj: any JSON-serializable value (dict/list/str/int/float/bool/None)
|
|
302
|
+
"""
|
|
303
|
+
self._response_queue.put(
|
|
304
|
+
(MSG_NDJSON, self.request_id, obj))
|
|
305
|
+
|
|
285
306
|
def response_stream_end(self):
|
|
286
307
|
"""End streaming response and close connection."""
|
|
287
308
|
self._response_queue.put(
|
|
@@ -859,7 +880,10 @@ class WorkerPool:
|
|
|
859
880
|
"""Check worker health, restart dead or stuck workers.
|
|
860
881
|
|
|
861
882
|
Returns:
|
|
862
|
-
List of (worker_id, reason) tuples for restarted
|
|
883
|
+
List of (worker_id, reason, exitcode) tuples for restarted
|
|
884
|
+
workers. exitcode is None for stuck workers (dispatcher killed
|
|
885
|
+
them), otherwise the process exit code (negative = signal:
|
|
886
|
+
-9 OOM, -11 SIGSEGV, -15 SIGTERM, etc.).
|
|
863
887
|
"""
|
|
864
888
|
restarted = []
|
|
865
889
|
now = _time.time()
|
|
@@ -869,8 +893,10 @@ class WorkerPool:
|
|
|
869
893
|
if now - t < self.restart_window]
|
|
870
894
|
for i, worker in enumerate(self.workers):
|
|
871
895
|
reason = None
|
|
896
|
+
exitcode = None
|
|
872
897
|
if not worker.is_alive():
|
|
873
|
-
|
|
898
|
+
exitcode = worker.exitcode
|
|
899
|
+
reason = f"died exit={exitcode}"
|
|
874
900
|
elif now - self._last_seen.get(i, 0) > self.stuck_timeout:
|
|
875
901
|
reason = "stuck"
|
|
876
902
|
worker.kill()
|
|
@@ -884,7 +910,7 @@ class WorkerPool:
|
|
|
884
910
|
if len(self._restart_times) >= self.max_restarts:
|
|
885
911
|
self._degraded = True
|
|
886
912
|
self._start_worker(i)
|
|
887
|
-
restarted.append((i, reason))
|
|
913
|
+
restarted.append((i, reason, exitcode))
|
|
888
914
|
return restarted
|
|
889
915
|
|
|
890
916
|
def matches(self, path):
|
|
@@ -939,6 +965,11 @@ class WorkerPool:
|
|
|
939
965
|
def is_degraded(self):
|
|
940
966
|
return self._degraded
|
|
941
967
|
|
|
968
|
+
@property
|
|
969
|
+
def alive_count(self):
|
|
970
|
+
"""Number of worker processes currently alive."""
|
|
971
|
+
return sum(1 for w in self.workers if w.is_alive())
|
|
972
|
+
|
|
942
973
|
@property
|
|
943
974
|
def pending_count(self):
|
|
944
975
|
try:
|
|
@@ -956,6 +987,7 @@ class WorkerPool:
|
|
|
956
987
|
return {
|
|
957
988
|
'name': self.name,
|
|
958
989
|
'degraded': self._degraded,
|
|
990
|
+
'alive_count': self.alive_count,
|
|
959
991
|
'queue_size': self.pending_count,
|
|
960
992
|
'workers': [
|
|
961
993
|
{
|
|
@@ -1107,6 +1139,9 @@ class Dispatcher:
|
|
|
1107
1139
|
notified via control queue (race possible).
|
|
1108
1140
|
PENDING_STREAM_CLOSED - worker ended the SSE stream cleanly.
|
|
1109
1141
|
PENDING_SHUTDOWN - dispatcher is shutting down; client got 503.
|
|
1142
|
+
PENDING_WORKER_DIED - worker process died/was killed while owning
|
|
1143
|
+
this request; client got 500. on_worker_died()
|
|
1144
|
+
runs first.
|
|
1110
1145
|
|
|
1111
1146
|
Args:
|
|
1112
1147
|
request_id: The request id being removed.
|
|
@@ -1181,6 +1216,11 @@ class Dispatcher:
|
|
|
1181
1216
|
client.respond(
|
|
1182
1217
|
{'error': 'Service unavailable'}, status=503)
|
|
1183
1218
|
return
|
|
1219
|
+
if pool.alive_count == 0:
|
|
1220
|
+
client.respond(
|
|
1221
|
+
{'error': 'No workers available'}, status=503,
|
|
1222
|
+
headers={'Retry-After': '1'})
|
|
1223
|
+
return
|
|
1184
1224
|
if len(self._pending) >= self._max_pending:
|
|
1185
1225
|
client.respond(
|
|
1186
1226
|
{'error': 'Too many requests'}, status=503)
|
|
@@ -1253,6 +1293,13 @@ class Dispatcher:
|
|
|
1253
1293
|
event_id=event_id, retry=retry)
|
|
1254
1294
|
if not ok:
|
|
1255
1295
|
self._stream_disconnected(request_id, pending)
|
|
1296
|
+
elif msg_type == MSG_NDJSON:
|
|
1297
|
+
_, request_id, obj = msg
|
|
1298
|
+
pending = self._pending.get(request_id)
|
|
1299
|
+
if pending is not None:
|
|
1300
|
+
ok = pending.client.send_ndjson(obj)
|
|
1301
|
+
if not ok:
|
|
1302
|
+
self._stream_disconnected(request_id, pending)
|
|
1256
1303
|
elif msg_type == MSG_SSE_CLOSE:
|
|
1257
1304
|
_, request_id = msg
|
|
1258
1305
|
pending = self._pending.pop(request_id, None)
|
|
@@ -1326,8 +1373,18 @@ class Dispatcher:
|
|
|
1326
1373
|
"""Check health of all worker pools and queue sizes."""
|
|
1327
1374
|
for pool in self._pools:
|
|
1328
1375
|
restarted = pool.check_workers()
|
|
1329
|
-
for worker_id, reason in restarted:
|
|
1330
|
-
|
|
1376
|
+
for worker_id, reason, exitcode in restarted:
|
|
1377
|
+
victims = [
|
|
1378
|
+
(rid, p) for rid, p in self._pending.items()
|
|
1379
|
+
if p.pool is pool and p.worker_id == worker_id]
|
|
1380
|
+
try:
|
|
1381
|
+
self.on_worker_died(
|
|
1382
|
+
pool, worker_id, reason, exitcode, victims)
|
|
1383
|
+
except Exception:
|
|
1384
|
+
self.on_log(
|
|
1385
|
+
pool.name, LOG_ERROR,
|
|
1386
|
+
f"on_worker_died() failed:\n"
|
|
1387
|
+
f"{_traceback.format_exc()}")
|
|
1331
1388
|
if pool.queue_warning:
|
|
1332
1389
|
qsize = pool.pending_count
|
|
1333
1390
|
if qsize >= pool.queue_warning:
|
|
@@ -1363,14 +1420,57 @@ class Dispatcher:
|
|
|
1363
1420
|
print(f"{prefix}{level_name:8s} {name:20s} {message}",
|
|
1364
1421
|
file=_sys.stderr)
|
|
1365
1422
|
|
|
1366
|
-
def
|
|
1367
|
-
"""Called when a worker
|
|
1423
|
+
def on_worker_died(self, pool, worker_id, reason, exitcode, victims):
|
|
1424
|
+
"""Called when a worker process died or was killed by the dispatcher.
|
|
1368
1425
|
|
|
1369
|
-
Default
|
|
1426
|
+
Default behavior:
|
|
1427
|
+
1. Log restart reason + each victim (request id, client address,
|
|
1428
|
+
method, path, body size).
|
|
1429
|
+
2. Respond 500 to every victim's client (or response_stream_end()
|
|
1430
|
+
for streams), remove them from _pending, and fire
|
|
1431
|
+
on_pending_removed(PENDING_WORKER_DIED) for each.
|
|
1432
|
+
|
|
1433
|
+
Override to capture victim payloads (e.g., persist to disk for
|
|
1434
|
+
post-mortem) BEFORE calling super(). pending.client gives access
|
|
1435
|
+
to method, path, headers, body, address.
|
|
1436
|
+
|
|
1437
|
+
Args:
|
|
1438
|
+
pool: WorkerPool the worker belonged to.
|
|
1439
|
+
worker_id: Index of the restarted worker.
|
|
1440
|
+
reason: 'stuck' or 'died exit=N' (string from check_workers).
|
|
1441
|
+
exitcode: Process exit code (int) or None for stuck workers.
|
|
1442
|
+
Negative values are signals: -9 OOM, -11 SIGSEGV, etc.
|
|
1443
|
+
victims: List of (request_id, _PendingRequest) tuples — requests
|
|
1444
|
+
this worker had claimed (via MSG_HEARTBEAT) but never
|
|
1445
|
+
completed. May be empty if worker died while idle.
|
|
1370
1446
|
"""
|
|
1371
1447
|
self.on_log(
|
|
1372
1448
|
f'{pool.name}[{worker_id}]', LOG_ERROR,
|
|
1373
|
-
f"worker restarted: {reason}"
|
|
1449
|
+
f"worker restarted: {reason}, "
|
|
1450
|
+
f"victims={len(victims)}")
|
|
1451
|
+
for request_id, pending in victims:
|
|
1452
|
+
c = pending.client
|
|
1453
|
+
body_len = len(c.body) if c.body is not None else 0
|
|
1454
|
+
self.on_log(
|
|
1455
|
+
pool.name, LOG_ERROR,
|
|
1456
|
+
f" victim rid={request_id} from={c.address} "
|
|
1457
|
+
f"{c.method} {c.path} body={body_len}B")
|
|
1458
|
+
del self._pending[request_id]
|
|
1459
|
+
if pending.streaming:
|
|
1460
|
+
try:
|
|
1461
|
+
pending.client.response_stream_end()
|
|
1462
|
+
except Exception:
|
|
1463
|
+
pass
|
|
1464
|
+
else:
|
|
1465
|
+
try:
|
|
1466
|
+
pending.client.respond(
|
|
1467
|
+
{'error': 'Worker crashed',
|
|
1468
|
+
'reason': reason},
|
|
1469
|
+
status=500)
|
|
1470
|
+
except Exception:
|
|
1471
|
+
pass
|
|
1472
|
+
self._notify_pending_removed(
|
|
1473
|
+
request_id, pending, PENDING_WORKER_DIED)
|
|
1374
1474
|
|
|
1375
1475
|
def _sigterm(self, _signo, _stack_frame):
|
|
1376
1476
|
self._running = False
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: uhttp-workers
|
|
3
|
-
Version: 1.
|
|
3
|
+
Version: 1.6.0
|
|
4
4
|
Summary: Multi-process worker dispatcher built on uhttp-server
|
|
5
5
|
Author-email: Pavel Revak <pavelrevak@gmail.com>
|
|
6
6
|
License-Expression: MIT
|
|
@@ -10,7 +10,7 @@ Classifier: Programming Language :: Python :: 3
|
|
|
10
10
|
Classifier: Operating System :: POSIX
|
|
11
11
|
Requires-Python: >=3.10
|
|
12
12
|
Description-Content-Type: text/markdown
|
|
13
|
-
Requires-Dist: uhttp-server
|
|
13
|
+
Requires-Dist: uhttp-server>=2.5.2
|
|
14
14
|
|
|
15
15
|
# uhttp-workers
|
|
16
16
|
|
|
@@ -384,6 +384,31 @@ Available streaming methods on `Request`:
|
|
|
384
384
|
|
|
385
385
|
Streaming requests are excluded from dispatcher timeout expiration. When the client disconnects, the dispatcher notifies the worker via `on_disconnect(request_id)`.
|
|
386
386
|
|
|
387
|
+
### NDJSON Streaming
|
|
388
|
+
|
|
389
|
+
Stream JSON objects line-by-line (`application/x-ndjson`) — one JSON value per line, terminated by `\n`. Useful for incremental APIs that aren't event-shaped (long lists, log tails, progress updates):
|
|
390
|
+
|
|
391
|
+
```python
|
|
392
|
+
class MyWorker(_workers.Worker):
|
|
393
|
+
@_workers.api('/devices/scan', 'GET')
|
|
394
|
+
def scan(self, request):
|
|
395
|
+
request.response_ndjson()
|
|
396
|
+
for device in self.discover_devices():
|
|
397
|
+
request.send_ndjson({'id': device.id, 'name': device.name})
|
|
398
|
+
request.response_stream_end()
|
|
399
|
+
return _workers.DEFERRED
|
|
400
|
+
```
|
|
401
|
+
|
|
402
|
+
NDJSON methods on `Request`:
|
|
403
|
+
|
|
404
|
+
| Method | Description |
|
|
405
|
+
|--------|-------------|
|
|
406
|
+
| `response_ndjson(headers, cookies)` | Start NDJSON stream (wrapper over `response_stream` with `application/x-ndjson`) |
|
|
407
|
+
| `send_ndjson(obj)` | Send one JSON-serializable value as a line |
|
|
408
|
+
| `response_stream_end()` | End stream and close connection (shared with SSE) |
|
|
409
|
+
|
|
410
|
+
Same lifecycle as SSE: excluded from timeout expiration, client disconnect triggers `on_disconnect(request_id)`.
|
|
411
|
+
|
|
387
412
|
### Flow Control
|
|
388
413
|
|
|
389
414
|
Workers can stop accepting new requests when busy. Requests stay in the shared pool queue for other workers to pick up:
|
|
@@ -540,6 +565,7 @@ Reason is one of:
|
|
|
540
565
|
| `PENDING_DISCONNECTED` | Client disconnected mid-stream; worker was notified via control queue (race possible). |
|
|
541
566
|
| `PENDING_STREAM_CLOSED` | Worker ended the SSE stream cleanly. |
|
|
542
567
|
| `PENDING_SHUTDOWN` | Dispatcher is shutting down; client got 503. |
|
|
568
|
+
| `PENDING_WORKER_DIED` | Worker process died/was killed while owning the request; client got 500. `on_worker_died()` runs first. |
|
|
543
569
|
|
|
544
570
|
The hook is invoked after the client-facing action (respond / disconnect / control queue put)
|
|
545
571
|
so dispatcher state is finalized when it runs. Exceptions raised by the hook are logged at
|
|
@@ -550,6 +576,51 @@ Override `on_pending_removed()` if you need exactly-once cleanup. Overriding bot
|
|
|
550
576
|
but discouraged — for the `PENDING_COMPLETED` reason, `on_response()` is called immediately
|
|
551
577
|
before `on_pending_removed()`.
|
|
552
578
|
|
|
579
|
+
## Worker Death Hook
|
|
580
|
+
|
|
581
|
+
Workers die — segfault in a C extension, OOM-kill, or the dispatcher kills them after
|
|
582
|
+
`stuck_timeout`. Override `on_worker_died()` to capture which requests they had in-flight
|
|
583
|
+
(useful for forensics when a malformed payload reproduces a crash):
|
|
584
|
+
|
|
585
|
+
```python
|
|
586
|
+
class MyDispatcher(_workers.Dispatcher):
|
|
587
|
+
def on_worker_died(
|
|
588
|
+
self, pool, worker_id, reason, exitcode, victims):
|
|
589
|
+
# `victims` is a list of (request_id, _PendingRequest) for all
|
|
590
|
+
# requests this worker had claimed but not completed.
|
|
591
|
+
# `exitcode` is None for stuck workers, otherwise the process exit
|
|
592
|
+
# code (negative = signal: -9 OOM, -11 SIGSEGV).
|
|
593
|
+
for rid, pending in victims:
|
|
594
|
+
c = pending.client
|
|
595
|
+
self._crash_queue.append({
|
|
596
|
+
'reason': reason,
|
|
597
|
+
'exitcode': exitcode,
|
|
598
|
+
'method': c.method,
|
|
599
|
+
'path': c.path,
|
|
600
|
+
'address': c.address,
|
|
601
|
+
'body': c.body, # raw bytes — replay this to reproduce
|
|
602
|
+
})
|
|
603
|
+
# Default impl responds 500 to victims and fires
|
|
604
|
+
# on_pending_removed(PENDING_WORKER_DIED). Call it after capture.
|
|
605
|
+
super().on_worker_died(
|
|
606
|
+
pool, worker_id, reason, exitcode, victims)
|
|
607
|
+
```
|
|
608
|
+
|
|
609
|
+
What's a victim: any request the worker had claimed via `MSG_HEARTBEAT`
|
|
610
|
+
(`pending.worker_id == worker_id`). Requests still in the queue (`worker_id is None`)
|
|
611
|
+
are **not** victims — other workers in the pool will pick them up after restart.
|
|
612
|
+
|
|
613
|
+
Default behavior (if you don't override) is to log the death + each victim, respond
|
|
614
|
+
500 (or close the stream for SSE/NDJSON), and fire `on_pending_removed` for each.
|
|
615
|
+
Override only if you want to persist payloads or customize the response status/body.
|
|
616
|
+
|
|
617
|
+
**500 vs 503:** a victim of a crashed worker gets **500** (processing started, then
|
|
618
|
+
the server failed). A new request arriving while the pool has zero alive workers
|
|
619
|
+
gets **503 + `Retry-After: 1`** (rejected before processing — try again shortly).
|
|
620
|
+
A request to a pool that has exceeded `max_restarts` in `restart_window` gets **503**
|
|
621
|
+
permanently (`pool.is_degraded`). `pool.alive_count` is exposed for monitoring and
|
|
622
|
+
also appears in `pool.status()`.
|
|
623
|
+
|
|
553
624
|
## Dispatcher Idle Hook
|
|
554
625
|
|
|
555
626
|
Override `on_idle()` on the dispatcher for periodic background tasks — called on each `select()` timeout (every `SELECT_TIMEOUT` seconds, default 1s):
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
uhttp-server>=2.5.2
|
|
@@ -1 +0,0 @@
|
|
|
1
|
-
uhttp-server
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|