@push.rocks/smartproxy 19.6.2 → 19.6.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (52) hide show
  1. package/dist_ts/proxies/smart-proxy/connection-manager.d.ts +4 -7
  2. package/dist_ts/proxies/smart-proxy/connection-manager.js +22 -22
  3. package/dist_ts/proxies/smart-proxy/http-proxy-bridge.d.ts +4 -3
  4. package/dist_ts/proxies/smart-proxy/http-proxy-bridge.js +9 -9
  5. package/dist_ts/proxies/smart-proxy/metrics-collector.d.ts +68 -56
  6. package/dist_ts/proxies/smart-proxy/metrics-collector.js +226 -176
  7. package/dist_ts/proxies/smart-proxy/models/interfaces.d.ts +5 -0
  8. package/dist_ts/proxies/smart-proxy/models/metrics-types.d.ts +94 -48
  9. package/dist_ts/proxies/smart-proxy/nftables-manager.d.ts +4 -4
  10. package/dist_ts/proxies/smart-proxy/nftables-manager.js +6 -6
  11. package/dist_ts/proxies/smart-proxy/port-manager.d.ts +4 -7
  12. package/dist_ts/proxies/smart-proxy/port-manager.js +6 -9
  13. package/dist_ts/proxies/smart-proxy/route-connection-handler.d.ts +4 -15
  14. package/dist_ts/proxies/smart-proxy/route-connection-handler.js +128 -128
  15. package/dist_ts/proxies/smart-proxy/security-manager.d.ts +3 -3
  16. package/dist_ts/proxies/smart-proxy/security-manager.js +9 -9
  17. package/dist_ts/proxies/smart-proxy/smart-proxy.d.ts +20 -13
  18. package/dist_ts/proxies/smart-proxy/smart-proxy.js +16 -13
  19. package/dist_ts/proxies/smart-proxy/throughput-tracker.d.ts +36 -0
  20. package/dist_ts/proxies/smart-proxy/throughput-tracker.js +117 -0
  21. package/dist_ts/proxies/smart-proxy/timeout-manager.d.ts +4 -3
  22. package/dist_ts/proxies/smart-proxy/timeout-manager.js +16 -16
  23. package/dist_ts/proxies/smart-proxy/tls-manager.d.ts +3 -3
  24. package/dist_ts/proxies/smart-proxy/tls-manager.js +12 -12
  25. package/package.json +8 -17
  26. package/readme.hints.md +0 -897
  27. package/readme.md +960 -54
  28. package/readme.plan.md +301 -562
  29. package/ts/proxies/smart-proxy/connection-manager.ts +23 -21
  30. package/ts/proxies/smart-proxy/http-proxy-bridge.ts +9 -8
  31. package/ts/proxies/smart-proxy/metrics-collector.ts +277 -189
  32. package/ts/proxies/smart-proxy/models/interfaces.ts +7 -0
  33. package/ts/proxies/smart-proxy/models/metrics-types.ts +93 -41
  34. package/ts/proxies/smart-proxy/nftables-manager.ts +5 -5
  35. package/ts/proxies/smart-proxy/port-manager.ts +6 -14
  36. package/ts/proxies/smart-proxy/route-connection-handler.ts +136 -136
  37. package/ts/proxies/smart-proxy/security-manager.ts +8 -8
  38. package/ts/proxies/smart-proxy/smart-proxy.ts +26 -35
  39. package/ts/proxies/smart-proxy/throughput-tracker.ts +144 -0
  40. package/ts/proxies/smart-proxy/timeout-manager.ts +16 -15
  41. package/ts/proxies/smart-proxy/tls-manager.ts +11 -11
  42. package/readme.connections.md +0 -724
  43. package/readme.delete.md +0 -187
  44. package/readme.memory-leaks-fixed.md +0 -45
  45. package/readme.metrics.md +0 -591
  46. package/readme.monitoring.md +0 -202
  47. package/readme.proxy-chain-summary.md +0 -112
  48. package/readme.proxy-protocol-example.md +0 -462
  49. package/readme.proxy-protocol.md +0 -415
  50. package/readme.routing.md +0 -341
  51. package/readme.websocket-keepalive-config.md +0 -140
  52. package/readme.websocket-keepalive-fix.md +0 -63
@@ -1,724 +0,0 @@
1
- # Connection Management in SmartProxy
2
-
3
- This document describes connection handling, cleanup mechanisms, and known issues in SmartProxy, particularly focusing on proxy chain configurations.
4
-
5
- ## Connection Accumulation Investigation (January 2025)
6
-
7
- ### Problem Statement
8
- Connections may accumulate on the outer proxy in proxy chain configurations, despite implemented fixes.
9
-
10
- ### Historical Context
11
- - **v19.5.12-v19.5.15**: Major connection cleanup improvements
12
- - **v19.5.19+**: PROXY protocol support with WrappedSocket implementation
13
- - **v19.5.20**: Fixed race condition in immediate routing cleanup
14
-
15
- ### Current Architecture
16
-
17
- #### Connection Flow in Proxy Chains
18
- ```
19
- Client → Outer Proxy (8001) → Inner Proxy (8002) → Backend (httpbin.org:443)
20
- ```
21
-
22
- 1. **Outer Proxy**:
23
- - Accepts client connection
24
- - Sends PROXY protocol header to inner proxy
25
- - Tracks connection in ConnectionManager
26
- - Immediate routing for non-TLS ports
27
-
28
- 2. **Inner Proxy**:
29
- - Parses PROXY protocol to get real client IP
30
- - Establishes connection to backend
31
- - Tracks its own connections separately
32
-
33
- ### Potential Causes of Connection Accumulation
34
-
35
- #### 1. Race Condition in Immediate Routing
36
- When a connection is immediately routed (non-TLS ports), there's a timing window:
37
- ```typescript
38
- // route-connection-handler.ts, line ~231
39
- this.routeConnection(socket, record, '', undefined);
40
- // Connection is routed before all setup is complete
41
- ```
42
-
43
- **Issue**: If client disconnects during backend connection setup, cleanup may not trigger properly.
44
-
45
- #### 2. Outgoing Socket Assignment Timing
46
- Despite the fix in v19.5.20:
47
- ```typescript
48
- // Line 1362 in setupDirectConnection
49
- record.outgoing = targetSocket;
50
- ```
51
- There's still a window between socket creation and the `connect` event where cleanup might miss the outgoing socket.
52
-
53
- #### 3. Batch Cleanup Delays
54
- ConnectionManager uses queued cleanup:
55
- - Batch size: 100 connections
56
- - Batch interval: 100ms
57
- - Under rapid connection/disconnection, queue might lag
58
-
59
- #### 4. Different Cleanup Paths
60
- Multiple cleanup triggers exist:
61
- - Socket 'close' event
62
- - Socket 'error' event
63
- - Inactivity timeout
64
- - Connection timeout
65
- - Manual cleanup
66
-
67
- Not all paths may properly handle proxy chain scenarios.
68
-
69
- #### 5. Keep-Alive Connection Handling
70
- Keep-alive connections have special treatment:
71
- - Extended inactivity timeout (6x normal)
72
- - Warning before closure
73
- - May accumulate if backend is unresponsive
74
-
75
- ### Observed Symptoms
76
-
77
- 1. **Outer proxy connection count grows over time**
78
- 2. **Inner proxy maintains zero or low connection count**
79
- 3. **Connections show as closed in logs but remain in tracking**
80
- 4. **Memory usage gradually increases**
81
-
82
- ### Debug Strategies
83
-
84
- #### 1. Enhanced Logging
85
- Add connection state logging at key points:
86
- ```typescript
87
- // When outgoing socket is created
88
- logger.log('debug', `Outgoing socket created for ${connectionId}`, {
89
- hasOutgoing: !!record.outgoing,
90
- outgoingState: record.outgoing?.readyState
91
- });
92
- ```
93
-
94
- #### 2. Connection State Inspection
95
- Periodically log detailed connection state:
96
- ```typescript
97
- for (const [id, record] of connectionManager.getConnections()) {
98
- console.log({
99
- id,
100
- age: Date.now() - record.incomingStartTime,
101
- incomingDestroyed: record.incoming.destroyed,
102
- outgoingDestroyed: record.outgoing?.destroyed,
103
- hasCleanupTimer: !!record.cleanupTimer
104
- });
105
- }
106
- ```
107
-
108
- #### 3. Cleanup Verification
109
- Track cleanup completion:
110
- ```typescript
111
- // In cleanupConnection
112
- logger.log('debug', `Cleanup completed for ${record.id}`, {
113
- recordsRemaining: this.connectionRecords.size
114
- });
115
- ```
116
-
117
- ### Recommendations
118
-
119
- 1. **Immediate Cleanup for Proxy Chains**
120
- - Skip batch queue for proxy chain connections
121
- - Use synchronous cleanup when PROXY protocol is detected
122
-
123
- 2. **Socket State Validation**
124
- - Check both `destroyed` and `readyState` before cleanup decisions
125
- - Handle 'opening' state sockets explicitly
126
-
127
- 3. **Timeout Adjustments**
128
- - Shorter timeouts for proxy chain connections
129
- - More aggressive cleanup for connections without data transfer
130
-
131
- 4. **Connection Limits**
132
- - Per-route connection limits
133
- - Backpressure when approaching limits
134
-
135
- 5. **Monitoring**
136
- - Export connection metrics
137
- - Alert on connection count thresholds
138
- - Track connection age distribution
139
-
140
- ### Test Scenarios to Reproduce
141
-
142
- 1. **Rapid Connect/Disconnect**
143
- ```bash
144
- # Create many short-lived connections
145
- for i in {1..1000}; do
146
- (echo -n | nc localhost 8001) &
147
- done
148
- ```
149
-
150
- 2. **Slow Backend**
151
- - Configure inner proxy to connect to unresponsive backend
152
- - Monitor outer proxy connection count
153
-
154
- 3. **Mixed Traffic**
155
- - Combine TLS and non-TLS connections
156
- - Add keep-alive connections
157
- - Observe accumulation patterns
158
-
159
- ### Future Improvements
160
-
161
- 1. **Connection Pool Isolation**
162
- - Separate pools for proxy chain vs direct connections
163
- - Different cleanup strategies per pool
164
-
165
- 2. **Circuit Breaker**
166
- - Detect accumulation and trigger aggressive cleanup
167
- - Temporary refuse new connections when near limit
168
-
169
- 3. **Connection State Machine**
170
- - Explicit states: CONNECTING, ESTABLISHED, CLOSING, CLOSED
171
- - State transition validation
172
- - Timeout per state
173
-
174
- 4. **Metrics Collection**
175
- - Connection lifecycle events
176
- - Cleanup success/failure rates
177
- - Time spent in each state
178
-
179
- ### Root Cause Identified (January 2025)
180
-
181
- **The primary issue is on the inner proxy when backends are unreachable:**
182
-
183
- When the backend is unreachable (e.g., non-routable IP like 10.255.255.1):
184
- 1. The outgoing socket gets stuck in "opening" state indefinitely
185
- 2. The `createSocketWithErrorHandler` in socket-utils.ts doesn't implement connection timeout
186
- 3. `socket.setTimeout()` only handles inactivity AFTER connection, not during connect phase
187
- 4. Connections accumulate because they never transition to error state
188
- 5. Socket timeout warnings fire but connections are preserved as keep-alive
189
-
190
- **Code Issue:**
191
- ```typescript
192
- // socket-utils.ts line 275
193
- if (timeout) {
194
- socket.setTimeout(timeout); // This only handles inactivity, not connection!
195
- }
196
- ```
197
-
198
- **Required Fix:**
199
-
200
- 1. Add `connectionTimeout` to ISmartProxyOptions interface:
201
- ```typescript
202
- // In interfaces.ts
203
- connectionTimeout?: number; // Timeout for establishing connection (ms), default: 30000 (30s)
204
- ```
205
-
206
- 2. Update `createSocketWithErrorHandler` in socket-utils.ts:
207
- ```typescript
208
- export function createSocketWithErrorHandler(options: SafeSocketOptions): plugins.net.Socket {
209
- const { port, host, onError, onConnect, timeout } = options;
210
-
211
- const socket = new plugins.net.Socket();
212
- let connected = false;
213
- let connectionTimeout: NodeJS.Timeout | null = null;
214
-
215
- socket.on('error', (error) => {
216
- if (connectionTimeout) {
217
- clearTimeout(connectionTimeout);
218
- connectionTimeout = null;
219
- }
220
- if (onError) onError(error);
221
- });
222
-
223
- socket.on('connect', () => {
224
- connected = true;
225
- if (connectionTimeout) {
226
- clearTimeout(connectionTimeout);
227
- connectionTimeout = null;
228
- }
229
- if (timeout) socket.setTimeout(timeout); // Set inactivity timeout
230
- if (onConnect) onConnect();
231
- });
232
-
233
- // Implement connection establishment timeout
234
- if (timeout) {
235
- connectionTimeout = setTimeout(() => {
236
- if (!connected && !socket.destroyed) {
237
- const error = new Error(`Connection timeout after ${timeout}ms to ${host}:${port}`);
238
- (error as any).code = 'ETIMEDOUT';
239
- socket.destroy();
240
- if (onError) onError(error);
241
- }
242
- }, timeout);
243
- }
244
-
245
- socket.connect(port, host);
246
- return socket;
247
- }
248
- ```
249
-
250
- 3. Pass connectionTimeout in route-connection-handler.ts:
251
- ```typescript
252
- const targetSocket = createSocketWithErrorHandler({
253
- port: finalTargetPort,
254
- host: finalTargetHost,
255
- timeout: this.settings.connectionTimeout || 30000, // Connection timeout
256
- onError: (error) => { /* existing */ },
257
- onConnect: async () => { /* existing */ }
258
- });
259
- ```
260
-
261
- ### Investigation Results (January 2025)
262
-
263
- Based on extensive testing with debug scripts:
264
-
265
- 1. **Normal Operation**: In controlled tests, connections are properly cleaned up:
266
- - Immediate routing cleanup handler properly destroys outgoing connections
267
- - Both outer and inner proxies maintain 0 connections after clients disconnect
268
- - Keep-alive connections are tracked and cleaned up correctly
269
-
270
- 2. **Potential Edge Cases Not Covered by Tests**:
271
- - **HTTP/2 Connections**: May have different lifecycle than HTTP/1.1
272
- - **WebSocket Connections**: Long-lived upgrade connections might persist
273
- - **Partial TLS Handshakes**: Connections that start TLS but don't complete
274
- - **PROXY Protocol Parse Failures**: Malformed headers from untrusted sources
275
- - **Connection Pool Reuse**: HttpProxy component may maintain its own pools
276
-
277
- 3. **Timing-Sensitive Scenarios**:
278
- - Client disconnects exactly when `record.outgoing` is being assigned
279
- - Backend connects but immediately RSTs
280
- - Proxy chain where middle proxy restarts
281
- - Multiple rapid reconnects with same source IP/port
282
-
283
- 4. **Configuration-Specific Issues**:
284
- - Mixed `sendProxyProtocol` settings in chain
285
- - Different `keepAlive` settings between proxies
286
- - Mismatched timeout values
287
- - Routes with `forwardingEngine: 'nftables'`
288
-
289
- ### Additional Debug Points
290
-
291
- Add these debug logs to identify the specific scenario:
292
-
293
- ```typescript
294
- // In route-connection-handler.ts setupDirectConnection
295
- logger.log('debug', `Setting outgoing socket for ${connectionId}`, {
296
- timestamp: Date.now(),
297
- hasOutgoing: !!record.outgoing,
298
- socketState: targetSocket.readyState
299
- });
300
-
301
- // In connection-manager.ts cleanupConnection
302
- logger.log('debug', `Cleanup attempt for ${record.id}`, {
303
- alreadyClosed: record.connectionClosed,
304
- hasIncoming: !!record.incoming,
305
- hasOutgoing: !!record.outgoing,
306
- incomingDestroyed: record.incoming?.destroyed,
307
- outgoingDestroyed: record.outgoing?.destroyed
308
- });
309
- ```
310
-
311
- ### Workarounds
312
-
313
- Until root cause is identified:
314
-
315
- 1. **Periodic Force Cleanup**:
316
- ```typescript
317
- setInterval(() => {
318
- const connections = connectionManager.getConnections();
319
- for (const [id, record] of connections) {
320
- if (record.incoming?.destroyed && !record.connectionClosed) {
321
- connectionManager.cleanupConnection(record, 'force_cleanup');
322
- }
323
- }
324
- }, 60000); // Every minute
325
- ```
326
-
327
- 2. **Connection Age Limit**:
328
- ```typescript
329
- // Add max connection age check
330
- const maxAge = 3600000; // 1 hour
331
- if (Date.now() - record.incomingStartTime > maxAge) {
332
- connectionManager.cleanupConnection(record, 'max_age');
333
- }
334
- ```
335
-
336
- 3. **Aggressive Timeout Settings**:
337
- ```typescript
338
- {
339
- socketTimeout: 60000, // 1 minute
340
- inactivityTimeout: 300000, // 5 minutes
341
- connectionCleanupInterval: 30000 // 30 seconds
342
- }
343
- ```
344
-
345
- ### Related Files
346
- - `/ts/proxies/smart-proxy/route-connection-handler.ts` - Main connection handling
347
- - `/ts/proxies/smart-proxy/connection-manager.ts` - Connection tracking and cleanup
348
- - `/ts/core/utils/socket-utils.ts` - Socket cleanup utilities
349
- - `/test/test.proxy-chain-cleanup.node.ts` - Test for connection cleanup
350
- - `/test/test.proxy-chaining-accumulation.node.ts` - Test for accumulation prevention
351
- - `/.nogit/debug/connection-accumulation-debug.ts` - Debug script for connection states
352
- - `/.nogit/debug/connection-accumulation-keepalive.ts` - Keep-alive specific tests
353
- - `/.nogit/debug/connection-accumulation-http.ts` - HTTP traffic through proxy chains
354
-
355
- ### Summary
356
-
357
- **Issue Identified**: Connection accumulation occurs on the **inner proxy** (not outer) when backends are unreachable.
358
-
359
- **Root Cause**: The `createSocketWithErrorHandler` function in socket-utils.ts doesn't implement connection establishment timeout. It only sets `socket.setTimeout()` which handles inactivity AFTER connection is established, not during the connect phase.
360
-
361
- **Impact**: When connecting to unreachable IPs (e.g., 10.255.255.1), outgoing sockets remain in "opening" state indefinitely, causing connections to accumulate.
362
-
363
- **Fix Required**:
364
- 1. Add `connectionTimeout` setting to ISmartProxyOptions
365
- 2. Implement proper connection timeout in `createSocketWithErrorHandler`
366
- 3. Pass the timeout value from route-connection-handler
367
-
368
- **Workaround Until Fixed**: Configure shorter socket timeouts and use the periodic force cleanup suggested above.
369
-
370
- The connection cleanup mechanisms have been significantly improved in v19.5.20:
371
- 1. Race condition fixed by setting `record.outgoing` before connecting
372
- 2. Immediate routing cleanup handler always destroys outgoing connections
373
- 3. Tests confirm no accumulation in standard scenarios with reachable backends
374
-
375
- However, the missing connection establishment timeout causes accumulation when backends are unreachable or very slow to connect.
376
-
377
- ### Outer Proxy Sudden Accumulation After Hours
378
-
379
- **User Report**: "The counter goes up suddenly after some hours on the outer proxy"
380
-
381
- **Investigation Findings**:
382
-
383
- 1. **Cleanup Queue Mechanism**:
384
- - Connections are cleaned up in batches of 100 via a queue
385
- - If the cleanup timer gets stuck or cleared without restart, connections accumulate
386
- - The timer is set with `setTimeout` and could be affected by event loop blocking
387
-
388
- 2. **Potential Causes for Sudden Spikes**:
389
-
390
- a) **Cleanup Timer Failure**:
391
- ```typescript
392
- // In ConnectionManager, if this timer gets cleared but not restarted:
393
- this.cleanupTimer = this.setTimeout(() => {
394
- this.processCleanupQueue();
395
- }, 100);
396
- ```
397
-
398
- b) **Memory Pressure**:
399
- - After hours of operation, memory fragmentation or pressure could cause delays
400
- - Garbage collection pauses might interfere with timer execution
401
-
402
- c) **Event Listener Accumulation**:
403
- - Socket event listeners might accumulate over time
404
- - Server 'connection' event handlers are particularly important
405
-
406
- d) **Keep-Alive Connection Cascades**:
407
- - When many keep-alive connections timeout simultaneously
408
- - Outer proxy has different timeout than inner proxy
409
- - Mass disconnection events can overwhelm cleanup queue
410
-
411
- e) **HttpProxy Component Issues**:
412
- - If using `useHttpProxy`, the HttpProxy bridge might maintain connection pools
413
- - These pools might not be properly cleaned after hours
414
-
415
- 3. **Why "Sudden" After Hours**:
416
- - Not a gradual leak but triggered by specific conditions
417
- - Likely related to periodic events or thresholds:
418
- - Inactivity check runs every 30 seconds
419
- - Keep-alive connections have extended timeouts (6x normal)
420
- - Parity check has 30-minute timeout for half-closed connections
421
-
422
- 4. **Reproduction Scenarios**:
423
- - Mass client disconnection/reconnection (network blip)
424
- - Keep-alive timeout cascade when inner proxy times out first
425
- - Cleanup timer getting stuck during high load
426
- - Memory pressure causing event loop delays
427
-
428
- ### Additional Monitoring Recommendations
429
-
430
- 1. **Add Cleanup Queue Monitoring**:
431
- ```typescript
432
- setInterval(() => {
433
- const cm = proxy.connectionManager;
434
- if (cm.cleanupQueue.size > 100 && !cm.cleanupTimer) {
435
- logger.error('Cleanup queue stuck!', {
436
- queueSize: cm.cleanupQueue.size,
437
- hasTimer: !!cm.cleanupTimer
438
- });
439
- }
440
- }, 60000);
441
- ```
442
-
443
- 2. **Track Timer Health**:
444
- - Monitor if cleanup timer is running
445
- - Check for event loop blocking
446
- - Log when batch processing takes too long
447
-
448
- 3. **Memory Monitoring**:
449
- - Track heap usage over time
450
- - Monitor for memory leaks in long-running processes
451
- - Force periodic garbage collection if needed
452
-
453
- ### Immediate Mitigations
454
-
455
- 1. **Restart Cleanup Timer**:
456
- ```typescript
457
- // Emergency cleanup timer restart
458
- if (!cm.cleanupTimer && cm.cleanupQueue.size > 0) {
459
- cm.cleanupTimer = setTimeout(() => {
460
- cm.processCleanupQueue();
461
- }, 100);
462
- }
463
- ```
464
-
465
- 2. **Force Periodic Cleanup**:
466
- ```typescript
467
- setInterval(() => {
468
- const cm = connectionManager;
469
- if (cm.getConnectionCount() > threshold) {
470
- cm.performOptimizedInactivityCheck();
471
- // Force process cleanup queue
472
- cm.processCleanupQueue();
473
- }
474
- }, 300000); // Every 5 minutes
475
- ```
476
-
477
- 3. **Connection Age Limits**:
478
- - Set maximum connection lifetime
479
- - Force close connections older than threshold
480
- - More aggressive cleanup for proxy chains
481
-
482
- ## ✅ FIXED: Zombie Connection Detection (January 2025)
483
-
484
- ### Root Cause Identified
485
- "Zombie connections" occur when sockets are destroyed without triggering their close/error event handlers. This causes connections to remain tracked with both sockets destroyed but `connectionClosed=false`. This is particularly problematic in proxy chains where the inner proxy might close connections in ways that don't trigger proper events on the outer proxy.
486
-
487
- ### Fix Implemented
488
- Added zombie detection to the periodic inactivity check in ConnectionManager:
489
-
490
- ```typescript
491
- // In performOptimizedInactivityCheck()
492
- // Check ALL connections for zombie state
493
- for (const [connectionId, record] of this.connectionRecords) {
494
- if (!record.connectionClosed) {
495
- const incomingDestroyed = record.incoming?.destroyed || false;
496
- const outgoingDestroyed = record.outgoing?.destroyed || false;
497
-
498
- // Check for zombie connections: both sockets destroyed but not cleaned up
499
- if (incomingDestroyed && outgoingDestroyed) {
500
- logger.log('warn', `Zombie connection detected: ${connectionId} - both sockets destroyed but not cleaned up`, {
501
- connectionId,
502
- remoteIP: record.remoteIP,
503
- age: plugins.prettyMs(now - record.incomingStartTime),
504
- component: 'connection-manager'
505
- });
506
-
507
- // Clean up immediately
508
- this.cleanupConnection(record, 'zombie_cleanup');
509
- continue;
510
- }
511
-
512
- // Check for half-zombie: one socket destroyed
513
- if (incomingDestroyed || outgoingDestroyed) {
514
- const age = now - record.incomingStartTime;
515
- // Give it 30 seconds grace period for normal cleanup
516
- if (age > 30000) {
517
- logger.log('warn', `Half-zombie connection detected: ${connectionId} - ${incomingDestroyed ? 'incoming' : 'outgoing'} destroyed`, {
518
- connectionId,
519
- remoteIP: record.remoteIP,
520
- age: plugins.prettyMs(age),
521
- incomingDestroyed,
522
- outgoingDestroyed,
523
- component: 'connection-manager'
524
- });
525
-
526
- // Clean up
527
- this.cleanupConnection(record, 'half_zombie_cleanup');
528
- }
529
- }
530
- }
531
- }
532
- ```
533
-
534
- ### How It Works
535
- 1. **Full Zombie Detection**: Detects when both incoming and outgoing sockets are destroyed but the connection hasn't been cleaned up
536
- 2. **Half-Zombie Detection**: Detects when only one socket is destroyed, with a 30-second grace period for normal cleanup to occur
537
- 3. **Automatic Cleanup**: Immediately cleans up zombie connections when detected
538
- 4. **Runs Periodically**: Integrated into the existing inactivity check that runs every 30 seconds
539
-
540
- ### Why This Fixes the Outer Proxy Accumulation
541
- - When inner proxy closes connections abruptly (e.g., due to backend failure), the outer proxy's outgoing socket might be destroyed without firing close/error events
542
- - These become zombie connections that previously accumulated indefinitely
543
- - Now they are detected and cleaned up within 30 seconds
544
-
545
- ### Test Results
546
- Debug scripts confirmed:
547
- - Zombie connections can be created when sockets are destroyed directly without events
548
- - The zombie detection successfully identifies and cleans up these connections
549
- - Both full zombies (both sockets destroyed) and half-zombies (one socket destroyed) are handled
550
-
551
- This fix addresses the specific issue where "connections that are closed on the inner proxy, always also close on the outer proxy" as requested by the user.
552
-
553
- ## 🔍 Production Diagnostics (January 2025)
554
-
555
- Since the zombie detection fix didn't fully resolve the issue, use the ProductionConnectionMonitor to diagnose the actual problem:
556
-
557
- ### How to Use the Production Monitor
558
-
559
- 1. **Add to your proxy startup script**:
560
- ```typescript
561
- import ProductionConnectionMonitor from './production-connection-monitor.js';
562
-
563
- // After proxy.start()
564
- const monitor = new ProductionConnectionMonitor(proxy);
565
- monitor.start(5000); // Check every 5 seconds
566
-
567
- // Monitor will automatically capture diagnostics when:
568
- // - Connections exceed threshold (default: 50)
569
- // - Sudden spike occurs (default: +20 connections)
570
- ```
571
-
572
- 2. **Diagnostics are saved to**: `.nogit/connection-diagnostics/`
573
-
574
- 3. **Force capture anytime**: `monitor.forceCaptureNow()`
575
-
576
- ### What the Monitor Captures
577
-
578
- For each connection:
579
- - Socket states (destroyed, readable, writable, readyState)
580
- - Connection flags (closed, keepAlive, TLS status)
581
- - Data transfer statistics
582
- - Time since last activity
583
- - Cleanup queue status
584
- - Event listener counts
585
- - Termination reasons
586
-
587
- ### Pattern Analysis
588
-
589
- The monitor automatically identifies:
590
- - **Zombie connections**: Both sockets destroyed but not cleaned up
591
- - **Half-zombies**: One socket destroyed
592
- - **Stuck connecting**: Outgoing socket stuck in connecting state
593
- - **No outgoing**: Missing outgoing socket
594
- - **Keep-alive stuck**: Keep-alive connections with no recent activity
595
- - **Old connections**: Connections older than 1 hour
596
- - **No data transfer**: Connections with no bytes transferred
597
- - **Listener leaks**: Excessive event listeners
598
-
599
- ### Common Accumulation Patterns
600
-
601
- 1. **Connecting State Stuck**
602
- - Outgoing socket shows `connecting: true` indefinitely
603
- - Usually means connection timeout not working
604
- - Check if backend is reachable
605
-
606
- 2. **Missing Outgoing Socket**
607
- - Connection has no outgoing socket but isn't closed
608
- - May indicate immediate routing issues
609
- - Check error logs during connection setup
610
-
611
- 3. **Event Listener Accumulation**
612
- - High listener counts (>20) on sockets
613
- - Indicates cleanup not removing all listeners
614
- - Can cause memory leaks
615
-
616
- 4. **Keep-Alive Zombies**
617
- - Keep-alive connections not timing out
618
- - Check keepAlive timeout settings
619
- - May need more aggressive cleanup
620
-
621
- ### Next Steps
622
-
623
- 1. **Run the monitor in production** during accumulation
624
- 2. **Share the diagnostic files** from `.nogit/connection-diagnostics/`
625
- 3. **Look for patterns** in the captured snapshots
626
- 4. **Check specific connection IDs** that accumulate
627
-
628
- The diagnostic files will show exactly what state connections are in when accumulation occurs, allowing targeted fixes for the specific issue.
629
-
630
- ## ✅ FIXED: Stuck Connection Detection (January 2025)
631
-
632
- ### Additional Root Cause Found
633
- Connections to hanging backends (that accept but never respond) were not being cleaned up because:
634
- - Both sockets remain alive (not destroyed)
635
- - Keep-alive prevents normal timeout
636
- - No data is sent back to the client despite receiving data
637
- - These don't qualify as "zombies" since sockets aren't destroyed
638
-
639
- ### Fix Implemented
640
- Added stuck connection detection to the periodic inactivity check:
641
-
642
- ```typescript
643
- // Check for stuck connections: no data sent back to client
644
- if (!record.connectionClosed && record.outgoing && record.bytesReceived > 0 && record.bytesSent === 0) {
645
- const age = now - record.incomingStartTime;
646
- // If connection is older than 60 seconds and no data sent back, likely stuck
647
- if (age > 60000) {
648
- logger.log('warn', `Stuck connection detected: ${connectionId} - received ${record.bytesReceived} bytes but sent 0 bytes`, {
649
- connectionId,
650
- remoteIP: record.remoteIP,
651
- age: plugins.prettyMs(age),
652
- bytesReceived: record.bytesReceived,
653
- targetHost: record.targetHost,
654
- targetPort: record.targetPort,
655
- component: 'connection-manager'
656
- });
657
-
658
- // Clean up
659
- this.cleanupConnection(record, 'stuck_no_response');
660
- }
661
- }
662
- ```
663
-
664
- ### What This Fixes
665
- - Connections to backends that accept but never respond
666
- - Proxy chains where inner proxy connects to unresponsive services
667
- - Scenarios where keep-alive prevents normal timeout mechanisms
668
- - Connections that receive client data but never send anything back
669
-
670
- ### Detection Criteria
671
- - Connection has received bytes from client (`bytesReceived > 0`)
672
- - No bytes sent back to client (`bytesSent === 0`)
673
- - Connection is older than 60 seconds
674
- - Both sockets are still alive (not destroyed)
675
-
676
- This complements the zombie detection by handling cases where sockets remain technically alive but the connection is effectively dead.
677
-
678
- ## 🚨 CRITICAL FIX: Cleanup Queue Bug (January 2025)
679
-
680
- ### Critical Bug Found
681
- The cleanup queue had a severe bug that caused connection accumulation when more than 100 connections needed cleanup:
682
-
683
- ```typescript
684
- // BUG: This cleared the ENTIRE queue after processing only the first batch!
685
- const toCleanup = Array.from(this.cleanupQueue).slice(0, this.cleanupBatchSize);
686
- this.cleanupQueue.clear(); // ❌ This discarded all connections beyond the first 100!
687
- ```
688
-
689
- ### Fix Implemented
690
- ```typescript
691
- // Now only removes the connections being processed
692
- const toCleanup = Array.from(this.cleanupQueue).slice(0, this.cleanupBatchSize);
693
- for (const connectionId of toCleanup) {
694
- this.cleanupQueue.delete(connectionId); // ✅ Only remove what we process
695
- const record = this.connectionRecords.get(connectionId);
696
- if (record) {
697
- this.cleanupConnection(record, record.incomingTerminationReason || 'normal');
698
- }
699
- }
700
- ```
701
-
702
- ### Impact
703
- - **Before**: If 150 connections needed cleanup, only the first 100 would be processed and the remaining 50 would accumulate forever
704
- - **After**: All connections are properly cleaned up in batches
705
-
706
- ### Additional Improvements
707
-
708
- 1. **Faster Inactivity Checks**: Reduced from 30s to 10s intervals
709
- - Zombies and stuck connections are detected 3x faster
710
- - Reduces the window for accumulation
711
-
712
- 2. **Duplicate Prevention**: Added check in queueCleanup to prevent processing already-closed connections
713
- - Prevents unnecessary work
714
- - Ensures connections are only cleaned up once
715
-
716
- ### Summary of All Fixes
717
-
718
- 1. **Connection Timeout** (already documented) - Prevents accumulation when backends are unreachable
719
- 2. **Zombie Detection** - Cleans up connections with destroyed sockets
720
- 3. **Stuck Connection Detection** - Cleans up connections to hanging backends
721
- 4. **Cleanup Queue Bug** - Ensures ALL connections get cleaned up, not just the first 100
722
- 5. **Faster Detection** - Reduced check interval from 30s to 10s
723
-
724
- These fixes combined should prevent connection accumulation in all known scenarios.