@push.rocks/smartproxy 19.5.23 → 19.5.25

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@push.rocks/smartproxy",
3
- "version": "19.5.23",
3
+ "version": "19.5.25",
4
4
  "private": false,
5
5
  "description": "A powerful proxy package with unified route-based configuration for high traffic management. Features include SSL/TLS support, flexible routing patterns, WebSocket handling, advanced security options, and automatic ACME certificate management.",
6
6
  "main": "dist_ts/index.js",
@@ -372,4 +372,305 @@ The connection cleanup mechanisms have been significantly improved in v19.5.20:
372
372
  2. Immediate routing cleanup handler always destroys outgoing connections
373
373
  3. Tests confirm no accumulation in standard scenarios with reachable backends
374
374
 
375
- However, the missing connection establishment timeout causes accumulation when backends are unreachable or very slow to connect.
375
+ However, the missing connection establishment timeout causes accumulation when backends are unreachable or very slow to connect.
376
+
377
+ ### Outer Proxy Sudden Accumulation After Hours
378
+
379
+ **User Report**: "The counter goes up suddenly after some hours on the outer proxy"
380
+
381
+ **Investigation Findings**:
382
+
383
+ 1. **Cleanup Queue Mechanism**:
384
+ - Connections are cleaned up in batches of 100 via a queue
385
+ - If the cleanup timer gets stuck or cleared without restart, connections accumulate
386
+ - The timer is set with `setTimeout` and could be affected by event loop blocking
387
+
388
+ 2. **Potential Causes for Sudden Spikes**:
389
+
390
+ a) **Cleanup Timer Failure**:
391
+ ```typescript
392
+ // In ConnectionManager, if this timer gets cleared but not restarted:
393
+ this.cleanupTimer = this.setTimeout(() => {
394
+ this.processCleanupQueue();
395
+ }, 100);
396
+ ```
397
+
398
+ b) **Memory Pressure**:
399
+ - After hours of operation, memory fragmentation or pressure could cause delays
400
+ - Garbage collection pauses might interfere with timer execution
401
+
402
+ c) **Event Listener Accumulation**:
403
+ - Socket event listeners might accumulate over time
404
+ - Server 'connection' event handlers are particularly important
405
+
406
+ d) **Keep-Alive Connection Cascades**:
407
+ - When many keep-alive connections timeout simultaneously
408
+ - Outer proxy has different timeout than inner proxy
409
+ - Mass disconnection events can overwhelm cleanup queue
410
+
411
+ e) **HttpProxy Component Issues**:
412
+ - If using `useHttpProxy`, the HttpProxy bridge might maintain connection pools
413
+ - These pools might not be properly cleaned after hours
414
+
415
+ 3. **Why "Sudden" After Hours**:
416
+ - Not a gradual leak but triggered by specific conditions
417
+ - Likely related to periodic events or thresholds:
418
+ - Inactivity check runs every 30 seconds
419
+ - Keep-alive connections have extended timeouts (6x normal)
420
+ - Parity check has 30-minute timeout for half-closed connections
421
+
422
+ 4. **Reproduction Scenarios**:
423
+ - Mass client disconnection/reconnection (network blip)
424
+ - Keep-alive timeout cascade when inner proxy times out first
425
+ - Cleanup timer getting stuck during high load
426
+ - Memory pressure causing event loop delays
427
+
428
+ ### Additional Monitoring Recommendations
429
+
430
+ 1. **Add Cleanup Queue Monitoring**:
431
+ ```typescript
432
+ setInterval(() => {
433
+ const cm = proxy.connectionManager;
434
+ if (cm.cleanupQueue.size > 100 && !cm.cleanupTimer) {
435
+ logger.error('Cleanup queue stuck!', {
436
+ queueSize: cm.cleanupQueue.size,
437
+ hasTimer: !!cm.cleanupTimer
438
+ });
439
+ }
440
+ }, 60000);
441
+ ```
442
+
443
+ 2. **Track Timer Health**:
444
+ - Monitor if cleanup timer is running
445
+ - Check for event loop blocking
446
+ - Log when batch processing takes too long
447
+
448
+ 3. **Memory Monitoring**:
449
+ - Track heap usage over time
450
+ - Monitor for memory leaks in long-running processes
451
+ - Force periodic garbage collection if needed
452
+
453
+ ### Immediate Mitigations
454
+
455
+ 1. **Restart Cleanup Timer**:
456
+ ```typescript
457
+ // Emergency cleanup timer restart
458
+ if (!cm.cleanupTimer && cm.cleanupQueue.size > 0) {
459
+ cm.cleanupTimer = setTimeout(() => {
460
+ cm.processCleanupQueue();
461
+ }, 100);
462
+ }
463
+ ```
464
+
465
+ 2. **Force Periodic Cleanup**:
466
+ ```typescript
467
+ setInterval(() => {
468
+ const cm = connectionManager;
469
+ if (cm.getConnectionCount() > threshold) {
470
+ cm.performOptimizedInactivityCheck();
471
+ // Force process cleanup queue
472
+ cm.processCleanupQueue();
473
+ }
474
+ }, 300000); // Every 5 minutes
475
+ ```
476
+
477
+ 3. **Connection Age Limits**:
478
+ - Set maximum connection lifetime
479
+ - Force close connections older than threshold
480
+ - More aggressive cleanup for proxy chains
481
+
482
+ ## ✅ FIXED: Zombie Connection Detection (January 2025)
483
+
484
+ ### Root Cause Identified
485
+ "Zombie connections" occur when sockets are destroyed without triggering their close/error event handlers. This causes connections to remain tracked with both sockets destroyed but `connectionClosed=false`. This is particularly problematic in proxy chains where the inner proxy might close connections in ways that don't trigger proper events on the outer proxy.
486
+
487
+ ### Fix Implemented
488
+ Added zombie detection to the periodic inactivity check in ConnectionManager:
489
+
490
+ ```typescript
491
+ // In performOptimizedInactivityCheck()
492
+ // Check ALL connections for zombie state
493
+ for (const [connectionId, record] of this.connectionRecords) {
494
+ if (!record.connectionClosed) {
495
+ const incomingDestroyed = record.incoming?.destroyed || false;
496
+ const outgoingDestroyed = record.outgoing?.destroyed || false;
497
+
498
+ // Check for zombie connections: both sockets destroyed but not cleaned up
499
+ if (incomingDestroyed && outgoingDestroyed) {
500
+ logger.log('warn', `Zombie connection detected: ${connectionId} - both sockets destroyed but not cleaned up`, {
501
+ connectionId,
502
+ remoteIP: record.remoteIP,
503
+ age: plugins.prettyMs(now - record.incomingStartTime),
504
+ component: 'connection-manager'
505
+ });
506
+
507
+ // Clean up immediately
508
+ this.cleanupConnection(record, 'zombie_cleanup');
509
+ continue;
510
+ }
511
+
512
+ // Check for half-zombie: one socket destroyed
513
+ if (incomingDestroyed || outgoingDestroyed) {
514
+ const age = now - record.incomingStartTime;
515
+ // Give it 30 seconds grace period for normal cleanup
516
+ if (age > 30000) {
517
+ logger.log('warn', `Half-zombie connection detected: ${connectionId} - ${incomingDestroyed ? 'incoming' : 'outgoing'} destroyed`, {
518
+ connectionId,
519
+ remoteIP: record.remoteIP,
520
+ age: plugins.prettyMs(age),
521
+ incomingDestroyed,
522
+ outgoingDestroyed,
523
+ component: 'connection-manager'
524
+ });
525
+
526
+ // Clean up
527
+ this.cleanupConnection(record, 'half_zombie_cleanup');
528
+ }
529
+ }
530
+ }
531
+ }
532
+ ```
533
+
534
+ ### How It Works
535
+ 1. **Full Zombie Detection**: Detects when both incoming and outgoing sockets are destroyed but the connection hasn't been cleaned up
536
+ 2. **Half-Zombie Detection**: Detects when only one socket is destroyed, with a 30-second grace period for normal cleanup to occur
537
+ 3. **Automatic Cleanup**: Immediately cleans up zombie connections when detected
538
+ 4. **Runs Periodically**: Integrated into the existing inactivity check that runs every 30 seconds
539
+
540
+ ### Why This Fixes the Outer Proxy Accumulation
541
+ - When inner proxy closes connections abruptly (e.g., due to backend failure), the outer proxy's outgoing socket might be destroyed without firing close/error events
542
+ - These become zombie connections that previously accumulated indefinitely
543
+ - Now they are detected and cleaned up within 30 seconds
544
+
545
+ ### Test Results
546
+ Debug scripts confirmed:
547
+ - Zombie connections can be created when sockets are destroyed directly without events
548
+ - The zombie detection successfully identifies and cleans up these connections
549
+ - Both full zombies (both sockets destroyed) and half-zombies (one socket destroyed) are handled
550
+
551
+ This fix addresses the specific issue where "connections that are closed on the inner proxy, always also close on the outer proxy" as requested by the user.
552
+
553
+ ## 🔍 Production Diagnostics (January 2025)
554
+
555
+ Since the zombie detection fix didn't fully resolve the issue, use the ProductionConnectionMonitor to diagnose the actual problem:
556
+
557
+ ### How to Use the Production Monitor
558
+
559
+ 1. **Add to your proxy startup script**:
560
+ ```typescript
561
+ import ProductionConnectionMonitor from './production-connection-monitor.js';
562
+
563
+ // After proxy.start()
564
+ const monitor = new ProductionConnectionMonitor(proxy);
565
+ monitor.start(5000); // Check every 5 seconds
566
+
567
+ // Monitor will automatically capture diagnostics when:
568
+ // - Connections exceed threshold (default: 50)
569
+ // - Sudden spike occurs (default: +20 connections)
570
+ ```
571
+
572
+ 2. **Diagnostics are saved to**: `.nogit/connection-diagnostics/`
573
+
574
+ 3. **Force capture anytime**: `monitor.forceCaptureNow()`
575
+
576
+ ### What the Monitor Captures
577
+
578
+ For each connection:
579
+ - Socket states (destroyed, readable, writable, readyState)
580
+ - Connection flags (closed, keepAlive, TLS status)
581
+ - Data transfer statistics
582
+ - Time since last activity
583
+ - Cleanup queue status
584
+ - Event listener counts
585
+ - Termination reasons
586
+
587
+ ### Pattern Analysis
588
+
589
+ The monitor automatically identifies:
590
+ - **Zombie connections**: Both sockets destroyed but not cleaned up
591
+ - **Half-zombies**: One socket destroyed
592
+ - **Stuck connecting**: Outgoing socket stuck in connecting state
593
+ - **No outgoing**: Missing outgoing socket
594
+ - **Keep-alive stuck**: Keep-alive connections with no recent activity
595
+ - **Old connections**: Connections older than 1 hour
596
+ - **No data transfer**: Connections with no bytes transferred
597
+ - **Listener leaks**: Excessive event listeners
598
+
599
+ ### Common Accumulation Patterns
600
+
601
+ 1. **Connecting State Stuck**
602
+ - Outgoing socket shows `connecting: true` indefinitely
603
+ - Usually means connection timeout not working
604
+ - Check if backend is reachable
605
+
606
+ 2. **Missing Outgoing Socket**
607
+ - Connection has no outgoing socket but isn't closed
608
+ - May indicate immediate routing issues
609
+ - Check error logs during connection setup
610
+
611
+ 3. **Event Listener Accumulation**
612
+ - High listener counts (>20) on sockets
613
+ - Indicates cleanup not removing all listeners
614
+ - Can cause memory leaks
615
+
616
+ 4. **Keep-Alive Zombies**
617
+ - Keep-alive connections not timing out
618
+ - Check keepAlive timeout settings
619
+ - May need more aggressive cleanup
620
+
621
+ ### Next Steps
622
+
623
+ 1. **Run the monitor in production** during accumulation
624
+ 2. **Share the diagnostic files** from `.nogit/connection-diagnostics/`
625
+ 3. **Look for patterns** in the captured snapshots
626
+ 4. **Check specific connection IDs** that accumulate
627
+
628
+ The diagnostic files will show exactly what state connections are in when accumulation occurs, allowing targeted fixes for the specific issue.
629
+
630
+ ## ✅ FIXED: Stuck Connection Detection (January 2025)
631
+
632
+ ### Additional Root Cause Found
633
+ Connections to hanging backends (that accept but never respond) were not being cleaned up because:
634
+ - Both sockets remain alive (not destroyed)
635
+ - Keep-alive prevents normal timeout
636
+ - No data is sent back to the client despite receiving data
637
+ - These don't qualify as "zombies" since sockets aren't destroyed
638
+
639
+ ### Fix Implemented
640
+ Added stuck connection detection to the periodic inactivity check:
641
+
642
+ ```typescript
643
+ // Check for stuck connections: no data sent back to client
644
+ if (!record.connectionClosed && record.outgoing && record.bytesReceived > 0 && record.bytesSent === 0) {
645
+ const age = now - record.incomingStartTime;
646
+ // If connection is older than 60 seconds and no data sent back, likely stuck
647
+ if (age > 60000) {
648
+ logger.log('warn', `Stuck connection detected: ${connectionId} - received ${record.bytesReceived} bytes but sent 0 bytes`, {
649
+ connectionId,
650
+ remoteIP: record.remoteIP,
651
+ age: plugins.prettyMs(age),
652
+ bytesReceived: record.bytesReceived,
653
+ targetHost: record.targetHost,
654
+ targetPort: record.targetPort,
655
+ component: 'connection-manager'
656
+ });
657
+
658
+ // Clean up
659
+ this.cleanupConnection(record, 'stuck_no_response');
660
+ }
661
+ }
662
+ ```
663
+
664
+ ### What This Fixes
665
+ - Connections to backends that accept but never respond
666
+ - Proxy chains where inner proxy connects to unresponsive services
667
+ - Scenarios where keep-alive prevents normal timeout mechanisms
668
+ - Connections that receive client data but never send anything back
669
+
670
+ ### Detection Criteria
671
+ - Connection has received bytes from client (`bytesReceived > 0`)
672
+ - No bytes sent back to client (`bytesSent === 0`)
673
+ - Connection is older than 60 seconds
674
+ - Both sockets are still alive (not destroyed)
675
+
676
+ This complements the zombie detection by handling cases where sockets remain technically alive but the connection is effectively dead.
package/readme.hints.md CHANGED
@@ -856,4 +856,42 @@ The WrappedSocket class has been implemented as the foundation for PROXY protoco
856
856
  For detailed information about proxy protocol implementation and proxy chaining:
857
857
  - **[Proxy Protocol Guide](./readme.proxy-protocol.md)** - Complete implementation details and configuration
858
858
  - **[Proxy Protocol Examples](./readme.proxy-protocol-example.md)** - Code examples and conceptual implementation
859
- - **[Proxy Chain Summary](./readme.proxy-chain-summary.md)** - Quick reference for proxy chaining setup
859
+ - **[Proxy Chain Summary](./readme.proxy-chain-summary.md)** - Quick reference for proxy chaining setup
860
+
861
+ ## Connection Cleanup Edge Cases Investigation (v19.5.20+)
862
+
863
+ ### Issue Discovered
864
+ "Zombie connections" can occur when both sockets are destroyed but the connection record hasn't been cleaned up. This happens when sockets are destroyed without triggering their close/error event handlers.
865
+
866
+ ### Root Cause
867
+ 1. **Event Handler Bypass**: In edge cases (network failures, proxy chain failures, forced socket destruction), sockets can be destroyed without their event handlers being called
868
+ 2. **Cleanup Queue Delay**: The `initiateCleanupOnce` method adds connections to a cleanup queue (batch of 100 every 100ms), which may not process fast enough
869
+ 3. **Inactivity Check Limitation**: The periodic inactivity check only examines `lastActivity` timestamps, not actual socket states
870
+
871
+ ### Test Results
872
+ Debug script (`connection-manager-direct-test.ts`) revealed:
873
+ - **Normal cleanup works**: When socket events fire normally, cleanup is reliable
874
+ - **Zombies ARE created**: Direct socket destruction creates zombies (destroyed sockets, connectionClosed=false)
875
+ - **Manual cleanup works**: Calling `initiateCleanupOnce` on a zombie does clean it up
876
+ - **Inactivity check misses zombies**: The check doesn't detect connections with destroyed sockets
877
+
878
+ ### Potential Solutions
879
+ 1. **Periodic Zombie Detection**: Add zombie detection to the inactivity check:
880
+ ```typescript
881
+ // In performOptimizedInactivityCheck
882
+ if (record.incoming?.destroyed && record.outgoing?.destroyed && !record.connectionClosed) {
883
+ this.cleanupConnection(record, 'zombie_detected');
884
+ }
885
+ ```
886
+
887
+ 2. **Socket State Monitoring**: Check socket states during connection operations
888
+ 3. **Defensive Socket Handling**: Always attach cleanup handlers before any operation that might destroy sockets
889
+ 4. **Immediate Cleanup Option**: For critical paths, use `cleanupConnection` instead of `initiateCleanupOnce`
890
+
891
+ ### Impact
892
+ - Memory leaks in edge cases (network failures, proxy chain issues)
893
+ - Connection count inaccuracy
894
+ - Potential resource exhaustion over time
895
+
896
+ ### Test Files
897
+ - `.nogit/debug/connection-manager-direct-test.ts` - Direct ConnectionManager testing showing zombie creation
@@ -0,0 +1,202 @@
1
+ # Production Connection Monitoring
2
+
3
+ This document explains how to use the ProductionConnectionMonitor to diagnose connection accumulation issues in real-time.
4
+
5
+ ## Quick Start
6
+
7
+ ```typescript
8
+ import ProductionConnectionMonitor from './.nogit/debug/production-connection-monitor.js';
9
+
10
+ // After starting your proxy
11
+ const monitor = new ProductionConnectionMonitor(proxy);
12
+ monitor.start(5000); // Check every 5 seconds
13
+
14
+ // The monitor will automatically capture diagnostics when:
15
+ // - Connections exceed 50 (default threshold)
16
+ // - Sudden spike of 20+ connections occurs
17
+ // - You manually call monitor.forceCaptureNow()
18
+ ```
19
+
20
+ ## What Gets Captured
21
+
22
+ When accumulation is detected, the monitor saves a JSON file with:
23
+
24
+ ### Connection Details
25
+ - Socket states (destroyed, readable, writable, readyState)
26
+ - Connection age and activity timestamps
27
+ - Data transfer statistics (bytes sent/received)
28
+ - Target host and port information
29
+ - Keep-alive status
30
+ - Event listener counts
31
+
32
+ ### System State
33
+ - Memory usage
34
+ - Event loop lag
35
+ - Connection count trends
36
+ - Termination statistics
37
+
38
+ ## Reading Diagnostic Files
39
+
40
+ Files are saved to `.nogit/connection-diagnostics/` with names like:
41
+ ```
42
+ accumulation_2025-06-07T20-20-43-733Z_force_capture.json
43
+ ```
44
+
45
+ ### Key Fields to Check
46
+
47
+ 1. **Socket States**
48
+ ```json
49
+ "incomingState": {
50
+ "destroyed": false,
51
+ "readable": true,
52
+ "writable": true,
53
+ "readyState": "open"
54
+ }
55
+ ```
56
+ - Both destroyed = zombie connection
57
+ - One destroyed = half-zombie
58
+ - Both alive but old = potential stuck connection
59
+
60
+ 2. **Data Transfer**
61
+ ```json
62
+ "bytesReceived": 36,
63
+ "bytesSent": 0,
64
+ "timeSinceLastActivity": 60000
65
+ ```
66
+ - No bytes sent back = stuck connection
67
+ - High bytes but old = slow backend
68
+ - No activity = idle connection
69
+
70
+ 3. **Connection Flags**
71
+ ```json
72
+ "hasReceivedInitialData": false,
73
+ "hasKeepAlive": true,
74
+ "connectionClosed": false
75
+ ```
76
+ - hasReceivedInitialData=false on non-TLS = immediate routing
77
+ - hasKeepAlive=true = extended timeout applies
78
+ - connectionClosed=false = still tracked
79
+
80
+ ## Common Patterns
81
+
82
+ ### 1. Hanging Backend Pattern
83
+ ```json
84
+ {
85
+ "bytesReceived": 36,
86
+ "bytesSent": 0,
87
+ "age": 120000,
88
+ "targetHost": "backend.example.com",
89
+ "incomingState": { "destroyed": false },
90
+ "outgoingState": { "destroyed": false }
91
+ }
92
+ ```
93
+ **Fix**: The stuck connection detection (60s timeout) should clean these up.
94
+
95
+ ### 2. Zombie Connection Pattern
96
+ ```json
97
+ {
98
+ "incomingState": { "destroyed": true },
99
+ "outgoingState": { "destroyed": true },
100
+ "connectionClosed": false
101
+ }
102
+ ```
103
+ **Fix**: The zombie detection should clean these up within 30s.
104
+
105
+ ### 3. Event Listener Leak Pattern
106
+ ```json
107
+ {
108
+ "incomingListeners": {
109
+ "data": 15,
110
+ "error": 20,
111
+ "close": 18
112
+ }
113
+ }
114
+ ```
115
+ **Issue**: Event listeners accumulating, potential memory leak.
116
+
117
+ ### 4. No Outgoing Socket Pattern
118
+ ```json
119
+ {
120
+ "outgoingState": { "exists": false },
121
+ "connectionClosed": false,
122
+ "age": 5000
123
+ }
124
+ ```
125
+ **Issue**: Connection setup failed but cleanup didn't trigger.
126
+
127
+ ## Forcing Diagnostic Capture
128
+
129
+ To capture current state immediately:
130
+ ```typescript
131
+ monitor.forceCaptureNow();
132
+ ```
133
+
134
+ This is useful when you notice accumulation starting.
135
+
136
+ ## Automated Analysis
137
+
138
+ The monitor automatically analyzes patterns and logs:
139
+ - Zombie/half-zombie counts
140
+ - Stuck connection counts
141
+ - Old connection counts
142
+ - Memory usage
143
+ - Recommendations
144
+
145
+ ## Integration Example
146
+
147
+ ```typescript
148
+ // In your proxy startup script
149
+ import { SmartProxy } from '@push.rocks/smartproxy';
150
+ import ProductionConnectionMonitor from './production-connection-monitor.js';
151
+
152
+ async function startProxyWithMonitoring() {
153
+ const proxy = new SmartProxy({
154
+ // your config
155
+ });
156
+
157
+ await proxy.start();
158
+
159
+ // Start monitoring
160
+ const monitor = new ProductionConnectionMonitor(proxy);
161
+ monitor.start(5000);
162
+
163
+ // Optional: Capture on specific events
164
+ process.on('SIGUSR1', () => {
165
+ console.log('Manual diagnostic capture triggered');
166
+ monitor.forceCaptureNow();
167
+ });
168
+
169
+ // Graceful shutdown
170
+ process.on('SIGTERM', async () => {
171
+ monitor.stop();
172
+ await proxy.stop();
173
+ process.exit(0);
174
+ });
175
+ }
176
+ ```
177
+
178
+ ## Troubleshooting
179
+
180
+ ### Monitor Not Detecting Accumulation
181
+ - Check threshold settings (default: 50 connections)
182
+ - Reduce check interval for faster detection
183
+ - Use forceCaptureNow() to capture current state
184
+
185
+ ### Too Many False Positives
186
+ - Increase accumulation threshold
187
+ - Increase spike threshold
188
+ - Adjust check interval
189
+
190
+ ### Missing Diagnostic Data
191
+ - Ensure output directory exists and is writable
192
+ - Check disk space
193
+ - Verify process has write permissions
194
+
195
+ ## Next Steps
196
+
197
+ 1. Deploy the monitor to production
198
+ 2. Wait for accumulation to occur
199
+ 3. Share diagnostic files for analysis
200
+ 4. Apply targeted fixes based on patterns found
201
+
202
+ The diagnostic data will reveal the exact state of connections when accumulation occurs, enabling precise fixes for your specific scenario.
@@ -456,6 +456,74 @@ export class ConnectionManager extends LifecycleComponent {
456
456
  }
457
457
  }
458
458
 
459
+ // Also check ALL connections for zombie state (destroyed sockets but not cleaned up)
460
+ // This is critical for proxy chains where sockets can be destroyed without events
461
+ for (const [connectionId, record] of this.connectionRecords) {
462
+ if (!record.connectionClosed) {
463
+ const incomingDestroyed = record.incoming?.destroyed || false;
464
+ const outgoingDestroyed = record.outgoing?.destroyed || false;
465
+
466
+ // Check for zombie connections: both sockets destroyed but connection not cleaned up
467
+ if (incomingDestroyed && outgoingDestroyed) {
468
+ logger.log('warn', `Zombie connection detected: ${connectionId} - both sockets destroyed but not cleaned up`, {
469
+ connectionId,
470
+ remoteIP: record.remoteIP,
471
+ age: plugins.prettyMs(now - record.incomingStartTime),
472
+ component: 'connection-manager'
473
+ });
474
+
475
+ // Clean up immediately
476
+ this.cleanupConnection(record, 'zombie_cleanup');
477
+ continue;
478
+ }
479
+
480
+ // Check for half-zombie: one socket destroyed
481
+ if (incomingDestroyed || outgoingDestroyed) {
482
+ const age = now - record.incomingStartTime;
483
+ // Give it 30 seconds grace period for normal cleanup
484
+ if (age > 30000) {
485
+ logger.log('warn', `Half-zombie connection detected: ${connectionId} - ${incomingDestroyed ? 'incoming' : 'outgoing'} destroyed`, {
486
+ connectionId,
487
+ remoteIP: record.remoteIP,
488
+ age: plugins.prettyMs(age),
489
+ incomingDestroyed,
490
+ outgoingDestroyed,
491
+ component: 'connection-manager'
492
+ });
493
+
494
+ // Clean up
495
+ this.cleanupConnection(record, 'half_zombie_cleanup');
496
+ }
497
+ }
498
+
499
+ // Check for stuck connections: no data sent back to client
500
+ if (!record.connectionClosed && record.outgoing && record.bytesReceived > 0 && record.bytesSent === 0) {
501
+ const age = now - record.incomingStartTime;
502
+ // If connection is older than 60 seconds and no data sent back, likely stuck
503
+ if (age > 60000) {
504
+ logger.log('warn', `Stuck connection detected: ${connectionId} - received ${record.bytesReceived} bytes but sent 0 bytes`, {
505
+ connectionId,
506
+ remoteIP: record.remoteIP,
507
+ age: plugins.prettyMs(age),
508
+ bytesReceived: record.bytesReceived,
509
+ targetHost: record.targetHost,
510
+ targetPort: record.targetPort,
511
+ component: 'connection-manager'
512
+ });
513
+
514
+ // Set termination reason and increment stats
515
+ if (record.incomingTerminationReason == null) {
516
+ record.incomingTerminationReason = 'stuck_no_response';
517
+ this.incrementTerminationStat('incoming', 'stuck_no_response');
518
+ }
519
+
520
+ // Clean up
521
+ this.cleanupConnection(record, 'stuck_no_response');
522
+ }
523
+ }
524
+ }
525
+ }
526
+
459
527
  // Process only connections that need checking
460
528
  for (const connectionId of connectionsToCheck) {
461
529
  const record = this.connectionRecords.get(connectionId);