@push.rocks/smartproxy 19.5.24 → 19.5.25

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@push.rocks/smartproxy",
3
- "version": "19.5.24",
3
+ "version": "19.5.25",
4
4
  "private": false,
5
5
  "description": "A powerful proxy package with unified route-based configuration for high traffic management. Features include SSL/TLS support, flexible routing patterns, WebSocket handling, advanced security options, and automatic ACME certificate management.",
6
6
  "main": "dist_ts/index.js",
@@ -548,4 +548,129 @@ Debug scripts confirmed:
548
548
  - The zombie detection successfully identifies and cleans up these connections
549
549
  - Both full zombies (both sockets destroyed) and half-zombies (one socket destroyed) are handled
550
550
 
551
- This fix addresses the specific issue where "connections that are closed on the inner proxy, always also close on the outer proxy" as requested by the user.
551
+ This fix addresses the specific issue where "connections that are closed on the inner proxy, always also close on the outer proxy" as requested by the user.
552
+
553
+ ## 🔍 Production Diagnostics (January 2025)
554
+
555
+ Since the zombie detection fix didn't fully resolve the issue, use the ProductionConnectionMonitor to diagnose the actual problem:
556
+
557
+ ### How to Use the Production Monitor
558
+
559
+ 1. **Add to your proxy startup script**:
560
+ ```typescript
561
+ import ProductionConnectionMonitor from './production-connection-monitor.js';
562
+
563
+ // After proxy.start()
564
+ const monitor = new ProductionConnectionMonitor(proxy);
565
+ monitor.start(5000); // Check every 5 seconds
566
+
567
+ // Monitor will automatically capture diagnostics when:
568
+ // - Connections exceed threshold (default: 50)
569
+ // - Sudden spike occurs (default: +20 connections)
570
+ ```
571
+
572
+ 2. **Diagnostics are saved to**: `.nogit/connection-diagnostics/`
573
+
574
+ 3. **Force capture anytime**: `monitor.forceCaptureNow()`
575
+
576
+ ### What the Monitor Captures
577
+
578
+ For each connection:
579
+ - Socket states (destroyed, readable, writable, readyState)
580
+ - Connection flags (closed, keepAlive, TLS status)
581
+ - Data transfer statistics
582
+ - Time since last activity
583
+ - Cleanup queue status
584
+ - Event listener counts
585
+ - Termination reasons
586
+
587
+ ### Pattern Analysis
588
+
589
+ The monitor automatically identifies:
590
+ - **Zombie connections**: Both sockets destroyed but not cleaned up
591
+ - **Half-zombies**: One socket destroyed
592
+ - **Stuck connecting**: Outgoing socket stuck in connecting state
593
+ - **No outgoing**: Missing outgoing socket
594
+ - **Keep-alive stuck**: Keep-alive connections with no recent activity
595
+ - **Old connections**: Connections older than 1 hour
596
+ - **No data transfer**: Connections with no bytes transferred
597
+ - **Listener leaks**: Excessive event listeners
598
+
599
+ ### Common Accumulation Patterns
600
+
601
+ 1. **Connecting State Stuck**
602
+ - Outgoing socket shows `connecting: true` indefinitely
603
+ - Usually means connection timeout not working
604
+ - Check if backend is reachable
605
+
606
+ 2. **Missing Outgoing Socket**
607
+ - Connection has no outgoing socket but isn't closed
608
+ - May indicate immediate routing issues
609
+ - Check error logs during connection setup
610
+
611
+ 3. **Event Listener Accumulation**
612
+ - High listener counts (>20) on sockets
613
+ - Indicates cleanup not removing all listeners
614
+ - Can cause memory leaks
615
+
616
+ 4. **Keep-Alive Zombies**
617
+ - Keep-alive connections not timing out
618
+ - Check keepAlive timeout settings
619
+ - May need more aggressive cleanup
620
+
621
+ ### Next Steps
622
+
623
+ 1. **Run the monitor in production** during accumulation
624
+ 2. **Share the diagnostic files** from `.nogit/connection-diagnostics/`
625
+ 3. **Look for patterns** in the captured snapshots
626
+ 4. **Check specific connection IDs** that accumulate
627
+
628
+ The diagnostic files will show exactly what state connections are in when accumulation occurs, allowing targeted fixes for the specific issue.
629
+
630
+ ## ✅ FIXED: Stuck Connection Detection (January 2025)
631
+
632
+ ### Additional Root Cause Found
633
+ Connections to hanging backends (that accept but never respond) were not being cleaned up because:
634
+ - Both sockets remain alive (not destroyed)
635
+ - Keep-alive prevents normal timeout
636
+ - No data is sent back to the client despite receiving data
637
+ - These don't qualify as "zombies" since sockets aren't destroyed
638
+
639
+ ### Fix Implemented
640
+ Added stuck connection detection to the periodic inactivity check:
641
+
642
+ ```typescript
643
+ // Check for stuck connections: no data sent back to client
644
+ if (!record.connectionClosed && record.outgoing && record.bytesReceived > 0 && record.bytesSent === 0) {
645
+ const age = now - record.incomingStartTime;
646
+ // If connection is older than 60 seconds and no data sent back, likely stuck
647
+ if (age > 60000) {
648
+ logger.log('warn', `Stuck connection detected: ${connectionId} - received ${record.bytesReceived} bytes but sent 0 bytes`, {
649
+ connectionId,
650
+ remoteIP: record.remoteIP,
651
+ age: plugins.prettyMs(age),
652
+ bytesReceived: record.bytesReceived,
653
+ targetHost: record.targetHost,
654
+ targetPort: record.targetPort,
655
+ component: 'connection-manager'
656
+ });
657
+
658
+ // Clean up
659
+ this.cleanupConnection(record, 'stuck_no_response');
660
+ }
661
+ }
662
+ ```
663
+
664
+ ### What This Fixes
665
+ - Connections to backends that accept but never respond
666
+ - Proxy chains where inner proxy connects to unresponsive services
667
+ - Scenarios where keep-alive prevents normal timeout mechanisms
668
+ - Connections that receive client data but never send anything back
669
+
670
+ ### Detection Criteria
671
+ - Connection has received bytes from client (`bytesReceived > 0`)
672
+ - No bytes sent back to client (`bytesSent === 0`)
673
+ - Connection is older than 60 seconds
674
+ - Both sockets are still alive (not destroyed)
675
+
676
+ This complements the zombie detection by handling cases where sockets remain technically alive but the connection is effectively dead.
@@ -0,0 +1,202 @@
1
+ # Production Connection Monitoring
2
+
3
+ This document explains how to use the ProductionConnectionMonitor to diagnose connection accumulation issues in real-time.
4
+
5
+ ## Quick Start
6
+
7
+ ```typescript
8
+ import ProductionConnectionMonitor from './.nogit/debug/production-connection-monitor.js';
9
+
10
+ // After starting your proxy
11
+ const monitor = new ProductionConnectionMonitor(proxy);
12
+ monitor.start(5000); // Check every 5 seconds
13
+
14
+ // The monitor will automatically capture diagnostics when:
15
+ // - Connections exceed 50 (default threshold)
16
+ // - Sudden spike of 20+ connections occurs
17
+ // - You manually call monitor.forceCaptureNow()
18
+ ```
19
+
20
+ ## What Gets Captured
21
+
22
+ When accumulation is detected, the monitor saves a JSON file with:
23
+
24
+ ### Connection Details
25
+ - Socket states (destroyed, readable, writable, readyState)
26
+ - Connection age and activity timestamps
27
+ - Data transfer statistics (bytes sent/received)
28
+ - Target host and port information
29
+ - Keep-alive status
30
+ - Event listener counts
31
+
32
+ ### System State
33
+ - Memory usage
34
+ - Event loop lag
35
+ - Connection count trends
36
+ - Termination statistics
37
+
38
+ ## Reading Diagnostic Files
39
+
40
+ Files are saved to `.nogit/connection-diagnostics/` with names like:
41
+ ```
42
+ accumulation_2025-06-07T20-20-43-733Z_force_capture.json
43
+ ```
44
+
45
+ ### Key Fields to Check
46
+
47
+ 1. **Socket States**
48
+ ```json
49
+ "incomingState": {
50
+ "destroyed": false,
51
+ "readable": true,
52
+ "writable": true,
53
+ "readyState": "open"
54
+ }
55
+ ```
56
+ - Both destroyed = zombie connection
57
+ - One destroyed = half-zombie
58
+ - Both alive but old = potential stuck connection
59
+
60
+ 2. **Data Transfer**
61
+ ```json
62
+ "bytesReceived": 36,
63
+ "bytesSent": 0,
64
+ "timeSinceLastActivity": 60000
65
+ ```
66
+ - No bytes sent back = stuck connection
67
+ - High bytes but old = slow backend
68
+ - No activity = idle connection
69
+
70
+ 3. **Connection Flags**
71
+ ```json
72
+ "hasReceivedInitialData": false,
73
+ "hasKeepAlive": true,
74
+ "connectionClosed": false
75
+ ```
76
+ - hasReceivedInitialData=false on non-TLS = immediate routing
77
+ - hasKeepAlive=true = extended timeout applies
78
+ - connectionClosed=false = still tracked
79
+
80
+ ## Common Patterns
81
+
82
+ ### 1. Hanging Backend Pattern
83
+ ```json
84
+ {
85
+ "bytesReceived": 36,
86
+ "bytesSent": 0,
87
+ "age": 120000,
88
+ "targetHost": "backend.example.com",
89
+ "incomingState": { "destroyed": false },
90
+ "outgoingState": { "destroyed": false }
91
+ }
92
+ ```
93
+ **Fix**: The stuck connection detection (60s timeout) should clean these up.
94
+
95
+ ### 2. Zombie Connection Pattern
96
+ ```json
97
+ {
98
+ "incomingState": { "destroyed": true },
99
+ "outgoingState": { "destroyed": true },
100
+ "connectionClosed": false
101
+ }
102
+ ```
103
+ **Fix**: The zombie detection should clean these up within 30s.
104
+
105
+ ### 3. Event Listener Leak Pattern
106
+ ```json
107
+ {
108
+ "incomingListeners": {
109
+ "data": 15,
110
+ "error": 20,
111
+ "close": 18
112
+ }
113
+ }
114
+ ```
115
+ **Issue**: Event listeners accumulating, potential memory leak.
116
+
117
+ ### 4. No Outgoing Socket Pattern
118
+ ```json
119
+ {
120
+ "outgoingState": { "exists": false },
121
+ "connectionClosed": false,
122
+ "age": 5000
123
+ }
124
+ ```
125
+ **Issue**: Connection setup failed but cleanup didn't trigger.
126
+
127
+ ## Forcing Diagnostic Capture
128
+
129
+ To capture current state immediately:
130
+ ```typescript
131
+ monitor.forceCaptureNow();
132
+ ```
133
+
134
+ This is useful when you notice accumulation starting.
135
+
136
+ ## Automated Analysis
137
+
138
+ The monitor automatically analyzes patterns and logs:
139
+ - Zombie/half-zombie counts
140
+ - Stuck connection counts
141
+ - Old connection counts
142
+ - Memory usage
143
+ - Recommendations
144
+
145
+ ## Integration Example
146
+
147
+ ```typescript
148
+ // In your proxy startup script
149
+ import { SmartProxy } from '@push.rocks/smartproxy';
150
+ import ProductionConnectionMonitor from './production-connection-monitor.js';
151
+
152
+ async function startProxyWithMonitoring() {
153
+ const proxy = new SmartProxy({
154
+ // your config
155
+ });
156
+
157
+ await proxy.start();
158
+
159
+ // Start monitoring
160
+ const monitor = new ProductionConnectionMonitor(proxy);
161
+ monitor.start(5000);
162
+
163
+ // Optional: Capture on specific events
164
+ process.on('SIGUSR1', () => {
165
+ console.log('Manual diagnostic capture triggered');
166
+ monitor.forceCaptureNow();
167
+ });
168
+
169
+ // Graceful shutdown
170
+ process.on('SIGTERM', async () => {
171
+ monitor.stop();
172
+ await proxy.stop();
173
+ process.exit(0);
174
+ });
175
+ }
176
+ ```
177
+
178
+ ## Troubleshooting
179
+
180
+ ### Monitor Not Detecting Accumulation
181
+ - Check threshold settings (default: 50 connections)
182
+ - Reduce check interval for faster detection
183
+ - Use forceCaptureNow() to capture current state
184
+
185
+ ### Too Many False Positives
186
+ - Increase accumulation threshold
187
+ - Increase spike threshold
188
+ - Adjust check interval
189
+
190
+ ### Missing Diagnostic Data
191
+ - Ensure output directory exists and is writable
192
+ - Check disk space
193
+ - Verify process has write permissions
194
+
195
+ ## Next Steps
196
+
197
+ 1. Deploy the monitor to production
198
+ 2. Wait for accumulation to occur
199
+ 3. Share diagnostic files for analysis
200
+ 4. Apply targeted fixes based on patterns found
201
+
202
+ The diagnostic data will reveal the exact state of connections when accumulation occurs, enabling precise fixes for your specific scenario.
@@ -495,6 +495,32 @@ export class ConnectionManager extends LifecycleComponent {
495
495
  this.cleanupConnection(record, 'half_zombie_cleanup');
496
496
  }
497
497
  }
498
+
499
+ // Check for stuck connections: no data sent back to client
500
+ if (!record.connectionClosed && record.outgoing && record.bytesReceived > 0 && record.bytesSent === 0) {
501
+ const age = now - record.incomingStartTime;
502
+ // If connection is older than 60 seconds and no data sent back, likely stuck
503
+ if (age > 60000) {
504
+ logger.log('warn', `Stuck connection detected: ${connectionId} - received ${record.bytesReceived} bytes but sent 0 bytes`, {
505
+ connectionId,
506
+ remoteIP: record.remoteIP,
507
+ age: plugins.prettyMs(age),
508
+ bytesReceived: record.bytesReceived,
509
+ targetHost: record.targetHost,
510
+ targetPort: record.targetPort,
511
+ component: 'connection-manager'
512
+ });
513
+
514
+ // Set termination reason and increment stats
515
+ if (record.incomingTerminationReason == null) {
516
+ record.incomingTerminationReason = 'stuck_no_response';
517
+ this.incrementTerminationStat('incoming', 'stuck_no_response');
518
+ }
519
+
520
+ // Clean up
521
+ this.cleanupConnection(record, 'stuck_no_response');
522
+ }
523
+ }
498
524
  }
499
525
  }
500
526