npm - @push.rocks/smartproxy - Versions diffs - 19.5.24 → 19.5.25 - Mend

@push.rocks/smartproxy 19.5.24 → 19.5.25

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (4) hide show

package/package.json +1 -1
package/readme.connections.md +126 -1
package/readme.monitoring.md +202 -0
package/ts/proxies/smart-proxy/connection-manager.ts +26 -0

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@push.rocks/smartproxy",
-  "version": "19.5.24",
+  "version": "19.5.25",
   "private": false,
   "description": "A powerful proxy package with unified route-based configuration for high traffic management. Features include SSL/TLS support, flexible routing patterns, WebSocket handling, advanced security options, and automatic ACME certificate management.",
   "main": "dist_ts/index.js",

package/readme.connections.md CHANGED Viewed

@@ -548,4 +548,129 @@ Debug scripts confirmed:
 - The zombie detection successfully identifies and cleans up these connections
 - Both full zombies (both sockets destroyed) and half-zombies (one socket destroyed) are handled
-This fix addresses the specific issue where "connections that are closed on the inner proxy, always also close on the outer proxy" as requested by the user.
+This fix addresses the specific issue where "connections that are closed on the inner proxy, always also close on the outer proxy" as requested by the user.
+## 🔍 Production Diagnostics (January 2025)
+Since the zombie detection fix didn't fully resolve the issue, use the ProductionConnectionMonitor to diagnose the actual problem:
+### How to Use the Production Monitor
+1. **Add to your proxy startup script**:
+```typescript
+import ProductionConnectionMonitor from './production-connection-monitor.js';
+// After proxy.start()
+const monitor = new ProductionConnectionMonitor(proxy);
+monitor.start(5000); // Check every 5 seconds
+// Monitor will automatically capture diagnostics when:
+// - Connections exceed threshold (default: 50)
+// - Sudden spike occurs (default: +20 connections)
+```
+2. **Diagnostics are saved to**: `.nogit/connection-diagnostics/`
+3. **Force capture anytime**: `monitor.forceCaptureNow()`
+### What the Monitor Captures
+For each connection:
+- Socket states (destroyed, readable, writable, readyState)
+- Connection flags (closed, keepAlive, TLS status)
+- Data transfer statistics
+- Time since last activity
+- Cleanup queue status
+- Event listener counts
+- Termination reasons
+### Pattern Analysis
+The monitor automatically identifies:
+- **Zombie connections**: Both sockets destroyed but not cleaned up
+- **Half-zombies**: One socket destroyed
+- **Stuck connecting**: Outgoing socket stuck in connecting state
+- **No outgoing**: Missing outgoing socket
+- **Keep-alive stuck**: Keep-alive connections with no recent activity
+- **Old connections**: Connections older than 1 hour
+- **No data transfer**: Connections with no bytes transferred
+- **Listener leaks**: Excessive event listeners
+### Common Accumulation Patterns
+1. **Connecting State Stuck**
+   - Outgoing socket shows `connecting: true` indefinitely
+   - Usually means connection timeout not working
+   - Check if backend is reachable
+2. **Missing Outgoing Socket**
+   - Connection has no outgoing socket but isn't closed
+   - May indicate immediate routing issues
+   - Check error logs during connection setup
+3. **Event Listener Accumulation**
+   - High listener counts (>20) on sockets
+   - Indicates cleanup not removing all listeners
+   - Can cause memory leaks
+4. **Keep-Alive Zombies**
+   - Keep-alive connections not timing out
+   - Check keepAlive timeout settings
+   - May need more aggressive cleanup
+### Next Steps
+1. **Run the monitor in production** during accumulation
+2. **Share the diagnostic files** from `.nogit/connection-diagnostics/`
+3. **Look for patterns** in the captured snapshots
+4. **Check specific connection IDs** that accumulate
+The diagnostic files will show exactly what state connections are in when accumulation occurs, allowing targeted fixes for the specific issue.
+## ✅ FIXED: Stuck Connection Detection (January 2025)
+### Additional Root Cause Found
+Connections to hanging backends (that accept but never respond) were not being cleaned up because:
+- Both sockets remain alive (not destroyed)
+- Keep-alive prevents normal timeout
+- No data is sent back to the client despite receiving data
+- These don't qualify as "zombies" since sockets aren't destroyed
+### Fix Implemented
+Added stuck connection detection to the periodic inactivity check:
+```typescript
+// Check for stuck connections: no data sent back to client
+if (!record.connectionClosed && record.outgoing && record.bytesReceived > 0 && record.bytesSent === 0) {
+  const age = now - record.incomingStartTime;
+  // If connection is older than 60 seconds and no data sent back, likely stuck
+  if (age > 60000) {
+    logger.log('warn', `Stuck connection detected: ${connectionId} - received ${record.bytesReceived} bytes but sent 0 bytes`, {
+      connectionId,
+      remoteIP: record.remoteIP,
+      age: plugins.prettyMs(age),
+      bytesReceived: record.bytesReceived,
+      targetHost: record.targetHost,
+      targetPort: record.targetPort,
+      component: 'connection-manager'
+    });
+    // Clean up
+    this.cleanupConnection(record, 'stuck_no_response');
+  }
+}
+```
+### What This Fixes
+- Connections to backends that accept but never respond
+- Proxy chains where inner proxy connects to unresponsive services
+- Scenarios where keep-alive prevents normal timeout mechanisms
+- Connections that receive client data but never send anything back
+### Detection Criteria
+- Connection has received bytes from client (`bytesReceived > 0`)
+- No bytes sent back to client (`bytesSent === 0`)
+- Connection is older than 60 seconds
+- Both sockets are still alive (not destroyed)
+This complements the zombie detection by handling cases where sockets remain technically alive but the connection is effectively dead.

package/readme.monitoring.md ADDED Viewed

@@ -0,0 +1,202 @@
+# Production Connection Monitoring
+This document explains how to use the ProductionConnectionMonitor to diagnose connection accumulation issues in real-time.
+## Quick Start
+```typescript
+import ProductionConnectionMonitor from './.nogit/debug/production-connection-monitor.js';
+// After starting your proxy
+const monitor = new ProductionConnectionMonitor(proxy);
+monitor.start(5000); // Check every 5 seconds
+// The monitor will automatically capture diagnostics when:
+// - Connections exceed 50 (default threshold)
+// - Sudden spike of 20+ connections occurs
+// - You manually call monitor.forceCaptureNow()
+```
+## What Gets Captured
+When accumulation is detected, the monitor saves a JSON file with:
+### Connection Details
+- Socket states (destroyed, readable, writable, readyState)
+- Connection age and activity timestamps
+- Data transfer statistics (bytes sent/received)
+- Target host and port information
+- Keep-alive status
+- Event listener counts
+### System State
+- Memory usage
+- Event loop lag
+- Connection count trends
+- Termination statistics
+## Reading Diagnostic Files
+Files are saved to `.nogit/connection-diagnostics/` with names like:
+```
+accumulation_2025-06-07T20-20-43-733Z_force_capture.json
+```
+### Key Fields to Check
+1. **Socket States**
+   ```json
+   "incomingState": {
+     "destroyed": false,
+     "readable": true,
+     "writable": true,
+     "readyState": "open"
+   }
+   ```
+   - Both destroyed = zombie connection
+   - One destroyed = half-zombie
+   - Both alive but old = potential stuck connection
+2. **Data Transfer**
+   ```json
+   "bytesReceived": 36,
+   "bytesSent": 0,
+   "timeSinceLastActivity": 60000
+   ```
+   - No bytes sent back = stuck connection
+   - High bytes but old = slow backend
+   - No activity = idle connection
+3. **Connection Flags**
+   ```json
+   "hasReceivedInitialData": false,
+   "hasKeepAlive": true,
+   "connectionClosed": false
+   ```
+   - hasReceivedInitialData=false on non-TLS = immediate routing
+   - hasKeepAlive=true = extended timeout applies
+   - connectionClosed=false = still tracked
+## Common Patterns
+### 1. Hanging Backend Pattern
+```json
+{
+  "bytesReceived": 36,
+  "bytesSent": 0,
+  "age": 120000,
+  "targetHost": "backend.example.com",
+  "incomingState": { "destroyed": false },
+  "outgoingState": { "destroyed": false }
+}
+```
+**Fix**: The stuck connection detection (60s timeout) should clean these up.
+### 2. Zombie Connection Pattern
+```json
+{
+  "incomingState": { "destroyed": true },
+  "outgoingState": { "destroyed": true },
+  "connectionClosed": false
+}
+```
+**Fix**: The zombie detection should clean these up within 30s.
+### 3. Event Listener Leak Pattern
+```json
+{
+  "incomingListeners": {
+    "data": 15,
+    "error": 20,
+    "close": 18
+  }
+}
+```
+**Issue**: Event listeners accumulating, potential memory leak.
+### 4. No Outgoing Socket Pattern
+```json
+{
+  "outgoingState": { "exists": false },
+  "connectionClosed": false,
+  "age": 5000
+}
+```
+**Issue**: Connection setup failed but cleanup didn't trigger.
+## Forcing Diagnostic Capture
+To capture current state immediately:
+```typescript
+monitor.forceCaptureNow();
+```
+This is useful when you notice accumulation starting.
+## Automated Analysis
+The monitor automatically analyzes patterns and logs:
+- Zombie/half-zombie counts
+- Stuck connection counts
+- Old connection counts
+- Memory usage
+- Recommendations
+## Integration Example
+```typescript
+// In your proxy startup script
+import { SmartProxy } from '@push.rocks/smartproxy';
+import ProductionConnectionMonitor from './production-connection-monitor.js';
+async function startProxyWithMonitoring() {
+  const proxy = new SmartProxy({
+    // your config
+  });
+  await proxy.start();
+  // Start monitoring
+  const monitor = new ProductionConnectionMonitor(proxy);
+  monitor.start(5000);
+  // Optional: Capture on specific events
+  process.on('SIGUSR1', () => {
+    console.log('Manual diagnostic capture triggered');
+    monitor.forceCaptureNow();
+  });
+  // Graceful shutdown
+  process.on('SIGTERM', async () => {
+    monitor.stop();
+    await proxy.stop();
+    process.exit(0);
+  });
+}
+```
+## Troubleshooting
+### Monitor Not Detecting Accumulation
+- Check threshold settings (default: 50 connections)
+- Reduce check interval for faster detection
+- Use forceCaptureNow() to capture current state
+### Too Many False Positives
+- Increase accumulation threshold
+- Increase spike threshold
+- Adjust check interval
+### Missing Diagnostic Data
+- Ensure output directory exists and is writable
+- Check disk space
+- Verify process has write permissions
+## Next Steps
+1. Deploy the monitor to production
+2. Wait for accumulation to occur
+3. Share diagnostic files for analysis
+4. Apply targeted fixes based on patterns found
+The diagnostic data will reveal the exact state of connections when accumulation occurs, enabling precise fixes for your specific scenario.

package/ts/proxies/smart-proxy/connection-manager.ts CHANGED Viewed

@@ -495,6 +495,32 @@ export class ConnectionManager extends LifecycleComponent {
             this.cleanupConnection(record, 'half_zombie_cleanup');
           }
         }
+        // Check for stuck connections: no data sent back to client
+        if (!record.connectionClosed && record.outgoing && record.bytesReceived > 0 && record.bytesSent === 0) {
+          const age = now - record.incomingStartTime;
+          // If connection is older than 60 seconds and no data sent back, likely stuck
+          if (age > 60000) {
+            logger.log('warn', `Stuck connection detected: ${connectionId} - received ${record.bytesReceived} bytes but sent 0 bytes`, {
+              connectionId,
+              remoteIP: record.remoteIP,
+              age: plugins.prettyMs(age),
+              bytesReceived: record.bytesReceived,
+              targetHost: record.targetHost,
+              targetPort: record.targetPort,
+              component: 'connection-manager'
+            });
+            // Set termination reason and increment stats
+            if (record.incomingTerminationReason == null) {
+              record.incomingTerminationReason = 'stuck_no_response';
+              this.incrementTerminationStat('incoming', 'stuck_no_response');
+            }
+            // Clean up
+            this.cleanupConnection(record, 'stuck_no_response');
+          }
+        }
       }
     }