npm - @push.rocks/smartproxy - Versions diffs - 19.5.23 → 19.5.24 - Mend

@push.rocks/smartproxy 19.5.23 → 19.5.24

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (4) hide show

package/package.json +1 -1
package/readme.connections.md +177 -1
package/readme.hints.md +39 -1
package/ts/proxies/smart-proxy/connection-manager.ts +42 -0

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@push.rocks/smartproxy",
-  "version": "19.5.23",
+  "version": "19.5.24",
   "private": false,
   "description": "A powerful proxy package with unified route-based configuration for high traffic management. Features include SSL/TLS support, flexible routing patterns, WebSocket handling, advanced security options, and automatic ACME certificate management.",
   "main": "dist_ts/index.js",

package/readme.connections.md CHANGED Viewed

@@ -372,4 +372,180 @@ The connection cleanup mechanisms have been significantly improved in v19.5.20:
 2. Immediate routing cleanup handler always destroys outgoing connections
 3. Tests confirm no accumulation in standard scenarios with reachable backends
-However, the missing connection establishment timeout causes accumulation when backends are unreachable or very slow to connect.
+However, the missing connection establishment timeout causes accumulation when backends are unreachable or very slow to connect.
+### Outer Proxy Sudden Accumulation After Hours
+**User Report**: "The counter goes up suddenly after some hours on the outer proxy"
+**Investigation Findings**:
+1. **Cleanup Queue Mechanism**:
+   - Connections are cleaned up in batches of 100 via a queue
+   - If the cleanup timer gets stuck or cleared without restart, connections accumulate
+   - The timer is set with `setTimeout` and could be affected by event loop blocking
+2. **Potential Causes for Sudden Spikes**:
+   a) **Cleanup Timer Failure**:
+   ```typescript
+   // In ConnectionManager, if this timer gets cleared but not restarted:
+   this.cleanupTimer = this.setTimeout(() => {
+     this.processCleanupQueue();
+   }, 100);
+   ```
+   b) **Memory Pressure**:
+   - After hours of operation, memory fragmentation or pressure could cause delays
+   - Garbage collection pauses might interfere with timer execution
+   c) **Event Listener Accumulation**:
+   - Socket event listeners might accumulate over time
+   - Server 'connection' event handlers are particularly important
+   d) **Keep-Alive Connection Cascades**:
+   - When many keep-alive connections timeout simultaneously
+   - Outer proxy has different timeout than inner proxy
+   - Mass disconnection events can overwhelm cleanup queue
+   e) **HttpProxy Component Issues**:
+   - If using `useHttpProxy`, the HttpProxy bridge might maintain connection pools
+   - These pools might not be properly cleaned after hours
+3. **Why "Sudden" After Hours**:
+   - Not a gradual leak but triggered by specific conditions
+   - Likely related to periodic events or thresholds:
+     - Inactivity check runs every 30 seconds
+     - Keep-alive connections have extended timeouts (6x normal)
+     - Parity check has 30-minute timeout for half-closed connections
+4. **Reproduction Scenarios**:
+   - Mass client disconnection/reconnection (network blip)
+   - Keep-alive timeout cascade when inner proxy times out first
+   - Cleanup timer getting stuck during high load
+   - Memory pressure causing event loop delays
+### Additional Monitoring Recommendations
+1. **Add Cleanup Queue Monitoring**:
+   ```typescript
+   setInterval(() => {
+     const cm = proxy.connectionManager;
+     if (cm.cleanupQueue.size > 100 && !cm.cleanupTimer) {
+       logger.error('Cleanup queue stuck!', {
+         queueSize: cm.cleanupQueue.size,
+         hasTimer: !!cm.cleanupTimer
+       });
+     }
+   }, 60000);
+   ```
+2. **Track Timer Health**:
+   - Monitor if cleanup timer is running
+   - Check for event loop blocking
+   - Log when batch processing takes too long
+3. **Memory Monitoring**:
+   - Track heap usage over time
+   - Monitor for memory leaks in long-running processes
+   - Force periodic garbage collection if needed
+### Immediate Mitigations
+1. **Restart Cleanup Timer**:
+   ```typescript
+   // Emergency cleanup timer restart
+   if (!cm.cleanupTimer && cm.cleanupQueue.size > 0) {
+     cm.cleanupTimer = setTimeout(() => {
+       cm.processCleanupQueue();
+     }, 100);
+   }
+   ```
+2. **Force Periodic Cleanup**:
+   ```typescript
+   setInterval(() => {
+     const cm = connectionManager;
+     if (cm.getConnectionCount() > threshold) {
+       cm.performOptimizedInactivityCheck();
+       // Force process cleanup queue
+       cm.processCleanupQueue();
+     }
+   }, 300000); // Every 5 minutes
+   ```
+3. **Connection Age Limits**:
+   - Set maximum connection lifetime
+   - Force close connections older than threshold
+   - More aggressive cleanup for proxy chains
+## ✅ FIXED: Zombie Connection Detection (January 2025)
+### Root Cause Identified
+"Zombie connections" occur when sockets are destroyed without triggering their close/error event handlers. This causes connections to remain tracked with both sockets destroyed but `connectionClosed=false`. This is particularly problematic in proxy chains where the inner proxy might close connections in ways that don't trigger proper events on the outer proxy.
+### Fix Implemented
+Added zombie detection to the periodic inactivity check in ConnectionManager:
+```typescript
+// In performOptimizedInactivityCheck()
+// Check ALL connections for zombie state
+for (const [connectionId, record] of this.connectionRecords) {
+  if (!record.connectionClosed) {
+    const incomingDestroyed = record.incoming?.destroyed || false;
+    const outgoingDestroyed = record.outgoing?.destroyed || false;
+    // Check for zombie connections: both sockets destroyed but not cleaned up
+    if (incomingDestroyed && outgoingDestroyed) {
+      logger.log('warn', `Zombie connection detected: ${connectionId} - both sockets destroyed but not cleaned up`, {
+        connectionId,
+        remoteIP: record.remoteIP,
+        age: plugins.prettyMs(now - record.incomingStartTime),
+        component: 'connection-manager'
+      });
+      // Clean up immediately
+      this.cleanupConnection(record, 'zombie_cleanup');
+      continue;
+    }
+    // Check for half-zombie: one socket destroyed
+    if (incomingDestroyed || outgoingDestroyed) {
+      const age = now - record.incomingStartTime;
+      // Give it 30 seconds grace period for normal cleanup
+      if (age > 30000) {
+        logger.log('warn', `Half-zombie connection detected: ${connectionId} - ${incomingDestroyed ? 'incoming' : 'outgoing'} destroyed`, {
+          connectionId,
+          remoteIP: record.remoteIP,
+          age: plugins.prettyMs(age),
+          incomingDestroyed,
+          outgoingDestroyed,
+          component: 'connection-manager'
+        });
+        // Clean up
+        this.cleanupConnection(record, 'half_zombie_cleanup');
+      }
+    }
+  }
+}
+```
+### How It Works
+1. **Full Zombie Detection**: Detects when both incoming and outgoing sockets are destroyed but the connection hasn't been cleaned up
+2. **Half-Zombie Detection**: Detects when only one socket is destroyed, with a 30-second grace period for normal cleanup to occur
+3. **Automatic Cleanup**: Immediately cleans up zombie connections when detected
+4. **Runs Periodically**: Integrated into the existing inactivity check that runs every 30 seconds
+### Why This Fixes the Outer Proxy Accumulation
+- When inner proxy closes connections abruptly (e.g., due to backend failure), the outer proxy's outgoing socket might be destroyed without firing close/error events
+- These become zombie connections that previously accumulated indefinitely
+- Now they are detected and cleaned up within 30 seconds
+### Test Results
+Debug scripts confirmed:
+- Zombie connections can be created when sockets are destroyed directly without events
+- The zombie detection successfully identifies and cleans up these connections
+- Both full zombies (both sockets destroyed) and half-zombies (one socket destroyed) are handled
+This fix addresses the specific issue where "connections that are closed on the inner proxy, always also close on the outer proxy" as requested by the user.

package/readme.hints.md CHANGED Viewed

@@ -856,4 +856,42 @@ The WrappedSocket class has been implemented as the foundation for PROXY protoco
 For detailed information about proxy protocol implementation and proxy chaining:
 - **[Proxy Protocol Guide](./readme.proxy-protocol.md)** - Complete implementation details and configuration
 - **[Proxy Protocol Examples](./readme.proxy-protocol-example.md)** - Code examples and conceptual implementation
-- **[Proxy Chain Summary](./readme.proxy-chain-summary.md)** - Quick reference for proxy chaining setup
+- **[Proxy Chain Summary](./readme.proxy-chain-summary.md)** - Quick reference for proxy chaining setup
+## Connection Cleanup Edge Cases Investigation (v19.5.20+)
+### Issue Discovered
+"Zombie connections" can occur when both sockets are destroyed but the connection record hasn't been cleaned up. This happens when sockets are destroyed without triggering their close/error event handlers.
+### Root Cause
+1. **Event Handler Bypass**: In edge cases (network failures, proxy chain failures, forced socket destruction), sockets can be destroyed without their event handlers being called
+2. **Cleanup Queue Delay**: The `initiateCleanupOnce` method adds connections to a cleanup queue (batch of 100 every 100ms), which may not process fast enough
+3. **Inactivity Check Limitation**: The periodic inactivity check only examines `lastActivity` timestamps, not actual socket states
+### Test Results
+Debug script (`connection-manager-direct-test.ts`) revealed:
+- **Normal cleanup works**: When socket events fire normally, cleanup is reliable
+- **Zombies ARE created**: Direct socket destruction creates zombies (destroyed sockets, connectionClosed=false)
+- **Manual cleanup works**: Calling `initiateCleanupOnce` on a zombie does clean it up
+- **Inactivity check misses zombies**: The check doesn't detect connections with destroyed sockets
+### Potential Solutions
+1. **Periodic Zombie Detection**: Add zombie detection to the inactivity check:
+   ```typescript
+   // In performOptimizedInactivityCheck
+   if (record.incoming?.destroyed && record.outgoing?.destroyed && !record.connectionClosed) {
+     this.cleanupConnection(record, 'zombie_detected');
+   }
+   ```
+2. **Socket State Monitoring**: Check socket states during connection operations
+3. **Defensive Socket Handling**: Always attach cleanup handlers before any operation that might destroy sockets
+4. **Immediate Cleanup Option**: For critical paths, use `cleanupConnection` instead of `initiateCleanupOnce`
+### Impact
+- Memory leaks in edge cases (network failures, proxy chain issues)
+- Connection count inaccuracy
+- Potential resource exhaustion over time
+### Test Files
+- `.nogit/debug/connection-manager-direct-test.ts` - Direct ConnectionManager testing showing zombie creation

package/ts/proxies/smart-proxy/connection-manager.ts CHANGED Viewed

@@ -456,6 +456,48 @@ export class ConnectionManager extends LifecycleComponent {
       }
     }
+    // Also check ALL connections for zombie state (destroyed sockets but not cleaned up)
+    // This is critical for proxy chains where sockets can be destroyed without events
+    for (const [connectionId, record] of this.connectionRecords) {
+      if (!record.connectionClosed) {
+        const incomingDestroyed = record.incoming?.destroyed || false;
+        const outgoingDestroyed = record.outgoing?.destroyed || false;
+        // Check for zombie connections: both sockets destroyed but connection not cleaned up
+        if (incomingDestroyed && outgoingDestroyed) {
+          logger.log('warn', `Zombie connection detected: ${connectionId} - both sockets destroyed but not cleaned up`, {
+            connectionId,
+            remoteIP: record.remoteIP,
+            age: plugins.prettyMs(now - record.incomingStartTime),
+            component: 'connection-manager'
+          });
+          // Clean up immediately
+          this.cleanupConnection(record, 'zombie_cleanup');
+          continue;
+        }
+        // Check for half-zombie: one socket destroyed
+        if (incomingDestroyed || outgoingDestroyed) {
+          const age = now - record.incomingStartTime;
+          // Give it 30 seconds grace period for normal cleanup
+          if (age > 30000) {
+            logger.log('warn', `Half-zombie connection detected: ${connectionId} - ${incomingDestroyed ? 'incoming' : 'outgoing'} destroyed`, {
+              connectionId,
+              remoteIP: record.remoteIP,
+              age: plugins.prettyMs(age),
+              incomingDestroyed,
+              outgoingDestroyed,
+              component: 'connection-manager'
+            });
+            // Clean up
+            this.cleanupConnection(record, 'half_zombie_cleanup');
+          }
+        }
+      }
+    }
     // Process only connections that need checking
     for (const connectionId of connectionsToCheck) {
       const record = this.connectionRecords.get(connectionId);