pgserve 1.1.9 → 1.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/SECURITY.md ADDED
@@ -0,0 +1,109 @@
1
+ # Security Policy
2
+
3
+ `pgserve` is maintained by [Automagik](https://automagik.dev). We take the security of this package seriously and appreciate responsible disclosure from the community.
4
+
5
+ ---
6
+
7
+ ## Reporting a Vulnerability
8
+
9
+ **Please do not open public issues for security reports.**
10
+
11
+ Send private reports to one of the following channels:
12
+
13
+ | Channel | Address | Best for |
14
+ |---------|---------|----------|
15
+ | Security email | `privacidade@namastex.ai` | Anything security-related, including coordinated disclosure |
16
+ | DPO (privacy + security officer) | `dpo@khal.ai` | Privacy, LGPD, data protection concerns |
17
+ | Private GitHub advisory | [Report via GitHub](https://github.com/namastexlabs/pgserve/security/advisories/new) | Preferred for CVE assignment and coordinated release |
18
+
19
+ **PGP** available on request.
20
+
21
+ ### Response SLA
22
+
23
+ - Acknowledgement: **within 2 business hours** (UTC-3).
24
+ - Initial triage and severity assessment: **within 24 hours**.
25
+ - Fix or mitigation plan: **within 7 days** for critical/high severity.
26
+ - Public disclosure: coordinated with reporter, typically within 30 days of fix.
27
+
28
+ We will credit reporters publicly (with their permission) in the released advisory.
29
+
30
+ ---
31
+
32
+ ## Supported Versions
33
+
34
+ | Version line | Status |
35
+ |--------------|--------|
36
+ | `1.1.10` and later clean releases | ✅ Supported — current |
37
+ | `1.1.11` – `1.1.14` | ❌ **COMPROMISED — do not use** |
38
+ | `1.1.0` – `1.1.9` | ⚠️ Legacy — security patches only |
39
+ | `1.0.x` and earlier | ❌ End of life |
40
+
41
+ Always install from the current stable line. Pin explicit versions in your `package.json` and avoid `latest` for supply-chain sensitive packages.
42
+
43
+ ---
44
+
45
+ ## Past Incidents
46
+
47
+ ### 2026-04 — CanisterWorm supply-chain compromise
48
+
49
+ Between 2026-04-21 (~22:14 UTC) and 2026-04-22 (~14:00 UTC), versions `1.1.11`, `1.1.12`, `1.1.13`, and `1.1.14` were published to npm by a threat actor after a developer GitHub OAuth token was exfiltrated. The malicious versions contained a `TeamPCP` payload in `scripts/check-env.js` that executed via `postinstall` to harvest local credentials.
50
+
51
+ - **Exposure window:** ~16 hours
52
+ - **Detection-to-containment:** under 20 hours
53
+ - **Current status:** malicious versions `npm unpublish`-ed and no longer installable
54
+
55
+ **If you installed versions `1.1.11` – `1.1.14` between April 21–22, 2026, assume your machine is compromised.** Follow the remediation guide linked below.
56
+
57
+ **Resources:**
58
+ - 📖 [Full incident response manual](https://github.com/namastexlabs/genie-dpo/blob/main/knowledge/canisterworm-incident-response.md)
59
+ - 🌐 [Public advisory (English)](https://automagik.dev/security)
60
+ - 🌐 [Aviso público (Português)](https://automagik.dev/seguranca)
61
+ - 🛡️ [GitHub Security Advisories](https://github.com/namastexlabs/pgserve/security/advisories) for this repository
62
+
63
+ A full public post-mortem will be published within 30 days of containment.
64
+
65
+ ---
66
+
67
+ ## Acknowledgments
68
+
69
+ We thank the researchers and organizations that identified and tracked this incident:
70
+
71
+ - [**Socket Research Team**](https://socket.dev/blog/namastex-npm-packages-compromised-canisterworm) — primary discovery and continued tracking at [socket.dev/supply-chain-attacks/canistersprawl](https://socket.dev/supply-chain-attacks/canistersprawl).
72
+ - **Endor Labs**, **Kodem Security**, **BleepingComputer**, **The Register**, **CSO Online**, **GBHackers**, **Cybersecurity News** — for coverage, analysis, and technical breakdowns that helped defenders respond quickly.
73
+
74
+ We also thank the Automagik team that ran the end-to-end response during the incident window, and the broader open-source community whose scrutiny, tools, and unfiltered feedback keep this ecosystem healthy. We will keep earning it.
75
+
76
+ ---
77
+
78
+ ## Our Commitments
79
+
80
+ Effective 2026-04-23, all `pgserve` releases are governed by:
81
+
82
+ - **Provenance attestation** — every publication is signed with `npm --provenance` and verifiable via Sigstore.
83
+ - **OIDC trusted publishing** — migrating to GitHub Actions OIDC publish, eliminating long-lived npm tokens. (in progress)
84
+ - **Mandatory 2FA** on every maintainer account with publish rights.
85
+ - **Environment protection** — production publishes require manual approval from a second maintainer.
86
+ - **Quarterly token audit** — scope and permission review.
87
+ - **External pentest** — scheduled ahead of the original roadmap.
88
+
89
+ ---
90
+
91
+ ## Hardening Recommendations for Consumers
92
+
93
+ - Pin explicit versions, not `latest`: `"pgserve": "1.1.10"`.
94
+ - Use `npm ci` in CI. It enforces lockfile-based installs by default.
95
+ - Evaluate `--ignore-scripts` per-package for untrusted dependencies. The current `pgserve` release does not require any lifecycle script to function.
96
+ - Verify package provenance: `npm view pgserve --json | jq '.dist.attestations'`.
97
+ - Monitor advisories: subscribe to GitHub security alerts for this repository.
98
+
99
+ ---
100
+
101
+ ## Contact
102
+
103
+ - **Security & incidents:** `privacidade@namastex.ai`
104
+ - **Data Protection Officer (DPO):** Cezar Vasconcelos — `dpo@khal.ai`
105
+ - **Security disclosure page:** [automagik.dev/security](https://automagik.dev/security)
106
+
107
+ Namastex Labs Serviços em Tecnologia Ltda · CNPJ 46.156.854/0001-62
108
+
109
+ *Last updated: 2026-04-23 · v1.0*
@@ -53,8 +53,113 @@ if (!bunPath) {
53
53
  process.exit(1);
54
54
  }
55
55
 
56
+ // Pre-flight health check: verify bun can actually execute.
57
+ //
58
+ // When pgserve is installed via `bun install` (as a global or transitive dep),
59
+ // the nested `bun` npm package's postinstall can be skipped, leaving
60
+ // `@oven/bun-<platform>/bin/bun` empty. The bun stub at `node_modules/bun/bin/bun`
61
+ // then exits instantly with:
62
+ // Error: Bun's postinstall script was not run.
63
+ //
64
+ // pglite-server.js's TCP readiness poll can't distinguish this from a slow
65
+ // startup, so users see a confusing 30s timeout. Detect the specific error
66
+ // here, attempt the documented self-heal once (`node install.js`), and retry.
67
+ // If self-heal also fails, surface the real error instead of hanging later.
68
+ ensureBunHealthy(bunPath);
69
+
56
70
  const scriptPath = path.join(__dirname, 'pglite-server.js');
57
71
 
72
+ /**
73
+ * Verify the selected bun binary can execute. If it fails with the known
74
+ * "postinstall script was not run" signature, attempt a one-shot repair via
75
+ * the bun npm package's install.js. Throws (with a useful message) rather
76
+ * than letting pglite-server.js hang on the TCP readiness poll for 30s.
77
+ */
78
+ function ensureBunHealthy(bunExe) {
79
+ const probe = probeBun(bunExe);
80
+ if (probe.ok) return;
81
+
82
+ // Only attempt self-heal for the specific postinstall-not-run failure.
83
+ // Any other failure (corrupt binary, unsupported glibc, etc.) is surfaced
84
+ // as-is rather than silently papered over.
85
+ if (!isPostinstallMissingError(probe.output)) {
86
+ console.error('Error: bun runtime at', bunExe, 'failed to execute:');
87
+ console.error(probe.output || '(no output)');
88
+ process.exit(1);
89
+ }
90
+
91
+ const installJs = findBunInstallJs(bunExe);
92
+ if (!installJs) {
93
+ console.error('Error: bun runtime at', bunExe, 'is missing its platform binary,');
94
+ console.error('and the recovery script (node_modules/bun/install.js) could not be located.');
95
+ console.error('');
96
+ console.error('Try reinstalling pgserve, or run the fix manually:');
97
+ console.error(' cd <node_modules>/bun && node install.js');
98
+ process.exit(1);
99
+ }
100
+
101
+ console.error('[pgserve] bun runtime missing platform binary; attempting self-heal...');
102
+ try {
103
+ execSync(`node ${JSON.stringify(installJs)}`, { stdio: 'inherit' });
104
+ } catch {
105
+ // fall through to second probe
106
+ }
107
+
108
+ const second = probeBun(bunExe);
109
+ if (second.ok) {
110
+ console.error('[pgserve] bun runtime recovered.');
111
+ return;
112
+ }
113
+
114
+ console.error('Error: bun runtime still broken after self-heal attempt.');
115
+ console.error(second.output || '(no output)');
116
+ console.error('');
117
+ console.error('Manual fix:');
118
+ console.error(` cd ${path.dirname(path.dirname(installJs))}/bun && node install.js`);
119
+ console.error('');
120
+ console.error('Upstream bug: https://github.com/namastexlabs/pgserve/issues/22');
121
+ process.exit(1);
122
+ }
123
+
124
+ function probeBun(bunExe) {
125
+ try {
126
+ const out = execSync(`${JSON.stringify(bunExe)} --version`, {
127
+ stdio: ['ignore', 'pipe', 'pipe'],
128
+ timeout: 10000,
129
+ encoding: 'utf8'
130
+ });
131
+ return { ok: true, output: out };
132
+ } catch (err) {
133
+ const output = [err.stderr, err.stdout, err.message]
134
+ .filter(Boolean).map(String).join('\n');
135
+ return { ok: false, output };
136
+ }
137
+ }
138
+
139
+ function isPostinstallMissingError(output) {
140
+ return typeof output === 'string' &&
141
+ /Bun's postinstall script was not run/i.test(output);
142
+ }
143
+
144
+ function findBunInstallJs(bunExe) {
145
+ // Walk up from the bun binary toward a `bun` package dir containing install.js.
146
+ // Matches the wrapper's own location list - bun is always nested under a
147
+ // `bun` package directory (or its `bin/` subdir).
148
+ let cursor = path.dirname(path.resolve(bunExe));
149
+ for (let i = 0; i < 6; i++) {
150
+ const candidate = path.join(cursor, 'install.js');
151
+ if (fs.existsSync(candidate) && fs.existsSync(path.join(cursor, 'package.json'))) {
152
+ return candidate;
153
+ }
154
+ const nested = path.join(cursor, 'bun', 'install.js');
155
+ if (fs.existsSync(nested)) return nested;
156
+ const parent = path.dirname(cursor);
157
+ if (parent === cursor) break;
158
+ cursor = parent;
159
+ }
160
+ return null;
161
+ }
162
+
58
163
  // Platform-specific spawning strategy:
59
164
  // - Windows: Use pipes for explicit handle control (prevents EBUSY errors)
60
165
  // - Unix: Use inherit for simplicity (works fine)
package/knip.json CHANGED
@@ -3,7 +3,7 @@
3
3
  "entry": ["src/index.js", "bin/pglite-server.js", "bin/pgserve-wrapper.cjs"],
4
4
  "project": ["src/**/*.js", "bin/**/*.js", "bin/**/*.cjs"],
5
5
  "ignore": ["tests/**", "helpers/**", "scripts/**"],
6
- "ignoreBinaries": ["scripts/test-npx.sh", "make"],
6
+ "ignoreBinaries": ["scripts/test-npx.sh", "scripts/test-bun-self-heal.sh", "make"],
7
7
  "ignoreDependencies": ["bun"],
8
8
  "ignoreExportsUsedInFile": true
9
9
  }
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "pgserve",
3
- "version": "1.1.9",
3
+ "version": "1.2.0",
4
4
  "description": "Embedded PostgreSQL server with true concurrent connections - zero config, auto-provision databases",
5
5
  "main": "src/index.js",
6
6
  "type": "module",
@@ -19,7 +19,8 @@
19
19
  "lint:fix": "eslint src/ bin/ --fix",
20
20
  "deadcode": "knip",
21
21
  "test:npx": "scripts/test-npx.sh",
22
- "prepublishOnly": "npm run lint && npm run deadcode && npm run test:npx",
22
+ "test:bun-self-heal": "scripts/test-bun-self-heal.sh",
23
+ "prepublishOnly": "npm run lint && npm run deadcode && npm run test:npx && npm run test:bun-self-heal",
23
24
  "prepare": "husky"
24
25
  },
25
26
  "keywords": [
@@ -0,0 +1,163 @@
1
+ #!/bin/bash
2
+ # Regression test for https://github.com/namastexlabs/pgserve/issues/22
3
+ #
4
+ # When pgserve is installed via `bun install`, the nested `bun` npm package's
5
+ # postinstall can be skipped, leaving @oven/bun-<platform>/bin/bun empty.
6
+ # The bun stub then refuses to run with "Bun's postinstall script was not run".
7
+ # pgserve-wrapper.cjs must detect this and self-heal via `node install.js`.
8
+ #
9
+ # This test stages a synthetic broken install tree, runs the wrapper, and
10
+ # asserts that it recovers and spawns pglite-server.
11
+
12
+ set -e
13
+
14
+ REPO_ROOT="$(cd "$(dirname "$0")/.." && pwd)"
15
+ WRAPPER="$REPO_ROOT/bin/pgserve-wrapper.cjs"
16
+
17
+ if [ ! -f "$WRAPPER" ]; then
18
+ echo "✗ wrapper not found: $WRAPPER"
19
+ exit 1
20
+ fi
21
+
22
+ # Use a real bun binary as the "recovered" payload so the healthy-path
23
+ # assertion is meaningful. Falls back to any bun on PATH.
24
+ REAL_BUN="${BUN_BIN:-$(command -v bun || true)}"
25
+ if [ -z "$REAL_BUN" ] || [ ! -x "$REAL_BUN" ]; then
26
+ echo "✗ bun runtime not found on PATH (set BUN_BIN to override)"
27
+ exit 1
28
+ fi
29
+
30
+ FIXTURE=$(mktemp -d)
31
+ trap "rm -rf $FIXTURE" EXIT
32
+
33
+ mkdir -p "$FIXTURE/node_modules/bun/bin"
34
+ mkdir -p "$FIXTURE/node_modules/@oven/bun-linux-x64/bin" # empty, simulating the bug
35
+ mkdir -p "$FIXTURE/node_modules/.bin"
36
+ mkdir -p "$FIXTURE/node_modules/pgserve/bin"
37
+
38
+ cp "$WRAPPER" "$FIXTURE/node_modules/pgserve/bin/pgserve-wrapper.cjs"
39
+
40
+ # Stub pglite-server so we can detect a successful spawn without needing
41
+ # postgres binaries in the fixture.
42
+ cat > "$FIXTURE/node_modules/pgserve/bin/pglite-server.js" <<'EOF'
43
+ console.log("pglite-server-spawned");
44
+ process.exit(0);
45
+ EOF
46
+
47
+ # Fake bun install.js: copies the real bun into the expected @oven location,
48
+ # mirroring what the real postinstall does.
49
+ cat > "$FIXTURE/node_modules/bun/install.js" <<EOF
50
+ const fs = require('fs');
51
+ const path = require('path');
52
+ const dst = path.resolve(__dirname, '..', '@oven', 'bun-linux-x64', 'bin', 'bun');
53
+ fs.mkdirSync(path.dirname(dst), { recursive: true });
54
+ fs.copyFileSync('$REAL_BUN', dst);
55
+ fs.chmodSync(dst, 0o755);
56
+ console.log('[test] install.js populated', dst);
57
+ EOF
58
+ echo '{"name":"bun","version":"1.3.12"}' > "$FIXTURE/node_modules/bun/package.json"
59
+
60
+ # Broken bun stub: prints the postinstall error unless the @oven binary exists.
61
+ cat > "$FIXTURE/node_modules/bun/bin/bun" <<'EOF'
62
+ #!/bin/sh
63
+ SELF=$(readlink -f "$0")
64
+ TARGET="$(dirname "$SELF")/../../@oven/bun-linux-x64/bin/bun"
65
+ if [ ! -x "$TARGET" ]; then
66
+ echo "Error: Bun's postinstall script was not run." >&2
67
+ echo "" >&2
68
+ echo "To fix this, run the postinstall script manually:" >&2
69
+ echo " cd node_modules/bun && node install.js" >&2
70
+ exit 1
71
+ fi
72
+ exec "$TARGET" "$@"
73
+ EOF
74
+ chmod +x "$FIXTURE/node_modules/bun/bin/bun"
75
+
76
+ ln -s ../bun/bin/bun "$FIXTURE/node_modules/.bin/bun"
77
+
78
+ echo "=== Testing self-heal on broken install ==="
79
+ OUTPUT=$(node "$FIXTURE/node_modules/pgserve/bin/pgserve-wrapper.cjs" 2>&1)
80
+ EXIT=$?
81
+
82
+ if [ $EXIT -ne 0 ]; then
83
+ echo "✗ wrapper exited non-zero: $EXIT"
84
+ echo "$OUTPUT"
85
+ exit 1
86
+ fi
87
+
88
+ if ! echo "$OUTPUT" | grep -q "attempting self-heal"; then
89
+ echo "✗ wrapper did not attempt self-heal"
90
+ echo "$OUTPUT"
91
+ exit 1
92
+ fi
93
+
94
+ if ! echo "$OUTPUT" | grep -q "bun runtime recovered"; then
95
+ echo "✗ wrapper did not report recovery"
96
+ echo "$OUTPUT"
97
+ exit 1
98
+ fi
99
+
100
+ if ! echo "$OUTPUT" | grep -q "pglite-server-spawned"; then
101
+ echo "✗ pglite-server was not spawned after self-heal"
102
+ echo "$OUTPUT"
103
+ exit 1
104
+ fi
105
+
106
+ echo "✓ self-heal path: wrapper detected, repaired, and spawned pglite-server"
107
+
108
+ echo ""
109
+ echo "=== Testing healthy path is unaffected ==="
110
+ OUTPUT=$(node "$FIXTURE/node_modules/pgserve/bin/pgserve-wrapper.cjs" 2>&1)
111
+ EXIT=$?
112
+
113
+ if [ $EXIT -ne 0 ]; then
114
+ echo "✗ wrapper exited non-zero on healthy path: $EXIT"
115
+ echo "$OUTPUT"
116
+ exit 1
117
+ fi
118
+
119
+ if echo "$OUTPUT" | grep -q "self-heal\|recovered"; then
120
+ echo "✗ wrapper logged self-heal messages on a healthy install"
121
+ echo "$OUTPUT"
122
+ exit 1
123
+ fi
124
+
125
+ if ! echo "$OUTPUT" | grep -q "pglite-server-spawned"; then
126
+ echo "✗ pglite-server was not spawned on healthy path"
127
+ echo "$OUTPUT"
128
+ exit 1
129
+ fi
130
+
131
+ echo "✓ healthy path: wrapper was silent and spawned pglite-server directly"
132
+
133
+ echo ""
134
+ echo "=== Testing non-postinstall errors surface raw ==="
135
+ # Replace stub with one that emits an unrelated error.
136
+ cat > "$FIXTURE/node_modules/bun/bin/bun" <<'EOF'
137
+ #!/bin/sh
138
+ echo "Error: GLIBC_2.99 not found (libc mismatch)" >&2
139
+ exit 127
140
+ EOF
141
+ chmod +x "$FIXTURE/node_modules/bun/bin/bun"
142
+
143
+ # Clear the @oven healed binary so the stub is what runs.
144
+ rm -f "$FIXTURE/node_modules/@oven/bun-linux-x64/bin/bun"
145
+
146
+ OUTPUT=$(node "$FIXTURE/node_modules/pgserve/bin/pgserve-wrapper.cjs" 2>&1 || true)
147
+
148
+ if echo "$OUTPUT" | grep -q "self-heal"; then
149
+ echo "✗ wrapper tried self-heal for a non-postinstall error"
150
+ echo "$OUTPUT"
151
+ exit 1
152
+ fi
153
+
154
+ if ! echo "$OUTPUT" | grep -q "GLIBC_2.99"; then
155
+ echo "✗ wrapper did not surface the real error message"
156
+ echo "$OUTPUT"
157
+ exit 1
158
+ fi
159
+
160
+ echo "✓ unrelated-error path: wrapper surfaced the raw error without self-heal"
161
+
162
+ echo ""
163
+ echo "=== bun self-heal test PASSED ==="
package/src/postgres.js CHANGED
@@ -444,8 +444,20 @@ export class PostgresManager {
444
444
 
445
445
  /**
446
446
  * Start the embedded PostgreSQL instance
447
+ *
448
+ * Re-entry guard: if a previous start() left `this.process` or stale state
449
+ * behind, refuse silently rather than leaking another socketDir/databaseDir.
450
+ * Callers must call stop() first if they want to restart.
447
451
  */
448
452
  async start() {
453
+ if (this.process) {
454
+ this.logger?.warn(
455
+ { pid: this.process.pid, socketDir: this.socketDir },
456
+ 'PostgresManager.start() called while already started — returning existing instance'
457
+ );
458
+ return this;
459
+ }
460
+
449
461
  // Get binary paths (may extract bundled binaries on first run)
450
462
  this.binaries = await getBinaryPaths();
451
463
 
@@ -773,12 +785,33 @@ export class PostgresManager {
773
785
  readStream(this.process.stdout);
774
786
 
775
787
  // Handle process exit
788
+ //
789
+ // When the postgres subprocess exits (normal stop OR crash), we must
790
+ // null `this.process` AND `this.socketDir`/`this.databaseDir` so that
791
+ // subsequent `getSocketPath()` calls do not return a path to a directory
792
+ // that no longer exists. This is the issue #24 root cause: the router
793
+ // was receiving stale socketPaths pointing to cleaned-up tmp dirs.
794
+ //
795
+ // NOTE: we do NOT null socketDir here if `stop()` is in flight, because
796
+ // stop() already handles cleanup+null. We only need to self-heal when
797
+ // the exit is unexpected (external kill, crash, OOM).
776
798
  this.process.exited.then((code) => {
777
799
  processExited = true;
778
800
  if (!started) {
779
801
  reject(new Error(`PostgreSQL exited with code ${code} before starting: ${startupOutput}`));
780
802
  }
781
803
  this.process = null;
804
+ // On unexpected exit (not via stop()), reset cached paths so that
805
+ // getSocketPath() returns null and callers can fall back to TCP
806
+ // or force a fresh start().
807
+ if (!this._stopping) {
808
+ this.socketDir = null;
809
+ this.databaseDir = null;
810
+ this.logger?.warn(
811
+ { code },
812
+ 'PostgreSQL subprocess exited unexpectedly — socketDir/databaseDir reset'
813
+ );
814
+ }
782
815
  });
783
816
 
784
817
  // Method 1: TCP connection polling (preferred, works on Linux/macOS)
@@ -1294,8 +1327,19 @@ export class PostgresManager {
1294
1327
 
1295
1328
  /**
1296
1329
  * Stop the PostgreSQL instance
1330
+ *
1331
+ * Cleanup order matters: we null `this.socketDir`/`this.databaseDir` AFTER
1332
+ * the rmSync so any concurrent `getSocketPath()` call either sees the old
1333
+ * path (while it still exists) or null (after cleanup) — never a path
1334
+ * pointing to a deleted directory.
1335
+ *
1336
+ * The `_stopping` flag tells the process.exited handler to NOT redundantly
1337
+ * null the paths (avoids a race where start() called immediately after
1338
+ * stop() sees nulls that stop() was about to set anyway).
1297
1339
  */
1298
1340
  async stop() {
1341
+ this._stopping = true;
1342
+
1299
1343
  // Close admin pool first (Bun.sql)
1300
1344
  if (this.adminPool) {
1301
1345
  await this.adminPool.close();
@@ -1340,6 +1384,16 @@ export class PostgresManager {
1340
1384
  }
1341
1385
  }
1342
1386
  }
1387
+
1388
+ // Reset cached paths UNCONDITIONALLY after cleanup so getSocketPath()
1389
+ // returns null for anyone still holding a reference to this instance.
1390
+ // This is the core fix for issue #24.
1391
+ this.socketDir = null;
1392
+ if (!this.persistent) {
1393
+ this.databaseDir = null;
1394
+ }
1395
+ this.createdDatabases.clear();
1396
+ this._stopping = false;
1343
1397
  }
1344
1398
 
1345
1399
  /**
package/src/router.js CHANGED
@@ -14,6 +14,7 @@
14
14
  * PERFORMANCE: Uses Bun.listen() and Bun.connect() for 2-3x throughput improvement
15
15
  */
16
16
 
17
+ import fs from 'fs';
17
18
  import { PostgresManager } from './postgres.js';
18
19
  import { SyncManager } from './sync.js';
19
20
  import { RestoreManager } from './restore.js';
@@ -28,6 +29,14 @@ const SSL_REQUEST_CODE = 80877103;
28
29
  const GSSAPI_REQUEST_CODE = 80877104;
29
30
  const CANCEL_REQUEST_CODE = 80877102;
30
31
 
32
+ // Maximum size for the pre-handshake startup buffer. A legitimate PG
33
+ // startup message is at most a few hundred bytes; anything approaching
34
+ // 1 MiB is a runaway client or an attempted buffer-growth DoS. Bound
35
+ // this to stop the proxy from accumulating gigabytes of orphaned data
36
+ // when a client sends garbage and the handshake never completes.
37
+ // (Issue #18 root cause #2 — unbounded growth at state.buffer.)
38
+ const MAX_STARTUP_BUFFER_SIZE = 1024 * 1024; // 1 MiB
39
+
31
40
  /**
32
41
  * Attempt to write a pending buffer to a target socket.
33
42
  * Returns remaining unwritten bytes, or null if fully flushed.
@@ -231,6 +240,12 @@ export class MultiTenantRouter extends EventEmitter {
231
240
  pgSocket: null,
232
241
  dbName: null,
233
242
  handshakeComplete: false,
243
+ // startupInProgress serializes processStartupMessage() against async
244
+ // reentrancy — without it, every data event fired while the previous
245
+ // processStartupMessage() is still awaiting createDatabase() would
246
+ // launch another async task on the same state, racing to overwrite
247
+ // state.pgSocket and leaking the losers (issue #18 root cause #1).
248
+ startupInProgress: false,
234
249
  pendingToPg: null,
235
250
  pendingToClient: null
236
251
  });
@@ -249,9 +264,13 @@ export class MultiTenantRouter extends EventEmitter {
249
264
 
250
265
  // If handshake complete, forward to PostgreSQL
251
266
  if (state.handshakeComplete && state.pgSocket) {
252
- // If there's already pending data, append to it
267
+ // If there's already pending data, append and re-pause.
268
+ // (Re-pause is defensive: client should already be paused from the
269
+ // earlier partial-write, but kernel-buffered data can still arrive
270
+ // before the pause takes effect — issue #18 root cause #3.)
253
271
  if (state.pendingToPg) {
254
272
  state.pendingToPg = Buffer.concat([state.pendingToPg, data]);
273
+ socket.pause();
255
274
  return;
256
275
  }
257
276
  const written = state.pgSocket.write(data);
@@ -263,7 +282,20 @@ export class MultiTenantRouter extends EventEmitter {
263
282
  return;
264
283
  }
265
284
 
266
- // Buffer data for startup message parsing
285
+ // Buffer data for startup message parsing.
286
+ // Bound the pre-handshake buffer so a client that never completes its
287
+ // startup (or sends garbage) cannot grow state.buffer without limit —
288
+ // the 74 GiB VmSize in the production deadlock report traces to this
289
+ // path (issue #18 root cause #2).
290
+ const incomingSize = state.buffer ? state.buffer.length + data.byteLength : data.byteLength;
291
+ if (incomingSize > MAX_STARTUP_BUFFER_SIZE) {
292
+ this.logger.warn(
293
+ { incomingSize, limit: MAX_STARTUP_BUFFER_SIZE },
294
+ 'Pre-handshake buffer exceeded limit — closing connection'
295
+ );
296
+ socket.end();
297
+ return;
298
+ }
267
299
  if (state.buffer) {
268
300
  state.buffer = Buffer.concat([state.buffer, data]);
269
301
  } else {
@@ -275,9 +307,17 @@ export class MultiTenantRouter extends EventEmitter {
275
307
  }
276
308
 
277
309
  /**
278
- * Process PostgreSQL startup message and establish proxy connection
310
+ * Process PostgreSQL startup message and establish proxy connection.
311
+ *
312
+ * Guarded against async reentrancy: multiple data events arriving while
313
+ * the first processStartupMessage() is still awaiting createDatabase()
314
+ * or Bun.connect() must not launch concurrent tasks on the same state —
315
+ * they would race to assign state.pgSocket, leaking the losing sockets
316
+ * and double-writing the startup message (issue #18 root cause #1).
279
317
  */
280
318
  async processStartupMessage(socket, state) {
319
+ if (state.startupInProgress) return;
320
+
281
321
  const buffer = state.buffer;
282
322
  if (!buffer || buffer.length < 8) return; // Need at least length + protocol
283
323
 
@@ -315,6 +355,11 @@ export class MultiTenantRouter extends EventEmitter {
315
355
  const dbName = extractDatabaseName(startupMessage);
316
356
  state.dbName = dbName;
317
357
 
358
+ // Claim the reentrancy guard BEFORE the first await so subsequent data
359
+ // events (buffered into state.buffer by handleSocketData) cannot launch
360
+ // a second async task on the same state.
361
+ state.startupInProgress = true;
362
+
318
363
  try {
319
364
  // Auto-provision database if needed
320
365
  if (this.autoProvision) {
@@ -328,9 +373,13 @@ export class MultiTenantRouter extends EventEmitter {
328
373
  // Shared handler for pgSocket (used by both unix and TCP paths)
329
374
  const pgHandler = {
330
375
  data(_pgSocket, pgData) {
331
- // Forward PostgreSQL response to client with backpressure
376
+ // Forward PostgreSQL response to client with backpressure.
377
+ // Re-pause defensively when pendingToClient already exists —
378
+ // kernel-buffered PG data can arrive before the earlier pause()
379
+ // takes effect (issue #18 root cause #3).
332
380
  if (state.pendingToClient) {
333
381
  state.pendingToClient = Buffer.concat([state.pendingToClient, pgData]);
382
+ _pgSocket.pause();
334
383
  return;
335
384
  }
336
385
  const written = socket.write(pgData);
@@ -362,9 +411,18 @@ export class MultiTenantRouter extends EventEmitter {
362
411
  }
363
412
  };
364
413
 
365
- if (socketPath) {
414
+ // Safety net for issue #24: if socketPath points to a directory that was
415
+ // cleaned up (e.g. pgManager was stopped+started, or the PG subprocess
416
+ // exited unexpectedly and socketDir was reset to null but a stale cached
417
+ // path is still hanging around), fall back to TCP instead of Bun.connect
418
+ // hanging on a missing unix socket.
419
+ const useUnix = socketPath && fs.existsSync(socketPath);
420
+ if (useUnix) {
366
421
  state.pgSocket = await Bun.connect({ unix: socketPath, socket: pgHandler });
367
422
  } else {
423
+ if (socketPath && !useUnix) {
424
+ this.logger.warn({ socketPath, dbName }, 'Unix socket path stale — falling back to TCP');
425
+ }
368
426
  state.pgSocket = await Bun.connect({ hostname: '127.0.0.1', port: this.pgPort, socket: pgHandler });
369
427
  }
370
428
 
@@ -373,6 +431,13 @@ export class MultiTenantRouter extends EventEmitter {
373
431
  this.logger.error({ dbName, err: error }, 'Connection error');
374
432
  socket.end();
375
433
  this.emit('connection-error', { error, dbName });
434
+ } finally {
435
+ // Release the reentrancy guard whether handshake succeeded or not.
436
+ // If it succeeded, handshakeComplete is now true and further data
437
+ // events will bypass processStartupMessage anyway (handleSocketData
438
+ // takes the handshakeComplete path). If it failed, socket.end()
439
+ // has been called and the connection is tearing down.
440
+ state.startupInProgress = false;
376
441
  }
377
442
  }
378
443