@rljson/server 0.0.14 → 0.0.15

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -86,7 +86,9 @@ The `Node` class sits above `Server` and `Client`, bridging `@rljson/network` to
86
86
  3. **Manages transport**: Uses injectable factories (`CreateHubTransport`/`CreateClientTransport`) to create the transport layer, keeping the Node class transport-agnostic.
87
87
  4. **Agent lifecycle**: An optional `createAgent` factory in `NodeDeps` is called on every `ready` event. The returned `AgentHandle.stop()` is called before the next role transition. This enables application-level wiring (e.g. FsAgent) without circular dependencies.
88
88
  5. **Serialized transitions**: Role transitions are queued — a new `role-changed` event waits for the previous transition to complete before starting. This prevents race conditions between teardown and setup.
89
- 6. **Error resilience**: Errors in user-provided code (agent factories, transport factories) are caught and logged. The node continues functioning a failed transport degrades connectivity but doesn't crash, a failed agent leaves the node's core intact.
89
+ 6. **Hub-changed reconnect** (v0.0.14): Subscribes to `NetworkManager`'s `hub-changed` event in addition to `role-changed`. When the hub changes but the node's role stays `client`, the node tears down its connection and reconnects to the new hub. This prevents split-brain scenarios where clients remain attached to a stale hub.
90
+ 7. **Clean socket teardown** (v0.0.14): `_tearDownCurrentRole()` explicitly calls `disconnect()` on client sockets before clearing the reference. This prevents orphaned Socket.IO connections from auto-reconnecting to the old hub.
91
+ 8. **Error resilience**: Errors in user-provided code (agent factories, transport factories) are caught and logged. The node continues functioning — a failed transport degrades connectivity but doesn't crash, a failed agent leaves the node's core intact.
90
92
 
91
93
  ```text
92
94
  ┌─────────────────────────────────────────┐
package/README.blog.md CHANGED
@@ -17,3 +17,11 @@ Add posts as Markdown entries in this file (newest last). Keep each post small a
17
17
  - Why it matters
18
18
  - Links: PRs, docs, demos
19
19
  ```
20
+
21
+ ## 2026-03-20 — v0.0.14: Split-brain fix and hub-changed reconnect
22
+
23
+ - Node class now listens to `hub-changed` events from NetworkManager — clients reconnect when hub changes but role stays `client`
24
+ - `_tearDownCurrentRole()` explicitly disconnects sockets before clearing references — prevents orphaned connections
25
+ - Validated on 4-node Windows test lab: E2E Reports 18 & 19 both score **38/41 passed, 0 failures**
26
+ - Previous Report 17 showed split-brain (two simultaneous hubs, 23/41 passed) — now fully resolved
27
+ - PR: https://github.com/rljson/server/pull/14
package/README.trouble.md CHANGED
@@ -10,9 +10,42 @@ found in the LICENSE file in the root of this package.
10
10
 
11
11
  ## Table of contents <!-- omit in toc -->
12
12
 
13
+ - [Split-Brain: Clients not reconnecting on hub change (fixed in v0.0.14)](#split-brain-clients-not-reconnecting-on-hub-change-fixed-in-v0014)
13
14
  - [Vscode Windows: Debugging is not working](#vscode-windows-debugging-is-not-working)
14
15
  - [Test Isolation: Socket.IO event listener accumulation](#test-isolation-socketio-event-listener-accumulation)
15
16
 
17
+ ## Split-Brain: Clients not reconnecting on hub change (fixed in v0.0.14)
18
+
19
+ Date: 2026-03-20
20
+
21
+ **Problem:**
22
+
23
+ In a 4-node deployment, two nodes simultaneously acted as hub (split-brain). Clients stayed connected to the old hub while a new hub was elected. File sync stopped working because the hub had no real clients.
24
+
25
+ **Symptoms:**
26
+
27
+ - E2E Report 17: 23/41 passed, 18 failed
28
+ - Two nodes reporting `role=hub` simultaneously
29
+ - Files written by one hub never appearing on clients
30
+ - File counts diverging between nodes (hub accumulating files, clients stuck)
31
+
32
+ **Root Cause:**
33
+
34
+ Two bugs in the `Node` class:
35
+
36
+ 1. **Missing `hub-changed` listener**: Node only subscribed to `role-changed` from NetworkManager. When the hub changed but the node's role stayed `client`, the `role-changed` handler skipped (same role). Clients never reconnected to the new hub.
37
+
38
+ 2. **No socket disconnect on teardown**: `_tearDownCurrentRole()` set `_clientSocket = undefined` without calling `disconnect()`. The orphaned Socket.IO connection kept auto-reconnecting to the old hub (especially with the `socket.connect()` reconnect fix from v0.0.13).
39
+
40
+ **Solution (v0.0.14):**
41
+
42
+ 1. Added `_onHubChanged` listener that tears down and reconnects when hub changes while role stays `client`
43
+ 2. Added explicit `socket.disconnect()` call in `_tearDownCurrentRole()` before clearing the reference
44
+
45
+ **Validation:**
46
+
47
+ - E2E Reports 18 & 19: **38/41 passed, 0 failures** on 4-node test lab
48
+
16
49
  ## Vscode Windows: Debugging is not working
17
50
 
18
51
  Date: 2025-03-08
@@ -86,7 +86,9 @@ The `Node` class sits above `Server` and `Client`, bridging `@rljson/network` to
86
86
  3. **Manages transport**: Uses injectable factories (`CreateHubTransport`/`CreateClientTransport`) to create the transport layer, keeping the Node class transport-agnostic.
87
87
  4. **Agent lifecycle**: An optional `createAgent` factory in `NodeDeps` is called on every `ready` event. The returned `AgentHandle.stop()` is called before the next role transition. This enables application-level wiring (e.g. FsAgent) without circular dependencies.
88
88
  5. **Serialized transitions**: Role transitions are queued — a new `role-changed` event waits for the previous transition to complete before starting. This prevents race conditions between teardown and setup.
89
- 6. **Error resilience**: Errors in user-provided code (agent factories, transport factories) are caught and logged. The node continues functioning a failed transport degrades connectivity but doesn't crash, a failed agent leaves the node's core intact.
89
+ 6. **Hub-changed reconnect** (v0.0.14): Subscribes to `NetworkManager`'s `hub-changed` event in addition to `role-changed`. When the hub changes but the node's role stays `client`, the node tears down its connection and reconnects to the new hub. This prevents split-brain scenarios where clients remain attached to a stale hub.
90
+ 7. **Clean socket teardown** (v0.0.14): `_tearDownCurrentRole()` explicitly calls `disconnect()` on client sockets before clearing the reference. This prevents orphaned Socket.IO connections from auto-reconnecting to the old hub.
91
+ 8. **Error resilience**: Errors in user-provided code (agent factories, transport factories) are caught and logged. The node continues functioning — a failed transport degrades connectivity but doesn't crash, a failed agent leaves the node's core intact.
90
92
 
91
93
  ```text
92
94
  ┌─────────────────────────────────────────┐
@@ -17,3 +17,11 @@ Add posts as Markdown entries in this file (newest last). Keep each post small a
17
17
  - Why it matters
18
18
  - Links: PRs, docs, demos
19
19
  ```
20
+
21
+ ## 2026-03-20 — v0.0.14: Split-brain fix and hub-changed reconnect
22
+
23
+ - Node class now listens to `hub-changed` events from NetworkManager — clients reconnect when hub changes but role stays `client`
24
+ - `_tearDownCurrentRole()` explicitly disconnects sockets before clearing references — prevents orphaned connections
25
+ - Validated on 4-node Windows test lab: E2E Reports 18 & 19 both score **38/41 passed, 0 failures**
26
+ - Previous Report 17 showed split-brain (two simultaneous hubs, 23/41 passed) — now fully resolved
27
+ - PR: https://github.com/rljson/server/pull/14
@@ -10,9 +10,42 @@ found in the LICENSE file in the root of this package.
10
10
 
11
11
  ## Table of contents <!-- omit in toc -->
12
12
 
13
+ - [Split-Brain: Clients not reconnecting on hub change (fixed in v0.0.14)](#split-brain-clients-not-reconnecting-on-hub-change-fixed-in-v0014)
13
14
  - [Vscode Windows: Debugging is not working](#vscode-windows-debugging-is-not-working)
14
15
  - [Test Isolation: Socket.IO event listener accumulation](#test-isolation-socketio-event-listener-accumulation)
15
16
 
17
+ ## Split-Brain: Clients not reconnecting on hub change (fixed in v0.0.14)
18
+
19
+ Date: 2026-03-20
20
+
21
+ **Problem:**
22
+
23
+ In a 4-node deployment, two nodes simultaneously acted as hub (split-brain). Clients stayed connected to the old hub while a new hub was elected. File sync stopped working because the hub had no real clients.
24
+
25
+ **Symptoms:**
26
+
27
+ - E2E Report 17: 23/41 passed, 18 failed
28
+ - Two nodes reporting `role=hub` simultaneously
29
+ - Files written by one hub never appearing on clients
30
+ - File counts diverging between nodes (hub accumulating files, clients stuck)
31
+
32
+ **Root Cause:**
33
+
34
+ Two bugs in the `Node` class:
35
+
36
+ 1. **Missing `hub-changed` listener**: Node only subscribed to `role-changed` from NetworkManager. When the hub changed but the node's role stayed `client`, the `role-changed` handler skipped (same role). Clients never reconnected to the new hub.
37
+
38
+ 2. **No socket disconnect on teardown**: `_tearDownCurrentRole()` set `_clientSocket = undefined` without calling `disconnect()`. The orphaned Socket.IO connection kept auto-reconnecting to the old hub (especially with the `socket.connect()` reconnect fix from v0.0.13).
39
+
40
+ **Solution (v0.0.14):**
41
+
42
+ 1. Added `_onHubChanged` listener that tears down and reconnects when hub changes while role stays `client`
43
+ 2. Added explicit `socket.disconnect()` call in `_tearDownCurrentRole()` before clearing the reference
44
+
45
+ **Validation:**
46
+
47
+ - E2E Reports 18 & 19: **38/41 passed, 0 failures** on 4-node test lab
48
+
16
49
  ## Vscode Windows: Debugging is not working
17
50
 
18
51
  Date: 2025-03-08
package/dist/server.js CHANGED
@@ -1569,6 +1569,18 @@ class Node {
1569
1569
  await this._becomeClient();
1570
1570
  break;
1571
1571
  }
1572
+ if (!this._running) return;
1573
+ const networkRole = this._networkManager.getTopology().myRole;
1574
+ if (networkRole !== this._role && networkRole !== "unassigned") {
1575
+ this._logger.info(
1576
+ "Node",
1577
+ `Reconciling stale role: node=${this._role} → network=${networkRole}`
1578
+ );
1579
+ await this._performTransition({
1580
+ previous: this._role,
1581
+ current: networkRole
1582
+ });
1583
+ }
1572
1584
  }
1573
1585
  async _becomeHub() {
1574
1586
  await this._ioMem.init();
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@rljson/server",
3
- "version": "0.0.14",
3
+ "version": "0.0.15",
4
4
  "description": "Rljson server description",
5
5
  "homepage": "https://github.com/rljson/server",
6
6
  "bugs": "https://github.com/rljson/server/issues",