rollbridge 0.1.5 → 0.1.7
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +125 -4
- package/TODO.md +45 -43
- package/docs/cli.md +166 -6
- package/docs/config.md +172 -2
- package/docs/logging.md +77 -0
- package/docs/releasing.md +53 -0
- package/docs/tensorbuzz-runbook.md +129 -0
- package/docs/velocious.md +49 -11
- package/docs/workers.md +115 -0
- package/package.json +1 -1
- package/src/cli.js +327 -1
- package/src/config.js +268 -6
- package/src/daemon.js +216 -13
- package/src/doctor.js +177 -0
- package/src/event-log.js +47 -0
- package/src/managed-process.js +225 -16
- package/src/predeploy-cleanup.js +340 -0
- package/src/process-memory.js +110 -0
- package/src/recover.js +134 -0
- package/src/release-group.js +71 -21
- package/src/state-store.js +103 -0
- package/src/system-ids.js +71 -0
- package/src/template.js +32 -0
- package/test/completion.test.js +64 -0
- package/test/config-validation.test.js +268 -0
- package/test/doctor.test.js +205 -3
- package/test/event-log.test.js +46 -0
- package/test/fixtures/memory-hog.js +19 -0
- package/test/managed-process.test.js +290 -0
- package/test/predeploy-cleanup.test.js +131 -0
- package/test/process-memory.test.js +40 -0
- package/test/recover.test.js +162 -0
- package/test/release-group.test.js +22 -0
- package/test/rollbridge.test.js +523 -6
- package/test/state-store.test.js +69 -0
- package/test/system-ids.test.js +24 -0
package/README.md
CHANGED
|
@@ -84,7 +84,10 @@ more or fewer lines for chatty or quiet processes.
|
|
|
84
84
|
Set `control.mode` to an octal permission string (for example `"660"`) to
|
|
85
85
|
chmod the control socket after it binds. This restricts which users can send
|
|
86
86
|
control commands — useful when several deploy users share a group. When unset,
|
|
87
|
-
the socket keeps the default permissions from the daemon's umask.
|
|
87
|
+
the socket keeps the default permissions from the daemon's umask. Pair it with
|
|
88
|
+
`control.owner` and `control.group` (a numeric id or a user/group name) to
|
|
89
|
+
`chown` the socket to a shared deploy group; names resolve via
|
|
90
|
+
`/etc/passwd`/`/etc/group`, and the daemon must run as a user allowed to chown it.
|
|
88
91
|
|
|
89
92
|
Set the proxied process's `health.startDelayMs` (default `0`) to wait that long
|
|
90
93
|
after the process starts before the first health probe — like a readiness
|
|
@@ -103,6 +106,51 @@ restart. With no `restart` block, a crashed process keeps restarting after
|
|
|
103
106
|
restart: {maxRestarts: 5, windowMs: 60000, backoffFactor: 2, maxDelayMs: 30000}
|
|
104
107
|
```
|
|
105
108
|
|
|
109
|
+
Set a process's `memory` policy to supervise its resident memory (RSS) and
|
|
110
|
+
gracefully restart it when it grows too large. `memory.limitBytes` is the RSS
|
|
111
|
+
limit (measured across the whole process group, not just the wrapper);
|
|
112
|
+
`memory.warnBytes` logs a warning before the limit; `memory.checkIntervalMs`
|
|
113
|
+
(default `5000`) sets how often RSS is sampled. A memory restart is reported in
|
|
114
|
+
`status` and recorded in `events` (a `process started` with `reason: "memory"`).
|
|
115
|
+
See [`docs/config.md`](docs/config.md#processesmemory).
|
|
116
|
+
|
|
117
|
+
```js
|
|
118
|
+
memory: {limitBytes: 536870912, warnBytes: 402653184, checkIntervalMs: 5000}
|
|
119
|
+
```
|
|
120
|
+
|
|
121
|
+
Set a process's `stopSignal` (default `"SIGTERM"`) to the signal it quiets on, so
|
|
122
|
+
a worker finishes its in-flight work before exiting. Rollbridge sends `stopSignal`
|
|
123
|
+
to gracefully stop the process and `SIGKILL`s it only if it hasn't exited within
|
|
124
|
+
`gracefulStopMs`. For example, a job worker that drains on `SIGINT`:
|
|
125
|
+
|
|
126
|
+
```js
|
|
127
|
+
{id: "worker", policy: "companion", command: "…", stopSignal: "SIGINT", gracefulStopMs: 60000}
|
|
128
|
+
```
|
|
129
|
+
|
|
130
|
+
Set `replicas` on a port-less `companion` to run a pool of identical workers.
|
|
131
|
+
Each instance runs as `<id>#<index>` (`worker#0`, `worker#1`, …) — visible in
|
|
132
|
+
`status` and targetable by `rollbridge restart` (base id for all, `worker#0` for
|
|
133
|
+
one) — and gets `{{replicaIndex}}`/`{{replicaCount}}` and
|
|
134
|
+
`ROLLBRIDGE_REPLICA_INDEX`/`_COUNT` so each instance can pick a distinct shard or
|
|
135
|
+
queue. See [`docs/config.md`](docs/config.md#processesreplicas).
|
|
136
|
+
|
|
137
|
+
```js
|
|
138
|
+
{id: "worker", policy: "companion", command: "npx velocious background-jobs-worker", replicas: 4}
|
|
139
|
+
```
|
|
140
|
+
|
|
141
|
+
For workers that quiesce or drain via a command, set a `lifecycle` block —
|
|
142
|
+
Rollbridge runs `quietCommand`, then drains (`drainCommand`/`drainTimeoutMs`),
|
|
143
|
+
then `stopCommand`/`stopSignal`, then `SIGKILL` after `gracefulStopMs` when
|
|
144
|
+
gracefully stopping the process. Each hook is bounded so it can't wedge a stop.
|
|
145
|
+
|
|
146
|
+
Set `nonBlockingDrain: true` on a worker companion to start its graceful stop the
|
|
147
|
+
moment its release is retired — in parallel with the proxied connection drain,
|
|
148
|
+
not after it — so new workers handle new work while the old workers finish theirs.
|
|
149
|
+
|
|
150
|
+
See [`docs/workers.md`](docs/workers.md) for the full safe background-job worker
|
|
151
|
+
deployment pattern — companion policy, `replicas`, and finishing in-flight jobs
|
|
152
|
+
on deploy with `stopSignal`/`lifecycle` + `gracefulStopMs`.
|
|
153
|
+
|
|
106
154
|
Set `releaseRetention` to bound how many stopped (drained) releases the daemon
|
|
107
155
|
keeps in memory and reports in `status`. `keep` (default `10`) retains the most
|
|
108
156
|
recent stopped releases; `maxAgeMs` (default `0`, disabled) also prunes stopped
|
|
@@ -114,6 +162,32 @@ owns cleaning up on-disk release directories.
|
|
|
114
162
|
releaseRetention: {keep: 5, maxAgeMs: 86400000}
|
|
115
163
|
```
|
|
116
164
|
|
|
165
|
+
Set `statePath` to have the daemon persist its state to a file (active/draining
|
|
166
|
+
releases, process pids, counters, recent events). On the next startup it reads
|
|
167
|
+
any leftover file and reports managed processes still alive from a daemon that
|
|
168
|
+
didn't shut down cleanly — advisory orphan detection. After a crash, run
|
|
169
|
+
`rollbridge recover` to list those leftovers and `rollbridge recover --force` to
|
|
170
|
+
stop them before restarting the daemon. A clean `shutdown` removes the file. See
|
|
171
|
+
[`docs/config.md`](docs/config.md#statepath).
|
|
172
|
+
|
|
173
|
+
```js
|
|
174
|
+
statePath: "/var/lib/rollbridge/ticket-server.state.json"
|
|
175
|
+
```
|
|
176
|
+
|
|
177
|
+
During the first migration from an old supervisor, set `legacyTakeover` and run
|
|
178
|
+
`rollbridge predeploy-cleanup --release-path <path>` before `rollbridge deploy`.
|
|
179
|
+
Rollbridge will only stop configured legacy processes when no reusable active
|
|
180
|
+
Rollbridge release is running.
|
|
181
|
+
|
|
182
|
+
```js
|
|
183
|
+
legacyTakeover: {
|
|
184
|
+
screens: ["ticket-server"],
|
|
185
|
+
processes: [
|
|
186
|
+
{name: "legacy web", includes: ["/home/dev/ticket-server/", "velocious server", "--port 8082"]}
|
|
187
|
+
]
|
|
188
|
+
}
|
|
189
|
+
```
|
|
190
|
+
|
|
117
191
|
A function export receives no arguments and lets you build the config at load
|
|
118
192
|
time:
|
|
119
193
|
|
|
@@ -143,7 +217,10 @@ Referencing a placeholder with no value (including an unset `{{env.<NAME>}}`)
|
|
|
143
217
|
fails the process start with a clear error, so typos surface immediately.
|
|
144
218
|
|
|
145
219
|
Production-ready examples live in `examples/`, including
|
|
146
|
-
`examples/tensorbuzz.com.js` for the current TensorBuzz backend deployment
|
|
220
|
+
`examples/tensorbuzz.com.js` for the current TensorBuzz backend deployment; see
|
|
221
|
+
[`docs/tensorbuzz-runbook.md`](docs/tensorbuzz-runbook.md) for the matching
|
|
222
|
+
production runbook (ports, deploy ordering, rollback constraints, and day-to-day
|
|
223
|
+
operations).
|
|
147
224
|
|
|
148
225
|
See [`docs/velocious.md`](docs/velocious.md) for a Velocious deployment guide —
|
|
149
226
|
how Beacon, background-jobs-main, background-jobs-worker, and the web process map
|
|
@@ -348,8 +425,12 @@ rollbridge status --config rollbridge.js
|
|
|
348
425
|
|
|
349
426
|
`status` reports each managed process's `state`, `pid`, recent `logs`, last
|
|
350
427
|
`exitCode`/`exitSignal`, and — per process — its automatic-restart count
|
|
351
|
-
(`restarts`), last start time (`startedAt`),
|
|
352
|
-
|
|
428
|
+
(`restarts`), last start time (`startedAt`), current `uptimeMs` while running,
|
|
429
|
+
and why it last started (`lastStartReason`: `deploy`, `crash`, `manual`, or
|
|
430
|
+
`memory`). The same reason appears on each `process started` entry in
|
|
431
|
+
`rollbridge events`. For memory-supervised processes it also reports current
|
|
432
|
+
`rssBytes`, `memoryRestarts`, `lastMemoryRestartAt`, and `children` (the sampled
|
|
433
|
+
process tree — each group member's `pid`, `command`, and `rssBytes`).
|
|
353
434
|
|
|
354
435
|
Print the recent captured stdout/stderr per process (a one-shot snapshot of the
|
|
355
436
|
retained `outputLines`, not a live stream):
|
|
@@ -359,12 +440,31 @@ rollbridge logs --config rollbridge.js
|
|
|
359
440
|
rollbridge logs --config rollbridge.js --process web
|
|
360
441
|
```
|
|
361
442
|
|
|
443
|
+
Print the daemon's recent structured event history — deploys, traffic switches,
|
|
444
|
+
release stops, process crashes/restarts, and failed commands (the most recent
|
|
445
|
+
1000 events, in memory):
|
|
446
|
+
|
|
447
|
+
```bash
|
|
448
|
+
rollbridge events --config rollbridge.js
|
|
449
|
+
rollbridge events --config rollbridge.js --limit 20
|
|
450
|
+
```
|
|
451
|
+
|
|
362
452
|
Stop the active release:
|
|
363
453
|
|
|
364
454
|
```bash
|
|
365
455
|
rollbridge stop --config rollbridge.js
|
|
366
456
|
```
|
|
367
457
|
|
|
458
|
+
Roll back to a previous release — re-starts it, health-checks it, and switches
|
|
459
|
+
traffic back (defaults to the most recently retired release; a failed rollback
|
|
460
|
+
leaves the current release active). Rollback manages processes only, not
|
|
461
|
+
database migrations:
|
|
462
|
+
|
|
463
|
+
```bash
|
|
464
|
+
rollbridge rollback --config rollbridge.js # the previous release
|
|
465
|
+
rollbridge rollback --config rollbridge.js --release-id v3
|
|
466
|
+
```
|
|
467
|
+
|
|
368
468
|
Restart non-proxied processes in place — all of them, one by id, or a policy
|
|
369
469
|
group (the proxied process is never restarted; use `deploy` for that):
|
|
370
470
|
|
|
@@ -380,6 +480,20 @@ Shut down the daemon and managed processes:
|
|
|
380
480
|
rollbridge shutdown --config rollbridge.js
|
|
381
481
|
```
|
|
382
482
|
|
|
483
|
+
Prepare a first Rollbridge deploy by recovering Rollbridge-managed orphans and
|
|
484
|
+
stopping configured legacy processes:
|
|
485
|
+
|
|
486
|
+
```bash
|
|
487
|
+
rollbridge predeploy-cleanup --config rollbridge.js --release-path /srv/app/current
|
|
488
|
+
```
|
|
489
|
+
|
|
490
|
+
Enable shell completion (bash or zsh) for command names and option flags:
|
|
491
|
+
|
|
492
|
+
```bash
|
|
493
|
+
source <(rollbridge completion bash) # add to ~/.bashrc
|
|
494
|
+
source <(rollbridge completion zsh) # add to ~/.zshrc
|
|
495
|
+
```
|
|
496
|
+
|
|
383
497
|
## Nginx
|
|
384
498
|
|
|
385
499
|
Nginx should proxy to Rollbridge, not directly to Velocious:
|
|
@@ -432,6 +546,10 @@ The daemon is long-lived and survives deploys. **Deploy with
|
|
|
432
546
|
release paths are passed per deploy. Use `command -v rollbridge` to find the
|
|
433
547
|
absolute CLI path for `ExecStart`.
|
|
434
548
|
|
|
549
|
+
See [`docs/logging.md`](docs/logging.md) for where the daemon's JSON logs go
|
|
550
|
+
(stdout / journald / the `--daemon-log-path` file) and how to rotate them — the
|
|
551
|
+
daemon holds its log file open, so logrotate needs `copytruncate`.
|
|
552
|
+
|
|
435
553
|
## Deployment Notes
|
|
436
554
|
|
|
437
555
|
Run migrations before `rollbridge deploy`, and keep migrations backwards-compatible while old and new web releases overlap. For stable local brokers such as Velocious Beacon or `background-jobs-main`, use `service` when the process should survive deploys and restart from the latest successful release if it crashes.
|
|
@@ -450,6 +568,9 @@ The release script owns the package version bump, lockfile update, default-branc
|
|
|
450
568
|
commit, push, and npm publish. Do not run `npm version` manually before running
|
|
451
569
|
it.
|
|
452
570
|
|
|
571
|
+
See [`docs/releasing.md`](docs/releasing.md) for the maintainer release checklist
|
|
572
|
+
— the pre-flight checks before `npm run release:patch` and what to verify after.
|
|
573
|
+
|
|
453
574
|
## License
|
|
454
575
|
|
|
455
576
|
Rollbridge is released under the [MIT License](LICENSE).
|
package/TODO.md
CHANGED
|
@@ -19,57 +19,58 @@ This roadmap tracks planned Rollbridge features and documentation. Rollbridge sh
|
|
|
19
19
|
|
|
20
20
|
## Major Features
|
|
21
21
|
|
|
22
|
-
- [
|
|
23
|
-
- [
|
|
24
|
-
- [
|
|
25
|
-
- [
|
|
26
|
-
- [
|
|
27
|
-
- [
|
|
28
|
-
- [
|
|
22
|
+
- [x] Memory supervision.
|
|
23
|
+
- [x] Add per-process memory config with an RSS limit, check interval, warning threshold, and restart policy.
|
|
24
|
+
- [x] Measure the managed process tree, not only the shell wrapper PID. (Sums RSS across the process group via `/proc`.)
|
|
25
|
+
- [x] Report memory stats and last memory-triggered restart in `status`.
|
|
26
|
+
- [x] Restart memory-heavy workers gracefully when possible, with a forced stop timeout.
|
|
27
|
+
- [x] Add tests with a fixture process that allocates memory above the configured limit.
|
|
28
|
+
- [x] Worker auto-restart and restart policy controls.
|
|
29
29
|
- [x] Add config for max restarts, restart window, exponential backoff, and disabled restart behavior (per-process `restart` policy).
|
|
30
|
-
- [
|
|
30
|
+
- [x] Distinguish crash restarts, deploy replacements, manual restarts, and memory restarts in status/events. (Per-process `lastStartReason` + a `reason` on the `process started` event; the `memory` reason is wired and fires once memory supervision restarts a process.)
|
|
31
31
|
- [x] Add a `restart` CLI command for a single process, a policy group, or all non-proxied workers.
|
|
32
|
-
- [
|
|
33
|
-
- [
|
|
34
|
-
- [
|
|
35
|
-
- [
|
|
36
|
-
- [
|
|
37
|
-
- [
|
|
38
|
-
- [
|
|
39
|
-
- [
|
|
40
|
-
- [
|
|
41
|
-
- [
|
|
42
|
-
- [
|
|
43
|
-
- [
|
|
44
|
-
- [
|
|
45
|
-
- [
|
|
46
|
-
- [
|
|
47
|
-
- [
|
|
48
|
-
- [
|
|
49
|
-
- [
|
|
50
|
-
- [
|
|
51
|
-
- [
|
|
52
|
-
- [
|
|
53
|
-
- [
|
|
54
|
-
- [
|
|
32
|
+
- [x] Keep restart behavior safe for job workers by using lifecycle hooks before termination. (Manual restart, memory restart, and deploy-drain stops all run the `lifecycle` hooks via `stop()`.)
|
|
33
|
+
- [x] Graceful job-worker lifecycle.
|
|
34
|
+
- [x] Add generic lifecycle hooks such as `quietCommand`, `drainCommand`, `drainTimeoutMs`, and `stopCommand` (per-process `lifecycle`).
|
|
35
|
+
- [x] Support signal-only lifecycle steps for workers that can quiet on a Unix signal. (Per-process `stopSignal`; sent before the `SIGKILL`-after-`gracefulStopMs` fallback.)
|
|
36
|
+
- [x] Add a non-blocking drain mode so new workers can start while old workers finish running jobs (per-process `nonBlockingDrain`; drains the worker in parallel with the connection drain).
|
|
37
|
+
- [x] Document a Velocious background-jobs-worker recipe once the lifecycle contract is implemented (`docs/velocious.md` → Worker recipe).
|
|
38
|
+
- [x] Replicas and stable worker indexes. (Supported on port-less `companion` processes; `proxied`/`singleton`/ported processes stay single.)
|
|
39
|
+
- [x] Allow one process config to start multiple replicas (`replicas`, companion-only for now).
|
|
40
|
+
- [x] Expose `ROLLBRIDGE_REPLICA_INDEX`, replica count, and per-replica template context (`{{replicaIndex}}`/`{{replicaCount}}`).
|
|
41
|
+
- [x] Restart or stop one replica without affecting the rest (`rollbridge restart --process worker#0`).
|
|
42
|
+
- [x] Preserve readable status output for replica groups (each instance shown as `<id>#<index>`).
|
|
43
|
+
- [x] Persistent daemon state and recovery.
|
|
44
|
+
- [x] Persist active release, draining releases, process metadata, counters, and recent events (opt-in `statePath`; atomic snapshot on change + periodic).
|
|
45
|
+
- [x] Reconnect status to still-running child processes after daemon restart where possible. (Feasible subset: `status` now includes an `orphans` array — still-alive managed processes from the prior daemon's persisted state, re-checked each call. Full re-management/stdout-exit re-attach stays infeasible; the daemon reports them and `rollbridge recover` stops them.)
|
|
46
|
+
- [x] Detect and report orphaned Rollbridge-managed processes. (On startup, reports persisted process pids that are still alive; advisory, see `statePath`.)
|
|
47
|
+
- [x] Add a recovery mode for safe startup after daemon crash or machine reboot. (`rollbridge recover` lists orphaned processes from the persisted state and, with `--force`, stops them and clears the state; refuses while a daemon is running.)
|
|
48
|
+
- [x] Rollback support.
|
|
49
|
+
- [x] Keep enough release metadata to switch traffic back to a previous healthy release.
|
|
50
|
+
- [x] Add a `rollback` CLI command that health-checks the target before switching.
|
|
51
|
+
- [x] Define how rollback interacts with singleton workers and draining releases. (Reuses the deploy flow: replaces singletons and drains the current release.)
|
|
52
|
+
- [x] Document migration constraints for rollback.
|
|
53
|
+
- [x] Observability and diagnostics.
|
|
54
|
+
- [x] Add structured event history for deploys, switches, stops, crashes, memory restarts, and failed commands. (In-memory `EventLog` tapping the daemon logger; memory-restart events populate once memory supervision logs them.)
|
|
55
55
|
- [x] Add restart counters and uptime to status (exit reasons already reported via `exitCode`/`exitSignal`/`state`).
|
|
56
|
-
- [
|
|
56
|
+
- [x] Add memory stats and child-process-tree details to status (with memory supervision). (`rssBytes`/`memoryRestarts`/`lastMemoryRestartAt` plus `children`: the sampled process tree with each member's pid, command, and RSS.)
|
|
57
57
|
- [x] Add a `logs` CLI command (recent per-process output from status).
|
|
58
|
-
- [
|
|
59
|
-
- [
|
|
58
|
+
- [x] Add an `events` CLI command (after structured event history lands).
|
|
59
|
+
- [x] Add optional file logging with rotation guidance (`docs/logging.md`; daemon log file via `--daemon-log-path`, logrotate `copytruncate`).
|
|
60
60
|
- [x] Add machine-readable JSON output for all CLI commands (data commands print JSON; `validate`/`doctor`/`logs` take `--json`).
|
|
61
|
-
- [
|
|
61
|
+
- [x] Config validation and doctoring.
|
|
62
62
|
- [x] Add `validate` to parse config and report all config errors without starting the daemon.
|
|
63
63
|
- [x] Add `doctor` to check config validity, control socket reachability, proxy port availability, and control-socket directory writability.
|
|
64
|
-
- [
|
|
64
|
+
- [x] Extend `doctor` with state-path checks: state-path directory writability and orphaned-process reporting from a prior state file.
|
|
65
|
+
- [x] Extend `doctor` with process-command and release-path checks once those are resolvable (they need per-release rendered templates, which only exist at deploy time). (`rollbridge doctor --release-path <path>` renders each process's command/cwd/env against that release and checks the release directory, template resolvability, and rendered working directories; uses representative ports and replica index 0.)
|
|
65
66
|
- [x] Validate duplicate process IDs, missing ports on proxied processes, invalid ranges, and the single-proxied-process policy rule.
|
|
66
|
-
- [
|
|
67
|
+
- [x] Validate unsupported lifecycle-hook combinations once worker lifecycle hooks land. (`lifecycle.drainCommand` requires a positive `drainTimeoutMs`; `nonBlockingDrain` is companion-only; a `lifecycle.stopCommand` may not be combined with a custom `stopSignal`, since the command runs instead of the signal.)
|
|
67
68
|
- [x] Include example fixes in validation output.
|
|
68
69
|
|
|
69
70
|
## Minor Features
|
|
70
71
|
|
|
71
72
|
- [x] Add a control-socket permission option (`control.mode`) for shared deploy users.
|
|
72
|
-
- [
|
|
73
|
+
- [x] Add control-socket owner/group options for shared deploy users (`control.owner`/`control.group`, numeric id or name resolved via `/etc/passwd`/`/etc/group`).
|
|
73
74
|
- [x] Make stale control socket diagnostics clearer when another daemon is still alive.
|
|
74
75
|
- [x] Add old-release cleanup policies by age, count, and stopped state (`releaseRetention`).
|
|
75
76
|
- [x] Add port allocation diagnostics when a range is exhausted.
|
|
@@ -77,7 +78,7 @@ This roadmap tracks planned Rollbridge features and documentation. Rollbridge sh
|
|
|
77
78
|
- [x] Add process output retention config instead of a fixed recent-log count.
|
|
78
79
|
- [x] Add environment variable interpolation from the daemon environment.
|
|
79
80
|
- [x] Add `--config` default lookup resolving to `rollbridge.js` when no path is given.
|
|
80
|
-
- [
|
|
81
|
+
- [x] Add shell completion generation for common shells (`rollbridge completion bash|zsh`).
|
|
81
82
|
- [x] Add npm package metadata such as repository, license, bugs, and homepage.
|
|
82
83
|
- [x] Add systemd service examples for the Rollbridge daemon.
|
|
83
84
|
- [x] Add tests for malformed control socket JSON and unknown control commands.
|
|
@@ -89,12 +90,13 @@ This roadmap tracks planned Rollbridge features and documentation. Rollbridge sh
|
|
|
89
90
|
- [x] Write a full config reference covering every field, default, and template variable (`docs/config.md`).
|
|
90
91
|
- [x] Write a CLI reference for `daemon`, `ensure-daemon`, `deploy`, `status`, `stop`, `shutdown`, and future commands (`docs/cli.md`).
|
|
91
92
|
- [x] Expand process policy docs with deployment examples for `proxied`, `companion`, `singleton`, and `service`.
|
|
92
|
-
- [
|
|
93
|
-
- [
|
|
93
|
+
- [x] Document memory checks and auto-restart behavior after the feature lands (`docs/config.md` → `processes[].memory`).
|
|
94
|
+
- [x] Document safe background-job deployment patterns (`docs/workers.md`: companion + `replicas` + `stopSignal` + `gracefulStopMs`, old/new worker overlap).
|
|
95
|
+
- [x] Document worker lifecycle hooks (`docs/config.md` → `processes[].lifecycle`, `docs/workers.md`).
|
|
94
96
|
- [x] Add a Velocious deployment guide with Beacon, background-jobs-main, background-jobs-worker, and web process examples (`docs/velocious.md`).
|
|
95
97
|
- [x] Add an Nginx guide with WebSocket headers, timeouts, and common failure modes (`docs/nginx.md`).
|
|
96
98
|
- [x] Add deploy-tool recipes that call Rollbridge CLI commands directly (`docs/deploy-recipes.md`).
|
|
97
99
|
- [x] Add a Capistrano recipe showing shell commands only; do not add a Capistrano plugin or Rollbridge-specific Capistrano tasks (`docs/deploy-recipes.md`).
|
|
98
|
-
- [
|
|
100
|
+
- [x] Add a TensorBuzz-specific runbook for current production ports, external services, deploy ordering, and rollback constraints (`docs/tensorbuzz-runbook.md`).
|
|
99
101
|
- [x] Add troubleshooting docs for health-check failures, port conflicts, stale sockets, crash loops, and stuck draining releases (`docs/troubleshooting.md`).
|
|
100
|
-
- [
|
|
102
|
+
- [x] Add a release checklist for maintainers using `npm run release:patch` (`docs/releasing.md`).
|
package/docs/cli.md
CHANGED
|
@@ -46,7 +46,8 @@ already accepting commands, waits until it responds, then prints the daemon
|
|
|
46
46
|
status JSON. Idempotent — safe to call before every deploy.
|
|
47
47
|
|
|
48
48
|
- `--daemon-log-path <path>` — file the detached daemon's stdout/stderr is
|
|
49
|
-
appended to. Default: `/tmp/rollbridge-<application>.log`.
|
|
49
|
+
appended to. Default: `/tmp/rollbridge-<application>.log`. See
|
|
50
|
+
[`logging.md`](logging.md) for the log format and rotation guidance.
|
|
50
51
|
- `--daemon-pid-path <path>` — file the detached daemon's PID is written to.
|
|
51
52
|
Default: `/tmp/rollbridge-<application>.pid`.
|
|
52
53
|
- `--daemon-start-timeout-ms <ms>` — how long to wait for the daemon to accept
|
|
@@ -79,6 +80,35 @@ active and the command errors.
|
|
|
79
80
|
- `--ensure-daemon` — start the daemon first if it isn't running (honors the
|
|
80
81
|
same `--daemon-*` options as `ensure-daemon`).
|
|
81
82
|
|
|
83
|
+
## `rollback`
|
|
84
|
+
|
|
85
|
+
```
|
|
86
|
+
rollbridge rollback [--config <path>] [--release-id <id>]
|
|
87
|
+
```
|
|
88
|
+
|
|
89
|
+
Rolls back to a previously-active release by re-running the deploy flow on its
|
|
90
|
+
retained metadata: it re-starts that release, health-checks the proxied process,
|
|
91
|
+
switches traffic, replaces singletons, and drains the current release — exactly
|
|
92
|
+
like a deploy. With no `--release-id`, it targets the **most recently retired**
|
|
93
|
+
release (the one active just before the current). Prints the same
|
|
94
|
+
`{"activeReleaseId", "previousReleaseId"}` result as `deploy`.
|
|
95
|
+
|
|
96
|
+
Because rollback reuses the deploy flow, a failed rollback (the target won't
|
|
97
|
+
start or health-check) leaves the current release active and errors — it never
|
|
98
|
+
takes the site down. Singletons are replaced (old stopped, then the target's
|
|
99
|
+
started) and the current release is drained, just like any deploy.
|
|
100
|
+
|
|
101
|
+
Errors when there is no previous release, the `--release-id` is not a retained
|
|
102
|
+
release, or the target is already active. Only releases Rollbridge still retains
|
|
103
|
+
(see [`releaseRetention`](config.md#releaseretention)) can be rolled back to.
|
|
104
|
+
|
|
105
|
+
**Migration constraints.** Rollback only manages processes — it does **not**
|
|
106
|
+
revert database migrations or other external state. The target release's on-disk
|
|
107
|
+
directory must still exist, and its code must be compatible with the current
|
|
108
|
+
schema. Keep migrations backwards-compatible (the same rule that lets old and
|
|
109
|
+
new releases overlap during a deploy) so rolling code back to a retained release
|
|
110
|
+
stays safe.
|
|
111
|
+
|
|
82
112
|
## `status`
|
|
83
113
|
|
|
84
114
|
```
|
|
@@ -87,8 +117,19 @@ rollbridge status [--config <path>]
|
|
|
87
117
|
|
|
88
118
|
Prints the daemon status JSON: the active release id, the proxy address, and —
|
|
89
119
|
per release, service, and singleton process — its `state`, `pid`, automatic
|
|
90
|
-
`restarts`, `startedAt`, `uptimeMs`, last `exitCode`/`exitSignal`,
|
|
91
|
-
`logs`.
|
|
120
|
+
`restarts`, `startedAt`, `uptimeMs`, last `exitCode`/`exitSignal`,
|
|
121
|
+
`lastStartReason` (`deploy`, `crash`, `manual`, or `memory`), and recent `logs`.
|
|
122
|
+
Memory-supervised processes also report `rssBytes`, `memoryRestarts`,
|
|
123
|
+
`lastMemoryRestartAt`, and `children` (the process tree: each group member's
|
|
124
|
+
`pid`, `command`, and `rssBytes`).
|
|
125
|
+
|
|
126
|
+
When [`statePath`](config.md#statepath) is configured, status also includes an
|
|
127
|
+
`orphans` array: managed processes from a **previous** daemon that are still
|
|
128
|
+
alive (`id`, `pid`, `releaseId`) — for example after the daemon restarted but its
|
|
129
|
+
detached children kept running. It is empty in the normal case. Liveness is
|
|
130
|
+
re-checked on each call, so the list clears itself as you stop the leftovers (see
|
|
131
|
+
[`recover`](#recover)). These are reported only — the new daemon can't re-adopt
|
|
132
|
+
them.
|
|
92
133
|
|
|
93
134
|
## `stop`
|
|
94
135
|
|
|
@@ -123,6 +164,49 @@ managed process (unknown, or a companion with no active release) is also an
|
|
|
123
164
|
error. Restarting a `service` bounces a shared broker (for example Velocious
|
|
124
165
|
Beacon), which briefly disrupts every process that depends on it.
|
|
125
166
|
|
|
167
|
+
## `predeploy-cleanup`
|
|
168
|
+
|
|
169
|
+
```
|
|
170
|
+
rollbridge predeploy-cleanup [--config <path>] [--release-path <path>]
|
|
171
|
+
```
|
|
172
|
+
|
|
173
|
+
Prepares a host for the first Rollbridge deploy. If a Rollbridge daemon already
|
|
174
|
+
has an active release, the command exits without stopping anything. Otherwise it
|
|
175
|
+
recovers Rollbridge-managed orphans from `statePath` and stops the legacy
|
|
176
|
+
processes configured in [`legacyTakeover`](config.md#legacytakeover), then exits
|
|
177
|
+
before `rollbridge deploy` starts the new daemon/proxy.
|
|
178
|
+
|
|
179
|
+
When `--release-path` is provided, the command also restarts the existing daemon
|
|
180
|
+
if the active release uses a different Rollbridge package version than the
|
|
181
|
+
pending release. It also restarts the daemon when the active daemon's proxy host,
|
|
182
|
+
port, or upstream host differs from the pending config.
|
|
183
|
+
|
|
184
|
+
Use it immediately before `rollbridge deploy --ensure-daemon` when migrating an
|
|
185
|
+
app from `screen`, `process_bot`, or another old supervisor to Rollbridge.
|
|
186
|
+
|
|
187
|
+
## `recover`
|
|
188
|
+
|
|
189
|
+
```
|
|
190
|
+
rollbridge recover [--config <path>] [--force]
|
|
191
|
+
```
|
|
192
|
+
|
|
193
|
+
Cleans up orphaned managed processes left by a **crashed** daemon. It reads the
|
|
194
|
+
persisted state ([`statePath`](config.md#statepath)) and finds managed processes
|
|
195
|
+
whose pids are still alive. Without `--force` it only **lists** them (a dry run);
|
|
196
|
+
with `--force` it stops each one's process group (`SIGTERM`, then `SIGKILL` after
|
|
197
|
+
`proxy.forceStopTimeoutMs`) and clears the stale state file.
|
|
198
|
+
|
|
199
|
+
Run it **before** restarting the daemon after a crash. It refuses to run while a
|
|
200
|
+
daemon (or another process) holds the control socket — those pids belong to a
|
|
201
|
+
live daemon, not a crash. A recycled pid can be a false positive, so review the
|
|
202
|
+
dry-run list before using `--force`.
|
|
203
|
+
|
|
204
|
+
If `--force` cannot stop some orphan (for example one now owned by another user,
|
|
205
|
+
so it can't be signaled), that process is reported as still running, the state
|
|
206
|
+
file is **kept** so you can investigate and re-run `recover`, and the command
|
|
207
|
+
exits non-zero. Requires `statePath`; also exits non-zero when it is unset or a
|
|
208
|
+
daemon is running.
|
|
209
|
+
|
|
126
210
|
## `shutdown`
|
|
127
211
|
|
|
128
212
|
```
|
|
@@ -146,15 +230,53 @@ issue with an example fix. Exits `1` when issues are found. With `--json`, print
|
|
|
146
230
|
## `doctor`
|
|
147
231
|
|
|
148
232
|
```
|
|
149
|
-
rollbridge doctor [--config <path>]
|
|
233
|
+
rollbridge doctor [--config <path>]
|
|
234
|
+
[--release-path <path>]
|
|
235
|
+
[--release-id <id>]
|
|
236
|
+
[--revision <sha>]
|
|
237
|
+
[--json]
|
|
150
238
|
```
|
|
151
239
|
|
|
152
240
|
Validates the config, then probes the environment: whether a daemon already
|
|
153
241
|
holds the control socket, whether the control socket's directory is writable,
|
|
154
|
-
and whether the proxy port can be bound.
|
|
155
|
-
|
|
242
|
+
and whether the proxy port can be bound. When [`statePath`](config.md#statepath)
|
|
243
|
+
is configured, it also checks that the state file's directory is writable and
|
|
244
|
+
reports any **orphaned processes** — managed processes still alive in a prior
|
|
245
|
+
state file, left by a daemon that didn't shut down cleanly (advisory; a recycled
|
|
246
|
+
pid can be a false positive, so verify before stopping). Exits `1` when any check
|
|
247
|
+
fails (so a green `doctor` means a fresh daemon can start). With `--json`, prints
|
|
156
248
|
`{"checks": [{"name", "ok", "detail"}], "ok"}`.
|
|
157
249
|
|
|
250
|
+
### Pre-flighting a release with `--release-path`
|
|
251
|
+
|
|
252
|
+
Process commands, working directories, and env values are
|
|
253
|
+
[templates](config.md#template-variables) (`{{releasePath}}`, `{{port}}`, …) that
|
|
254
|
+
are only rendered at deploy time, against a specific release. Pass
|
|
255
|
+
`--release-path <path>` to a **prepared release directory** to add deploy-time
|
|
256
|
+
checks against it:
|
|
257
|
+
|
|
258
|
+
- **release path** — the release directory exists.
|
|
259
|
+
- **process templates** — every process's `command`, `cwd`, and `env` templates
|
|
260
|
+
resolve (no `{{…}}` references an undefined variable). Ports are rendered with
|
|
261
|
+
the low end of each process's configured range.
|
|
262
|
+
- **process working directories** — each process's rendered `cwd` (defaulting to
|
|
263
|
+
the release path) exists.
|
|
264
|
+
|
|
265
|
+
`--release-id` and `--revision` set `{{releaseId}}`/`{{revision}}` for rendering
|
|
266
|
+
(defaulting the way `deploy` does: `--release-id` falls back to `--revision` or
|
|
267
|
+
the release path's basename, and `--revision` falls back to `--release-id`). Run
|
|
268
|
+
it as part of a deploy pipeline, after preparing the release and before
|
|
269
|
+
`rollbridge deploy`, to catch a template typo or a missing directory before
|
|
270
|
+
traffic is involved:
|
|
271
|
+
|
|
272
|
+
```bash
|
|
273
|
+
rollbridge doctor --config /etc/rollbridge/app.js --release-path /srv/app/releases/20260524
|
|
274
|
+
```
|
|
275
|
+
|
|
276
|
+
These checks render replica index `0` and use representative ports, so they
|
|
277
|
+
catch template and path problems but not values that only exist once the daemon
|
|
278
|
+
allocates real ports and spawns processes.
|
|
279
|
+
|
|
158
280
|
## `logs`
|
|
159
281
|
|
|
160
282
|
```
|
|
@@ -166,6 +288,44 @@ snapshot of each process's `outputLines`, not a live stream. `--process <id>`
|
|
|
166
288
|
limits output to one process. With `--json`, prints
|
|
167
289
|
`[{"id", "source", "logs": [{"at", "line", "stream"}]}]`.
|
|
168
290
|
|
|
291
|
+
## `events`
|
|
292
|
+
|
|
293
|
+
```
|
|
294
|
+
rollbridge events [--config <path>] [--limit <count>] [--json]
|
|
295
|
+
```
|
|
296
|
+
|
|
297
|
+
Prints the daemon's recent structured event history — deploys (`deploy
|
|
298
|
+
starting`, `traffic switched`, `deploy failed`), release stops (`release
|
|
299
|
+
stopped`, `release drained`), process lifecycle (`process started` — with a
|
|
300
|
+
`reason` of `deploy`, `crash`, `manual`, or `memory` — `process exited`,
|
|
301
|
+
`memory limit exceeded`, `restart limit reached`, `process restart requested`),
|
|
302
|
+
and failed control commands (`command failed`). Each event has a timestamp, a
|
|
303
|
+
message, and a structured data payload. The daemon keeps the most recent 1000 events in
|
|
304
|
+
memory (cleared on restart). `--limit <count>` shows only the most recent
|
|
305
|
+
`count`. With `--json`, prints `[{"at", "message", "data"}]`.
|
|
306
|
+
|
|
307
|
+
## `completion`
|
|
308
|
+
|
|
309
|
+
```
|
|
310
|
+
rollbridge completion <bash|zsh>
|
|
311
|
+
```
|
|
312
|
+
|
|
313
|
+
Prints a shell completion script to stdout, generated by introspecting the
|
|
314
|
+
command set (so it never drifts from the real commands and options). It
|
|
315
|
+
completes command names, each command's option flags, and falls back to file
|
|
316
|
+
completion after an option that takes a value (bash). Enable it for the current
|
|
317
|
+
session, or add the line to your shell startup file:
|
|
318
|
+
|
|
319
|
+
```bash
|
|
320
|
+
# bash (~/.bashrc)
|
|
321
|
+
source <(rollbridge completion bash)
|
|
322
|
+
|
|
323
|
+
# zsh (~/.zshrc)
|
|
324
|
+
source <(rollbridge completion zsh)
|
|
325
|
+
```
|
|
326
|
+
|
|
327
|
+
An unsupported shell exits `1` with the list of supported shells.
|
|
328
|
+
|
|
169
329
|
## Exit codes
|
|
170
330
|
|
|
171
331
|
- `0` — success.
|