rollbridge 0.1.4 → 0.1.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,238 @@
1
+ # Velocious deployment guide
2
+
3
+ A Velocious backend typically runs four kinds of process: **Beacon** (the
4
+ message broker other processes connect to), **background-jobs-main** (the job
5
+ coordinator), **background-jobs-worker** (runs the jobs), and the **web/API**
6
+ server. This guide maps each to a Rollbridge process policy, shows a complete
7
+ `rollbridge.js`, and explains startup ordering and what happens on a deploy.
8
+
9
+ A production version of this config lives at
10
+ [`examples/tensorbuzz.com.js`](../examples/tensorbuzz.com.js).
11
+
12
+ ## Process mapping
13
+
14
+ | Velocious process | Policy | Why |
15
+ | --- | --- | --- |
16
+ | `beacon` | `service` | A shared broker the other processes connect to. It should survive deploys and keep a **stable port**, so workers and the web process always reach the same Beacon. |
17
+ | `background-jobs-main` | `service` (or `singleton`) | The job coordinator. Run it as a `service` when it should outlive releases on a stable port; run it as a `singleton` when it must run the latest release's code after every deploy (see [Choosing the jobs-main policy](#choosing-the-jobs-main-policy)). |
18
+ | `background-jobs-worker` | `companion` | Release-scoped: one set of workers per active release, started before the web process and running that release's code. |
19
+ | `web` | `proxied` | Receives external HTTP/WebSocket traffic, is health-checked before traffic switches, and is drained on the next deploy. Exactly one process is `proxied`. |
20
+
21
+ See [README → Process Policies](../README.md#process-policies) for the full
22
+ semantics of each policy and [`docs/config.md`](config.md) for every field.
23
+
24
+ ## Example `rollbridge.js`
25
+
26
+ ```js
27
+ // rollbridge.js
28
+ export default {
29
+ application: "tensorbuzz",
30
+ control: {path: "/tmp/rollbridge-tensorbuzz.sock"},
31
+
32
+ proxy: {
33
+ host: "127.0.0.1",
34
+ port: 4500, // the stable port Nginx points at
35
+ healthPath: "/ping",
36
+ healthTimeoutMs: 30000,
37
+ drainTimeoutMs: 60000,
38
+ forceStopTimeoutMs: 10000
39
+ },
40
+
41
+ processes: [
42
+ // Shared broker — one daemon-wide instance on a stable port.
43
+ {
44
+ id: "beacon",
45
+ policy: "service",
46
+ cwd: "{{releasePath}}/backend",
47
+ env: {NODE_ENV: "production", VELOCIOUS_BEACON_PORT: "{{port}}"},
48
+ command: "npx velocious beacon",
49
+ port: 7330
50
+ },
51
+
52
+ // Job coordinator — waits for Beacon, stable port other jobs processes use.
53
+ {
54
+ id: "background-jobs-main",
55
+ policy: "service",
56
+ cwd: "{{releasePath}}/backend",
57
+ env: {
58
+ NODE_ENV: "production",
59
+ VELOCIOUS_BEACON_PORT: "{{ports.beacon}}",
60
+ VELOCIOUS_BACKGROUND_JOBS_PORT: "{{port}}"
61
+ },
62
+ command: "wait-for-it 127.0.0.1:{{ports.beacon}} --strict -- npx velocious background-jobs-main",
63
+ port: 7331
64
+ },
65
+
66
+ // Workers — one set per release; raise gracefulStopMs to let in-flight
67
+ // jobs finish during a deploy.
68
+ {
69
+ id: "background-jobs-worker",
70
+ policy: "companion",
71
+ cwd: "{{releasePath}}/backend",
72
+ env: {
73
+ NODE_ENV: "production",
74
+ VELOCIOUS_BEACON_PORT: "{{ports.beacon}}",
75
+ VELOCIOUS_BACKGROUND_JOBS_PORT: "{{ports.background-jobs-main}}"
76
+ },
77
+ command: "wait-for-it 127.0.0.1:{{ports.beacon}} --strict -- wait-for-it 127.0.0.1:{{ports.background-jobs-main}} --strict -- npx velocious background-jobs-worker",
78
+ gracefulStopMs: 60000
79
+ },
80
+
81
+ // Web/API — the one proxied process.
82
+ {
83
+ id: "web",
84
+ policy: "proxied",
85
+ cwd: "{{releasePath}}/backend",
86
+ env: {
87
+ NODE_ENV: "production",
88
+ VELOCIOUS_BEACON_PORT: "{{ports.beacon}}",
89
+ VELOCIOUS_BACKGROUND_JOBS_PORT: "{{ports.background-jobs-main}}"
90
+ },
91
+ command: "wait-for-it 127.0.0.1:{{ports.beacon}} --strict -- wait-for-it 127.0.0.1:{{ports.background-jobs-main}} --strict -- npx velocious server --host 127.0.0.1 --port {{port}}",
92
+ port: {from: 14500, to: 14599},
93
+ health: {path: "/ping", timeoutMs: 30000, intervalMs: 500}
94
+ }
95
+ ]
96
+ }
97
+ ```
98
+
99
+ ## Wiring processes together
100
+
101
+ Beacon and `background-jobs-main` get **fixed** ports (`7330`, `7331`) because
102
+ they are `service`s — a stable port lets every release's workers and web process
103
+ find them. The proxied `web` process gets a **range** (`{from: 14500, to:
104
+ 14599}`); Rollbridge allocates a free port per release so the old and new web
105
+ releases can run side by side during the drain.
106
+
107
+ Cross-reference ports with `{{ports.<id>}}` and pass them to Velocious through
108
+ `env`. Rollbridge also injects `ROLLBRIDGE_<ID>_PORT` for every process (e.g.
109
+ `ROLLBRIDGE_BACKGROUND_JOBS_MAIN_PORT`), so you can read ports from the
110
+ environment instead of templating if you prefer — see
111
+ [`docs/config.md`](config.md#injected-environment-variables).
112
+
113
+ ### Startup ordering
114
+
115
+ Only the `proxied` process is health-checked, so dependent processes must wait
116
+ for their dependencies themselves. Two mechanisms combine:
117
+
118
+ 1. **Policy ordering.** On each deploy Rollbridge starts `service`s first, then
119
+ the release's `companion`s, then the `proxied` process (see
120
+ [README → Deploy ordering](../README.md#deploy-ordering)).
121
+ 2. **Readiness gating.** `wait-for-it 127.0.0.1:{{ports.beacon}} --strict -- …`
122
+ blocks the command until Beacon's port accepts connections, so
123
+ `background-jobs-main`, the worker, and `web` don't start talking to Beacon
124
+ before it is listening. `wait-for-it` is a small standalone script (install it
125
+ on the host); any equivalent port-wait works.
126
+
127
+ ## Deploying
128
+
129
+ Drive deploys through the Rollbridge CLI — Rollbridge ships no deploy-tool
130
+ plugins (see [`docs/deploy-recipes.md`](deploy-recipes.md) for shell/CI/Capistrano
131
+ recipes). The minimal step after a release directory is prepared:
132
+
133
+ ```bash
134
+ release_path=/srv/tensorbuzz/releases/20260523120000 # prepared by your pipeline
135
+
136
+ # Run backwards-compatible migrations BEFORE switching traffic: the old and new
137
+ # web releases overlap during the drain.
138
+ (cd "$release_path/backend" && npx velocious db:migrate)
139
+
140
+ rollbridge deploy \
141
+ --ensure-daemon \
142
+ --config /etc/rollbridge/rollbridge.js \
143
+ --release-path "$release_path" \
144
+ --revision "$(git -C "$release_path/backend" rev-parse HEAD)"
145
+ ```
146
+
147
+ `rollbridge deploy` starts the new release's worker and web process,
148
+ health-checks `web` on its `{{port}}`/`/ping`, switches traffic, then drains and
149
+ stops the previous release. It exits non-zero (leaving the previous release
150
+ active) if the new release fails to start or health-check, so a failed deploy
151
+ never promotes a broken release.
152
+
153
+ ## Background jobs across a deploy
154
+
155
+ The worker is a `companion`, so each release runs its own workers:
156
+
157
+ - On deploy, the **new** release's workers start (running the new code) before
158
+ traffic switches; the **old** release's workers are stopped when that release
159
+ is drained and retired — the worker's `stopSignal`, then `SIGKILL` after
160
+ `gracefulStopMs`.
161
+ - Set `stopSignal` to the signal your worker drains on and `gracefulStopMs` to at
162
+ least your longest in-flight job, so a job gets time to finish before the
163
+ forced kill. Set `replicas` to run a pool of workers.
164
+
165
+ See [`docs/workers.md`](workers.md) for the full safe background-job deployment
166
+ pattern (companion + `replicas` + `stopSignal`/`lifecycle` hooks +
167
+ `gracefulStopMs`), the old/new worker overlap, and `nonBlockingDrain` to start the
168
+ old workers' drain immediately when a release is retired.
169
+
170
+ ### Worker recipe
171
+
172
+ A complete `background-jobs-worker` entry that runs a pool and finishes in-flight
173
+ jobs across a deploy:
174
+
175
+ ```js
176
+ {
177
+ id: "background-jobs-worker",
178
+ policy: "companion",
179
+ cwd: "{{releasePath}}/backend",
180
+ env: {
181
+ NODE_ENV: "production",
182
+ VELOCIOUS_ENV: "production",
183
+ VELOCIOUS_BEACON_PORT: "{{ports.beacon}}",
184
+ VELOCIOUS_BACKGROUND_JOBS_PORT: "{{ports.background-jobs-main}}"
185
+ },
186
+ command: "wait-for-it 127.0.0.1:{{ports.beacon}} --strict -- wait-for-it 127.0.0.1:{{ports.background-jobs-main}} --strict -- npx velocious background-jobs-worker",
187
+ replicas: 4,
188
+ gracefulStopMs: 60000
189
+ }
190
+ ```
191
+
192
+ - `replicas: 4` runs four worker instances (`background-jobs-worker#0` … `#3`),
193
+ each with `ROLLBRIDGE_REPLICA_INDEX`/`ROLLBRIDGE_REPLICA_COUNT` if you shard work.
194
+ - On deploy the new release's workers start before traffic switches; the old
195
+ release's workers receive `SIGTERM` (the default `stopSignal`) when the old
196
+ release is retired, then `SIGKILL` after `gracefulStopMs` — so size
197
+ `gracefulStopMs` to your longest job. Both releases' workers briefly consume the
198
+ shared queue, so keep job code backwards-compatible and jobs idempotent.
199
+
200
+ If your worker quiesces on a command or a non-default signal, add a `lifecycle`
201
+ block — Rollbridge runs `quietCommand`, drains for up to `drainTimeoutMs`, then
202
+ stops. For example, send a quiet signal to the worker's process group before the
203
+ drain:
204
+
205
+ ```js
206
+ lifecycle: {quietCommand: "kill -TSTP -$ROLLBRIDGE_PID", drainTimeoutMs: 60000}
207
+ ```
208
+
209
+ ### Choosing the jobs-main policy
210
+
211
+ `background-jobs-main` is duplicate-unsafe (you never want two coordinators), so
212
+ it is either a `service` or a `singleton` — never a `companion`:
213
+
214
+ - **`service`** — keeps running across deploys on its stable port. Workers from
215
+ every release talk to the same coordinator, so there's no coordination gap on
216
+ deploy. The trade-off: a `service` keeps running the **release it was started
217
+ from** and only adopts the latest release's template if it crashes and
218
+ restarts (or the daemon restarts). If `background-jobs-main` itself needs the
219
+ newest code immediately after every deploy, this is the wrong policy.
220
+ - **`singleton`** — Rollbridge stops the old instance and then starts the new
221
+ one on each deploy, so it always runs the latest release's code and two copies
222
+ never overlap. The trade-off: a brief coordination gap while it restarts.
223
+
224
+ Beacon is a broker rather than code that changes per release, so `service` is
225
+ almost always right for it.
226
+
227
+ ## Verifying
228
+
229
+ After a deploy, `rollbridge status` should show `beacon` and
230
+ `background-jobs-main` as long-lived `service`s with unchanged ports across
231
+ deploys, one `background-jobs-worker` for the active release, and the `web`
232
+ process `proxied` with its connection counts. Use
233
+ [`rollbridge logs --process <id>`](cli.md) to read recent output from any
234
+ process, and [`docs/troubleshooting.md`](troubleshooting.md) for health-check,
235
+ port, and draining problems.
236
+
237
+ For the front end, point Nginx at the stable `proxy.port` (here `4500`), never at
238
+ a release's web port — see [`docs/nginx.md`](nginx.md).
@@ -0,0 +1,115 @@
1
+ # Background-job worker deployment
2
+
3
+ This guide covers deploying background-job workers (or any non-HTTP worker pool)
4
+ with Rollbridge so that in-flight jobs finish across a deploy. It uses features
5
+ that exist today; the command-based lifecycle hooks mentioned at the end are
6
+ still on the roadmap.
7
+
8
+ ## Run workers as a `companion`
9
+
10
+ Give each worker the `companion` policy. Companions are **release-scoped**: every
11
+ release starts its own workers running that release's code, and a release's
12
+ workers are stopped only when that release is retired (drained) after a newer
13
+ release takes over. They start **before** the `proxied` web process, so they're
14
+ ready before traffic switches.
15
+
16
+ ```js
17
+ {
18
+ id: "worker",
19
+ policy: "companion",
20
+ cwd: "{{releasePath}}",
21
+ command: "npx velocious background-jobs-worker"
22
+ }
23
+ ```
24
+
25
+ ## Scale the pool with `replicas`
26
+
27
+ Set `replicas` to run several identical workers (a port-less companion only).
28
+ Each instance runs as `worker#0`, `worker#1`, … and gets
29
+ `ROLLBRIDGE_REPLICA_INDEX` / `ROLLBRIDGE_REPLICA_COUNT` (and `{{replicaIndex}}` /
30
+ `{{replicaCount}}`), so an instance can claim a distinct shard, queue, or lock:
31
+
32
+ ```js
33
+ {id: "worker", policy: "companion", command: "npx velocious background-jobs-worker", replicas: 4}
34
+ ```
35
+
36
+ Restart the pool with `rollbridge restart --process worker` (all replicas) or a
37
+ single instance with `rollbridge restart --process worker#0`.
38
+
39
+ ## Finish in-flight jobs on stop (`stopSignal` + `gracefulStopMs`)
40
+
41
+ When Rollbridge stops a worker — during a deploy's drain, a `rollbridge restart`,
42
+ or shutdown — it sends the worker's **`stopSignal`** (default `SIGTERM`), waits up
43
+ to **`gracefulStopMs`**, then `SIGKILL`s it if it hasn't exited. That window is
44
+ the worker's chance to finish its current job and exit cleanly.
45
+
46
+ - Set `stopSignal` to the signal your worker quiets/drains on. Many job runners
47
+ finish the current job and exit on `SIGTERM` (the default); some use `SIGINT`
48
+ or `SIGQUIT`. Use the one your worker treats as "drain and exit".
49
+ - Set `gracefulStopMs` to at least your longest job's duration, so a job in
50
+ progress is not cut off by the `SIGKILL` fallback.
51
+
52
+ ```js
53
+ {
54
+ id: "worker",
55
+ policy: "companion",
56
+ command: "npx velocious background-jobs-worker",
57
+ replicas: 4,
58
+ stopSignal: "SIGTERM",
59
+ gracefulStopMs: 60000
60
+ }
61
+ ```
62
+
63
+ ## What happens across a deploy
64
+
65
+ 1. The new release's workers start (running the **new** code) before traffic
66
+ switches to the new web process.
67
+ 2. Both old and new workers run while the previous release drains, so **both
68
+ code versions consume the shared queue at once.** Keep job code
69
+ backwards-compatible across a deploy — the same rule as database migrations.
70
+ 3. When the previous release is retired (its HTTP/WebSocket connections close or
71
+ `proxy.drainTimeoutMs` elapses), its workers are stopped: `stopSignal`, then
72
+ `SIGKILL` after `gracefulStopMs`.
73
+
74
+ Because old workers are retired on the release's **connection** drain (not on
75
+ their own job queue draining), a job still running when the release is retired
76
+ gets only the `gracefulStopMs` window to finish. Keep jobs **idempotent and
77
+ safe to retry** so a job interrupted at the `SIGKILL` fallback can run again.
78
+
79
+ ## Command-based lifecycle hooks
80
+
81
+ For workers that quiesce or drain via a command rather than a single signal, set
82
+ a `lifecycle` block. When Rollbridge gracefully stops the worker it runs
83
+ `quietCommand` (stop accepting new work), then drains (`drainCommand`, or waits up
84
+ to `drainTimeoutMs` for the worker to exit), then `stopCommand` or `stopSignal`,
85
+ then `SIGKILL` after `gracefulStopMs`. Each hook gets `ROLLBRIDGE_PID` and is
86
+ bounded by a timeout, so a slow hook can't wedge a deploy.
87
+
88
+ ```js
89
+ {
90
+ id: "worker",
91
+ policy: "companion",
92
+ command: "npx velocious background-jobs-worker",
93
+ replicas: 4,
94
+ lifecycle: {quietCommand: "kill -TSTP -$ROLLBRIDGE_PID", drainTimeoutMs: 60000}
95
+ }
96
+ ```
97
+
98
+ See [`docs/config.md`](config.md#processeslifecycle) for the hook reference.
99
+
100
+ ## Non-blocking drain
101
+
102
+ By default a retired release's workers are stopped only after the proxied
103
+ process's connections have drained. Set `nonBlockingDrain: true` on a worker
104
+ companion whose work is independent of the web process (a job worker on a shared
105
+ queue) to start its graceful stop **immediately** when the release is retired —
106
+ in parallel with the connection drain. The new release's workers handle new work
107
+ while the old workers finish their in-flight jobs:
108
+
109
+ ```js
110
+ {id: "worker", policy: "companion", command: "…", nonBlockingDrain: true, gracefulStopMs: 60000}
111
+ ```
112
+
113
+ See [`docs/config.md`](config.md) for `stopSignal`, `replicas`, and
114
+ `gracefulStopMs`, and [`docs/velocious.md`](velocious.md) for a full Velocious
115
+ deployment (Beacon, jobs-main, workers, web) example.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "rollbridge",
3
- "version": "0.1.4",
3
+ "version": "0.1.6",
4
4
  "description": "Zero-downtime process supervisor and local traffic switcher for deploy-managed apps.",
5
5
  "keywords": [
6
6
  "deploy",
@@ -28,7 +28,7 @@
28
28
  "scripts": {
29
29
  "all-checks": "npm run typecheck && npm run lint && npm test",
30
30
  "lint": "eslint",
31
- "release:patch": "node scripts/release-patch.js",
31
+ "release:patch": "release-patch",
32
32
  "test": "node --test test/*.test.js",
33
33
  "typecheck": "tsc --noEmit"
34
34
  },
@@ -46,6 +46,7 @@
46
46
  "eslint": "^10.4.0",
47
47
  "eslint-plugin-jsdoc": "^62.9.0",
48
48
  "globals": "^17.6.0",
49
+ "release-patch": "^1.0.0",
49
50
  "typescript": "^6.0.3"
50
51
  }
51
52
  }