rollbridge 0.1.5 → 0.1.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,115 @@
1
+ # Background-job worker deployment
2
+
3
+ This guide covers deploying background-job workers (or any non-HTTP worker pool)
4
+ with Rollbridge so that in-flight jobs finish across a deploy. It uses features
5
+ that exist today; the command-based lifecycle hooks mentioned at the end are
6
+ still on the roadmap.
7
+
8
+ ## Run workers as a `companion`
9
+
10
+ Give each worker the `companion` policy. Companions are **release-scoped**: every
11
+ release starts its own workers running that release's code, and a release's
12
+ workers are stopped only when that release is retired (drained) after a newer
13
+ release takes over. They start **before** the `proxied` web process, so they're
14
+ ready before traffic switches.
15
+
16
+ ```js
17
+ {
18
+ id: "worker",
19
+ policy: "companion",
20
+ cwd: "{{releasePath}}",
21
+ command: "npx velocious background-jobs-worker"
22
+ }
23
+ ```
24
+
25
+ ## Scale the pool with `replicas`
26
+
27
+ Set `replicas` to run several identical workers (a port-less companion only).
28
+ Each instance runs as `worker#0`, `worker#1`, … and gets
29
+ `ROLLBRIDGE_REPLICA_INDEX` / `ROLLBRIDGE_REPLICA_COUNT` (and `{{replicaIndex}}` /
30
+ `{{replicaCount}}`), so an instance can claim a distinct shard, queue, or lock:
31
+
32
+ ```js
33
+ {id: "worker", policy: "companion", command: "npx velocious background-jobs-worker", replicas: 4}
34
+ ```
35
+
36
+ Restart the pool with `rollbridge restart --process worker` (all replicas) or a
37
+ single instance with `rollbridge restart --process worker#0`.
38
+
39
+ ## Finish in-flight jobs on stop (`stopSignal` + `gracefulStopMs`)
40
+
41
+ When Rollbridge stops a worker — during a deploy's drain, a `rollbridge restart`,
42
+ or shutdown — it sends the worker's **`stopSignal`** (default `SIGTERM`), waits up
43
+ to **`gracefulStopMs`**, then `SIGKILL`s it if it hasn't exited. That window is
44
+ the worker's chance to finish its current job and exit cleanly.
45
+
46
+ - Set `stopSignal` to the signal your worker quiets/drains on. Many job runners
47
+ finish the current job and exit on `SIGTERM` (the default); some use `SIGINT`
48
+ or `SIGQUIT`. Use the one your worker treats as "drain and exit".
49
+ - Set `gracefulStopMs` to at least your longest job's duration, so a job in
50
+ progress is not cut off by the `SIGKILL` fallback.
51
+
52
+ ```js
53
+ {
54
+ id: "worker",
55
+ policy: "companion",
56
+ command: "npx velocious background-jobs-worker",
57
+ replicas: 4,
58
+ stopSignal: "SIGTERM",
59
+ gracefulStopMs: 60000
60
+ }
61
+ ```
62
+
63
+ ## What happens across a deploy
64
+
65
+ 1. The new release's workers start (running the **new** code) before traffic
66
+ switches to the new web process.
67
+ 2. Both old and new workers run while the previous release drains, so **both
68
+ code versions consume the shared queue at once.** Keep job code
69
+ backwards-compatible across a deploy — the same rule as database migrations.
70
+ 3. When the previous release is retired (its HTTP/WebSocket connections close or
71
+ `proxy.drainTimeoutMs` elapses), its workers are stopped: `stopSignal`, then
72
+ `SIGKILL` after `gracefulStopMs`.
73
+
74
+ Because old workers are retired on the release's **connection** drain (not on
75
+ their own job queue draining), a job still running when the release is retired
76
+ gets only the `gracefulStopMs` window to finish. Keep jobs **idempotent and
77
+ safe to retry** so a job interrupted at the `SIGKILL` fallback can run again.
78
+
79
+ ## Command-based lifecycle hooks
80
+
81
+ For workers that quiesce or drain via a command rather than a single signal, set
82
+ a `lifecycle` block. When Rollbridge gracefully stops the worker it runs
83
+ `quietCommand` (stop accepting new work), then drains (`drainCommand`, or waits up
84
+ to `drainTimeoutMs` for the worker to exit), then `stopCommand` or `stopSignal`,
85
+ then `SIGKILL` after `gracefulStopMs`. Each hook gets `ROLLBRIDGE_PID` and is
86
+ bounded by a timeout, so a slow hook can't wedge a deploy.
87
+
88
+ ```js
89
+ {
90
+ id: "worker",
91
+ policy: "companion",
92
+ command: "npx velocious background-jobs-worker",
93
+ replicas: 4,
94
+ lifecycle: {quietCommand: "kill -TSTP -$ROLLBRIDGE_PID", drainTimeoutMs: 60000}
95
+ }
96
+ ```
97
+
98
+ See [`docs/config.md`](config.md#processeslifecycle) for the hook reference.
99
+
100
+ ## Non-blocking drain
101
+
102
+ By default a retired release's workers are stopped only after the proxied
103
+ process's connections have drained. Set `nonBlockingDrain: true` on a worker
104
+ companion whose work is independent of the web process (a job worker on a shared
105
+ queue) to start its graceful stop **immediately** when the release is retired —
106
+ in parallel with the connection drain. The new release's workers handle new work
107
+ while the old workers finish their in-flight jobs:
108
+
109
+ ```js
110
+ {id: "worker", policy: "companion", command: "…", nonBlockingDrain: true, gracefulStopMs: 60000}
111
+ ```
112
+
113
+ See [`docs/config.md`](config.md) for `stopSignal`, `replicas`, and
114
+ `gracefulStopMs`, and [`docs/velocious.md`](velocious.md) for a full Velocious
115
+ deployment (Beacon, jobs-main, workers, web) example.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "rollbridge",
3
- "version": "0.1.5",
3
+ "version": "0.1.7",
4
4
  "description": "Zero-downtime process supervisor and local traffic switcher for deploy-managed apps.",
5
5
  "keywords": [
6
6
  "deploy",
package/src/cli.js CHANGED
@@ -7,7 +7,9 @@ import {spawn} from "node:child_process"
7
7
  import {Command} from "commander"
8
8
  import RollbridgeDaemon from "./daemon.js"
9
9
  import {loadConfig, parseConfigFile, resolveConfigPath, validateConfig} from "./config.js"
10
- import {runEnvironmentChecks} from "./doctor.js"
10
+ import {runEnvironmentChecks, runReleaseChecks} from "./doctor.js"
11
+ import {predeployCleanup} from "./predeploy-cleanup.js"
12
+ import {recoverOrphans} from "./recover.js"
11
13
  import {sendControlCommand} from "./control-client.js"
12
14
 
13
15
  const DEFAULT_DAEMON_START_TIMEOUT_MS = 10000
@@ -82,6 +84,25 @@ export async function runCli(argv) {
82
84
  console.log(JSON.stringify(response, null, 2))
83
85
  })
84
86
 
87
+ program
88
+ .command("rollback")
89
+ .description("Roll back to a previous release: re-start it, health-check it, and switch traffic.")
90
+ .option("-c, --config <path>", "Config file path (defaults to rollbridge.js)")
91
+ .option("--release-id <id>", "Release id to roll back to (defaults to the most recently retired release)")
92
+ .action(async (options) => {
93
+ const configPath = await resolveConfigPath(options.config)
94
+ const config = await loadConfig(configPath)
95
+ const response = await sendControlCommand({
96
+ command: {
97
+ command: "rollback",
98
+ releaseId: options.releaseId
99
+ },
100
+ path: config.control.path
101
+ })
102
+
103
+ console.log(JSON.stringify(response, null, 2))
104
+ })
105
+
85
106
  program
86
107
  .command("ensure-daemon")
87
108
  .description("Start the daemon if the control socket is not already accepting commands.")
@@ -228,6 +249,9 @@ export async function runCli(argv) {
228
249
  .command("doctor")
229
250
  .description("Check the environment before starting the daemon: config, control socket, and proxy port.")
230
251
  .option("-c, --config <path>", "Config file path (defaults to rollbridge.js)")
252
+ .option("--release-path <path>", "Also pre-flight a prepared release: render its templates and check the release and working directories")
253
+ .option("--release-id <id>", "Release id used when rendering templates (defaults to --revision or the release path basename)")
254
+ .option("--revision <sha>", "Revision used when rendering templates (defaults to --release-id)")
231
255
  .option("--json", "Output machine-readable JSON")
232
256
  .action(async (options) => {
233
257
  let configPath
@@ -252,6 +276,10 @@ export async function runCli(argv) {
252
276
  } else {
253
277
  checks.push({detail: `valid: ${config.processes.length} ${config.processes.length === 1 ? "process" : "processes"}, proxy on ${config.proxy.host}:${config.proxy.port}`, name: "config", ok: true})
254
278
  checks.push(...await runEnvironmentChecks(config))
279
+
280
+ if (options.releasePath) {
281
+ checks.push(...await runReleaseChecks(config, {releaseId: options.releaseId, releasePath: options.releasePath, revision: options.revision}))
282
+ }
255
283
  }
256
284
 
257
285
  const failed = checks.filter((check) => !check.ok).length
@@ -295,9 +323,307 @@ export async function runCli(argv) {
295
323
  console.log(formatLogSources(sources, options.process))
296
324
  })
297
325
 
326
+ program
327
+ .command("events")
328
+ .description("Print recent structured daemon events (deploys, switches, stops, crashes, restarts, failures).")
329
+ .option("-c, --config <path>", "Config file path (defaults to rollbridge.js)")
330
+ .option("--limit <count>", "Show only the most recent <count> events")
331
+ .option("--json", "Output machine-readable JSON")
332
+ .action(async (options) => {
333
+ let limit
334
+
335
+ if (options.limit !== undefined) {
336
+ limit = Number(options.limit)
337
+
338
+ if (!Number.isInteger(limit) || limit < 1) {
339
+ console.error("--limit must be a positive integer.")
340
+ process.exitCode = 1
341
+ return
342
+ }
343
+ }
344
+
345
+ const configPath = await resolveConfigPath(options.config)
346
+ const config = await loadConfig(configPath)
347
+ const response = await sendControlCommand({
348
+ command: {command: "events", limit},
349
+ path: config.control.path
350
+ })
351
+ const events = /** @type {import("./event-log.js").DaemonEvent[]} */ (response.events ?? [])
352
+
353
+ if (options.json) {
354
+ console.log(JSON.stringify(events, null, 2))
355
+ return
356
+ }
357
+
358
+ console.log(formatEvents(events))
359
+ })
360
+
361
+ program
362
+ .command("predeploy-cleanup")
363
+ .description("Prepare a host for deploy: recover Rollbridge orphans and stop configured legacy processes when no release is active.")
364
+ .option("-c, --config <path>", "Config file path (defaults to rollbridge.js)")
365
+ .option("--release-path <path>", "Pending release path; restarts the daemon if this release changes Rollbridge itself")
366
+ .action(async (options) => {
367
+ const configPath = await resolveConfigPath(options.config)
368
+ const config = await loadConfig(configPath)
369
+ const result = await predeployCleanup({config, releasePath: options.releasePath})
370
+
371
+ console.log(formatPredeployCleanupResult(result))
372
+ })
373
+
374
+ program
375
+ .command("recover")
376
+ .description("Stop orphaned processes left by a crashed daemon (reads statePath; lists them unless --force).")
377
+ .option("-c, --config <path>", "Config file path (defaults to rollbridge.js)")
378
+ .option("--force", "Stop the orphaned processes; without it, recover only lists them")
379
+ .action(async (options) => {
380
+ const configPath = await resolveConfigPath(options.config)
381
+ const config = await loadConfig(configPath)
382
+ const result = await recoverOrphans({config, force: Boolean(options.force)})
383
+
384
+ if ("error" in result) {
385
+ console.error(result.error)
386
+ process.exitCode = 1
387
+ return
388
+ }
389
+
390
+ if (result.remaining.length > 0) {
391
+ console.error(formatRecoverResult(result))
392
+ process.exitCode = 1
393
+ return
394
+ }
395
+
396
+ console.log(formatRecoverResult(result))
397
+ })
398
+
399
+ program
400
+ .command("completion")
401
+ .description("Print a shell completion script. Enable with: source <(rollbridge completion <shell>)")
402
+ .argument("<shell>", "Shell to generate completion for (bash or zsh)")
403
+ .action((shell) => {
404
+ if (shell !== "bash" && shell !== "zsh") {
405
+ console.error(`Unsupported shell "${shell}". Supported shells: bash, zsh.`)
406
+ process.exitCode = 1
407
+ return
408
+ }
409
+
410
+ console.log(generateCompletionScript(program, shell))
411
+ })
412
+
298
413
  await program.parseAsync(argv)
299
414
  }
300
415
 
416
+ /**
417
+ * @param {import("./predeploy-cleanup.js").PredeployCleanupResult} result - Cleanup result.
418
+ * @returns {string} Human-readable summary.
419
+ */
420
+ function formatPredeployCleanupResult(result) {
421
+ if (result.action === "daemon-active") {
422
+ return "Rollbridge daemon already has an active release; no legacy cleanup needed."
423
+ }
424
+
425
+ const lines = []
426
+
427
+ if (result.action === "daemon-stopped") {
428
+ lines.push("Stopped existing Rollbridge daemon before deploy.")
429
+ } else {
430
+ lines.push("No active Rollbridge daemon found.")
431
+ }
432
+
433
+ lines.push(`Recovered ${result.recoveredOrphans} Rollbridge orphaned process${result.recoveredOrphans === 1 ? "" : "es"}.`)
434
+ lines.push(`Stopped ${result.legacyProcesses.length} legacy process${result.legacyProcesses.length === 1 ? "" : "es"}.`)
435
+
436
+ return lines.join("\n")
437
+ }
438
+
439
+ /**
440
+ * @typedef {import("./recover.js").OrphanProcess} OrphanProcess
441
+ */
442
+
443
+ /**
444
+ * Formats the result of a recover run (the report case, not the error case).
445
+ * @param {{cleared: boolean, forced: boolean, orphans: OrphanProcess[], remaining: OrphanProcess[]}} result - Recover report.
446
+ * @returns {string} Human-readable summary.
447
+ */
448
+ export function formatRecoverResult(result) {
449
+ if (result.orphans.length === 0) {
450
+ return result.forced ? "No orphaned processes found; cleared the state file." : "No orphaned processes found."
451
+ }
452
+
453
+ if (!result.forced) {
454
+ return [
455
+ `Found ${orphanCountLabel(result.orphans.length)} (run with --force to stop):`,
456
+ ...listOrphans(result.orphans),
457
+ "Review the list first — a recycled pid could be an unrelated process."
458
+ ].join("\n")
459
+ }
460
+
461
+ const remainingPids = new Set(result.remaining.map((orphan) => orphan.pid))
462
+ const stopped = result.orphans.filter((orphan) => !remainingPids.has(orphan.pid))
463
+ const lines = []
464
+
465
+ if (stopped.length > 0) lines.push(`Stopped ${orphanCountLabel(stopped.length)}:`, ...listOrphans(stopped))
466
+
467
+ if (result.remaining.length > 0) {
468
+ lines.push(
469
+ `Could not stop ${orphanCountLabel(result.remaining.length)} — still running (check permissions/ownership):`,
470
+ ...listOrphans(result.remaining),
471
+ "Left the state file in place so you can investigate and re-run recover."
472
+ )
473
+ }
474
+
475
+ return lines.join("\n")
476
+ }
477
+
478
+ /**
479
+ * @param {number} count - Number of orphaned processes.
480
+ * @returns {string} A pluralized label such as "1 orphaned process" or "3 orphaned processes".
481
+ */
482
+ function orphanCountLabel(count) {
483
+ return `${count} orphaned process${count === 1 ? "" : "es"}`
484
+ }
485
+
486
+ /**
487
+ * @param {OrphanProcess[]} orphans - Orphans to render.
488
+ * @returns {string[]} One indented line per orphan.
489
+ */
490
+ function listOrphans(orphans) {
491
+ return orphans.map((orphan) => ` ${orphan.id} (pid ${orphan.pid}${orphan.releaseId ? `, release ${orphan.releaseId}` : ""})`)
492
+ }
493
+
494
+ /**
495
+ * @typedef {{name: string, options: string[], valueOptions: string[]}} CompletionCommand
496
+ */
497
+
498
+ /**
499
+ * Builds a shell completion script by introspecting the CLI's commands and options,
500
+ * so completions never drift from the actual command surface.
501
+ * @param {import("commander").Command} program - Configured CLI program.
502
+ * @param {"bash" | "zsh"} shell - Target shell.
503
+ * @returns {string} A sourceable completion script.
504
+ */
505
+ export function generateCompletionScript(program, shell) {
506
+ const commands = describeCommands(program)
507
+
508
+ return shell === "zsh" ? zshCompletionScript(commands) : bashCompletionScript(commands)
509
+ }
510
+
511
+ /**
512
+ * @param {import("commander").Command} program - Configured CLI program.
513
+ * @returns {CompletionCommand[]} Each command's name, long option flags, and value-taking option flags.
514
+ */
515
+ function describeCommands(program) {
516
+ return program.commands.map((command) => {
517
+ /** @type {string[]} */
518
+ const options = []
519
+ /** @type {string[]} */
520
+ const valueOptions = []
521
+
522
+ for (const option of command.options) {
523
+ if (!option.long) continue
524
+
525
+ options.push(option.long)
526
+ if (option.required || option.optional) valueOptions.push(option.long)
527
+ }
528
+
529
+ return {name: command.name(), options, valueOptions}
530
+ })
531
+ }
532
+
533
+ /**
534
+ * @param {CompletionCommand[]} commands - Command descriptors.
535
+ * @returns {string} A bash completion script.
536
+ */
537
+ function bashCompletionScript(commands) {
538
+ const names = commands.map((command) => command.name).join(" ")
539
+ const branches = commands
540
+ .map((command) => ` ${command.name})\n opts="${command.options.join(" ")}"\n values="${command.valueOptions.join(" ")}"\n ;;`)
541
+ .join("\n")
542
+
543
+ return `# rollbridge bash completion
544
+ # Enable with: source <(rollbridge completion bash)
545
+ _rollbridge() {
546
+ local cur prev cmd opts values i
547
+ cur="\${COMP_WORDS[COMP_CWORD]}"
548
+ prev="\${COMP_WORDS[COMP_CWORD-1]}"
549
+
550
+ cmd=""
551
+ for ((i = 1; i < COMP_CWORD; i++)); do
552
+ case "\${COMP_WORDS[i]}" in
553
+ -*) ;;
554
+ *) cmd="\${COMP_WORDS[i]}"; break ;;
555
+ esac
556
+ done
557
+
558
+ if [[ -z "$cmd" ]]; then
559
+ COMPREPLY=( $(compgen -W "${names}" -- "$cur") )
560
+ return
561
+ fi
562
+
563
+ opts=""
564
+ values=""
565
+ case "$cmd" in
566
+ ${branches}
567
+ esac
568
+
569
+ if [[ -n "$values" && " $values " == *" $prev "* ]]; then
570
+ COMPREPLY=( $(compgen -f -- "$cur") )
571
+ return
572
+ fi
573
+
574
+ COMPREPLY=( $(compgen -W "$opts" -- "$cur") )
575
+ }
576
+ complete -F _rollbridge rollbridge
577
+ `
578
+ }
579
+
580
+ /**
581
+ * @param {CompletionCommand[]} commands - Command descriptors.
582
+ * @returns {string} A zsh completion script.
583
+ */
584
+ function zshCompletionScript(commands) {
585
+ const names = commands.map((command) => command.name).join(" ")
586
+ const branches = commands
587
+ .map((command) => ` ${command.name}) compadd -- ${command.options.join(" ")} ;;`)
588
+ .join("\n")
589
+
590
+ return `#compdef rollbridge
591
+ # rollbridge zsh completion
592
+ # Enable with: source <(rollbridge completion zsh)
593
+ _rollbridge() {
594
+ local -a commands
595
+ commands=(${names})
596
+
597
+ if (( CURRENT == 2 )); then
598
+ compadd -- $commands
599
+ return
600
+ fi
601
+
602
+ case "\${words[2]}" in
603
+ ${branches}
604
+ esac
605
+ }
606
+ compdef _rollbridge rollbridge
607
+ `
608
+ }
609
+
610
+ /**
611
+ * Formats structured daemon events as human-readable lines.
612
+ * @param {import("./event-log.js").DaemonEvent[]} events - Recent events, oldest first.
613
+ * @returns {string} One line per event, or a placeholder when empty.
614
+ */
615
+ export function formatEvents(events) {
616
+ if (events.length === 0) return "No events recorded yet."
617
+
618
+ return events
619
+ .map((event) => {
620
+ const data = Object.keys(event.data).length > 0 ? ` ${JSON.stringify(event.data)}` : ""
621
+
622
+ return `${event.at} ${event.message}${data}`
623
+ })
624
+ .join("\n")
625
+ }
626
+
301
627
  /**
302
628
  * @typedef {{id: string, logs: import("./managed-process.js").ManagedProcessLog[], source: string}} LogSource
303
629
  */