alvin-bot 4.9.2 β†’ 4.9.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -2,6 +2,85 @@
2
2
 
3
3
  All notable changes to Alvin Bot are documented here.
4
4
 
5
+ ## [4.9.4] β€” 2026-04-13
6
+
7
+ ### πŸ”Œ Web UI fully decoupled from main bot β€” port conflicts no longer crash anything
8
+
9
+ Colleague feedback (WhatsApp voice note, 2026-04-13):
10
+ > *"The gateway binds to port 3100 like OpenClaw. When the bot restarts,
11
+ > the port is often still held β†’ catastrophic crash. I ended up
12
+ > decoupling the gateway process completely, because the actual bot
13
+ > runs independently of the gateway β€” it can still answer Telegram
14
+ > even if the web endpoint isn't reachable yet. It's weird that the
15
+ > main routine crashes when the port is busy. It should just run in
16
+ > the background, watch for the port to become free, and connect
17
+ > then. Zero impact on the main routine."*
18
+
19
+ He was right. My v4.9.0 `stopWebServer()` fix was *prevention* β€” it stopped the bot itself from holding 3100 across restarts. But it didn't cover the *resilience* side: a foreign process holding 3100 (another dev server, an OpenClaw-style orphan, a TIME_WAIT race after SIGKILL) still crashed the boot, because `startWebServer()` was synchronous and the `uncaught exception` from `server.listen()` escaped to the main event loop.
20
+
21
+ **Complete rewrite of the bind loop:**
22
+
23
+ - **`src/web/bind-strategy.ts` (new) β€” pure decision helper.** `decideNextBindAction(err, attempt, opts)` returns either `{type: "retry-port", port, attempt}` (climb the ladder) or `{type: "retry-background", delayMs, port}` (back off, retry the original port in 30 s). EADDRINUSE with attempts remaining β†’ ladder. EADDRINUSE exhausted β†’ background. Any other error β†’ background. 8 unit tests covering every branch + purity.
24
+
25
+ - **`src/web/server.ts` startWebServer β€” non-blocking, fresh-server-per-attempt.** Returns `void` synchronously, NEVER throws, NEVER blocks on bind. Each attempt creates a new `http.Server` (no state-recycling bugs) and attaches its own error handler. On failure, cleans up and calls `decideNextBindAction` to decide the next move. If the ladder is exhausted, schedules a 30 s background retry at the original port β€” the Telegram bot keeps running the whole time, the web UI just isn't reachable yet.
26
+
27
+ - **`src/web/server.ts` WebSocketServer attached POST-bind.** The `ws` library's `WebSocketServer` constructor installs its own event plumbing on the underlying `http.Server` and β€” crucially β€” causes EADDRINUSE errors to escape as uncaught exceptions when attached pre-listen. Debugging this chewed an hour on 2026-04-13. Fix: only `new WebSocketServer({ server })` AFTER `listen()` has fired its callback. The unit-test `test/web-server-integration.test.ts "when the primary port is taken"` pins this behaviour.
28
+
29
+ - **`src/web/server.ts` error handler: `on` not `once`.** Previous version used `.once("error", handler)` and a node edge case where a single bind failure emits TWO error events left the second one uncaught. Handler is now `on` with a `handled` guard β€” idempotent, and a post-bind quiet logger replaces it on success.
30
+
31
+ - **`src/web/server.ts` defensive try/catch around `server.listen()`.** In the wild Node sometimes throws synchronously for edge-case binds (already-listening, invalid backlog, kernel race). The catch funnels sync throws through the same `handleBindFailure` path as async error events.
32
+
33
+ - **`src/web/server.ts` `closeHttpServerGracefully(server)` + `stopWebServer()`.** The old `stopWebServer(server)` took an explicit server arg; it's been split into a low-level helper (`closeHttpServerGracefully(server)`, exported for tests) and a stateful top-level (`stopWebServer()`, no args, cleans up `currentServer` + `wsServerRef` + `bindRetryTimer`). Safe to call before start, safe to call twice, cancels pending background retries.
34
+
35
+ - **`src/index.ts` call sites adjusted.** `const webServer = startWebServer()` β†’ `startWebServer()`. `stopWebServer(webServer)` β†’ `stopWebServer()`. The comment above the call explains the decoupling so nobody accidentally re-couples it in a future "clean up" refactor.
36
+
37
+ **Testing: 186 β†’ 201 (+15 new).**
38
+
39
+ - `test/web-server-resilience.test.ts` β€” 8 unit tests for `decideNextBindAction`
40
+ - `test/web-server-integration.test.ts` β€” 7 real-server integration tests: startWebServer returns void, binds, stops, is idempotent, survives primary-port conflict by climbing the ladder, closes servers with hanging sockets.
41
+ - **Live-verified on the maintainer's machine**: `launchctl unload` + dual-stack Node hog on port 3100 + `launchctl load` β†’ bot booted cleanly β†’ out.log contained `[web] port 3100 busy (EADDRINUSE) β€” trying 3101` β†’ `🌐 Web UI: http://localhost:3101 (Port 3100 was busy, using 3101 instead)` β†’ Telegram responsive throughout. Exactly what the colleague described.
42
+
43
+ **Non-goals / intentionally unchanged:**
44
+ - Timeouts stay unlimited (v4.8.8 behaviour preserved).
45
+ - The primary port is still `WEB_PORT || 3100` β€” no config schema change.
46
+ - When the bot binds on a non-primary port (e.g. 3101), the README permalink still points at 3100. Users hitting a ladder-climbed bot should check the startup log; this is rare and temporary.
47
+
48
+ ## [4.9.3] β€” 2026-04-11
49
+
50
+ ### πŸ›  Two UX bugs found in production after v4.9.2 β€” now closed
51
+
52
+ Ali triggered `/cron run Daily Job Alert` after the v4.9.2 deploy and saw 13 minutes of chat silence followed by nothing. Forensics on the live bot revealed two distinct problems on top of an already-successful run:
53
+
54
+ **1. `subagent-delivery` has been silently dropping every banner for days.** Err.log: `GrammyError: Call to 'sendMessage' failed! (400: Bad Request: can't parse entities: Can't find end of the entity starting at byte offset 2636)`. The daily-job-alert sub-agent produces markdown-dense output (`|` tables, `**bold**`, `\|` escapes, mixed asterisks). Telegram's Markdown parser refuses it, `api.sendMessage(..., parse_mode: "Markdown")` throws, and the bare try/catch in `deliverSubAgentResult` logs + bails. **Result: the user has never seen a sub-agent-delivery banner, even when the underlying run succeeded perfectly and emailed the HTML report correctly.**
55
+
56
+ Fix in `src/services/subagent-delivery.ts`: new `sendWithMarkdownFallback()` helper that detects the "can't parse entities" pattern and retries the SAME text without `parse_mode`. All three code paths (file-upload case, single-message case, chunked case) now flow through the helper. 3 new tests drive the happy path, non-parse errors, and the chunked path.
57
+
58
+ **2. `/cron run` had zero proof-of-life for 13 minutes.** The handler used to `await runJobNow(...)` synchronously and reply only when finished. Telegram's typing indicator expires after 5s. Users saw: command sent β†’ typing indicator blip β†’ nothing β†’ nothing β†’ (much later, if at all) result. For cron jobs that take 10-15 min (daily job alert, Perseus health, Polyseus P&L), this is indistinguishable from a dead bot.
59
+
60
+ Fix β€” new handler flow:
61
+
62
+ ```
63
+ bot: πŸš€ Started *Daily Job Alert* β€” working… ← instant ack
64
+ bot: πŸ”„ Running *Daily Job Alert* Β· 1m 0s elapsed… ← edit every 60s
65
+ bot: πŸ”„ Running *Daily Job Alert* Β· 2m 0s elapsed… ← edit
66
+ ...
67
+ bot: βœ… Done β€” *Daily Job Alert* Β· 13m 17s ← final edit
68
+ bot: βœ… *Daily Job Alert* completed Β· 13m Β· 2.6M/28k ← subagent-delivery
69
+ [full report body, Markdown-safe with plain-text fallback]
70
+ ```
71
+
72
+ The ticker uses a single `editMessageText` call per minute on the same message β€” zero notification spam, clean visual progress. Every edit is wrapped with `isHarmlessTelegramError` so the inevitable "message is not modified" races stay silent. The ack itself falls back to plain text if the first `reply` hits a parse error, and the final edit falls back to a fresh plain message if the edit fails.
73
+
74
+ New module: `src/handlers/cron-progress.ts` with pure helpers β€” `formatElapsed`, `escapeMarkdown`, `buildTickerText`, `buildDoneText`. 8 tests cover the formatting rules and markdown-safety escapes so future cron jobs with weird names (`weird_job*name`) can't break the ticker.
75
+
76
+ **186 tests total** (+11 new). All green. Timeouts remain unlimited.
77
+
78
+ **What you see after this upgrade:**
79
+ - Instant "πŸš€ Started" ack on `/cron run`
80
+ - Live elapsed-time ticker every minute
81
+ - Final "βœ… Done" when the sub-agent finishes
82
+ - A separate banner+body message with the full report β€” **this time actually delivered**, even when the body contains broken Markdown
83
+
5
84
  ## [4.9.2] β€” 2026-04-11
6
85
 
7
86
  ### πŸ” Post-review polish: three edge cases from the strict audit
package/README.md CHANGED
@@ -114,7 +114,18 @@ That's it. The setup wizard validates everything:
114
114
 
115
115
  **Requires:** Node.js 18+ ([nodejs.org](https://nodejs.org)) Β· Telegram bot token ([@BotFather](https://t.me/BotFather)) Β· Your Telegram user ID ([@userinfobot](https://t.me/userinfobot))
116
116
 
117
- Free AI providers available β€” no credit card needed.
117
+ Free AI providers available β€” no credit card needed. **Privacy-first?** Pick the πŸ”’ **Offline β€” Gemma 4 E4B** option in setup for a fully local LLM via Ollama (macOS/Linux: automated install; Windows: manual).
118
+
119
+ ### πŸ“˜ First-time setup walkthroughs
120
+
121
+ Step-by-step guides with screenshots and screen-for-screen instructions:
122
+
123
+ | Platform | PDF (printable) |
124
+ |---|---|
125
+ | 🍎 **macOS** (with `launchd` background service) | [Download PDF](https://github.com/alvbln/Alvin-Bot/releases/latest/download/Alvin-Bot-macOS-Setup-Guide.pdf) |
126
+ | πŸͺŸ **Windows** (with Task Scheduler / Startup folder) | [Download PDF](https://github.com/alvbln/Alvin-Bot/releases/latest/download/Alvin-Bot-Windows-Setup-Guide.pdf) |
127
+
128
+ Both guides cover: Node.js install Β· Telegram bot creation Β· first-time `setup` Β· foreground test Β· background service Β· offline Gemma 4 mode Β· troubleshooting. ~15 min end-to-end for a first-time user.
118
129
 
119
130
  ### macOS: use `launchd` instead of pm2 (recommended)
120
131
 
@@ -16,6 +16,9 @@ import { getMCPStatus, getMCPTools, callMCPTool } from "../services/mcp.js";
16
16
  import { listCustomTools, executeCustomTool } from "../services/custom-tools.js";
17
17
  import { screenshotUrl, extractText, generatePdf, hasPlaywright } from "../services/browser.js";
18
18
  import { listJobs, createJob, deleteJob, toggleJob, runJobNow, formatNextRun, humanReadableSchedule } from "../services/cron.js";
19
+ import { resolveJobByNameOrId } from "../services/cron-resolver.js";
20
+ import { buildTickerText, buildDoneText, escapeMarkdown } from "./cron-progress.js";
21
+ import { isHarmlessTelegramError } from "../util/telegram-error-filter.js";
19
22
  import { storePassword, revokePassword, getSudoStatus, verifyPassword } from "../services/sudo.js";
20
23
  import { config } from "../config.js";
21
24
  import { BOT_VERSION } from "../version.js";
@@ -1442,11 +1445,25 @@ export function registerCommands(bot) {
1442
1445
  return;
1443
1446
  }
1444
1447
  // /cron run <name-or-id>
1448
+ //
1449
+ // UX contract:
1450
+ // 1. Instantly post a "πŸš€ Started …" message so the user knows
1451
+ // the command was received.
1452
+ // 2. Every 60s edit that message with the elapsed-time ticker
1453
+ // so the chat shows proof-of-life during 10+ min sub-agent
1454
+ // runs (the Daily Job Alert takes ~13 min in production).
1455
+ // 3. When runJobNow returns, edit the same message into a
1456
+ // final "βœ… Done" / "❌ error" / "⏳ already running" state.
1457
+ // 4. The heavy lifting (banner + full body + chunking) stays in
1458
+ // subagent-delivery.ts — which now has a Markdown→plain-text
1459
+ // fallback so it actually reaches the user.
1445
1460
  if (arg.startsWith("run ")) {
1446
1461
  const nameOrId = arg.slice(4).trim();
1447
- await ctx.api.sendChatAction(ctx.chat.id, "typing");
1448
- const outcome = await runJobNow(nameOrId);
1449
- if (outcome.status === "not-found") {
1462
+ // Resolve up-front so we can show the real job name in the
1463
+ // "Started" ack, and so we handle the not-found case BEFORE
1464
+ // spending a Telegram round-trip on a pointless placeholder.
1465
+ const resolved = resolveJobByNameOrId(listJobs(), nameOrId);
1466
+ if (!resolved) {
1450
1467
  const jobs = listJobs();
1451
1468
  const hint = jobs.length > 0
1452
1469
  ? `\n\nAvailable:\n${jobs.slice(0, 10).map(j => `β€’ ${j.name}`).join("\n")}`
@@ -1454,15 +1471,80 @@ export function registerCommands(bot) {
1454
1471
  await ctx.reply(`❌ No job matches <code>${nameOrId}</code>.${hint}`, { parse_mode: "HTML" });
1455
1472
  return;
1456
1473
  }
1457
- if (outcome.status === "already-running") {
1458
- await ctx.reply(`⏳ Job "${outcome.job.name}" is already running β€” not starting a duplicate. ` +
1459
- `Wait for the current run to finish, or /subagents cancel to abort it.`);
1460
- return;
1474
+ const jobName = resolved.name;
1475
+ const startedAt = Date.now();
1476
+ // Post initial ack β€” we'll edit THIS message for the ticker and
1477
+ // the final state.
1478
+ let ackMessageId = null;
1479
+ try {
1480
+ const ack = await ctx.reply(`πŸš€ Started *${escapeMarkdown(jobName)}* β€” working…`, { parse_mode: "Markdown" });
1481
+ ackMessageId = ack.message_id;
1482
+ }
1483
+ catch (err) {
1484
+ // If even the initial ack fails, fall back to plain text so
1485
+ // the user still knows we received the command.
1486
+ try {
1487
+ const ack = await ctx.reply(`πŸš€ Started ${jobName} β€” working…`);
1488
+ ackMessageId = ack.message_id;
1489
+ }
1490
+ catch { /* give up on the ack β€” run still fires below */ }
1491
+ }
1492
+ const chatId = ctx.chat.id;
1493
+ // Progress ticker: edit the ack message with elapsed time every
1494
+ // 60s. Errors from editMessageText (including the harmless
1495
+ // "message is not modified") are swallowed via the central filter.
1496
+ const ticker = setInterval(async () => {
1497
+ if (ackMessageId === null)
1498
+ return;
1499
+ const elapsed = Math.floor((Date.now() - startedAt) / 1000);
1500
+ try {
1501
+ await ctx.api.editMessageText(chatId, ackMessageId, buildTickerText(jobName, elapsed), { parse_mode: "Markdown" });
1502
+ }
1503
+ catch (err) {
1504
+ if (!isHarmlessTelegramError(err)) {
1505
+ console.warn(`[cron:run] ticker edit failed:`, err);
1506
+ }
1507
+ }
1508
+ }, 60_000);
1509
+ let outcome;
1510
+ try {
1511
+ outcome = await runJobNow(nameOrId);
1512
+ }
1513
+ finally {
1514
+ clearInterval(ticker);
1515
+ }
1516
+ // Final state β€” edit the ack message one last time.
1517
+ const elapsed = Math.floor((Date.now() - startedAt) / 1000);
1518
+ const finalText = (() => {
1519
+ if (outcome.status === "not-found") {
1520
+ // Shouldn't happen β€” we already resolved successfully above β€”
1521
+ // but handle it for completeness.
1522
+ return `❌ ${escapeMarkdown(jobName)} β€” not found (race?)`;
1523
+ }
1524
+ if (outcome.status === "already-running") {
1525
+ return buildDoneText(outcome.job.name, elapsed, { ok: true, skipped: true });
1526
+ }
1527
+ return buildDoneText(outcome.job.name, elapsed, {
1528
+ ok: !outcome.error,
1529
+ error: outcome.error,
1530
+ });
1531
+ })();
1532
+ if (ackMessageId !== null) {
1533
+ try {
1534
+ await ctx.api.editMessageText(chatId, ackMessageId, finalText, { parse_mode: "Markdown" });
1535
+ }
1536
+ catch (err) {
1537
+ if (!isHarmlessTelegramError(err)) {
1538
+ // Last-ditch fallback: post as a new plain message so the
1539
+ // user sees the result even if the edit failed.
1540
+ await ctx.reply(finalText).catch(() => { });
1541
+ }
1542
+ }
1543
+ }
1544
+ else {
1545
+ // We never got an ack message id β€” just post fresh
1546
+ await ctx.reply(finalText, { parse_mode: "Markdown" }).catch(() => ctx.reply(finalText));
1461
1547
  }
1462
- const output = outcome.output
1463
- ? `\`\`\`\n${outcome.output.slice(0, 2000)}\n\`\`\``
1464
- : "(no output)";
1465
- await ctx.reply(`πŸ”§ Job "${outcome.job.name}" executed:\n${output}${outcome.error ? `\n\n❌ ${outcome.error}` : ""}`, { parse_mode: "Markdown" });
1466
1548
  return;
1467
1549
  }
1468
1550
  await ctx.reply("Unknown cron command. Use /cron for help.");
@@ -0,0 +1,52 @@
1
+ /**
2
+ * Pure helpers for the /cron run progress ticker.
3
+ *
4
+ * Separated from commands.ts so the formatting and safety rules can be
5
+ * unit-tested without standing up the entire grammy Context. The command
6
+ * handler wires these into a setInterval that edits a single Telegram
7
+ * message once per tick, giving the user visible proof-of-life during
8
+ * long-running (10+ min) cron jobs.
9
+ *
10
+ * See test/cron-progress-ticker.test.ts for the contract.
11
+ */
12
+ /** Human-readable elapsed time β€” adapts unit to magnitude. */
13
+ export function formatElapsed(seconds) {
14
+ if (seconds < 60)
15
+ return `${seconds}s`;
16
+ const minutes = Math.floor(seconds / 60);
17
+ const remSec = seconds % 60;
18
+ if (minutes < 60)
19
+ return `${minutes}m ${remSec}s`;
20
+ const hours = Math.floor(minutes / 60);
21
+ const remMin = minutes % 60;
22
+ return `${hours}h ${remMin}m`;
23
+ }
24
+ /**
25
+ * Escape Markdown-breaking characters in untrusted display strings so
26
+ * an edit-message call can safely use `parse_mode: Markdown` without
27
+ * triggering "can't parse entities" β€” the exact bug that killed every
28
+ * daily-job-alert banner for days.
29
+ *
30
+ * We use Telegram Markdown (v1) escape rules: only `*`, `_`, `[`, `` ` ``.
31
+ * The rest flow through unchanged.
32
+ */
33
+ export function escapeMarkdown(text) {
34
+ return text.replace(/([*_[\]`])/g, "\\$1");
35
+ }
36
+ /** Intermediate ticker text: "πŸ”„ Running *name* Β· 2m 5s elapsed…" */
37
+ export function buildTickerText(jobName, elapsedSeconds) {
38
+ const safe = escapeMarkdown(jobName);
39
+ return `πŸ”„ Running *${safe}* Β· ${formatElapsed(elapsedSeconds)} elapsed…`;
40
+ }
41
+ /** Final ticker state: "βœ… Done β€” *name* Β· 13m 17s" (or ❌ / ⏳). */
42
+ export function buildDoneText(jobName, elapsedSeconds, outcome) {
43
+ const safe = escapeMarkdown(jobName);
44
+ if (outcome.skipped) {
45
+ return `⏳ *${safe}* is already running β€” not starting a duplicate`;
46
+ }
47
+ if (!outcome.ok) {
48
+ const errLine = outcome.error ? `\n\n${outcome.error.slice(0, 500)}` : "";
49
+ return `❌ *${safe}* β€” ${formatElapsed(elapsedSeconds)}${errLine}`;
50
+ }
51
+ return `βœ… Done β€” *${safe}* Β· ${formatElapsed(elapsedSeconds)}`;
52
+ }
package/dist/index.js CHANGED
@@ -267,7 +267,7 @@ const shutdown = async () => {
267
267
  }
268
268
  // Release :3100 so the next launchd boot doesn't hit EADDRINUSE.
269
269
  // Must happen before exit β€” see src/web/server.ts stopWebServer() comment.
270
- await stopWebServer(webServer).catch((err) => console.warn("[shutdown] stopWebServer failed:", err));
270
+ await stopWebServer().catch((err) => console.warn("[shutdown] stopWebServer failed:", err));
271
271
  await unloadPlugins().catch(() => { });
272
272
  await disconnectMCP().catch(() => { });
273
273
  // Tear down any bot-managed local runners (Ollama, LM Studio, …) so VRAM
@@ -404,8 +404,13 @@ async function startOptionalPlatforms() {
404
404
  }
405
405
  }
406
406
  startOptionalPlatforms().catch(err => console.error("Platform startup error:", err));
407
- // Start Web UI (ALWAYS β€” regardless of Telegram/AI config)
408
- const webServer = startWebServer();
407
+ // Start Web UI (ALWAYS β€” regardless of Telegram/AI config).
408
+ // startWebServer is now non-blocking and will never throw: if port 3100
409
+ // is busy (foreign process, TIME_WAIT, another bot instance), it climbs
410
+ // the port ladder up to 3119 and then enters a background retry loop
411
+ // at 3100 every 30s. The Telegram bot runs independently β€” Web UI is a
412
+ // feature, not core. See src/web/bind-strategy.ts for the retry rules.
413
+ startWebServer();
409
414
  // Start Cron Scheduler β€” route notifications through delivery queue for reliability
410
415
  setNotifyCallback(async (target, text) => {
411
416
  if (target.platform === "web") {
@@ -10,6 +10,35 @@
10
10
  * module with a fake bot via __setBotApiForTest.
11
11
  */
12
12
  import { getVisibility } from "./subagents.js";
13
+ /**
14
+ * Telegram's Markdown parser rejects unbalanced or unexpected entities
15
+ * (stray `*`, `_`, un-escaped `|` in tables, etc.). Sub-agent outputs
16
+ * mix all of these. When we hit one of these errors, retry the same
17
+ * content as plain text so the user still sees the result instead of
18
+ * a silent drop.
19
+ */
20
+ function isTelegramParseError(err) {
21
+ if (!err || typeof err !== "object")
22
+ return false;
23
+ const e = err;
24
+ const haystack = `${e.message ?? ""} ${e.description ?? ""}`;
25
+ return /can't parse entities|can't find end of the entity/i.test(haystack);
26
+ }
27
+ /**
28
+ * Send a Markdown message with an automatic plain-text retry on parse
29
+ * errors. Any other error propagates to the caller's outer catch.
30
+ */
31
+ async function sendWithMarkdownFallback(api, chatId, text) {
32
+ try {
33
+ await api.sendMessage(chatId, text, { parse_mode: "Markdown" });
34
+ }
35
+ catch (err) {
36
+ if (!isTelegramParseError(err))
37
+ throw err;
38
+ console.warn(`[subagent-delivery] Markdown parse failed, retrying as plain text`);
39
+ await api.sendMessage(chatId, text);
40
+ }
41
+ }
13
42
  const MAX_TG_CHUNK = 3800; // below Telegram's 4096 limit with headroom
14
43
  const FILE_UPLOAD_THRESHOLD = 20_000; // switch to .md file upload above this
15
44
  let injectedApi = null;
@@ -243,7 +272,7 @@ export async function deliverSubAgentResult(info, result, opts = {}) {
243
272
  try {
244
273
  // Case 1: very long output β†’ file upload with a short banner
245
274
  if (body.length > FILE_UPLOAD_THRESHOLD) {
246
- await api.sendMessage(info.parentChatId, banner, { parse_mode: "Markdown" });
275
+ await sendWithMarkdownFallback(api, info.parentChatId, banner);
247
276
  try {
248
277
  const { InputFile } = await import("grammy");
249
278
  const buf = Buffer.from(body, "utf-8");
@@ -257,12 +286,14 @@ export async function deliverSubAgentResult(info, result, opts = {}) {
257
286
  }
258
287
  // Case 2: fits in a single message β†’ banner + body joined
259
288
  if (body.length + banner.length + 2 <= MAX_TG_CHUNK) {
260
- await api.sendMessage(info.parentChatId, `${banner}\n\n${body}`, { parse_mode: "Markdown" });
289
+ await sendWithMarkdownFallback(api, info.parentChatId, `${banner}\n\n${body}`);
261
290
  return;
262
291
  }
263
292
  // Case 3: medium output β†’ banner as its own message, body chunked
264
- await api.sendMessage(info.parentChatId, banner, { parse_mode: "Markdown" });
293
+ await sendWithMarkdownFallback(api, info.parentChatId, banner);
265
294
  for (let i = 0; i < body.length; i += MAX_TG_CHUNK) {
295
+ // Body chunks are always sent as plain text β€” markdown across
296
+ // arbitrary chunk boundaries would be inconsistent anyway.
266
297
  await api.sendMessage(info.parentChatId, body.slice(i, i + MAX_TG_CHUNK));
267
298
  }
268
299
  }
@@ -0,0 +1,42 @@
1
+ /**
2
+ * Pure decision helper for the web-server bind loop.
3
+ *
4
+ * Decouples the "what should happen next" logic from the side-effect
5
+ * spaghetti of real http.Server binding so it can be unit-tested in
6
+ * isolation. See test/web-server-resilience.test.ts for the contract.
7
+ *
8
+ * Why this exists: the v4.8.x and earlier implementations crashed the
9
+ * entire bot when port 3100 was held by a foreign process. A colleague
10
+ * running an OpenClaw fork hit the same bug years ago and ended up
11
+ * decoupling the web server completely β€” the main bot should never be
12
+ * gated on a web-UI bind. This helper encodes the decision logic so
13
+ * the new startWebServer() can just act on the returned action.
14
+ */
15
+ /**
16
+ * Decide what the bind loop should do next after a failed listen().
17
+ *
18
+ * Rule of thumb:
19
+ * - EADDRINUSE AND attempts remaining β†’ climb the port ladder.
20
+ * - EADDRINUSE AND ladder exhausted β†’ background retry at original port.
21
+ * - any other error (EACCES, listen-called-twice, etc.) β†’ background retry.
22
+ *
23
+ * PURE: no timers, no I/O, no mutation of inputs. Safe to call from tests.
24
+ */
25
+ export function decideNextBindAction(err, attempt, opts) {
26
+ const code = err?.code;
27
+ if (code === "EADDRINUSE" && attempt < opts.maxPortTries - 1) {
28
+ return {
29
+ type: "retry-port",
30
+ port: opts.originalPort + attempt + 1,
31
+ attempt: attempt + 1,
32
+ };
33
+ }
34
+ // EADDRINUSE with no attempts left, OR any non-EADDRINUSE error:
35
+ // don't walk the port ladder further, just back off and retry the
36
+ // original port in the background.
37
+ return {
38
+ type: "retry-background",
39
+ delayMs: opts.backgroundRetryMs,
40
+ port: opts.originalPort,
41
+ };
42
+ }