alvin-bot 4.9.2 β 4.9.4
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +79 -0
- package/README.md +12 -1
- package/dist/handlers/commands.js +93 -11
- package/dist/handlers/cron-progress.js +52 -0
- package/dist/index.js +8 -3
- package/dist/services/subagent-delivery.js +34 -3
- package/dist/web/bind-strategy.js +42 -0
- package/dist/web/server.js +231 -101
- package/package.json +1 -1
- package/test/cron-progress-ticker.test.ts +76 -0
- package/test/stress-scenarios.test.ts +1 -1
- package/test/subagent-delivery-markdown-fallback.test.ts +147 -0
- package/test/web-server-integration.test.ts +189 -0
- package/test/web-server-resilience.test.ts +118 -0
- package/test/web-server-shutdown.test.ts +7 -1
package/CHANGELOG.md
CHANGED
|
@@ -2,6 +2,85 @@
|
|
|
2
2
|
|
|
3
3
|
All notable changes to Alvin Bot are documented here.
|
|
4
4
|
|
|
5
|
+
## [4.9.4] β 2026-04-13
|
|
6
|
+
|
|
7
|
+
### π Web UI fully decoupled from main bot β port conflicts no longer crash anything
|
|
8
|
+
|
|
9
|
+
Colleague feedback (WhatsApp voice note, 2026-04-13):
|
|
10
|
+
> *"The gateway binds to port 3100 like OpenClaw. When the bot restarts,
|
|
11
|
+
> the port is often still held β catastrophic crash. I ended up
|
|
12
|
+
> decoupling the gateway process completely, because the actual bot
|
|
13
|
+
> runs independently of the gateway β it can still answer Telegram
|
|
14
|
+
> even if the web endpoint isn't reachable yet. It's weird that the
|
|
15
|
+
> main routine crashes when the port is busy. It should just run in
|
|
16
|
+
> the background, watch for the port to become free, and connect
|
|
17
|
+
> then. Zero impact on the main routine."*
|
|
18
|
+
|
|
19
|
+
He was right. My v4.9.0 `stopWebServer()` fix was *prevention* β it stopped the bot itself from holding 3100 across restarts. But it didn't cover the *resilience* side: a foreign process holding 3100 (another dev server, an OpenClaw-style orphan, a TIME_WAIT race after SIGKILL) still crashed the boot, because `startWebServer()` was synchronous and the `uncaught exception` from `server.listen()` escaped to the main event loop.
|
|
20
|
+
|
|
21
|
+
**Complete rewrite of the bind loop:**
|
|
22
|
+
|
|
23
|
+
- **`src/web/bind-strategy.ts` (new) β pure decision helper.** `decideNextBindAction(err, attempt, opts)` returns either `{type: "retry-port", port, attempt}` (climb the ladder) or `{type: "retry-background", delayMs, port}` (back off, retry the original port in 30 s). EADDRINUSE with attempts remaining β ladder. EADDRINUSE exhausted β background. Any other error β background. 8 unit tests covering every branch + purity.
|
|
24
|
+
|
|
25
|
+
- **`src/web/server.ts` startWebServer β non-blocking, fresh-server-per-attempt.** Returns `void` synchronously, NEVER throws, NEVER blocks on bind. Each attempt creates a new `http.Server` (no state-recycling bugs) and attaches its own error handler. On failure, cleans up and calls `decideNextBindAction` to decide the next move. If the ladder is exhausted, schedules a 30 s background retry at the original port β the Telegram bot keeps running the whole time, the web UI just isn't reachable yet.
|
|
26
|
+
|
|
27
|
+
- **`src/web/server.ts` WebSocketServer attached POST-bind.** The `ws` library's `WebSocketServer` constructor installs its own event plumbing on the underlying `http.Server` and β crucially β causes EADDRINUSE errors to escape as uncaught exceptions when attached pre-listen. Debugging this chewed an hour on 2026-04-13. Fix: only `new WebSocketServer({ server })` AFTER `listen()` has fired its callback. The unit-test `test/web-server-integration.test.ts "when the primary port is taken"` pins this behaviour.
|
|
28
|
+
|
|
29
|
+
- **`src/web/server.ts` error handler: `on` not `once`.** Previous version used `.once("error", handler)` and a node edge case where a single bind failure emits TWO error events left the second one uncaught. Handler is now `on` with a `handled` guard β idempotent, and a post-bind quiet logger replaces it on success.
|
|
30
|
+
|
|
31
|
+
- **`src/web/server.ts` defensive try/catch around `server.listen()`.** In the wild Node sometimes throws synchronously for edge-case binds (already-listening, invalid backlog, kernel race). The catch funnels sync throws through the same `handleBindFailure` path as async error events.
|
|
32
|
+
|
|
33
|
+
- **`src/web/server.ts` `closeHttpServerGracefully(server)` + `stopWebServer()`.** The old `stopWebServer(server)` took an explicit server arg; it's been split into a low-level helper (`closeHttpServerGracefully(server)`, exported for tests) and a stateful top-level (`stopWebServer()`, no args, cleans up `currentServer` + `wsServerRef` + `bindRetryTimer`). Safe to call before start, safe to call twice, cancels pending background retries.
|
|
34
|
+
|
|
35
|
+
- **`src/index.ts` call sites adjusted.** `const webServer = startWebServer()` β `startWebServer()`. `stopWebServer(webServer)` β `stopWebServer()`. The comment above the call explains the decoupling so nobody accidentally re-couples it in a future "clean up" refactor.
|
|
36
|
+
|
|
37
|
+
**Testing: 186 β 201 (+15 new).**
|
|
38
|
+
|
|
39
|
+
- `test/web-server-resilience.test.ts` β 8 unit tests for `decideNextBindAction`
|
|
40
|
+
- `test/web-server-integration.test.ts` β 7 real-server integration tests: startWebServer returns void, binds, stops, is idempotent, survives primary-port conflict by climbing the ladder, closes servers with hanging sockets.
|
|
41
|
+
- **Live-verified on the maintainer's machine**: `launchctl unload` + dual-stack Node hog on port 3100 + `launchctl load` β bot booted cleanly β out.log contained `[web] port 3100 busy (EADDRINUSE) β trying 3101` β `π Web UI: http://localhost:3101 (Port 3100 was busy, using 3101 instead)` β Telegram responsive throughout. Exactly what the colleague described.
|
|
42
|
+
|
|
43
|
+
**Non-goals / intentionally unchanged:**
|
|
44
|
+
- Timeouts stay unlimited (v4.8.8 behaviour preserved).
|
|
45
|
+
- The primary port is still `WEB_PORT || 3100` β no config schema change.
|
|
46
|
+
- When the bot binds on a non-primary port (e.g. 3101), the README permalink still points at 3100. Users hitting a ladder-climbed bot should check the startup log; this is rare and temporary.
|
|
47
|
+
|
|
48
|
+
## [4.9.3] β 2026-04-11
|
|
49
|
+
|
|
50
|
+
### π Two UX bugs found in production after v4.9.2 β now closed
|
|
51
|
+
|
|
52
|
+
Ali triggered `/cron run Daily Job Alert` after the v4.9.2 deploy and saw 13 minutes of chat silence followed by nothing. Forensics on the live bot revealed two distinct problems on top of an already-successful run:
|
|
53
|
+
|
|
54
|
+
**1. `subagent-delivery` has been silently dropping every banner for days.** Err.log: `GrammyError: Call to 'sendMessage' failed! (400: Bad Request: can't parse entities: Can't find end of the entity starting at byte offset 2636)`. The daily-job-alert sub-agent produces markdown-dense output (`|` tables, `**bold**`, `\|` escapes, mixed asterisks). Telegram's Markdown parser refuses it, `api.sendMessage(..., parse_mode: "Markdown")` throws, and the bare try/catch in `deliverSubAgentResult` logs + bails. **Result: the user has never seen a sub-agent-delivery banner, even when the underlying run succeeded perfectly and emailed the HTML report correctly.**
|
|
55
|
+
|
|
56
|
+
Fix in `src/services/subagent-delivery.ts`: new `sendWithMarkdownFallback()` helper that detects the "can't parse entities" pattern and retries the SAME text without `parse_mode`. All three code paths (file-upload case, single-message case, chunked case) now flow through the helper. 3 new tests drive the happy path, non-parse errors, and the chunked path.
|
|
57
|
+
|
|
58
|
+
**2. `/cron run` had zero proof-of-life for 13 minutes.** The handler used to `await runJobNow(...)` synchronously and reply only when finished. Telegram's typing indicator expires after 5s. Users saw: command sent β typing indicator blip β nothing β nothing β (much later, if at all) result. For cron jobs that take 10-15 min (daily job alert, Perseus health, Polyseus P&L), this is indistinguishable from a dead bot.
|
|
59
|
+
|
|
60
|
+
Fix β new handler flow:
|
|
61
|
+
|
|
62
|
+
```
|
|
63
|
+
bot: π Started *Daily Job Alert* β workingβ¦ β instant ack
|
|
64
|
+
bot: π Running *Daily Job Alert* Β· 1m 0s elapsedβ¦ β edit every 60s
|
|
65
|
+
bot: π Running *Daily Job Alert* Β· 2m 0s elapsedβ¦ β edit
|
|
66
|
+
...
|
|
67
|
+
bot: β
Done β *Daily Job Alert* Β· 13m 17s β final edit
|
|
68
|
+
bot: β
*Daily Job Alert* completed Β· 13m Β· 2.6M/28k β subagent-delivery
|
|
69
|
+
[full report body, Markdown-safe with plain-text fallback]
|
|
70
|
+
```
|
|
71
|
+
|
|
72
|
+
The ticker uses a single `editMessageText` call per minute on the same message β zero notification spam, clean visual progress. Every edit is wrapped with `isHarmlessTelegramError` so the inevitable "message is not modified" races stay silent. The ack itself falls back to plain text if the first `reply` hits a parse error, and the final edit falls back to a fresh plain message if the edit fails.
|
|
73
|
+
|
|
74
|
+
New module: `src/handlers/cron-progress.ts` with pure helpers β `formatElapsed`, `escapeMarkdown`, `buildTickerText`, `buildDoneText`. 8 tests cover the formatting rules and markdown-safety escapes so future cron jobs with weird names (`weird_job*name`) can't break the ticker.
|
|
75
|
+
|
|
76
|
+
**186 tests total** (+11 new). All green. Timeouts remain unlimited.
|
|
77
|
+
|
|
78
|
+
**What you see after this upgrade:**
|
|
79
|
+
- Instant "π Started" ack on `/cron run`
|
|
80
|
+
- Live elapsed-time ticker every minute
|
|
81
|
+
- Final "β
Done" when the sub-agent finishes
|
|
82
|
+
- A separate banner+body message with the full report β **this time actually delivered**, even when the body contains broken Markdown
|
|
83
|
+
|
|
5
84
|
## [4.9.2] β 2026-04-11
|
|
6
85
|
|
|
7
86
|
### π Post-review polish: three edge cases from the strict audit
|
package/README.md
CHANGED
|
@@ -114,7 +114,18 @@ That's it. The setup wizard validates everything:
|
|
|
114
114
|
|
|
115
115
|
**Requires:** Node.js 18+ ([nodejs.org](https://nodejs.org)) Β· Telegram bot token ([@BotFather](https://t.me/BotFather)) Β· Your Telegram user ID ([@userinfobot](https://t.me/userinfobot))
|
|
116
116
|
|
|
117
|
-
Free AI providers available β no credit card needed.
|
|
117
|
+
Free AI providers available β no credit card needed. **Privacy-first?** Pick the π **Offline β Gemma 4 E4B** option in setup for a fully local LLM via Ollama (macOS/Linux: automated install; Windows: manual).
|
|
118
|
+
|
|
119
|
+
### π First-time setup walkthroughs
|
|
120
|
+
|
|
121
|
+
Step-by-step guides with screenshots and screen-for-screen instructions:
|
|
122
|
+
|
|
123
|
+
| Platform | PDF (printable) |
|
|
124
|
+
|---|---|
|
|
125
|
+
| π **macOS** (with `launchd` background service) | [Download PDF](https://github.com/alvbln/Alvin-Bot/releases/latest/download/Alvin-Bot-macOS-Setup-Guide.pdf) |
|
|
126
|
+
| πͺ **Windows** (with Task Scheduler / Startup folder) | [Download PDF](https://github.com/alvbln/Alvin-Bot/releases/latest/download/Alvin-Bot-Windows-Setup-Guide.pdf) |
|
|
127
|
+
|
|
128
|
+
Both guides cover: Node.js install Β· Telegram bot creation Β· first-time `setup` Β· foreground test Β· background service Β· offline Gemma 4 mode Β· troubleshooting. ~15 min end-to-end for a first-time user.
|
|
118
129
|
|
|
119
130
|
### macOS: use `launchd` instead of pm2 (recommended)
|
|
120
131
|
|
|
@@ -16,6 +16,9 @@ import { getMCPStatus, getMCPTools, callMCPTool } from "../services/mcp.js";
|
|
|
16
16
|
import { listCustomTools, executeCustomTool } from "../services/custom-tools.js";
|
|
17
17
|
import { screenshotUrl, extractText, generatePdf, hasPlaywright } from "../services/browser.js";
|
|
18
18
|
import { listJobs, createJob, deleteJob, toggleJob, runJobNow, formatNextRun, humanReadableSchedule } from "../services/cron.js";
|
|
19
|
+
import { resolveJobByNameOrId } from "../services/cron-resolver.js";
|
|
20
|
+
import { buildTickerText, buildDoneText, escapeMarkdown } from "./cron-progress.js";
|
|
21
|
+
import { isHarmlessTelegramError } from "../util/telegram-error-filter.js";
|
|
19
22
|
import { storePassword, revokePassword, getSudoStatus, verifyPassword } from "../services/sudo.js";
|
|
20
23
|
import { config } from "../config.js";
|
|
21
24
|
import { BOT_VERSION } from "../version.js";
|
|
@@ -1442,11 +1445,25 @@ export function registerCommands(bot) {
|
|
|
1442
1445
|
return;
|
|
1443
1446
|
}
|
|
1444
1447
|
// /cron run <name-or-id>
|
|
1448
|
+
//
|
|
1449
|
+
// UX contract:
|
|
1450
|
+
// 1. Instantly post a "π Started β¦" message so the user knows
|
|
1451
|
+
// the command was received.
|
|
1452
|
+
// 2. Every 60s edit that message with the elapsed-time ticker
|
|
1453
|
+
// so the chat shows proof-of-life during 10+ min sub-agent
|
|
1454
|
+
// runs (the Daily Job Alert takes ~13 min in production).
|
|
1455
|
+
// 3. When runJobNow returns, edit the same message into a
|
|
1456
|
+
// final "β
Done" / "β error" / "β³ already running" state.
|
|
1457
|
+
// 4. The heavy lifting (banner + full body + chunking) stays in
|
|
1458
|
+
// subagent-delivery.ts β which now has a Markdownβplain-text
|
|
1459
|
+
// fallback so it actually reaches the user.
|
|
1445
1460
|
if (arg.startsWith("run ")) {
|
|
1446
1461
|
const nameOrId = arg.slice(4).trim();
|
|
1447
|
-
|
|
1448
|
-
|
|
1449
|
-
|
|
1462
|
+
// Resolve up-front so we can show the real job name in the
|
|
1463
|
+
// "Started" ack, and so we handle the not-found case BEFORE
|
|
1464
|
+
// spending a Telegram round-trip on a pointless placeholder.
|
|
1465
|
+
const resolved = resolveJobByNameOrId(listJobs(), nameOrId);
|
|
1466
|
+
if (!resolved) {
|
|
1450
1467
|
const jobs = listJobs();
|
|
1451
1468
|
const hint = jobs.length > 0
|
|
1452
1469
|
? `\n\nAvailable:\n${jobs.slice(0, 10).map(j => `β’ ${j.name}`).join("\n")}`
|
|
@@ -1454,15 +1471,80 @@ export function registerCommands(bot) {
|
|
|
1454
1471
|
await ctx.reply(`β No job matches <code>${nameOrId}</code>.${hint}`, { parse_mode: "HTML" });
|
|
1455
1472
|
return;
|
|
1456
1473
|
}
|
|
1457
|
-
|
|
1458
|
-
|
|
1459
|
-
|
|
1460
|
-
|
|
1474
|
+
const jobName = resolved.name;
|
|
1475
|
+
const startedAt = Date.now();
|
|
1476
|
+
// Post initial ack β we'll edit THIS message for the ticker and
|
|
1477
|
+
// the final state.
|
|
1478
|
+
let ackMessageId = null;
|
|
1479
|
+
try {
|
|
1480
|
+
const ack = await ctx.reply(`π Started *${escapeMarkdown(jobName)}* β workingβ¦`, { parse_mode: "Markdown" });
|
|
1481
|
+
ackMessageId = ack.message_id;
|
|
1482
|
+
}
|
|
1483
|
+
catch (err) {
|
|
1484
|
+
// If even the initial ack fails, fall back to plain text so
|
|
1485
|
+
// the user still knows we received the command.
|
|
1486
|
+
try {
|
|
1487
|
+
const ack = await ctx.reply(`π Started ${jobName} β workingβ¦`);
|
|
1488
|
+
ackMessageId = ack.message_id;
|
|
1489
|
+
}
|
|
1490
|
+
catch { /* give up on the ack β run still fires below */ }
|
|
1491
|
+
}
|
|
1492
|
+
const chatId = ctx.chat.id;
|
|
1493
|
+
// Progress ticker: edit the ack message with elapsed time every
|
|
1494
|
+
// 60s. Errors from editMessageText (including the harmless
|
|
1495
|
+
// "message is not modified") are swallowed via the central filter.
|
|
1496
|
+
const ticker = setInterval(async () => {
|
|
1497
|
+
if (ackMessageId === null)
|
|
1498
|
+
return;
|
|
1499
|
+
const elapsed = Math.floor((Date.now() - startedAt) / 1000);
|
|
1500
|
+
try {
|
|
1501
|
+
await ctx.api.editMessageText(chatId, ackMessageId, buildTickerText(jobName, elapsed), { parse_mode: "Markdown" });
|
|
1502
|
+
}
|
|
1503
|
+
catch (err) {
|
|
1504
|
+
if (!isHarmlessTelegramError(err)) {
|
|
1505
|
+
console.warn(`[cron:run] ticker edit failed:`, err);
|
|
1506
|
+
}
|
|
1507
|
+
}
|
|
1508
|
+
}, 60_000);
|
|
1509
|
+
let outcome;
|
|
1510
|
+
try {
|
|
1511
|
+
outcome = await runJobNow(nameOrId);
|
|
1512
|
+
}
|
|
1513
|
+
finally {
|
|
1514
|
+
clearInterval(ticker);
|
|
1515
|
+
}
|
|
1516
|
+
// Final state β edit the ack message one last time.
|
|
1517
|
+
const elapsed = Math.floor((Date.now() - startedAt) / 1000);
|
|
1518
|
+
const finalText = (() => {
|
|
1519
|
+
if (outcome.status === "not-found") {
|
|
1520
|
+
// Shouldn't happen β we already resolved successfully above β
|
|
1521
|
+
// but handle it for completeness.
|
|
1522
|
+
return `β ${escapeMarkdown(jobName)} β not found (race?)`;
|
|
1523
|
+
}
|
|
1524
|
+
if (outcome.status === "already-running") {
|
|
1525
|
+
return buildDoneText(outcome.job.name, elapsed, { ok: true, skipped: true });
|
|
1526
|
+
}
|
|
1527
|
+
return buildDoneText(outcome.job.name, elapsed, {
|
|
1528
|
+
ok: !outcome.error,
|
|
1529
|
+
error: outcome.error,
|
|
1530
|
+
});
|
|
1531
|
+
})();
|
|
1532
|
+
if (ackMessageId !== null) {
|
|
1533
|
+
try {
|
|
1534
|
+
await ctx.api.editMessageText(chatId, ackMessageId, finalText, { parse_mode: "Markdown" });
|
|
1535
|
+
}
|
|
1536
|
+
catch (err) {
|
|
1537
|
+
if (!isHarmlessTelegramError(err)) {
|
|
1538
|
+
// Last-ditch fallback: post as a new plain message so the
|
|
1539
|
+
// user sees the result even if the edit failed.
|
|
1540
|
+
await ctx.reply(finalText).catch(() => { });
|
|
1541
|
+
}
|
|
1542
|
+
}
|
|
1543
|
+
}
|
|
1544
|
+
else {
|
|
1545
|
+
// We never got an ack message id β just post fresh
|
|
1546
|
+
await ctx.reply(finalText, { parse_mode: "Markdown" }).catch(() => ctx.reply(finalText));
|
|
1461
1547
|
}
|
|
1462
|
-
const output = outcome.output
|
|
1463
|
-
? `\`\`\`\n${outcome.output.slice(0, 2000)}\n\`\`\``
|
|
1464
|
-
: "(no output)";
|
|
1465
|
-
await ctx.reply(`π§ Job "${outcome.job.name}" executed:\n${output}${outcome.error ? `\n\nβ ${outcome.error}` : ""}`, { parse_mode: "Markdown" });
|
|
1466
1548
|
return;
|
|
1467
1549
|
}
|
|
1468
1550
|
await ctx.reply("Unknown cron command. Use /cron for help.");
|
|
@@ -0,0 +1,52 @@
|
|
|
1
|
+
/**
|
|
2
|
+
* Pure helpers for the /cron run progress ticker.
|
|
3
|
+
*
|
|
4
|
+
* Separated from commands.ts so the formatting and safety rules can be
|
|
5
|
+
* unit-tested without standing up the entire grammy Context. The command
|
|
6
|
+
* handler wires these into a setInterval that edits a single Telegram
|
|
7
|
+
* message once per tick, giving the user visible proof-of-life during
|
|
8
|
+
* long-running (10+ min) cron jobs.
|
|
9
|
+
*
|
|
10
|
+
* See test/cron-progress-ticker.test.ts for the contract.
|
|
11
|
+
*/
|
|
12
|
+
/** Human-readable elapsed time β adapts unit to magnitude. */
|
|
13
|
+
export function formatElapsed(seconds) {
|
|
14
|
+
if (seconds < 60)
|
|
15
|
+
return `${seconds}s`;
|
|
16
|
+
const minutes = Math.floor(seconds / 60);
|
|
17
|
+
const remSec = seconds % 60;
|
|
18
|
+
if (minutes < 60)
|
|
19
|
+
return `${minutes}m ${remSec}s`;
|
|
20
|
+
const hours = Math.floor(minutes / 60);
|
|
21
|
+
const remMin = minutes % 60;
|
|
22
|
+
return `${hours}h ${remMin}m`;
|
|
23
|
+
}
|
|
24
|
+
/**
|
|
25
|
+
* Escape Markdown-breaking characters in untrusted display strings so
|
|
26
|
+
* an edit-message call can safely use `parse_mode: Markdown` without
|
|
27
|
+
* triggering "can't parse entities" β the exact bug that killed every
|
|
28
|
+
* daily-job-alert banner for days.
|
|
29
|
+
*
|
|
30
|
+
* We use Telegram Markdown (v1) escape rules: only `*`, `_`, `[`, `` ` ``.
|
|
31
|
+
* The rest flow through unchanged.
|
|
32
|
+
*/
|
|
33
|
+
export function escapeMarkdown(text) {
|
|
34
|
+
return text.replace(/([*_[\]`])/g, "\\$1");
|
|
35
|
+
}
|
|
36
|
+
/** Intermediate ticker text: "π Running *name* Β· 2m 5s elapsedβ¦" */
|
|
37
|
+
export function buildTickerText(jobName, elapsedSeconds) {
|
|
38
|
+
const safe = escapeMarkdown(jobName);
|
|
39
|
+
return `π Running *${safe}* Β· ${formatElapsed(elapsedSeconds)} elapsedβ¦`;
|
|
40
|
+
}
|
|
41
|
+
/** Final ticker state: "β
Done β *name* Β· 13m 17s" (or β / β³). */
|
|
42
|
+
export function buildDoneText(jobName, elapsedSeconds, outcome) {
|
|
43
|
+
const safe = escapeMarkdown(jobName);
|
|
44
|
+
if (outcome.skipped) {
|
|
45
|
+
return `β³ *${safe}* is already running β not starting a duplicate`;
|
|
46
|
+
}
|
|
47
|
+
if (!outcome.ok) {
|
|
48
|
+
const errLine = outcome.error ? `\n\n${outcome.error.slice(0, 500)}` : "";
|
|
49
|
+
return `β *${safe}* β ${formatElapsed(elapsedSeconds)}${errLine}`;
|
|
50
|
+
}
|
|
51
|
+
return `β
Done β *${safe}* Β· ${formatElapsed(elapsedSeconds)}`;
|
|
52
|
+
}
|
package/dist/index.js
CHANGED
|
@@ -267,7 +267,7 @@ const shutdown = async () => {
|
|
|
267
267
|
}
|
|
268
268
|
// Release :3100 so the next launchd boot doesn't hit EADDRINUSE.
|
|
269
269
|
// Must happen before exit β see src/web/server.ts stopWebServer() comment.
|
|
270
|
-
await stopWebServer(
|
|
270
|
+
await stopWebServer().catch((err) => console.warn("[shutdown] stopWebServer failed:", err));
|
|
271
271
|
await unloadPlugins().catch(() => { });
|
|
272
272
|
await disconnectMCP().catch(() => { });
|
|
273
273
|
// Tear down any bot-managed local runners (Ollama, LM Studio, β¦) so VRAM
|
|
@@ -404,8 +404,13 @@ async function startOptionalPlatforms() {
|
|
|
404
404
|
}
|
|
405
405
|
}
|
|
406
406
|
startOptionalPlatforms().catch(err => console.error("Platform startup error:", err));
|
|
407
|
-
// Start Web UI (ALWAYS β regardless of Telegram/AI config)
|
|
408
|
-
|
|
407
|
+
// Start Web UI (ALWAYS β regardless of Telegram/AI config).
|
|
408
|
+
// startWebServer is now non-blocking and will never throw: if port 3100
|
|
409
|
+
// is busy (foreign process, TIME_WAIT, another bot instance), it climbs
|
|
410
|
+
// the port ladder up to 3119 and then enters a background retry loop
|
|
411
|
+
// at 3100 every 30s. The Telegram bot runs independently β Web UI is a
|
|
412
|
+
// feature, not core. See src/web/bind-strategy.ts for the retry rules.
|
|
413
|
+
startWebServer();
|
|
409
414
|
// Start Cron Scheduler β route notifications through delivery queue for reliability
|
|
410
415
|
setNotifyCallback(async (target, text) => {
|
|
411
416
|
if (target.platform === "web") {
|
|
@@ -10,6 +10,35 @@
|
|
|
10
10
|
* module with a fake bot via __setBotApiForTest.
|
|
11
11
|
*/
|
|
12
12
|
import { getVisibility } from "./subagents.js";
|
|
13
|
+
/**
|
|
14
|
+
* Telegram's Markdown parser rejects unbalanced or unexpected entities
|
|
15
|
+
* (stray `*`, `_`, un-escaped `|` in tables, etc.). Sub-agent outputs
|
|
16
|
+
* mix all of these. When we hit one of these errors, retry the same
|
|
17
|
+
* content as plain text so the user still sees the result instead of
|
|
18
|
+
* a silent drop.
|
|
19
|
+
*/
|
|
20
|
+
function isTelegramParseError(err) {
|
|
21
|
+
if (!err || typeof err !== "object")
|
|
22
|
+
return false;
|
|
23
|
+
const e = err;
|
|
24
|
+
const haystack = `${e.message ?? ""} ${e.description ?? ""}`;
|
|
25
|
+
return /can't parse entities|can't find end of the entity/i.test(haystack);
|
|
26
|
+
}
|
|
27
|
+
/**
|
|
28
|
+
* Send a Markdown message with an automatic plain-text retry on parse
|
|
29
|
+
* errors. Any other error propagates to the caller's outer catch.
|
|
30
|
+
*/
|
|
31
|
+
async function sendWithMarkdownFallback(api, chatId, text) {
|
|
32
|
+
try {
|
|
33
|
+
await api.sendMessage(chatId, text, { parse_mode: "Markdown" });
|
|
34
|
+
}
|
|
35
|
+
catch (err) {
|
|
36
|
+
if (!isTelegramParseError(err))
|
|
37
|
+
throw err;
|
|
38
|
+
console.warn(`[subagent-delivery] Markdown parse failed, retrying as plain text`);
|
|
39
|
+
await api.sendMessage(chatId, text);
|
|
40
|
+
}
|
|
41
|
+
}
|
|
13
42
|
const MAX_TG_CHUNK = 3800; // below Telegram's 4096 limit with headroom
|
|
14
43
|
const FILE_UPLOAD_THRESHOLD = 20_000; // switch to .md file upload above this
|
|
15
44
|
let injectedApi = null;
|
|
@@ -243,7 +272,7 @@ export async function deliverSubAgentResult(info, result, opts = {}) {
|
|
|
243
272
|
try {
|
|
244
273
|
// Case 1: very long output β file upload with a short banner
|
|
245
274
|
if (body.length > FILE_UPLOAD_THRESHOLD) {
|
|
246
|
-
await api
|
|
275
|
+
await sendWithMarkdownFallback(api, info.parentChatId, banner);
|
|
247
276
|
try {
|
|
248
277
|
const { InputFile } = await import("grammy");
|
|
249
278
|
const buf = Buffer.from(body, "utf-8");
|
|
@@ -257,12 +286,14 @@ export async function deliverSubAgentResult(info, result, opts = {}) {
|
|
|
257
286
|
}
|
|
258
287
|
// Case 2: fits in a single message β banner + body joined
|
|
259
288
|
if (body.length + banner.length + 2 <= MAX_TG_CHUNK) {
|
|
260
|
-
await api
|
|
289
|
+
await sendWithMarkdownFallback(api, info.parentChatId, `${banner}\n\n${body}`);
|
|
261
290
|
return;
|
|
262
291
|
}
|
|
263
292
|
// Case 3: medium output β banner as its own message, body chunked
|
|
264
|
-
await api
|
|
293
|
+
await sendWithMarkdownFallback(api, info.parentChatId, banner);
|
|
265
294
|
for (let i = 0; i < body.length; i += MAX_TG_CHUNK) {
|
|
295
|
+
// Body chunks are always sent as plain text β markdown across
|
|
296
|
+
// arbitrary chunk boundaries would be inconsistent anyway.
|
|
266
297
|
await api.sendMessage(info.parentChatId, body.slice(i, i + MAX_TG_CHUNK));
|
|
267
298
|
}
|
|
268
299
|
}
|
|
@@ -0,0 +1,42 @@
|
|
|
1
|
+
/**
|
|
2
|
+
* Pure decision helper for the web-server bind loop.
|
|
3
|
+
*
|
|
4
|
+
* Decouples the "what should happen next" logic from the side-effect
|
|
5
|
+
* spaghetti of real http.Server binding so it can be unit-tested in
|
|
6
|
+
* isolation. See test/web-server-resilience.test.ts for the contract.
|
|
7
|
+
*
|
|
8
|
+
* Why this exists: the v4.8.x and earlier implementations crashed the
|
|
9
|
+
* entire bot when port 3100 was held by a foreign process. A colleague
|
|
10
|
+
* running an OpenClaw fork hit the same bug years ago and ended up
|
|
11
|
+
* decoupling the web server completely β the main bot should never be
|
|
12
|
+
* gated on a web-UI bind. This helper encodes the decision logic so
|
|
13
|
+
* the new startWebServer() can just act on the returned action.
|
|
14
|
+
*/
|
|
15
|
+
/**
|
|
16
|
+
* Decide what the bind loop should do next after a failed listen().
|
|
17
|
+
*
|
|
18
|
+
* Rule of thumb:
|
|
19
|
+
* - EADDRINUSE AND attempts remaining β climb the port ladder.
|
|
20
|
+
* - EADDRINUSE AND ladder exhausted β background retry at original port.
|
|
21
|
+
* - any other error (EACCES, listen-called-twice, etc.) β background retry.
|
|
22
|
+
*
|
|
23
|
+
* PURE: no timers, no I/O, no mutation of inputs. Safe to call from tests.
|
|
24
|
+
*/
|
|
25
|
+
export function decideNextBindAction(err, attempt, opts) {
|
|
26
|
+
const code = err?.code;
|
|
27
|
+
if (code === "EADDRINUSE" && attempt < opts.maxPortTries - 1) {
|
|
28
|
+
return {
|
|
29
|
+
type: "retry-port",
|
|
30
|
+
port: opts.originalPort + attempt + 1,
|
|
31
|
+
attempt: attempt + 1,
|
|
32
|
+
};
|
|
33
|
+
}
|
|
34
|
+
// EADDRINUSE with no attempts left, OR any non-EADDRINUSE error:
|
|
35
|
+
// don't walk the port ladder further, just back off and retry the
|
|
36
|
+
// original port in the background.
|
|
37
|
+
return {
|
|
38
|
+
type: "retry-background",
|
|
39
|
+
delayMs: opts.backgroundRetryMs,
|
|
40
|
+
port: opts.originalPort,
|
|
41
|
+
};
|
|
42
|
+
}
|