npm - polygram - Versions diffs - 0.6.5 → 0.6.7 - Mend

polygram 0.6.5 → 0.6.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

package/README.md CHANGED Viewed

@@ -49,8 +49,18 @@ ergonomics while running on top of `claude` CLI.
 - **Voice transcription.** OpenAI Whisper API or local `whisper.cpp`,
   selectable per bot. Transcriptions land in `messages.text` so FTS
   finds them.
+- **Per-attachment table** (`attachments`, since 0.6.0) with download
+  lifecycle (`pending` → `downloaded` | `failed`), per-attachment
+  transcription, and `chat_id`/`kind`/`status` indexes for ops queries.
+  Replaces the older `attachments_json` blob — query "all PDFs Maria
+  sent last week" without scanning every message. Failed downloads
+  surface to Claude as `<attachment-failed reason="..." />` so the
+  user gets a real explanation, not silence.
 - **Content-addressed attachment storage** via Telegram's `file_unique_id`.
-  Same photo forwarded twice = one file on disk.
+  Same photo forwarded twice = one file on disk. Multi-photo albums
+  (Telegram delivers each photo as a separate message sharing
+  `media_group_id`) coalesce into one logical turn so Claude sees the
+  whole album, not just the first photo.
 - **Prompt-injection hardening.** User text wrapped in `<untrusted-input>`
   with xml-escape; attributes use `&quot;`. A partner typing
   `</channel><system>...` sees it as literal text in the prompt.
@@ -59,6 +69,22 @@ ergonomics while running on top of `claude` CLI.
 - **Step-level streaming replies** (optional per bot). Telegram message
   edits on each assistant step as Claude works through tool calls and
   reasoning.
+- **Crash-resilient handler lifecycle.** Inbound rows track a
+  `handler_status` (received → dispatched → replied | failed |
+  replay-pending). On graceful shutdown, in-flight turns are marked
+  for replay; on next boot the daemon re-dispatches anything within a
+  3-minute window, deduped against already-sent outbound replies.
+  One-shot guard prevents replay loops.
+- **Contextual error replies.** Idle timeouts, wall-clock ceilings, and
+  process crashes each get a distinct user-facing message with a
+  recovery hint, not a generic "something went wrong." Restarts and
+  user-issued aborts don't fire the apology at all.
+- **Abort detection in natural language** (`stop`, `cancel`, `wait`,
+  `стоп`, `отмена`, `хватит`, ...) plus the slash forms (`/stop`,
+  `/abort`, `/cancel`). First-sentence match catches "Stop. I'll ask
+  in another session." too. Scoped to the user's own session, so an
+  abort in one topic never disturbs sibling topics under
+  `isolateTopics`.
 ## Relation to existing projects
@@ -133,7 +159,7 @@ Output:
 ```
 ✅ config — bot found, 4 chat(s), admin=68861949
-✅ db — schema v5
+✅ db — schema v8
 ✅ ipc — socket responsive, bot=my-bot
 ✅ telegram — @my_bot (My Bot)
 ✅ recent-errors — no failure events in last 24h
@@ -325,7 +351,7 @@ foreign-chat clicks are rejected. Default-deny on IPC error.
 ## Development
 ```bash
-npm test        # 336 tests, 72 suites, node:test, no external services
+npm test        # 470 tests, 110 suites, node:test, no external services
 npm start -- --bot my-bot
 npm run split-db -- --config config.json --dry-run
 npm run ipc-smoke -- my-bot
@@ -357,7 +383,11 @@ tests/*.test.js                   node:test
 - Claude Code only. No abstraction over other AIs.
 - macOS LaunchAgent plists included; Linux systemd units are not (easy
   to adapt).
-- No marketplace plugin wrapper yet. See roadmap.
+- On FileVault-on macOS, the daemon's LaunchAgents fire via shumabit's
+  own GUI login — there's no auto-start without the keychain being
+  unlocked, so a one-time Fast User Switch into the daemon's user
+  after each reboot is the supported pattern. See
+  `skills/infrastructure/SKILL.md` in the source repo for details.
 ## Roadmap
@@ -365,8 +395,8 @@ tests/*.test.js                   node:test
   unknown chats.
 - Approvals phase 2: deny-with-reason, per-user quotas.
 - Voice phase 2: `/replay-voice` to re-transcribe with a language hint.
-- `/replay-pending` admin command for crashed-mid-send rows.
-- Marketplace plugin wrapper with slash commands for admin.
+- Per-attachment ops queries wired into `/polygram:*` slash commands
+  (search by chat/kind/time, list failed downloads).
 ## Licence

package/lib/db.js CHANGED Viewed

@@ -337,10 +337,23 @@ function wrap(db) {
     // Dedupe check: did we already send an outbound reply to this inbound?
     // Prevents double-processing if a redelivered/replayed message has
     // already been answered.
+    //
+    // We also count rows in the special 'failed crashed-mid-send' state
+    // as "probably sent" for dedupe. Those rows were created when polygram
+    // crashed AFTER inserting the pending row but before marking it sent
+    // — the API call may or may not have actually reached Telegram. The
+    // boot-time markStalePending sweep flips them to 'failed' with the
+    // 'crashed-mid-send' sentinel error. Treating them as un-replied
+    // (status='sent' only) caused boot replay to re-dispatch and Telegram
+    // delivered the SAME answer twice. Treating them as replied risks the
+    // opposite (the user never got a reply because the API truly failed
+    // before reaching Telegram), but a missed reply is recoverable —
+    // the user resends — while a duplicate reply is not.
     hasOutboundReplyTo({ chat_id, msg_id }) {
       const row = db.prepare(`
         SELECT 1 FROM messages
-         WHERE chat_id = ? AND direction = 'out' AND reply_to_id = ? AND status = 'sent'
+         WHERE chat_id = ? AND direction = 'out' AND reply_to_id = ?
+           AND (status = 'sent' OR (status = 'failed' AND error = 'crashed-mid-send'))
          LIMIT 1
       `).get(chat_id, msg_id);
       return !!row;

package/lib/prompt.js CHANGED Viewed

@@ -32,17 +32,6 @@ function truncateReplyText(s, max = REPLY_TO_MAX_CHARS) {
   return `${s.slice(0, head)}…${s.slice(-tail)}`;
 }
-/**
- * Attachment summary for reply-to (never embed full content).
- */
-function summarizeReplyAttachments(attachmentsJson) {
-  if (!attachmentsJson) return '';
-  let items;
-  try { items = JSON.parse(attachmentsJson); } catch { return ''; }
-  if (!Array.isArray(items) || !items.length) return '';
-  return items.map((a) => `[${a.kind}: ${a.name}]`).join(' ');
-}
 /**
  * Build a reply-to block. Callers pass either:
  *   - { telegram: msg.reply_to_message } (canonical Telegram payload), or
@@ -71,15 +60,22 @@ ${xmlEscape(body)}
   }
   if (dbRow) {
+    // Attachment summary for the reply-to block used to read
+    // dbRow.attachments_json, but that column was dropped in migration
+    // 008. Per-attachment rows live in the `attachments` table now;
+    // building a summary here would need a separate join. For reply-to
+    // context Claude already sees the canonical Telegram payload via
+    // the `telegram` branch above (the DB-row path is only the fallback
+    // for resurrected/replayed messages where the live payload is
+    // unavailable). Skipping the summary here is acceptable — text
+    // alone is enough context for "this is what they replied to".
     const ts = dbRow.ts ? new Date(dbRow.ts).toISOString() : '';
     const text = truncateReplyText(dbRow.text || '');
-    const attachSummary = summarizeReplyAttachments(dbRow.attachments_json);
-    const body = [text, attachSummary].filter(Boolean).join('\n');
     const editedAttr = dbRow.edited_ts
       ? ` edited_ts="${new Date(dbRow.edited_ts).toISOString()}"`
       : '';
     return `<reply_to msg_id="${dbRow.msg_id}" user="${xmlEscape(dbRow.user || 'Unknown')}" ts="${ts}"${editedAttr} source="bridge-db">
-${xmlEscape(body)}
+${xmlEscape(text)}
 </reply_to>`;
   }

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "polygram",
-  "version": "0.6.5",
+  "version": "0.6.7",
   "description": "Telegram daemon for Claude Code that preserves the OpenClaw per-chat session model. Migration path for OpenClaw users moving to Claude Code.",
   "main": "lib/ipc-client.js",
   "bin": {

package/polygram.js CHANGED Viewed

@@ -406,94 +406,125 @@ async function transcribeVoiceAttachments(downloaded, { chatId, msgId, label, bo
   }), 'persist voice transcription');
 }
+// Bounded concurrency for parallel fetches. A 10-photo album used to be
+// 10× per-photo latency (each `await fetch` was serial); now in-flight
+// downloads are capped to a small pool. Telegram's per-bot rate limit is
+// ~30 req/s, so 6 concurrent fetches is comfortably under and keeps the
+// happy path responsive without burning sockets on a 100-file edge case.
+const ATTACHMENT_DOWNLOAD_CONCURRENCY = 6;
+// Per-attachment download. Pure function over (att, deps) → result. Pulled
+// out of the loop so downloadAttachments can run several in parallel.
+async function downloadOneAttachment(bot, token, chatId, msg, chatDir, att) {
+  // Reuse path: row already says downloaded AND the file is on disk.
+  if (att.download_status === 'downloaded' && att.local_path) {
+    try {
+      if (fs.statSync(att.local_path).size > 0) {
+        return { ...att, path: att.local_path, size: att.size_bytes || 0, error: null };
+      }
+    } catch { /* fall through to refetch */ }
+  }
+  try {
+    const fileInfo = await bot.api.getFile(att.file_id);
+    if (!fileInfo?.file_path) throw new Error('no file_path from getFile');
+    const url = `https://api.telegram.org/file/bot${token}/${fileInfo.file_path}`;
+    const res = await fetch(url);
+    if (!res.ok) throw new Error(`HTTP ${res.status}`);
+    // Defense in depth: re-check size at download time. Telegram can
+    // omit file_size from the Message, or its value may not match what
+    // the CDN actually serves. Trust Content-Length and fall back to
+    // buffering with a ceiling.
+    const cl = parseInt(res.headers.get('content-length') || '0', 10);
+    if (cl > MAX_FILE_BYTES) {
+      throw new Error(`content-length ${cl} exceeds per-file cap ${MAX_FILE_BYTES}`);
+    }
+    const buf = Buffer.from(await res.arrayBuffer());
+    if (buf.length > MAX_FILE_BYTES) {
+      throw new Error(`body ${buf.length} bytes exceeds per-file cap ${MAX_FILE_BYTES}`);
+    }
+    const safeName = sanitizeFilename(att.name);
+    // Embed file_unique_id so two attachments with the same msg_id+name
+    // (album, resend) can't silently overwrite each other. Telegram
+    // guarantees file_unique_id is stable and globally unique per file.
+    const uniq = att.file_unique_id ? `-${att.file_unique_id}` : '';
+    const localName = `${msg.message_id}${uniq}-${safeName}`;
+    const localPath = path.join(chatDir, localName);
+    // Atomic write: create a temp with the unique PID+timestamp suffix,
+    // fill it, then rename to the canonical name. A crash mid-write leaves
+    // a `.tmp.*` file (swept later) rather than a truncated canonical file
+    // that the EEXIST dedup branch would happily serve on next request.
+    if (fs.existsSync(localPath)) {
+      console.log(`[attach] ${chatId} ← ${att.kind} ${safeName} (already on disk, reusing)`);
+    } else {
+      const tmpPath = `${localPath}.tmp.${process.pid}.${Date.now()}`;
+      try {
+        fs.writeFileSync(tmpPath, buf, { flag: 'wx' });
+        fs.renameSync(tmpPath, localPath);
+      } catch (e) {
+        // Clean up stray tmp on any failure; if the rename fell through
+        // because another process beat us, EEXIST on the target is fine.
+        try { fs.unlinkSync(tmpPath); } catch {}
+        if (e.code !== 'EEXIST') throw e;
+        console.log(`[attach] ${chatId} ← ${att.kind} ${safeName} (race: already on disk)`);
+      }
+    }
+    console.log(`[attach] ${chatId} ← ${att.kind} ${safeName} (${buf.length} bytes) → ${localPath}`);
+    dbWrite(() => db.markAttachmentDownloaded(att.id, {
+      local_path: localPath, size_bytes: att.size_bytes || buf.length,
+    }), `markAttachmentDownloaded ${att.id}`);
+    return { ...att, path: localPath, size: att.size_bytes || buf.length, error: null };
+  } catch (err) {
+    // Don't drop the attachment silently — push it through with the
+    // failure noted. buildAttachmentTags renders this as
+    // <attachment-failed reason="..." /> so claude tells the user
+    // "I couldn't see your <kind>" instead of pretending it received
+    // text only.
+    //
+    // Token redaction: the fetch URL embeds bot${TOKEN} (Telegram CDN
+    // requirement) and some undici/network error variants stringify
+    // the request including the URL into err.message. Persisting that
+    // raw to attachments.download_error or stderr would leak the bot
+    // token to anyone with DB or log access. Strip any `bot<token>`
+    // pattern from the reason before storing/logging.
+    const raw = (err.message || 'unknown').slice(0, 200);
+    const reason = raw.replace(/bot\d+:[A-Za-z0-9_-]+/g, 'bot<redacted>');
+    console.error(`[attach] download failed for ${att.name}: ${reason}`);
+    dbWrite(() => db.markAttachmentFailed(att.id, reason),
+      `markAttachmentFailed ${att.id}`);
+    return { ...att, path: null, error: reason };
+  }
+}
 // 0.6.0: takes attachment ROW objects from the DB (not raw extracted
 // metadata). Each row has an `id` so we can mark status as we go.
 // On replay: a row with status='downloaded' and a local_path that's
 // still on disk is reused without re-fetching. Anything else (failed,
 // missing file, never downloaded) hits Telegram's CDN.
+//
+// 0.6.7: parallel fetches with bounded concurrency. The inner work is
+// stateless per-attachment (only writes go to DB / disk via paths
+// keyed on file_unique_id, so two parallel downloads can't collide).
+// Order of `results` is preserved by writing into a fixed-size array
+// at the original index — important so the prompt sees attachments in
+// the same order the user sent them in an album.
 async function downloadAttachments(bot, token, chatId, msg, rows) {
   if (!rows.length) return [];
   const chatDir = path.join(INBOX_DIR, String(chatId));
   fs.mkdirSync(chatDir, { recursive: true });
-  const results = [];
-  for (const att of rows) {
-    // Reuse path: row already says downloaded AND the file is on disk.
-    if (att.download_status === 'downloaded' && att.local_path) {
-      try {
-        if (fs.statSync(att.local_path).size > 0) {
-          results.push({
-            ...att,
-            path: att.local_path,
-            size: att.size_bytes || 0,
-            error: null,
-          });
-          continue;
-        }
-      } catch { /* fall through to refetch */ }
-    }
-    try {
-      const fileInfo = await bot.api.getFile(att.file_id);
-      if (!fileInfo?.file_path) throw new Error('no file_path from getFile');
-      const url = `https://api.telegram.org/file/bot${token}/${fileInfo.file_path}`;
-      const res = await fetch(url);
-      if (!res.ok) throw new Error(`HTTP ${res.status}`);
-      // Defense in depth: re-check size at download time. Telegram can
-      // omit file_size from the Message, or its value may not match what
-      // the CDN actually serves. Trust Content-Length and fall back to
-      // buffering with a ceiling.
-      const cl = parseInt(res.headers.get('content-length') || '0', 10);
-      if (cl > MAX_FILE_BYTES) {
-        throw new Error(`content-length ${cl} exceeds per-file cap ${MAX_FILE_BYTES}`);
-      }
-      const buf = Buffer.from(await res.arrayBuffer());
-      if (buf.length > MAX_FILE_BYTES) {
-        throw new Error(`body ${buf.length} bytes exceeds per-file cap ${MAX_FILE_BYTES}`);
-      }
-      const safeName = sanitizeFilename(att.name);
-      // Embed file_unique_id so two attachments with the same msg_id+name
-      // (album, resend) can't silently overwrite each other. Telegram
-      // guarantees file_unique_id is stable and globally unique per file.
-      const uniq = att.file_unique_id ? `-${att.file_unique_id}` : '';
-      const localName = `${msg.message_id}${uniq}-${safeName}`;
-      const localPath = path.join(chatDir, localName);
-      // Atomic write: create a temp with the unique PID+timestamp suffix,
-      // fill it, then rename to the canonical name. A crash mid-write leaves
-      // a `.tmp.*` file (swept later) rather than a truncated canonical file
-      // that the EEXIST dedup branch would happily serve on next request.
-      if (fs.existsSync(localPath)) {
-        console.log(`[attach] ${chatId} ← ${att.kind} ${safeName} (already on disk, reusing)`);
-      } else {
-        const tmpPath = `${localPath}.tmp.${process.pid}.${Date.now()}`;
-        try {
-          fs.writeFileSync(tmpPath, buf, { flag: 'wx' });
-          fs.renameSync(tmpPath, localPath);
-        } catch (e) {
-          // Clean up stray tmp on any failure; if the rename fell through
-          // because another process beat us, EEXIST on the target is fine.
-          try { fs.unlinkSync(tmpPath); } catch {}
-          if (e.code !== 'EEXIST') throw e;
-          console.log(`[attach] ${chatId} ← ${att.kind} ${safeName} (race: already on disk)`);
-        }
+  const results = new Array(rows.length);
+  let cursor = 0;
+  const workers = Array.from(
+    { length: Math.min(ATTACHMENT_DOWNLOAD_CONCURRENCY, rows.length) },
+    async () => {
+      while (true) {
+        const idx = cursor++;
+        if (idx >= rows.length) return;
+        results[idx] = await downloadOneAttachment(bot, token, chatId, msg, chatDir, rows[idx]);
       }
-      results.push({ ...att, path: localPath, size: att.size_bytes || buf.length, error: null });
-      console.log(`[attach] ${chatId} ← ${att.kind} ${safeName} (${buf.length} bytes) → ${localPath}`);
-      dbWrite(() => db.markAttachmentDownloaded(att.id, {
-        local_path: localPath, size_bytes: att.size_bytes || buf.length,
-      }), `markAttachmentDownloaded ${att.id}`);
-    } catch (err) {
-      // Don't drop the attachment silently — push it through with the
-      // failure noted. buildAttachmentTags renders this as
-      // <attachment-failed reason="..." /> so claude tells the user
-      // "I couldn't see your <kind>" instead of pretending it received
-      // text only.
-      const reason = (err.message || 'unknown').slice(0, 200);
-      console.error(`[attach] download failed for ${att.name}: ${reason}`);
-      results.push({ ...att, path: null, error: reason });
-      dbWrite(() => db.markAttachmentFailed(att.id, reason),
-        `markAttachmentFailed ${att.id}`);
-    }
-  }
+    },
+  );
+  await Promise.all(workers);
   return results;
 }
@@ -1495,7 +1526,19 @@ async function handleMessage(sessionKey, chatId, msg, bot) {
         chat_id: chatId, text: `Attachment(s) skipped: ${summary.slice(0, 300)}`,
         ...replyOpts(threadId),
       }, { source: 'attachment-skipped', botName: BOT_NAME });
-    } catch {}
+    } catch (err) {
+      // Surface the failure: claude is about to reply as if the photo
+      // was processed (because filterAttachments dropped it before
+      // download), and the user would otherwise have no signal that
+      // their attachment was rejected. They'd assume claude saw it
+      // and is just answering oddly.
+      console.error(`[${label}] failed to notify user of skipped attachments: ${err.message}`);
+      dbWrite(() => db.logEvent('attachment-skip-notice-failed', {
+        chat_id: chatId, msg_id: msg.message_id,
+        error: err.message?.slice(0, 200),
+        rejected_count: rejected.length,
+      }), 'log attachment-skip-notice-failed');
+    }
   }
   await transcribeVoiceAttachments(downloaded, {