npm - memex-mvp - Versions diffs - 0.8.1 → 0.9.0 - Mend

memex-mvp 0.8.1 → 0.9.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

package/HELP.md +18 -3
package/README.md +1 -1
package/README.ru.md +2 -1
package/lib/parse-telegram-html.js +358 -0
package/package.json +2 -2
package/server.js +109 -4
package/skills/install-memex/SKILL.md +24 -7
package/skills/install-memex/examples.md +13 -13

package/HELP.md CHANGED Viewed

@@ -390,9 +390,24 @@ npx memex-sync scan
 ### Telegram-чаты не появляются
-1. В Telegram **Desktop** (не mobile!): чат → меню → **Export chat history** → **Format: JSON**
-2. Кинь `result.json` в `~/.memex/inbox/`
-3. Memex подхватит автоматически за ~1.5 сек
+В Telegram **Desktop** (не mobile!): чат → меню (⋮ или ☰) → **Export chat history**.
+**memex поддерживает оба формата экспорта (v0.9+):**
+- **«Machine-readable JSON»** ← рекомендуется. Один файл `result.json`, чисто ингестится.
+- **«HTML»** ← тоже работает (с v0.9+). Получишь директорию `ChatExport_<chat>_<date>/` — кинь её **целиком** в inbox, memex авто-парсит.
+Шаги:
+1. Выбери формат (JSON — короче путь, HTML — если уже экспортировано).
+2. Сохрани в **`~/.memex/inbox/`** — для JSON это `result.json`, для HTML — вся директория `ChatExport_…`.
+3. Memex подхватит автоматически за ~1.5 секунды. Файл/директория уедет в `~/.memex/data/conversations/telegram/` (или `…/telegram-html/`) после ingest'a.
+Если ингест не произошёл — проверь `~/.memex/data/memex.log`. Для HTML-export'а парсер пишет actionable error если что-то не так (формат изменился у Telegram, директория повреждена, и т.д.).
+**Что НЕ работает:**
+- ❌ Мобильный экспорт (Telegram mobile не экспортирует структурированно)
+- ❌ Скриншоты чатов
+- ❌ Только-медиа чаты (без текста парсятся как `[photo]`/`[voice]` placeholders, но без content)
 ### Хочу подключить новый Obsidian-vault

package/README.md CHANGED Viewed

@@ -172,7 +172,7 @@ Perplexity threads need to be made **Public** in the Share dialog first — meme
 | Cursor IDE chats      | Auto: reads Cursor's local SQLite session store                |
 | Continue / Zed        | Auto: filesystem watchers per platform                         |
 | Obsidian notes        | Auto: per-vault markdown watcher                               |
-| Telegram exports      | Manual: drop `result.json` (Telegram Desktop) into `~/.memex/inbox/` |
+| Telegram exports      | Manual: drop `result.json` (Telegram Desktop → Export → JSON) into `~/.memex/inbox/`. v0.9+: also accepts the **HTML** export directory (`ChatExport_*/`) — drop the whole folder, memex parses it. |
 | Telegram (live)       | Run [`memex-bot`](bot/README.md) — captures messages you send/forward to your private bot |
 | **Web pages, AI chat shares, pasted text** | From any MCP agent: *"save https://... to memex"*. Agent fetches; memex stores verbatim. Cloudflare-protected pages (Perplexity, npm.com, Twitter, Medium, …) handled via the agent's r.jina.ai fallback. See [HELP.md §8](HELP.md) |

package/README.ru.md CHANGED Viewed

@@ -224,7 +224,8 @@ which node  # → путь до бинарника node (например /Users
 | **Claude Cowork** | `cowork-*.jsonl` (через filename prefix), включая subagents | ✅ работает |
 | **Cursor IDE** (Composer + Chat) | SQLite `state.vscdb` в `~/Library/Application Support/Cursor/` | ✅ работает (poll каждые 5 мин) |
 | **Obsidian** vault notes | `.md` файлы + YAML frontmatter | ✅ работает (FSEvents, hash-based dedupe) |
-| **Telegram** | `result.json` из Desktop export | ✅ работает |
+| **Telegram (JSON export)** | `result.json` из Desktop export | ✅ работает |
+| **Telegram (HTML export)** | директория `ChatExport_*` из Desktop (v0.9+) | ✅ работает — кидай всю папку в inbox |
 | **Telegram (live)** | бот `memex-bot` ловит твои сообщения / форварды | ✅ работает |
 | **Web-страницы, AI-share'ы, paste'ы** | `memex_store_document` — агент fetch'ит, memex хранит verbatim (v0.6+) | ✅ работает |
 | Claude.ai web export | будет в v0.7 | — |

package/lib/parse-telegram-html.js ADDED Viewed

@@ -0,0 +1,358 @@
+/**
+ * Telegram Desktop HTML export → Telegram-JSON-shape converter.
+ *
+ * Telegram Desktop offers two export formats:
+ *   - "Machine-readable JSON"  — what memex's importTelegram expects
+ *   - "Human-readable HTML"    — what many users pick by default
+ *
+ * Users frequently export as HTML by accident (often the default in the
+ * Telegram UI), then memex's inbox watcher silently ignores the dropped
+ * directory. This module makes HTML work: parse → emit the same shape
+ * importTelegram already understands.
+ *
+ * Telegram's HTML export is reasonably stable:
+ *
+ *   ChatExport_<chat-title>_<date>/
+ *     ├── messages.html     (or messages.htm — chunked: messages2, messages3, …)
+ *     ├── photos/
+ *     ├── files/
+ *     ├── stickers/
+ *     └── voice_messages/
+ *
+ * Each messages*.html has structure:
+ *
+ *   <div class="message default clearfix" id="message12345">
+ *     <div class="body">
+ *       <div class="from_name"> ↳ Sender Name </div>      (may be absent on "joined" messages)
+ *       <div class="text"> message text </div>
+ *       <div class="pull_right date details" title="2024-01-01 14:23:45 UTC+03:00">14:23</div>
+ *     </div>
+ *   </div>
+ *
+ *   Joined message = same sender as previous, has class "joined", no from_name.
+ *   Service message = class "service" (joined chat, name change, …) — we skip these.
+ *   Forwarded = "forwarded body" wrapping the message body.
+ *   Reply = "reply_to details" sibling.
+ *
+ * We use regex-based parsing (no DOM dependency) because Telegram's class
+ * names are stable and we control which fields we care about. If Telegram
+ * radically changes the schema, parser breaks loudly (returns 0 messages
+ * + clear log) rather than silently corrupting.
+ */
+import { readFileSync, existsSync, readdirSync, statSync } from 'node:fs';
+import { join, basename, dirname } from 'node:path';
+/**
+ * Detect if a given path is a Telegram HTML export.
+ * Accepts both a directory (most common — ChatExport_xxx/) and a bare
+ * messages.html file (rare — user dropped just the one file).
+ *
+ * Returns { type: 'dir' | 'file' | null, htmlFiles: string[] }
+ *   null type means "not a Telegram HTML export"
+ */
+export function detectTelegramHtml(path) {
+  if (!existsSync(path)) return { type: null, htmlFiles: [] };
+  const stats = statSync(path);
+  // Directory case: look for messages*.html inside
+  if (stats.isDirectory()) {
+    let entries = [];
+    try { entries = readdirSync(path); } catch (_) { return { type: null, htmlFiles: [] }; }
+    const htmlFiles = entries
+      .filter((f) => /^messages\d*\.html?$/i.test(f))
+      .map((f) => join(path, f));
+    if (htmlFiles.length === 0) return { type: null, htmlFiles: [] };
+    // Verify the first one contains Telegram-shaped markers
+    const head = safeReadHead(htmlFiles[0]);
+    if (!looksLikeTelegram(head)) return { type: null, htmlFiles: [] };
+    // Sort chunks: messages.html < messages2.html < messages3.html …
+    htmlFiles.sort(numericChunkSort);
+    return { type: 'dir', htmlFiles };
+  }
+  // Single file case: must be messages*.html
+  if (stats.isFile() && /\.html?$/i.test(path) && /messages\d*\.html?$/i.test(basename(path))) {
+    const head = safeReadHead(path);
+    if (!looksLikeTelegram(head)) return { type: null, htmlFiles: [] };
+    return { type: 'file', htmlFiles: [path] };
+  }
+  return { type: null, htmlFiles: [] };
+}
+function safeReadHead(file, bytes = 8192) {
+  try {
+    return readFileSync(file, 'utf-8').slice(0, bytes);
+  } catch (_) {
+    return '';
+  }
+}
+function looksLikeTelegram(head) {
+  // Reliable markers in Telegram Desktop HTML exports
+  return /class="page_wrap"/.test(head) ||
+         /class="page_body chat_page"/.test(head) ||
+         (/class="from_name"/.test(head) && /class="text"/.test(head));
+}
+function numericChunkSort(a, b) {
+  const numA = parseInt((basename(a).match(/messages(\d*)\.html?/i) || [, '0'])[1] || '0', 10);
+  const numB = parseInt((basename(b).match(/messages(\d*)\.html?/i) || [, '0'])[1] || '0', 10);
+  return numA - numB;
+}
+/**
+ * Strip HTML tags and decode common entities → plain text.
+ * Conservative: preserves newlines from <br>, paragraph breaks from </div>.
+ */
+function htmlToText(html) {
+  if (!html) return '';
+  let out = String(html);
+  // Convert breaks to newlines BEFORE stripping tags
+  out = out.replace(/<br\s*\/?>/gi, '\n');
+  out = out.replace(/<\/p>/gi, '\n\n');
+  out = out.replace(/<\/div>/gi, '\n');
+  // Drop all remaining tags
+  out = out.replace(/<[^>]+>/g, '');
+  // Decode common entities
+  out = out
+    .replace(/&nbsp;/g, ' ')
+    .replace(/&amp;/g, '&')
+    .replace(/&lt;/g, '<')
+    .replace(/&gt;/g, '>')
+    .replace(/&quot;/g, '"')
+    .replace(/&apos;/g, "'")
+    .replace(/&#39;/g, "'")
+    .replace(/&#x([0-9a-f]+);/gi, (_, hex) => String.fromCodePoint(parseInt(hex, 16)))
+    .replace(/&#(\d+);/g, (_, dec) => String.fromCodePoint(parseInt(dec, 10)));
+  // Collapse 3+ blank lines, trim
+  out = out.replace(/\n{3,}/g, '\n\n').trim();
+  return out;
+}
+/**
+ * Parse a Telegram date title into Unix timestamp.
+ * Format: "2024-01-01 14:23:45 UTC+03:00" (or "UTC-04:00", etc.)
+ * Returns { tsUnix, isoString } or null if unparseable.
+ */
+function parseTelegramDate(title) {
+  if (!title) return null;
+  const m = title.match(/^(\d{4})-(\d{2})-(\d{2})\s+(\d{2}):(\d{2}):(\d{2})(?:\s+UTC([+-])(\d{2}):(\d{2}))?$/);
+  if (!m) return null;
+  const [, y, mo, d, h, mi, s, sign, oh, om] = m;
+  // Construct an ISO 8601 string with the explicit offset (or UTC if absent)
+  const offset = sign ? `${sign}${oh}:${om}` : 'Z';
+  const iso = `${y}-${mo}-${d}T${h}:${mi}:${s}${offset}`;
+  const date = new Date(iso);
+  if (isNaN(date.getTime())) return null;
+  return {
+    tsUnix: Math.floor(date.getTime() / 1000),
+    isoString: iso.replace(/[+-]\d{2}:\d{2}$/, '').replace('Z', ''),
+  };
+}
+/**
+ * Parse a single message div (raw HTML segment).
+ * Returns null for service messages (we skip those) or messages with no text.
+ */
+function parseMessageDiv(messageHtml, lastSender) {
+  // Skip service messages outright
+  if (/class="message service\b/.test(messageHtml)) return null;
+  // Extract message id from outer div: id="message12345"
+  const idMatch = messageHtml.match(/id="message(\d+)"/);
+  const msgId = idMatch ? idMatch[1] : null;
+  if (!msgId) return null;
+  const isJoined = /class="message [^"]*joined/.test(messageHtml);
+  // Forwarded marker
+  const isForwarded = /class="forwarded body"/.test(messageHtml);
+  let forwardedFrom = null;
+  if (isForwarded) {
+    const fwdM = messageHtml.match(/class="forwarded[^"]*"[\s\S]*?<div class="from_name"[^>]*>\s*([\s\S]*?)\s*<\/div>/);
+    if (fwdM) {
+      forwardedFrom = htmlToText(fwdM[1]).replace(/^Forwarded from:?\s*/i, '').trim();
+    }
+  }
+  // Sender (from_name) — absent on joined messages
+  let fromName = null;
+  const fromM = messageHtml.match(/<div class="from_name"[^>]*>\s*([\s\S]*?)\s*<\/div>/);
+  if (fromM && !isForwarded) {
+    fromName = htmlToText(fromM[1]).trim();
+  }
+  // If joined, inherit lastSender; otherwise use parsed or fallback
+  if (!fromName && isJoined && lastSender) fromName = lastSender;
+  if (!fromName) fromName = 'Unknown';
+  // Date — title attribute on `.date.details`
+  let date = null;
+  const dateM = messageHtml.match(/class="[^"]*\bdate details[^"]*"\s+title="([^"]+)"/);
+  if (dateM) date = parseTelegramDate(dateM[1]);
+  // Main text — last `<div class="text">…</div>` inside body (forwards may have one earlier)
+  let text = '';
+  const textMatches = [...messageHtml.matchAll(/<div class="text"[^>]*>([\s\S]*?)<\/div>(?=\s*(?:<div class="(?!text)|<\/div>|<a class="|$))/g)];
+  if (textMatches.length > 0) {
+    // Use last one (the actual message body, after any quoted/forwarded preamble)
+    text = htmlToText(textMatches[textMatches.length - 1][1]);
+  }
+  // Reply marker — include as prefix so it's searchable but not lost
+  const replyM = messageHtml.match(/class="reply_to details"[^>]*>([\s\S]*?)<\/div>/);
+  if (replyM) {
+    const replyTxt = htmlToText(replyM[1]).replace(/^In reply to\s+/i, '').trim();
+    if (replyTxt) text = `↩ Reply: ${replyTxt}\n\n${text}`;
+  }
+  // Photo / media — if no text, note the media presence so the row isn't lost.
+  // Use word-boundary regexes since class attrs like "photo_wrap clearfix pull_left"
+  // wouldn't match a strict `class="photo_wrap"` pattern.
+  if (!text) {
+    if (/class="[^"]*\bphoto_wrap\b/.test(messageHtml)) text = '[photo]';
+    else if (/class="[^"]*\bmedia_voice_message\b/.test(messageHtml)) text = '[voice message]';
+    else if (/class="[^"]*\bmedia_video_file\b/.test(messageHtml)) text = '[video]';
+    else if (/class="[^"]*\bmedia_audio_file\b/.test(messageHtml)) text = '[audio]';
+    else if (/class="[^"]*\bmedia_file\b/.test(messageHtml)) text = '[file]';
+    else if (/class="[^"]*\bsticker\b/.test(messageHtml)) text = '[sticker]';
+    else return null;  // Truly empty — skip
+  }
+  // Build the message object in the shape importTelegram expects
+  // (date and date_unixtime are required by the importer)
+  const isoDate = date ? date.isoString : null;
+  const tsUnix = date ? date.tsUnix : 0;
+  return {
+    id: parseInt(msgId, 10),
+    type: 'message',
+    date: isoDate || '1970-01-01T00:00:00',
+    date_unixtime: tsUnix > 0 ? String(tsUnix) : '0',
+    from: fromName,
+    from_id: fromName ? `user_html_${slugify(fromName)}` : 'unknown',
+    text: text,
+    ...(forwardedFrom ? { forwarded_from: forwardedFrom } : {}),
+  };
+}
+function slugify(s) {
+  return String(s).toLowerCase().replace(/[^a-z0-9]+/g, '_').replace(/^_+|_+$/g, '').slice(0, 40) || 'anon';
+}
+/**
+ * Extract chat title from messages.html (or first chunk).
+ * Falls back to directory name basename, then "Telegram chat".
+ */
+function extractChatTitle(htmlContent, fallbackPath) {
+  // Try the <title>...</title>
+  const titleM = htmlContent.match(/<title>\s*([^<]+?)\s*<\/title>/i);
+  if (titleM) {
+    let t = titleM[1].trim();
+    // Telegram titles often look like "Alice — Chat Export"
+    t = t.replace(/\s*[—-]\s*(Chat Export|Telegram).*$/i, '').trim();
+    if (t && t !== 'Telegram') return t;
+  }
+  // Try the page_header text
+  const headerM = htmlContent.match(/<div class="text bold"[^>]*>\s*([\s\S]*?)\s*<\/div>/);
+  if (headerM) {
+    const t = htmlToText(headerM[1]).trim();
+    if (t) return t;
+  }
+  // Fallback: dirname of the parent ChatExport_xxx folder
+  if (fallbackPath) {
+    const parent = basename(dirname(fallbackPath));
+    if (parent && parent.startsWith('ChatExport')) {
+      return parent.replace(/^ChatExport_?/, '').replace(/_/g, ' ').trim() || 'Telegram chat';
+    }
+  }
+  return 'Telegram chat';
+}
+/**
+ * Main entrypoint. Parse a Telegram HTML export path → return an object
+ * shaped like a Telegram JSON export, ready for importTelegram().
+ *
+ * Returns null if path isn't a valid Telegram HTML export.
+ *
+ * Object shape:
+ *   {
+ *     personal_information: { user_id: "" },
+ *     chats: {
+ *       list: [{
+ *         id: <stable hash of chat title>,
+ *         name: <chat title>,
+ *         type: "personal_chat",
+ *         messages: [{ id, type, date, date_unixtime, from, from_id, text, … }, …]
+ *       }]
+ *     }
+ *   }
+ */
+export function parseTelegramHtmlExport(path, opts = {}) {
+  const detection = detectTelegramHtml(path);
+  if (!detection.type) return null;
+  if (detection.htmlFiles.length === 0) return null;
+  let allMessages = [];
+  let chatTitle = null;
+  let lastSender = null;
+  for (const htmlPath of detection.htmlFiles) {
+    let content;
+    try { content = readFileSync(htmlPath, 'utf-8'); }
+    catch (_) { continue; }
+    if (!chatTitle) chatTitle = extractChatTitle(content, htmlPath);
+    // Split into per-message blocks. The reliable boundary is the
+    // opening `<div class="message ` of the next message.
+    // Use a tolerant regex that handles the message default / joined variants.
+    const messageBlocks = [...content.matchAll(/<div class="message [^"]*"[\s\S]*?(?=<div class="message [^"]*"|<div class="page_footer"|<\/body>)/g)];
+    for (const blockMatch of messageBlocks) {
+      const msg = parseMessageDiv(blockMatch[0], lastSender);
+      if (msg) {
+        allMessages.push(msg);
+        // Track sender for "joined" continuation messages
+        if (msg.from && msg.from !== 'Unknown') lastSender = msg.from;
+      }
+    }
+  }
+  if (allMessages.length === 0) return null;
+  // Stable chat id: hash of title + first message ts (good enough for dedup)
+  // We use a simple numeric hash so the synthetic chat_id is stable across re-imports.
+  const chatId = stableChatId(chatTitle || 'Telegram chat', allMessages[0]?.date_unixtime || '0');
+  return {
+    personal_information: { user_id: '' },
+    chats: {
+      list: [
+        {
+          id: chatId,
+          name: chatTitle || 'Telegram chat',
+          type: 'personal_chat',
+          messages: allMessages,
+        },
+      ],
+    },
+    _source: {
+      format: 'telegram-html',
+      original_path: path,
+      chunks: detection.htmlFiles.length,
+      messages_total: allMessages.length,
+    },
+  };
+}
+function stableChatId(title, firstTs) {
+  let hash = 0;
+  const key = title + ':' + firstTs;
+  for (let i = 0; i < key.length; i++) {
+    hash = ((hash << 5) - hash) + key.charCodeAt(i);
+    hash |= 0;
+  }
+  return Math.abs(hash);
+}

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "memex-mvp",
-  "version": "0.8.1",
+  "version": "0.9.0",
   "description": "Local-first MCP server for cross-agent AI memory. One SQLite + FTS5 corpus across Claude Code, Cowork, Cursor, Continue, Zed, Obsidian, and Telegram — passively captured, verbatim, searchable from any MCP-compatible client.",
   "type": "module",
   "main": "server.js",
@@ -26,7 +26,7 @@
     "sync": "node ingest.js",
     "ingest": "node ingest.js",
     "bot": "node bot/index.js",
-    "test": "node test/parser.test.js && node test/bot-inbox.test.js && node test/search-sort.test.js && node test/store-document.test.js && node test/cli.test.js && node test/hook.test.js",
+    "test": "node test/parser.test.js && node test/bot-inbox.test.js && node test/search-sort.test.js && node test/store-document.test.js && node test/cli.test.js && node test/hook.test.js && node test/telegram-html.test.js",
     "prepublishOnly": "npm test"
   },
   "engines": {

package/server.js CHANGED Viewed

@@ -49,6 +49,10 @@ import {
 } from './lib/store-doc/canonicalize.js';
 import { detectIssues, isBlocked } from './lib/store-doc/detect.js';
 import { extractTitle } from './lib/store-doc/extract-title.js';
+import {
+  detectTelegramHtml,
+  parseTelegramHtmlExport,
+} from './lib/parse-telegram-html.js';
 import { createHash } from 'node:crypto';
 import { runCli, CLI_SUBCOMMAND_NAMES } from './lib/cli/index.js';
@@ -325,9 +329,17 @@ const insertImport = db.prepare(`
 // -------------------- Importers --------------------
-/** Telegram Desktop JSON export (single chat or all_chats). */
-function importTelegram(filePath) {
-  const raw = JSON.parse(readFileSync(filePath, 'utf-8'));
+/**
+ * Telegram Desktop export importer. Accepts:
+ *   - filePath (string) — path to result.json
+ *   - rawObject (object) — already-parsed export, e.g. from parseTelegramHtmlExport
+ *
+ * Returns total imported message count.
+ */
+function importTelegram(filePathOrRaw) {
+  const raw = typeof filePathOrRaw === 'string'
+    ? JSON.parse(readFileSync(filePathOrRaw, 'utf-8'))
+    : filePathOrRaw;
   // Telegram Desktop produces either a single chat object or { chats: { list: [...] } }
   const chats = Array.isArray(raw.chats?.list)
@@ -670,9 +682,83 @@ function resolvePendingParents() {
 }
 /** Auto-detect format and import */
+/**
+ * Try to import a path as a Telegram HTML export (directory or single file).
+ * Returns imported message count, or 0 if not an HTML export.
+ *
+ * Side effects on success:
+ *   - Inserts an `imports` row tagged "telegram-html"
+ *   - Moves the source directory/file to ~/.memex/data/conversations/telegram-html/
+ *
+ * If it LOOKS like a Telegram HTML export but parsing failed, prints an
+ * actionable error pointing the user at the Desktop export menu — instead
+ * of silently ignoring. This was Tester 5's friction point.
+ */
+function importTelegramHtmlIfMatches(path) {
+  const detection = detectTelegramHtml(path);
+  if (!detection.type) return 0;
+  let parsed;
+  try {
+    parsed = parseTelegramHtmlExport(path);
+  } catch (err) {
+    log('telegram-html parse error:', basename(path), err.message);
+    parsed = null;
+  }
+  if (!parsed || parsed.chats.list[0].messages.length === 0) {
+    // Looked like Telegram HTML (had markers) but extraction yielded nothing.
+    // Print actionable error rather than silent ignore.
+    log('');
+    log('⚠ Detected Telegram HTML export at ' + basename(path) + ' but extracted 0 messages.');
+    log('  This usually means Telegram changed the HTML format, or the export is partial.');
+    log('  EASIEST FIX — re-export as JSON:');
+    log('    1. Open Telegram Desktop');
+    log('    2. Click the chat → ⋮ menu → "Export chat history"');
+    log('    3. Format: change "HTML" to "Machine-readable JSON"');
+    log('    4. Drop the new result.json into ~/.memex/inbox/');
+    log('');
+    log('  HTML export will be left in place — feel free to delete it once JSON works.');
+    return 0;
+  }
+  let imported = 0;
+  try {
+    imported = importTelegram(parsed);
+  } catch (err) {
+    log('telegram-html import error:', err.message);
+    return 0;
+  }
+  if (imported > 0) {
+    insertImport.run(
+      basename(path),
+      'telegram-html',
+      Math.floor(Date.now() / 1000),
+      imported
+    );
+    // Archive: move the whole directory (or file) so the watcher doesn't re-process
+    const targetDir = join(ARCHIVE, 'telegram-html');
+    mkdirSync(targetDir, { recursive: true });
+    const target = join(targetDir, basename(path));
+    try {
+      renameSync(path, target);
+    } catch (_) {}
+    log(`imported ${imported} messages from ${basename(path)} (telegram-html, ${detection.htmlFiles.length} chunk(s))`);
+  }
+  return imported;
+}
 function importFile(filePath) {
   if (!existsSync(filePath)) return 0;
   const stats = statSync(filePath);
+  // Telegram HTML export — can be either a directory (ChatExport_xxx/)
+  // or a bare messages.html file. We accept both. Detected via marker
+  // patterns inside the HTML, not file extension alone.
+  if (stats.isDirectory()) {
+    return importTelegramHtmlIfMatches(filePath);
+  }
   if (!stats.isFile()) return 0;
   const lower = filePath.toLowerCase();
@@ -692,6 +778,10 @@ function importFile(filePath) {
         imported = importTelegram(filePath);
         source = 'telegram';
       }
+    } else if (/\.html?$/i.test(lower)) {
+      // Single-file HTML drop (rare — usually a directory)
+      imported = importTelegramHtmlIfMatches(filePath);
+      if (imported > 0) source = 'telegram';
     } else if (lower.endsWith('.jsonl')) {
       // Filename prefix tells us which product the session came from.
       // cowork-   → Claude Cowork (incl. its subagents)
@@ -739,15 +829,30 @@ function importFile(filePath) {
 // dropping a partial file by hand — the watcher must not race the writer and
 // move the unfinished tmp into archive, which used to spam ENOENT into the
 // daemon's rename and corrupt the import accounting.
+// Watch INBOX top-level. Files: chokidar 'add' event. Directories:
+// chokidar 'addDir' event (v0.9+ inbox can also receive Telegram HTML
+// export DIRECTORIES like ChatExport_xxx/, not just JSON/JSONL files).
+//
+// `depth: 0` means we only get top-level entries — we DON'T want every
+// .html chunk inside ChatExport_xxx to fire 'add' separately. The
+// directory drop itself is what we react to; the HTML parser walks
+// inside.
 chokidar
   .watch(INBOX, {
     ignoreInitial: false,
     ignored: /\.tmp$/,
     awaitWriteFinish: { stabilityThreshold: 800 },
+    depth: 0,
   })
   .on('add', (filePath) => {
-    log('inbox detected:', basename(filePath));
+    log('inbox detected (file):', basename(filePath));
     importFile(filePath);
+  })
+  .on('addDir', (dirPath) => {
+    // Skip the inbox itself
+    if (dirPath === INBOX) return;
+    log('inbox detected (dir):', basename(dirPath));
+    importFile(dirPath);
   });
 // -------------------- MCP Server --------------------

package/skills/install-memex/SKILL.md CHANGED Viewed

@@ -58,14 +58,19 @@ Do these in order. Show each command before running it. Stop and ask if anything
 npm install -g memex-mvp
 ```
-If you get `EACCES` (macOS system Node), tell the user to choose:
+If you get `EACCES` (macOS system Node), **recommend Option B (permanent prefix fix) by default** and proceed unless the user explicitly chooses sudo.
-**Option A** — one-shot sudo:
-```sh
-sudo npm install -g memex-mvp
-```
+Say to the user:
+> "Hit EACCES — your Node is installed in a system directory that npm can't write to without admin rights. Two fixes:
+>
+> **A)** Quick: `sudo npm install -g memex-mvp` — one-time, requires your password
+> **B)** Permanent fix: I'll move npm's install location to your home directory. After this, no `npm install -g` will ever ask for sudo again, for any package. Five commands, one-time.
+>
+> I recommend **B** — it's a permanent improvement that benefits all your future Node tools, not just memex. Proceed with B?"
+If the user agrees (or says anything like "ok", "yes", "default", "go", "B") — proceed with B without further questions:
-**Option B** — fix prefix permanently (better long-term):
 ```sh
 mkdir -p ~/.npm-global
 npm config set prefix ~/.npm-global
@@ -74,7 +79,19 @@ source ~/.zshrc
 npm install -g memex-mvp
 ```
-**Ask which the user prefers — don't guess.**
+If the user **explicitly** picks A (or says "sudo", "fast", "quick"), use sudo:
+```sh
+sudo npm install -g memex-mvp
+```
+After the install completes, verify with `which memex` (should print an absolute path) and `memex --version` (should print the version).
+**Important if Option B was used + user already had memex from a prior sudo install:** the old sudo-installed copy in `/usr/local/lib/node_modules/memex-mvp/` is now orphaned (PATH prefers the new home install). Mention this once:
+> "By the way, you have an older sudo-installed memex copy in `/usr/local/lib/`. It's harmless but takes ~60 MB. Clean it up when convenient: `sudo npm uninstall -g memex-mvp` — not urgent."
+Don't run that uninstall yourself — let the user decide when.
 If `node` or `npm` are missing, STOP and tell the user to install Node.js (recommend nvm.sh or `brew install node`). Don't try to install Node yourself.

package/skills/install-memex/examples.md CHANGED Viewed

@@ -73,7 +73,7 @@ Sample interactions when the user invokes this skill. These show typical happy p
 ---
-## Example 2 · EACCES recovery (macOS system Node)
+## Example 2 · EACCES recovery (macOS system Node) — recommend B by default
 > **User:** install memex
 >
@@ -84,14 +84,18 @@ Sample interactions when the user invokes this skill. These show typical happy p
 > npm error path: '/usr/local/lib/node_modules/memex-mvp'
 > ```
 >
-> Hit a permissions error — this is common on macOS when Node is installed system-wide. Two ways to fix:
+> Hit EACCES — your Node is installed in a system directory that npm can't write to without admin rights. Two fixes:
 >
-> **Option A** (quick, requires sudo password):
-> ```sh
-> sudo npm install -g memex-mvp
-> ```
+> **A)** Quick: `sudo npm install -g memex-mvp` — one-time, requires your password
+>
+> **B)** Permanent fix: I'll move npm's install location to your home directory. After this, no `npm install -g` will ever ask for sudo again, for any package. Five commands, one-time.
+>
+> I recommend **B** — it's a permanent improvement that benefits all your future Node tools, not just memex. **Proceed with B?**
+>
+> **User:** ok
+>
+> **Agent:** Running the 5 prefix-fix commands:
 >
-> **Option B** (one-time setup, then future `npm install -g` works without sudo):
 > ```sh
 > mkdir -p ~/.npm-global
 > npm config set prefix ~/.npm-global
@@ -100,11 +104,7 @@ Sample interactions when the user invokes this skill. These show typical happy p
 > npm install -g memex-mvp
 > ```
 >
-> Option B is the standard npm recommendation — solves this for every future global install. Option A is faster if you don't want to touch your shell config. Which do you prefer?
->
-> **User:** B
->
-> **Agent:** [runs the 5 commands in sequence, shows output for each] ✓ Done. Verifying:
+> ✓ Done. Verifying:
 >
 > ```
 > which memex → /Users/macbook/.npm-global/bin/memex
@@ -242,7 +242,7 @@ Sample interactions when the user invokes this skill. These show typical happy p
 - **Always discover first**, then ask for confirmation before any write action
 - **Show every command** before running it; show diffs for config edits
-- **Offer choices** at decision points (EACCES, daemon yes/no) rather than guessing
+- **Recommend the better default** with a clear reason (e.g. Option B for EACCES) instead of forcing the user to weigh both options blind. Fall back to alternatives only if explicitly chosen.
 - **Detect and adapt** to existing installs, missing prerequisites, partial setups
 - **Refuse gracefully** when running in a context where the install can't actually work (web-only agents)