memex-mvp 0.8.1 → 0.9.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/HELP.md CHANGED
@@ -390,9 +390,24 @@ npx memex-sync scan
390
390
 
391
391
  ### Telegram-чаты не появляются
392
392
 
393
- 1. В Telegram **Desktop** (не mobile!): чат → меню → **Export chat history** → **Format: JSON**
394
- 2. Кинь `result.json` в `~/.memex/inbox/`
395
- 3. Memex подхватит автоматически за ~1.5 сек
393
+ В Telegram **Desktop** (не mobile!): чат → меню (⋮ или ☰) → **Export chat history**.
394
+
395
+ **memex поддерживает оба формата экспорта (v0.9+):**
396
+
397
+ - **«Machine-readable JSON»** ← рекомендуется. Один файл `result.json`, чисто ингестится.
398
+ - **«HTML»** ← тоже работает (с v0.9+). Получишь директорию `ChatExport_<chat>_<date>/` — кинь её **целиком** в inbox, memex авто-парсит.
399
+
400
+ Шаги:
401
+ 1. Выбери формат (JSON — короче путь, HTML — если уже экспортировано).
402
+ 2. Сохрани в **`~/.memex/inbox/`** — для JSON это `result.json`, для HTML — вся директория `ChatExport_…`.
403
+ 3. Memex подхватит автоматически за ~1.5 секунды. Файл/директория уедет в `~/.memex/data/conversations/telegram/` (или `…/telegram-html/`) после ingest'a.
404
+
405
+ Если ингест не произошёл — проверь `~/.memex/data/memex.log`. Для HTML-export'а парсер пишет actionable error если что-то не так (формат изменился у Telegram, директория повреждена, и т.д.).
406
+
407
+ **Что НЕ работает:**
408
+ - ❌ Мобильный экспорт (Telegram mobile не экспортирует структурированно)
409
+ - ❌ Скриншоты чатов
410
+ - ❌ Только-медиа чаты (без текста парсятся как `[photo]`/`[voice]` placeholders, но без content)
396
411
 
397
412
  ### Хочу подключить новый Obsidian-vault
398
413
 
package/README.md CHANGED
@@ -172,7 +172,7 @@ Perplexity threads need to be made **Public** in the Share dialog first — meme
172
172
  | Cursor IDE chats | Auto: reads Cursor's local SQLite session store |
173
173
  | Continue / Zed | Auto: filesystem watchers per platform |
174
174
  | Obsidian notes | Auto: per-vault markdown watcher |
175
- | Telegram exports | Manual: drop `result.json` (Telegram Desktop) into `~/.memex/inbox/` |
175
+ | Telegram exports | Manual: drop `result.json` (Telegram Desktop → Export → JSON) into `~/.memex/inbox/`. v0.9+: also accepts the **HTML** export directory (`ChatExport_*/`) — drop the whole folder, memex parses it. |
176
176
  | Telegram (live) | Run [`memex-bot`](bot/README.md) — captures messages you send/forward to your private bot |
177
177
  | **Web pages, AI chat shares, pasted text** | From any MCP agent: *"save https://... to memex"*. Agent fetches; memex stores verbatim. Cloudflare-protected pages (Perplexity, npm.com, Twitter, Medium, …) handled via the agent's r.jina.ai fallback. See [HELP.md §8](HELP.md) |
178
178
 
package/README.ru.md CHANGED
@@ -224,7 +224,8 @@ which node # → путь до бинарника node (например /Users
224
224
  | **Claude Cowork** | `cowork-*.jsonl` (через filename prefix), включая subagents | ✅ работает |
225
225
  | **Cursor IDE** (Composer + Chat) | SQLite `state.vscdb` в `~/Library/Application Support/Cursor/` | ✅ работает (poll каждые 5 мин) |
226
226
  | **Obsidian** vault notes | `.md` файлы + YAML frontmatter | ✅ работает (FSEvents, hash-based dedupe) |
227
- | **Telegram** | `result.json` из Desktop export | ✅ работает |
227
+ | **Telegram (JSON export)** | `result.json` из Desktop export | ✅ работает |
228
+ | **Telegram (HTML export)** | директория `ChatExport_*` из Desktop (v0.9+) | ✅ работает — кидай всю папку в inbox |
228
229
  | **Telegram (live)** | бот `memex-bot` ловит твои сообщения / форварды | ✅ работает |
229
230
  | **Web-страницы, AI-share'ы, paste'ы** | `memex_store_document` — агент fetch'ит, memex хранит verbatim (v0.6+) | ✅ работает |
230
231
  | Claude.ai web export | будет в v0.7 | — |
@@ -0,0 +1,358 @@
1
+ /**
2
+ * Telegram Desktop HTML export → Telegram-JSON-shape converter.
3
+ *
4
+ * Telegram Desktop offers two export formats:
5
+ * - "Machine-readable JSON" — what memex's importTelegram expects
6
+ * - "Human-readable HTML" — what many users pick by default
7
+ *
8
+ * Users frequently export as HTML by accident (often the default in the
9
+ * Telegram UI), then memex's inbox watcher silently ignores the dropped
10
+ * directory. This module makes HTML work: parse → emit the same shape
11
+ * importTelegram already understands.
12
+ *
13
+ * Telegram's HTML export is reasonably stable:
14
+ *
15
+ * ChatExport_<chat-title>_<date>/
16
+ * ├── messages.html (or messages.htm — chunked: messages2, messages3, …)
17
+ * ├── photos/
18
+ * ├── files/
19
+ * ├── stickers/
20
+ * └── voice_messages/
21
+ *
22
+ * Each messages*.html has structure:
23
+ *
24
+ * <div class="message default clearfix" id="message12345">
25
+ * <div class="body">
26
+ * <div class="from_name"> ↳ Sender Name </div> (may be absent on "joined" messages)
27
+ * <div class="text"> message text </div>
28
+ * <div class="pull_right date details" title="2024-01-01 14:23:45 UTC+03:00">14:23</div>
29
+ * </div>
30
+ * </div>
31
+ *
32
+ * Joined message = same sender as previous, has class "joined", no from_name.
33
+ * Service message = class "service" (joined chat, name change, …) — we skip these.
34
+ * Forwarded = "forwarded body" wrapping the message body.
35
+ * Reply = "reply_to details" sibling.
36
+ *
37
+ * We use regex-based parsing (no DOM dependency) because Telegram's class
38
+ * names are stable and we control which fields we care about. If Telegram
39
+ * radically changes the schema, parser breaks loudly (returns 0 messages
40
+ * + clear log) rather than silently corrupting.
41
+ */
42
+
43
+ import { readFileSync, existsSync, readdirSync, statSync } from 'node:fs';
44
+ import { join, basename, dirname } from 'node:path';
45
+
46
+ /**
47
+ * Detect if a given path is a Telegram HTML export.
48
+ * Accepts both a directory (most common — ChatExport_xxx/) and a bare
49
+ * messages.html file (rare — user dropped just the one file).
50
+ *
51
+ * Returns { type: 'dir' | 'file' | null, htmlFiles: string[] }
52
+ * null type means "not a Telegram HTML export"
53
+ */
54
+ export function detectTelegramHtml(path) {
55
+ if (!existsSync(path)) return { type: null, htmlFiles: [] };
56
+ const stats = statSync(path);
57
+
58
+ // Directory case: look for messages*.html inside
59
+ if (stats.isDirectory()) {
60
+ let entries = [];
61
+ try { entries = readdirSync(path); } catch (_) { return { type: null, htmlFiles: [] }; }
62
+ const htmlFiles = entries
63
+ .filter((f) => /^messages\d*\.html?$/i.test(f))
64
+ .map((f) => join(path, f));
65
+ if (htmlFiles.length === 0) return { type: null, htmlFiles: [] };
66
+ // Verify the first one contains Telegram-shaped markers
67
+ const head = safeReadHead(htmlFiles[0]);
68
+ if (!looksLikeTelegram(head)) return { type: null, htmlFiles: [] };
69
+ // Sort chunks: messages.html < messages2.html < messages3.html …
70
+ htmlFiles.sort(numericChunkSort);
71
+ return { type: 'dir', htmlFiles };
72
+ }
73
+
74
+ // Single file case: must be messages*.html
75
+ if (stats.isFile() && /\.html?$/i.test(path) && /messages\d*\.html?$/i.test(basename(path))) {
76
+ const head = safeReadHead(path);
77
+ if (!looksLikeTelegram(head)) return { type: null, htmlFiles: [] };
78
+ return { type: 'file', htmlFiles: [path] };
79
+ }
80
+
81
+ return { type: null, htmlFiles: [] };
82
+ }
83
+
84
+ function safeReadHead(file, bytes = 8192) {
85
+ try {
86
+ return readFileSync(file, 'utf-8').slice(0, bytes);
87
+ } catch (_) {
88
+ return '';
89
+ }
90
+ }
91
+
92
+ function looksLikeTelegram(head) {
93
+ // Reliable markers in Telegram Desktop HTML exports
94
+ return /class="page_wrap"/.test(head) ||
95
+ /class="page_body chat_page"/.test(head) ||
96
+ (/class="from_name"/.test(head) && /class="text"/.test(head));
97
+ }
98
+
99
+ function numericChunkSort(a, b) {
100
+ const numA = parseInt((basename(a).match(/messages(\d*)\.html?/i) || [, '0'])[1] || '0', 10);
101
+ const numB = parseInt((basename(b).match(/messages(\d*)\.html?/i) || [, '0'])[1] || '0', 10);
102
+ return numA - numB;
103
+ }
104
+
105
+ /**
106
+ * Strip HTML tags and decode common entities → plain text.
107
+ * Conservative: preserves newlines from <br>, paragraph breaks from </div>.
108
+ */
109
+ function htmlToText(html) {
110
+ if (!html) return '';
111
+ let out = String(html);
112
+ // Convert breaks to newlines BEFORE stripping tags
113
+ out = out.replace(/<br\s*\/?>/gi, '\n');
114
+ out = out.replace(/<\/p>/gi, '\n\n');
115
+ out = out.replace(/<\/div>/gi, '\n');
116
+ // Drop all remaining tags
117
+ out = out.replace(/<[^>]+>/g, '');
118
+ // Decode common entities
119
+ out = out
120
+ .replace(/&nbsp;/g, ' ')
121
+ .replace(/&amp;/g, '&')
122
+ .replace(/&lt;/g, '<')
123
+ .replace(/&gt;/g, '>')
124
+ .replace(/&quot;/g, '"')
125
+ .replace(/&apos;/g, "'")
126
+ .replace(/&#39;/g, "'")
127
+ .replace(/&#x([0-9a-f]+);/gi, (_, hex) => String.fromCodePoint(parseInt(hex, 16)))
128
+ .replace(/&#(\d+);/g, (_, dec) => String.fromCodePoint(parseInt(dec, 10)));
129
+ // Collapse 3+ blank lines, trim
130
+ out = out.replace(/\n{3,}/g, '\n\n').trim();
131
+ return out;
132
+ }
133
+
134
+ /**
135
+ * Parse a Telegram date title into Unix timestamp.
136
+ * Format: "2024-01-01 14:23:45 UTC+03:00" (or "UTC-04:00", etc.)
137
+ * Returns { tsUnix, isoString } or null if unparseable.
138
+ */
139
+ function parseTelegramDate(title) {
140
+ if (!title) return null;
141
+ const m = title.match(/^(\d{4})-(\d{2})-(\d{2})\s+(\d{2}):(\d{2}):(\d{2})(?:\s+UTC([+-])(\d{2}):(\d{2}))?$/);
142
+ if (!m) return null;
143
+ const [, y, mo, d, h, mi, s, sign, oh, om] = m;
144
+ // Construct an ISO 8601 string with the explicit offset (or UTC if absent)
145
+ const offset = sign ? `${sign}${oh}:${om}` : 'Z';
146
+ const iso = `${y}-${mo}-${d}T${h}:${mi}:${s}${offset}`;
147
+ const date = new Date(iso);
148
+ if (isNaN(date.getTime())) return null;
149
+ return {
150
+ tsUnix: Math.floor(date.getTime() / 1000),
151
+ isoString: iso.replace(/[+-]\d{2}:\d{2}$/, '').replace('Z', ''),
152
+ };
153
+ }
154
+
155
+ /**
156
+ * Parse a single message div (raw HTML segment).
157
+ * Returns null for service messages (we skip those) or messages with no text.
158
+ */
159
+ function parseMessageDiv(messageHtml, lastSender) {
160
+ // Skip service messages outright
161
+ if (/class="message service\b/.test(messageHtml)) return null;
162
+
163
+ // Extract message id from outer div: id="message12345"
164
+ const idMatch = messageHtml.match(/id="message(\d+)"/);
165
+ const msgId = idMatch ? idMatch[1] : null;
166
+ if (!msgId) return null;
167
+
168
+ const isJoined = /class="message [^"]*joined/.test(messageHtml);
169
+
170
+ // Forwarded marker
171
+ const isForwarded = /class="forwarded body"/.test(messageHtml);
172
+ let forwardedFrom = null;
173
+ if (isForwarded) {
174
+ const fwdM = messageHtml.match(/class="forwarded[^"]*"[\s\S]*?<div class="from_name"[^>]*>\s*([\s\S]*?)\s*<\/div>/);
175
+ if (fwdM) {
176
+ forwardedFrom = htmlToText(fwdM[1]).replace(/^Forwarded from:?\s*/i, '').trim();
177
+ }
178
+ }
179
+
180
+ // Sender (from_name) — absent on joined messages
181
+ let fromName = null;
182
+ const fromM = messageHtml.match(/<div class="from_name"[^>]*>\s*([\s\S]*?)\s*<\/div>/);
183
+ if (fromM && !isForwarded) {
184
+ fromName = htmlToText(fromM[1]).trim();
185
+ }
186
+ // If joined, inherit lastSender; otherwise use parsed or fallback
187
+ if (!fromName && isJoined && lastSender) fromName = lastSender;
188
+ if (!fromName) fromName = 'Unknown';
189
+
190
+ // Date — title attribute on `.date.details`
191
+ let date = null;
192
+ const dateM = messageHtml.match(/class="[^"]*\bdate details[^"]*"\s+title="([^"]+)"/);
193
+ if (dateM) date = parseTelegramDate(dateM[1]);
194
+
195
+ // Main text — last `<div class="text">…</div>` inside body (forwards may have one earlier)
196
+ let text = '';
197
+ const textMatches = [...messageHtml.matchAll(/<div class="text"[^>]*>([\s\S]*?)<\/div>(?=\s*(?:<div class="(?!text)|<\/div>|<a class="|$))/g)];
198
+ if (textMatches.length > 0) {
199
+ // Use last one (the actual message body, after any quoted/forwarded preamble)
200
+ text = htmlToText(textMatches[textMatches.length - 1][1]);
201
+ }
202
+
203
+ // Reply marker — include as prefix so it's searchable but not lost
204
+ const replyM = messageHtml.match(/class="reply_to details"[^>]*>([\s\S]*?)<\/div>/);
205
+ if (replyM) {
206
+ const replyTxt = htmlToText(replyM[1]).replace(/^In reply to\s+/i, '').trim();
207
+ if (replyTxt) text = `↩ Reply: ${replyTxt}\n\n${text}`;
208
+ }
209
+
210
+ // Photo / media — if no text, note the media presence so the row isn't lost.
211
+ // Use word-boundary regexes since class attrs like "photo_wrap clearfix pull_left"
212
+ // wouldn't match a strict `class="photo_wrap"` pattern.
213
+ if (!text) {
214
+ if (/class="[^"]*\bphoto_wrap\b/.test(messageHtml)) text = '[photo]';
215
+ else if (/class="[^"]*\bmedia_voice_message\b/.test(messageHtml)) text = '[voice message]';
216
+ else if (/class="[^"]*\bmedia_video_file\b/.test(messageHtml)) text = '[video]';
217
+ else if (/class="[^"]*\bmedia_audio_file\b/.test(messageHtml)) text = '[audio]';
218
+ else if (/class="[^"]*\bmedia_file\b/.test(messageHtml)) text = '[file]';
219
+ else if (/class="[^"]*\bsticker\b/.test(messageHtml)) text = '[sticker]';
220
+ else return null; // Truly empty — skip
221
+ }
222
+
223
+ // Build the message object in the shape importTelegram expects
224
+ // (date and date_unixtime are required by the importer)
225
+ const isoDate = date ? date.isoString : null;
226
+ const tsUnix = date ? date.tsUnix : 0;
227
+
228
+ return {
229
+ id: parseInt(msgId, 10),
230
+ type: 'message',
231
+ date: isoDate || '1970-01-01T00:00:00',
232
+ date_unixtime: tsUnix > 0 ? String(tsUnix) : '0',
233
+ from: fromName,
234
+ from_id: fromName ? `user_html_${slugify(fromName)}` : 'unknown',
235
+ text: text,
236
+ ...(forwardedFrom ? { forwarded_from: forwardedFrom } : {}),
237
+ };
238
+ }
239
+
240
+ function slugify(s) {
241
+ return String(s).toLowerCase().replace(/[^a-z0-9]+/g, '_').replace(/^_+|_+$/g, '').slice(0, 40) || 'anon';
242
+ }
243
+
244
+ /**
245
+ * Extract chat title from messages.html (or first chunk).
246
+ * Falls back to directory name basename, then "Telegram chat".
247
+ */
248
+ function extractChatTitle(htmlContent, fallbackPath) {
249
+ // Try the <title>...</title>
250
+ const titleM = htmlContent.match(/<title>\s*([^<]+?)\s*<\/title>/i);
251
+ if (titleM) {
252
+ let t = titleM[1].trim();
253
+ // Telegram titles often look like "Alice — Chat Export"
254
+ t = t.replace(/\s*[—-]\s*(Chat Export|Telegram).*$/i, '').trim();
255
+ if (t && t !== 'Telegram') return t;
256
+ }
257
+ // Try the page_header text
258
+ const headerM = htmlContent.match(/<div class="text bold"[^>]*>\s*([\s\S]*?)\s*<\/div>/);
259
+ if (headerM) {
260
+ const t = htmlToText(headerM[1]).trim();
261
+ if (t) return t;
262
+ }
263
+ // Fallback: dirname of the parent ChatExport_xxx folder
264
+ if (fallbackPath) {
265
+ const parent = basename(dirname(fallbackPath));
266
+ if (parent && parent.startsWith('ChatExport')) {
267
+ return parent.replace(/^ChatExport_?/, '').replace(/_/g, ' ').trim() || 'Telegram chat';
268
+ }
269
+ }
270
+ return 'Telegram chat';
271
+ }
272
+
273
+ /**
274
+ * Main entrypoint. Parse a Telegram HTML export path → return an object
275
+ * shaped like a Telegram JSON export, ready for importTelegram().
276
+ *
277
+ * Returns null if path isn't a valid Telegram HTML export.
278
+ *
279
+ * Object shape:
280
+ * {
281
+ * personal_information: { user_id: "" },
282
+ * chats: {
283
+ * list: [{
284
+ * id: <stable hash of chat title>,
285
+ * name: <chat title>,
286
+ * type: "personal_chat",
287
+ * messages: [{ id, type, date, date_unixtime, from, from_id, text, … }, …]
288
+ * }]
289
+ * }
290
+ * }
291
+ */
292
+ export function parseTelegramHtmlExport(path, opts = {}) {
293
+ const detection = detectTelegramHtml(path);
294
+ if (!detection.type) return null;
295
+ if (detection.htmlFiles.length === 0) return null;
296
+
297
+ let allMessages = [];
298
+ let chatTitle = null;
299
+ let lastSender = null;
300
+
301
+ for (const htmlPath of detection.htmlFiles) {
302
+ let content;
303
+ try { content = readFileSync(htmlPath, 'utf-8'); }
304
+ catch (_) { continue; }
305
+
306
+ if (!chatTitle) chatTitle = extractChatTitle(content, htmlPath);
307
+
308
+ // Split into per-message blocks. The reliable boundary is the
309
+ // opening `<div class="message ` of the next message.
310
+ // Use a tolerant regex that handles the message default / joined variants.
311
+ const messageBlocks = [...content.matchAll(/<div class="message [^"]*"[\s\S]*?(?=<div class="message [^"]*"|<div class="page_footer"|<\/body>)/g)];
312
+
313
+ for (const blockMatch of messageBlocks) {
314
+ const msg = parseMessageDiv(blockMatch[0], lastSender);
315
+ if (msg) {
316
+ allMessages.push(msg);
317
+ // Track sender for "joined" continuation messages
318
+ if (msg.from && msg.from !== 'Unknown') lastSender = msg.from;
319
+ }
320
+ }
321
+ }
322
+
323
+ if (allMessages.length === 0) return null;
324
+
325
+ // Stable chat id: hash of title + first message ts (good enough for dedup)
326
+ // We use a simple numeric hash so the synthetic chat_id is stable across re-imports.
327
+ const chatId = stableChatId(chatTitle || 'Telegram chat', allMessages[0]?.date_unixtime || '0');
328
+
329
+ return {
330
+ personal_information: { user_id: '' },
331
+ chats: {
332
+ list: [
333
+ {
334
+ id: chatId,
335
+ name: chatTitle || 'Telegram chat',
336
+ type: 'personal_chat',
337
+ messages: allMessages,
338
+ },
339
+ ],
340
+ },
341
+ _source: {
342
+ format: 'telegram-html',
343
+ original_path: path,
344
+ chunks: detection.htmlFiles.length,
345
+ messages_total: allMessages.length,
346
+ },
347
+ };
348
+ }
349
+
350
+ function stableChatId(title, firstTs) {
351
+ let hash = 0;
352
+ const key = title + ':' + firstTs;
353
+ for (let i = 0; i < key.length; i++) {
354
+ hash = ((hash << 5) - hash) + key.charCodeAt(i);
355
+ hash |= 0;
356
+ }
357
+ return Math.abs(hash);
358
+ }
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "memex-mvp",
3
- "version": "0.8.1",
3
+ "version": "0.9.0",
4
4
  "description": "Local-first MCP server for cross-agent AI memory. One SQLite + FTS5 corpus across Claude Code, Cowork, Cursor, Continue, Zed, Obsidian, and Telegram — passively captured, verbatim, searchable from any MCP-compatible client.",
5
5
  "type": "module",
6
6
  "main": "server.js",
@@ -26,7 +26,7 @@
26
26
  "sync": "node ingest.js",
27
27
  "ingest": "node ingest.js",
28
28
  "bot": "node bot/index.js",
29
- "test": "node test/parser.test.js && node test/bot-inbox.test.js && node test/search-sort.test.js && node test/store-document.test.js && node test/cli.test.js && node test/hook.test.js",
29
+ "test": "node test/parser.test.js && node test/bot-inbox.test.js && node test/search-sort.test.js && node test/store-document.test.js && node test/cli.test.js && node test/hook.test.js && node test/telegram-html.test.js",
30
30
  "prepublishOnly": "npm test"
31
31
  },
32
32
  "engines": {
package/server.js CHANGED
@@ -49,6 +49,10 @@ import {
49
49
  } from './lib/store-doc/canonicalize.js';
50
50
  import { detectIssues, isBlocked } from './lib/store-doc/detect.js';
51
51
  import { extractTitle } from './lib/store-doc/extract-title.js';
52
+ import {
53
+ detectTelegramHtml,
54
+ parseTelegramHtmlExport,
55
+ } from './lib/parse-telegram-html.js';
52
56
  import { createHash } from 'node:crypto';
53
57
  import { runCli, CLI_SUBCOMMAND_NAMES } from './lib/cli/index.js';
54
58
 
@@ -325,9 +329,17 @@ const insertImport = db.prepare(`
325
329
 
326
330
  // -------------------- Importers --------------------
327
331
 
328
- /** Telegram Desktop JSON export (single chat or all_chats). */
329
- function importTelegram(filePath) {
330
- const raw = JSON.parse(readFileSync(filePath, 'utf-8'));
332
+ /**
333
+ * Telegram Desktop export importer. Accepts:
334
+ * - filePath (string) — path to result.json
335
+ * - rawObject (object) — already-parsed export, e.g. from parseTelegramHtmlExport
336
+ *
337
+ * Returns total imported message count.
338
+ */
339
+ function importTelegram(filePathOrRaw) {
340
+ const raw = typeof filePathOrRaw === 'string'
341
+ ? JSON.parse(readFileSync(filePathOrRaw, 'utf-8'))
342
+ : filePathOrRaw;
331
343
 
332
344
  // Telegram Desktop produces either a single chat object or { chats: { list: [...] } }
333
345
  const chats = Array.isArray(raw.chats?.list)
@@ -670,9 +682,83 @@ function resolvePendingParents() {
670
682
  }
671
683
 
672
684
  /** Auto-detect format and import */
685
+ /**
686
+ * Try to import a path as a Telegram HTML export (directory or single file).
687
+ * Returns imported message count, or 0 if not an HTML export.
688
+ *
689
+ * Side effects on success:
690
+ * - Inserts an `imports` row tagged "telegram-html"
691
+ * - Moves the source directory/file to ~/.memex/data/conversations/telegram-html/
692
+ *
693
+ * If it LOOKS like a Telegram HTML export but parsing failed, prints an
694
+ * actionable error pointing the user at the Desktop export menu — instead
695
+ * of silently ignoring. This was Tester 5's friction point.
696
+ */
697
+ function importTelegramHtmlIfMatches(path) {
698
+ const detection = detectTelegramHtml(path);
699
+ if (!detection.type) return 0;
700
+
701
+ let parsed;
702
+ try {
703
+ parsed = parseTelegramHtmlExport(path);
704
+ } catch (err) {
705
+ log('telegram-html parse error:', basename(path), err.message);
706
+ parsed = null;
707
+ }
708
+
709
+ if (!parsed || parsed.chats.list[0].messages.length === 0) {
710
+ // Looked like Telegram HTML (had markers) but extraction yielded nothing.
711
+ // Print actionable error rather than silent ignore.
712
+ log('');
713
+ log('⚠ Detected Telegram HTML export at ' + basename(path) + ' but extracted 0 messages.');
714
+ log(' This usually means Telegram changed the HTML format, or the export is partial.');
715
+ log(' EASIEST FIX — re-export as JSON:');
716
+ log(' 1. Open Telegram Desktop');
717
+ log(' 2. Click the chat → ⋮ menu → "Export chat history"');
718
+ log(' 3. Format: change "HTML" to "Machine-readable JSON"');
719
+ log(' 4. Drop the new result.json into ~/.memex/inbox/');
720
+ log('');
721
+ log(' HTML export will be left in place — feel free to delete it once JSON works.');
722
+ return 0;
723
+ }
724
+
725
+ let imported = 0;
726
+ try {
727
+ imported = importTelegram(parsed);
728
+ } catch (err) {
729
+ log('telegram-html import error:', err.message);
730
+ return 0;
731
+ }
732
+
733
+ if (imported > 0) {
734
+ insertImport.run(
735
+ basename(path),
736
+ 'telegram-html',
737
+ Math.floor(Date.now() / 1000),
738
+ imported
739
+ );
740
+ // Archive: move the whole directory (or file) so the watcher doesn't re-process
741
+ const targetDir = join(ARCHIVE, 'telegram-html');
742
+ mkdirSync(targetDir, { recursive: true });
743
+ const target = join(targetDir, basename(path));
744
+ try {
745
+ renameSync(path, target);
746
+ } catch (_) {}
747
+ log(`imported ${imported} messages from ${basename(path)} (telegram-html, ${detection.htmlFiles.length} chunk(s))`);
748
+ }
749
+ return imported;
750
+ }
751
+
673
752
  function importFile(filePath) {
674
753
  if (!existsSync(filePath)) return 0;
675
754
  const stats = statSync(filePath);
755
+
756
+ // Telegram HTML export — can be either a directory (ChatExport_xxx/)
757
+ // or a bare messages.html file. We accept both. Detected via marker
758
+ // patterns inside the HTML, not file extension alone.
759
+ if (stats.isDirectory()) {
760
+ return importTelegramHtmlIfMatches(filePath);
761
+ }
676
762
  if (!stats.isFile()) return 0;
677
763
 
678
764
  const lower = filePath.toLowerCase();
@@ -692,6 +778,10 @@ function importFile(filePath) {
692
778
  imported = importTelegram(filePath);
693
779
  source = 'telegram';
694
780
  }
781
+ } else if (/\.html?$/i.test(lower)) {
782
+ // Single-file HTML drop (rare — usually a directory)
783
+ imported = importTelegramHtmlIfMatches(filePath);
784
+ if (imported > 0) source = 'telegram';
695
785
  } else if (lower.endsWith('.jsonl')) {
696
786
  // Filename prefix tells us which product the session came from.
697
787
  // cowork- → Claude Cowork (incl. its subagents)
@@ -739,15 +829,30 @@ function importFile(filePath) {
739
829
  // dropping a partial file by hand — the watcher must not race the writer and
740
830
  // move the unfinished tmp into archive, which used to spam ENOENT into the
741
831
  // daemon's rename and corrupt the import accounting.
832
+ // Watch INBOX top-level. Files: chokidar 'add' event. Directories:
833
+ // chokidar 'addDir' event (v0.9+ inbox can also receive Telegram HTML
834
+ // export DIRECTORIES like ChatExport_xxx/, not just JSON/JSONL files).
835
+ //
836
+ // `depth: 0` means we only get top-level entries — we DON'T want every
837
+ // .html chunk inside ChatExport_xxx to fire 'add' separately. The
838
+ // directory drop itself is what we react to; the HTML parser walks
839
+ // inside.
742
840
  chokidar
743
841
  .watch(INBOX, {
744
842
  ignoreInitial: false,
745
843
  ignored: /\.tmp$/,
746
844
  awaitWriteFinish: { stabilityThreshold: 800 },
845
+ depth: 0,
747
846
  })
748
847
  .on('add', (filePath) => {
749
- log('inbox detected:', basename(filePath));
848
+ log('inbox detected (file):', basename(filePath));
750
849
  importFile(filePath);
850
+ })
851
+ .on('addDir', (dirPath) => {
852
+ // Skip the inbox itself
853
+ if (dirPath === INBOX) return;
854
+ log('inbox detected (dir):', basename(dirPath));
855
+ importFile(dirPath);
751
856
  });
752
857
 
753
858
  // -------------------- MCP Server --------------------
@@ -58,14 +58,19 @@ Do these in order. Show each command before running it. Stop and ask if anything
58
58
  npm install -g memex-mvp
59
59
  ```
60
60
 
61
- If you get `EACCES` (macOS system Node), tell the user to choose:
61
+ If you get `EACCES` (macOS system Node), **recommend Option B (permanent prefix fix) by default** and proceed unless the user explicitly chooses sudo.
62
62
 
63
- **Option A** one-shot sudo:
64
- ```sh
65
- sudo npm install -g memex-mvp
66
- ```
63
+ Say to the user:
64
+
65
+ > "Hit EACCES — your Node is installed in a system directory that npm can't write to without admin rights. Two fixes:
66
+ >
67
+ > **A)** Quick: `sudo npm install -g memex-mvp` — one-time, requires your password
68
+ > **B)** Permanent fix: I'll move npm's install location to your home directory. After this, no `npm install -g` will ever ask for sudo again, for any package. Five commands, one-time.
69
+ >
70
+ > I recommend **B** — it's a permanent improvement that benefits all your future Node tools, not just memex. Proceed with B?"
71
+
72
+ If the user agrees (or says anything like "ok", "yes", "default", "go", "B") — proceed with B without further questions:
67
73
 
68
- **Option B** — fix prefix permanently (better long-term):
69
74
  ```sh
70
75
  mkdir -p ~/.npm-global
71
76
  npm config set prefix ~/.npm-global
@@ -74,7 +79,19 @@ source ~/.zshrc
74
79
  npm install -g memex-mvp
75
80
  ```
76
81
 
77
- **Ask which the user prefers don't guess.**
82
+ If the user **explicitly** picks A (or says "sudo", "fast", "quick"), use sudo:
83
+
84
+ ```sh
85
+ sudo npm install -g memex-mvp
86
+ ```
87
+
88
+ After the install completes, verify with `which memex` (should print an absolute path) and `memex --version` (should print the version).
89
+
90
+ **Important if Option B was used + user already had memex from a prior sudo install:** the old sudo-installed copy in `/usr/local/lib/node_modules/memex-mvp/` is now orphaned (PATH prefers the new home install). Mention this once:
91
+
92
+ > "By the way, you have an older sudo-installed memex copy in `/usr/local/lib/`. It's harmless but takes ~60 MB. Clean it up when convenient: `sudo npm uninstall -g memex-mvp` — not urgent."
93
+
94
+ Don't run that uninstall yourself — let the user decide when.
78
95
 
79
96
  If `node` or `npm` are missing, STOP and tell the user to install Node.js (recommend nvm.sh or `brew install node`). Don't try to install Node yourself.
80
97
 
@@ -73,7 +73,7 @@ Sample interactions when the user invokes this skill. These show typical happy p
73
73
 
74
74
  ---
75
75
 
76
- ## Example 2 · EACCES recovery (macOS system Node)
76
+ ## Example 2 · EACCES recovery (macOS system Node) — recommend B by default
77
77
 
78
78
  > **User:** install memex
79
79
  >
@@ -84,14 +84,18 @@ Sample interactions when the user invokes this skill. These show typical happy p
84
84
  > npm error path: '/usr/local/lib/node_modules/memex-mvp'
85
85
  > ```
86
86
  >
87
- > Hit a permissions error this is common on macOS when Node is installed system-wide. Two ways to fix:
87
+ > Hit EACCES your Node is installed in a system directory that npm can't write to without admin rights. Two fixes:
88
88
  >
89
- > **Option A** (quick, requires sudo password):
90
- > ```sh
91
- > sudo npm install -g memex-mvp
92
- > ```
89
+ > **A)** Quick: `sudo npm install -g memex-mvp` — one-time, requires your password
90
+ >
91
+ > **B)** Permanent fix: I'll move npm's install location to your home directory. After this, no `npm install -g` will ever ask for sudo again, for any package. Five commands, one-time.
92
+ >
93
+ > I recommend **B** — it's a permanent improvement that benefits all your future Node tools, not just memex. **Proceed with B?**
94
+ >
95
+ > **User:** ok
96
+ >
97
+ > **Agent:** Running the 5 prefix-fix commands:
93
98
  >
94
- > **Option B** (one-time setup, then future `npm install -g` works without sudo):
95
99
  > ```sh
96
100
  > mkdir -p ~/.npm-global
97
101
  > npm config set prefix ~/.npm-global
@@ -100,11 +104,7 @@ Sample interactions when the user invokes this skill. These show typical happy p
100
104
  > npm install -g memex-mvp
101
105
  > ```
102
106
  >
103
- > Option B is the standard npm recommendation — solves this for every future global install. Option A is faster if you don't want to touch your shell config. Which do you prefer?
104
- >
105
- > **User:** B
106
- >
107
- > **Agent:** [runs the 5 commands in sequence, shows output for each] ✓ Done. Verifying:
107
+ > Done. Verifying:
108
108
  >
109
109
  > ```
110
110
  > which memex → /Users/macbook/.npm-global/bin/memex
@@ -242,7 +242,7 @@ Sample interactions when the user invokes this skill. These show typical happy p
242
242
 
243
243
  - **Always discover first**, then ask for confirmation before any write action
244
244
  - **Show every command** before running it; show diffs for config edits
245
- - **Offer choices** at decision points (EACCES, daemon yes/no) rather than guessing
245
+ - **Recommend the better default** with a clear reason (e.g. Option B for EACCES) instead of forcing the user to weigh both options blind. Fall back to alternatives only if explicitly chosen.
246
246
  - **Detect and adapt** to existing installs, missing prerequisites, partial setups
247
247
  - **Refuse gracefully** when running in a context where the install can't actually work (web-only agents)
248
248