npm - memex-mvp - Versions diffs - 0.6.0 → 0.7.0 - Mend

memex-mvp 0.6.0 → 0.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (10) hide show

package/HELP.md +45 -0
package/README.md +41 -0
package/README.ru.md +46 -3
package/lib/cli/index.js +513 -0
package/lib/store-doc/extract-title.js +59 -13
package/package.json +2 -2
package/server.js +57 -0
package/skills/install-memex/README.md +3 -3
package/skills/install-memex/SKILL.md +13 -1
package/skills/install-memex/examples.md +59 -0

package/HELP.md CHANGED Viewed

@@ -293,6 +293,51 @@ Memex по дефолту сортирует по **релевантности**
 ---
+## 💻 Терминальный CLI (v0.7+) — когда MCP не работает
+Если MCP-интеграция не подцепилась к твоему агенту (или ты в агенте без MCP-поддержки, но с shell-доступом) — у memex есть **terminal-режим** на том же бинаре. Один пакет, два режима.
+```bash
+memex search "Postgres миграция"                  # FTS5 поиск
+memex search "Q2 deck" --chat "Memex Bot"         # фильтр по title чата
+memex search "auth" --source claude-code --limit 5 --sort date_desc
+memex recent --limit 5                             # последние сообщения
+memex recent --source telegram
+memex list                                         # все conversations
+memex list --source web                            # только сохранённые URL'ы
+memex get web-1582ab51a7b7                         # полный контент conversation
+memex overview                                     # snapshot корпуса
+memex projects                                     # уникальные project_paths
+memex help                                         # эта инструкция в терминале
+memex --help                                       # справка по командам
+memex --version
+```
+**Все query-команды поддерживают `--json`** для пайпов и скриптов:
+```bash
+memex search "TODO" --json | jq '.results[].snippet'
+memex list --source telegram --json | jq -r '.conversations[].title'
+memex get web-1582ab51a7b7 --json > backup.json
+```
+**БД открывается read-only** — безопасно запускать пока daemon-writer работает.
+**Когда использовать CLI вместо MCP:**
+- MCP-интеграция в твоём агенте не подключилась → `memex overview` подтвердит что сам memex здоров, проблема в MCP-config'е клиента
+- Агент без MCP-поддержки (OpenCode + Kimi, любые CLI-only агенты), но с shell-доступом
+- Shell-скрипты / автоматизация
+- Дебаг: «вижу ли я свою историю напрямую?»
+**`memex` (без аргументов)** — это MCP stdio-сервер. Это поведение по умолчанию для Claude Code / Cursor / Cline через их MCP-config'и. CLI-команды активируются только при наличии распознанного subcommand'a.
+---
 ## Если что-то не работает
 ### Поиск пустой

package/README.md CHANGED Viewed

@@ -100,6 +100,45 @@ For a fully-automated install across all detected MCP clients, see [the AI-drive
 ---
+## Terminal CLI (v0.7+) — query memex without MCP
+The same `memex` binary that runs as an MCP server also has a terminal mode for direct queries. Useful when MCP isn't wired up, when you want to pipe results into shell scripts, or when debugging MCP-config issues:
+```sh
+memex search "Postgres migration"          # full-text search
+memex search "Q2 deck" --chat "Memex Bot"  # scope to one conversation by title
+memex recent --limit 5                      # last 5 messages across all sources
+memex list --source web                     # all saved URLs
+memex get web-1582ab51a7b7                  # full content of one conversation
+memex overview                              # snapshot of corpus
+memex projects                              # distinct project_paths captured
+memex help                                  # full user guide (HELP.md)
+memex --help                                # command reference
+```
+Every query supports `--json` for machine-readable output: `memex search foo --json | jq '.results[].snippet'`. The DB is opened **read-only** — safe to run while `memex-sync` daemon is writing.
+When called **without arguments** (`memex`), the binary still runs as an MCP stdio server (the way Claude Code / Cursor / Cline launch it). CLI mode and MCP mode are the same package — no extra install.
+---
+## Save URLs into memex (v0.6+)
+Once memex is installed, any MCP-aware agent can also save **web pages, AI chat shares, and pasted text** into your memex memory — searchable from any other AI chat later. In Claude Code, Cursor, Cline, …:
+```
+Save https://www.perplexity.ai/share/<id> to memex
+Add this article to my memex: https://example.com/long-post
+```
+The agent fetches the page via its own WebFetch (auto-falling back to `r.jina.ai` for Cloudflare-protected sites — memex teaches the trick) and calls `memex_store_document`. Memex stores the content verbatim as a `web` source conversation, indistinguishable from AI chats at search time.
+Perplexity threads need to be made **Public** in the Share dialog first — memex detects private threads and tells the user how to fix it. Full guide: [HELP.md §8](HELP.md).
+**Memex stays 100% local** — the agent fetches, memex only stores. Zero outbound calls from memex itself.
+---
 ## What it captures
 | Source                | How it gets in                                                 |
@@ -111,6 +150,7 @@ For a fully-automated install across all detected MCP clients, see [the AI-drive
 | Obsidian notes        | Auto: per-vault markdown watcher                               |
 | Telegram exports      | Manual: drop `result.json` (Telegram Desktop) into `~/.memex/inbox/` |
 | Telegram (live)       | Run [`memex-bot`](bot/README.md) — captures messages you send/forward to your private bot |
+| **Web pages, AI chat shares, pasted text** | From any MCP agent: *"save https://... to memex"*. Agent fetches; memex stores verbatim. Cloudflare-protected pages (Perplexity, npm.com, Twitter, Medium, …) handled via the agent's r.jina.ai fallback. See [HELP.md §8](HELP.md) |
 All sources land in the same FTS5 corpus, searchable by one `memex_search` call.
@@ -128,6 +168,7 @@ All sources land in the same FTS5 corpus, searchable by one `memex_search` call.
 | `memex_list_projects`         | Distinct project paths captured (for the `project` filter)               |
 | `memex_archive_conversation`  | Hide a chat from default listings (data preserved)                       |
 | `memex_export_markdown`       | Export one conversation as Markdown (for Obsidian round-trip)            |
+| `memex_store_document`        | Save a web page, AI chat share, or pasted text. Agent fetches; memex stores verbatim. Teaches the Jina r.jina.ai trick for Cloudflare-blocked pages |
 | `memex_list_sources`          | Per-source enabled/disabled + counts                                     |
 | `memex_status`                | Daemon health: PID, last capture, watched files                          |
 | `memex_sources_status`        | Which sources are captured + the exact CLI to opt out                    |

package/README.ru.md CHANGED Viewed

@@ -121,6 +121,47 @@ curl -fsSL https://raw.githubusercontent.com/parallelclaw/memex-mvp/main/skills/
 …или `/install-memex`. Агент сам сделает `npm install`, пропишет MCP-config, поднимет daemon и проверит что всё работает — ~2 минуты.
+### Сохранение URL'ов в memex (v0.6+)
+После установки в любом MCP-агенте (Claude Code, Cursor, Cline, Continue, Zed) можно сохранять **web-страницы, AI-chat share'ы и pasted-тексты** прямо в memex-память:
+```
+Сохрани https://www.perplexity.ai/share/<id> в memex
+Добавь эту статью в memex: https://example.com/article
+```
+Агент сам fetch'ит страницу через свой WebFetch — для Cloudflare-защищённых сайтов (Perplexity, npm.com, Twitter, Medium) автоматически falls back на `r.jina.ai` proxy (memex учит агента этому трюку через tool description). Затем агент вызывает `memex_store_document`, который хранит контент verbatim как conversation с `source: "web"`.
+**Memex остаётся 100% локальным** — fetch делает агент, memex только хранит. Никаких outbound network calls со стороны memex.
+Полное руководство и edge cases (private Perplexity, paywall, login-walls): [HELP.md §8](HELP.md).
+### Терминальный CLI (v0.7+) — запросы к memex без MCP
+Тот же бинарь `memex`, который работает как MCP-сервер, имеет **terminal-режим** для прямых запросов. Полезно когда MCP не настроен, когда хочешь пайпить результаты в shell-скрипты, или дебажить MCP-конфиг:
+```bash
+memex search "Postgres миграция"            # полнотекстовый поиск
+memex search "Q2 deck" --chat "Memex Bot"   # сузить до конкретного чата по title
+memex recent --limit 5                       # последние 5 сообщений из всех источников
+memex list --source web                      # все сохранённые URL'ы
+memex get web-1582ab51a7b7                   # полный контент одной conversation
+memex overview                               # snapshot корпуса
+memex projects                               # уникальные project_paths
+memex help                                   # полное руководство (HELP.md)
+memex --help                                 # справка по командам
+```
+У каждого query-subcommand'a есть `--json` для machine-readable вывода: `memex search foo --json | jq '.results[].snippet'`. БД открывается **read-only** — безопасно запускать пока daemon пишет.
+При запуске **без аргументов** (`memex`) бинарь по-прежнему работает как MCP stdio server (как и вызывают его Claude Code / Cursor / Cline из своих конфигов). CLI-режим и MCP-режим — один и тот же пакет, без дополнительной установки.
+**Использовать CLI, когда:**
+- MCP-интеграция не подцепилась к твоему агенту → `memex overview` подтвердит что сам memex здоров
+- Агент без MCP-поддержки, но с shell-доступом
+- Хочешь пайпить результаты: `memex search foo --json | jq ...`
+- Хочешь сдампить полный transcript в stdout для context'a
 ### Подключение к Claude Code
 Сначала возьми **два абсолютных пути** в терминале:
@@ -162,9 +203,11 @@ which node  # → путь до бинарника node (например /Users
 | **Cursor IDE** (Composer + Chat) | SQLite `state.vscdb` в `~/Library/Application Support/Cursor/` | ✅ работает (poll каждые 5 мин) |
 | **Obsidian** vault notes | `.md` файлы + YAML frontmatter | ✅ работает (FSEvents, hash-based dedupe) |
 | **Telegram** | `result.json` из Desktop export | ✅ работает |
-| Claude.ai web export | будет в v0.3 | — |
-| ChatGPT export | будет в v0.3 | — |
-| Apple Notes | будет в v0.3 | — |
+| **Telegram (live)** | бот `memex-bot` ловит твои сообщения / форварды | ✅ работает |
+| **Web-страницы, AI-share'ы, paste'ы** | `memex_store_document` — агент fetch'ит, memex хранит verbatim (v0.6+) | ✅ работает |
+| Claude.ai web export | будет в v0.7 | — |
+| ChatGPT export | будет в v0.7 | — |
+| Apple Notes | будет в v0.7 | — |
 ### Filename convention для inbox-файлов

package/lib/cli/index.js ADDED Viewed

@@ -0,0 +1,513 @@
+/**
+ * memex CLI — terminal-mode subcommands for the `memex` binary.
+ *
+ * When the user invokes the `memex` bin with a recognized subcommand
+ * (search / recent / list / get / overview / projects / help / --help
+ * / --version), we run a one-shot query and exit. When called WITHOUT
+ * any argument, server.js falls through to MCP-stdio mode (the
+ * primary mode used by Claude Code, Cursor, Cline, Continue, Zed).
+ *
+ * The CLI opens memex.db in read-only mode and uses WAL-friendly
+ * queries — safe to run while memex-sync daemon is writing.
+ *
+ * Why duplicate SQL from server.js?  The MCP handlers in server.js
+ * are tightly coupled with the JSON-RPC response shape (jsonResult /
+ * textResult, half-life-boost params, group_by_conversation, …).
+ * Replicating the simple queries here keeps the CLI self-contained
+ * and avoids a risky refactor of the production MCP path. The CLI
+ * intentionally exposes the MOST USEFUL subset — not every MCP tool
+ * has a CLI peer.
+ *
+ * Output format:
+ *   default → human-friendly markdown with light ANSI colors (TTY only)
+ *   --json  → structured JSON for shell pipelines / agents
+ */
+import Database from 'better-sqlite3';
+import { join } from 'node:path';
+import { homedir } from 'node:os';
+import { existsSync, readFileSync } from 'node:fs';
+import { fileURLToPath } from 'node:url';
+// ---------- Subcommand registry ----------
+export const CLI_SUBCOMMAND_NAMES = [
+  'search', 'recent', 'list', 'get', 'overview',
+  'projects', 'help', '-h', '--help', '-v', '--version',
+];
+// ---------- Path helpers ----------
+const HOME = homedir();
+const MEMEX_DIR = process.env.MEMEX_DIR || join(HOME, '.memex');
+const DB_PATH = join(MEMEX_DIR, 'data', 'memex.db');
+// HELP.md lives at the package root, two levels up from lib/cli/
+const PACKAGE_ROOT = fileURLToPath(new URL('../../', import.meta.url));
+const HELP_MD_PATH = join(PACKAGE_ROOT, 'HELP.md');
+// ---------- ANSI helpers ----------
+const TTY = process.stdout.isTTY;
+const c = TTY
+  ? {
+      dim:   (s) => `\x1b[2m${s}\x1b[0m`,
+      bold:  (s) => `\x1b[1m${s}\x1b[0m`,
+      cyan:  (s) => `\x1b[36m${s}\x1b[0m`,
+      green: (s) => `\x1b[32m${s}\x1b[0m`,
+      yellow:(s) => `\x1b[33m${s}\x1b[0m`,
+    }
+  : {
+      dim: (s) => s, bold: (s) => s, cyan: (s) => s,
+      green: (s) => s, yellow: (s) => s,
+    };
+// ---------- argv parser (minimal, no deps) ----------
+function parseArgs(argv) {
+  const opts = {};
+  const positionals = [];
+  for (let i = 0; i < argv.length; i++) {
+    const a = argv[i];
+    if (a === '--json') opts.json = true;
+    else if (a === '--limit') opts.limit = parseInt(argv[++i], 10);
+    else if (a === '--source') opts.source = argv[++i];
+    else if (a === '--chat') opts.chat = argv[++i];
+    else if (a === '--project') opts.project = argv[++i];
+    else if (a === '--sort') opts.sort = argv[++i];
+    else if (a === '--include-archived') opts.includeArchived = true;
+    else if (a === '--help' || a === '-h') opts.help = true;
+    else if (a.startsWith('--')) { /* ignore unknown flag for forward-compat */ }
+    else positionals.push(a);
+  }
+  return { opts, positionals };
+}
+function openDb() {
+  if (!existsSync(DB_PATH)) {
+    console.error(`memex.db not found at ${DB_PATH}`);
+    console.error(`Run 'memex-sync install' to set up the daemon and create the DB.`);
+    process.exit(1);
+  }
+  // Read-only handle: WAL allows this to coexist with the writing daemon.
+  return new Database(DB_PATH, { readonly: true, fileMustExist: true });
+}
+function fmtDate(ts) {
+  if (!ts || ts === 0) return '?';
+  return new Date(ts * 1000).toISOString().slice(0, 10);
+}
+function fmtDateTime(ts) {
+  if (!ts || ts === 0) return '?';
+  return new Date(ts * 1000).toISOString().slice(0, 16).replace('T', ' ');
+}
+// FTS5 expects sanitized tokens — strip what would be operators
+function sanitizeFtsQuery(q) {
+  return String(q || '')
+    .trim()
+    .replace(/[^\p{L}\p{N}_\-\s"]/gu, ' ')
+    .replace(/\s+/g, ' ')
+    .trim();
+}
+// =============================================================
+// SEARCH
+// =============================================================
+async function cmdSearch(args) {
+  const { opts, positionals } = parseArgs(args);
+  const query = positionals.join(' ').trim();
+  if (!query || opts.help) {
+    console.error('Usage: memex search "<query>" [--source X] [--chat X] [--project X] [--sort SORT] [--limit N] [--json]');
+    console.error('  --sort: relevance (default) | date_asc | date_desc');
+    process.exit(query ? 0 : 2);
+  }
+  const limit = Math.min(50, Math.max(1, opts.limit || 10));
+  const sanitized = sanitizeFtsQuery(query);
+  if (!sanitized) {
+    console.error('Query became empty after sanitization — try simpler keywords.');
+    process.exit(2);
+  }
+  const filters = ['messages_fts MATCH ?'];
+  const params = [sanitized];
+  if (opts.source) {
+    filters.push('m.source = ?');
+    params.push(opts.source);
+  }
+  if (!opts.includeArchived) {
+    filters.push('(c.archived_at IS NULL OR c.archived_at = 0)');
+  }
+  if (opts.project) {
+    filters.push('c.project_path LIKE ?');
+    params.push(`%${opts.project}%`);
+  }
+  if (opts.chat) {
+    filters.push('LOWER(c.title) LIKE LOWER(?)');
+    params.push(`%${opts.chat}%`);
+  }
+  let orderBy;
+  if (opts.sort === 'date_asc') {
+    orderBy = 'CASE WHEN m.ts IS NULL OR m.ts = 0 THEN 1 ELSE 0 END, m.ts ASC';
+  } else if (opts.sort === 'date_desc') {
+    orderBy = 'CASE WHEN m.ts IS NULL OR m.ts = 0 THEN 1 ELSE 0 END, m.ts DESC';
+  } else {
+    // Same BM25 × recency formula as memex_search, with half_life = 30 days
+    orderBy = `bm25(messages_fts) * exp(-(CAST(strftime('%s','now') AS REAL) - COALESCE(NULLIF(m.ts, 0), CAST(strftime('%s','now') AS REAL))) / 86400.0 / 30.0)`;
+  }
+  const sql = `
+    SELECT m.source, m.conversation_id, m.role, m.sender, m.ts,
+           snippet(messages_fts, 0, '<<', '>>', ' … ', 18) AS snippet,
+           c.title AS conversation_title
+      FROM messages_fts
+      JOIN messages m ON m.id = messages_fts.rowid
+ LEFT JOIN conversations c ON c.conversation_id = m.conversation_id
+     WHERE ${filters.join(' AND ')}
+  ORDER BY ${orderBy}
+     LIMIT ?
+  `;
+  const db = openDb();
+  const rows = db.prepare(sql).all(...params, limit);
+  db.close();
+  if (opts.json) {
+    console.log(JSON.stringify({ query, count: rows.length, results: rows }, null, 2));
+    return;
+  }
+  if (rows.length === 0) {
+    console.log(`No results for ${c.bold('"' + query + '"')}`);
+    return;
+  }
+  console.log(`${c.bold(rows.length)} result(s) for ${c.bold('"' + query + '"')}\n`);
+  for (const r of rows) {
+    console.log(`${c.cyan(r.conversation_title || r.conversation_id)} ${c.dim('· ' + r.source + ' · ' + fmtDate(r.ts))}`);
+    console.log(`  ${r.snippet.replace(/<<(.+?)>>/g, (_, m) => c.yellow(m))}`);
+    console.log(`  ${c.dim('conversation_id: ' + r.conversation_id)}`);
+    console.log('');
+  }
+}
+// =============================================================
+// RECENT
+// =============================================================
+async function cmdRecent(args) {
+  const { opts } = parseArgs(args);
+  if (opts.help) {
+    console.error('Usage: memex recent [--limit N] [--source X] [--json]');
+    process.exit(0);
+  }
+  const limit = Math.min(100, Math.max(1, opts.limit || 20));
+  const filters = [];
+  const params = [];
+  if (opts.source) {
+    filters.push('m.source = ?');
+    params.push(opts.source);
+  }
+  if (!opts.includeArchived) {
+    filters.push('(c.archived_at IS NULL OR c.archived_at = 0)');
+  }
+  const where = filters.length ? `WHERE ${filters.join(' AND ')}` : '';
+  const sql = `
+    SELECT m.source, m.conversation_id, m.role, m.sender, m.ts,
+           substr(m.text, 1, 240) AS preview,
+           c.title AS conversation_title
+      FROM messages m
+ LEFT JOIN conversations c ON c.conversation_id = m.conversation_id
+     ${where}
+  ORDER BY m.ts DESC
+     LIMIT ?
+  `;
+  const db = openDb();
+  const rows = db.prepare(sql).all(...params, limit);
+  db.close();
+  if (opts.json) {
+    console.log(JSON.stringify({ count: rows.length, results: rows }, null, 2));
+    return;
+  }
+  console.log(`${c.bold(rows.length)} recent message(s)\n`);
+  for (const r of rows) {
+    console.log(`${c.cyan(r.conversation_title || r.conversation_id)} ${c.dim('· ' + r.source + ' · ' + fmtDateTime(r.ts))}`);
+    console.log(`  ${c.dim(r.role + ':')} ${r.preview.replace(/\s+/g, ' ').trim()}`);
+    console.log('');
+  }
+}
+// =============================================================
+// LIST conversations
+// =============================================================
+async function cmdList(args) {
+  const { opts } = parseArgs(args);
+  if (opts.help) {
+    console.error('Usage: memex list [--source X] [--limit N] [--json]');
+    process.exit(0);
+  }
+  const limit = Math.min(200, Math.max(1, opts.limit || 20));
+  const filters = [];
+  const params = [];
+  if (opts.source) {
+    filters.push('source = ?');
+    params.push(opts.source);
+  }
+  if (!opts.includeArchived) {
+    filters.push('(archived_at IS NULL OR archived_at = 0)');
+  }
+  filters.push("(parent_conversation_id IS NULL)"); // skip subagents by default
+  const where = filters.length ? `WHERE ${filters.join(' AND ')}` : '';
+  const sql = `
+    SELECT conversation_id, source, title, first_ts, last_ts, message_count
+      FROM conversations
+     ${where}
+  ORDER BY last_ts DESC
+     LIMIT ?
+  `;
+  const db = openDb();
+  const rows = db.prepare(sql).all(...params, limit);
+  db.close();
+  if (opts.json) {
+    console.log(JSON.stringify({ count: rows.length, conversations: rows }, null, 2));
+    return;
+  }
+  console.log(`${c.bold(rows.length)} conversation(s)\n`);
+  for (const r of rows) {
+    console.log(`${c.cyan(r.title || r.conversation_id)}`);
+    console.log(`  ${c.dim(r.source + ' · ' + r.message_count + ' msgs · ' + fmtDate(r.first_ts) + ' → ' + fmtDate(r.last_ts))}`);
+    console.log(`  ${c.dim(r.conversation_id)}`);
+    console.log('');
+  }
+}
+// =============================================================
+// GET full conversation
+// =============================================================
+async function cmdGet(args) {
+  const { opts, positionals } = parseArgs(args);
+  const convId = positionals[0];
+  if (!convId || opts.help) {
+    console.error('Usage: memex get <conversation_id> [--limit N] [--json]');
+    console.error('Find conversation_ids via `memex list` or `memex search`.');
+    process.exit(convId ? 0 : 2);
+  }
+  const limit = Math.min(2000, Math.max(1, opts.limit || 200));
+  const db = openDb();
+  const conv = db
+    .prepare(`SELECT * FROM conversations WHERE conversation_id = ?`)
+    .get(convId);
+  if (!conv) {
+    db.close();
+    console.error(`No conversation found for id: ${convId}`);
+    process.exit(1);
+  }
+  const msgs = db
+    .prepare(`
+      SELECT role, sender, text, ts
+        FROM messages
+       WHERE conversation_id = ?
+    ORDER BY ts ASC, id ASC
+       LIMIT ?
+    `)
+    .all(convId, limit);
+  db.close();
+  if (opts.json) {
+    console.log(JSON.stringify({ conversation: conv, messages: msgs }, null, 2));
+    return;
+  }
+  console.log(`# ${conv.title || conv.conversation_id}`);
+  console.log(`${c.dim(conv.source + ' · ' + msgs.length + ' message(s) · ' + fmtDate(conv.first_ts) + ' → ' + fmtDate(conv.last_ts))}`);
+  console.log('');
+  for (const m of msgs) {
+    console.log(`${c.cyan(m.role + ' (' + m.sender + ')')} ${c.dim(fmtDateTime(m.ts))}`);
+    console.log(m.text);
+    console.log('');
+  }
+}
+// =============================================================
+// OVERVIEW
+// =============================================================
+async function cmdOverview(args) {
+  const { opts } = parseArgs(args);
+  const db = openDb();
+  const sources = db.prepare(`
+    SELECT source, COUNT(*) AS msgs, COUNT(DISTINCT conversation_id) AS chats,
+           MIN(ts) AS first_ts, MAX(ts) AS last_ts
+      FROM messages
+  GROUP BY source
+  ORDER BY msgs DESC
+  `).all();
+  const totalMsgs = db.prepare(`SELECT COUNT(*) AS c FROM messages`).get().c;
+  const totalConvs = db.prepare(`SELECT COUNT(*) AS c FROM conversations`).get().c;
+  const recentConvs = db.prepare(`
+    SELECT conversation_id, source, title, last_ts
+      FROM conversations
+     WHERE archived_at IS NULL OR archived_at = 0
+  ORDER BY last_ts DESC
+     LIMIT 10
+  `).all();
+  db.close();
+  if (opts.json) {
+    console.log(JSON.stringify({
+      total_messages: totalMsgs,
+      total_conversations: totalConvs,
+      sources,
+      recent_conversations: recentConvs,
+    }, null, 2));
+    return;
+  }
+  console.log(c.bold('memex corpus snapshot') + '\n');
+  console.log(`Total: ${c.green(totalMsgs + ' messages')} in ${c.green(totalConvs + ' conversations')}\n`);
+  console.log(c.bold('By source:'));
+  for (const s of sources) {
+    console.log(`  ${s.source.padEnd(18)} ${String(s.msgs).padStart(7)} msgs · ${String(s.chats).padStart(5)} chats · ${fmtDate(s.first_ts)} → ${fmtDate(s.last_ts)}`);
+  }
+  console.log('');
+  console.log(c.bold('10 most recent conversations:'));
+  for (const r of recentConvs) {
+    console.log(`  ${c.dim(fmtDate(r.last_ts))}  ${c.cyan((r.title || r.conversation_id).slice(0, 60))}  ${c.dim('(' + r.source + ')')}`);
+  }
+}
+// =============================================================
+// PROJECTS
+// =============================================================
+async function cmdProjects(args) {
+  const { opts } = parseArgs(args);
+  const limit = Math.min(500, Math.max(1, opts.limit || 50));
+  const db = openDb();
+  const rows = db.prepare(`
+    SELECT project_path AS path, COUNT(*) AS chats
+      FROM conversations
+     WHERE project_path IS NOT NULL AND project_path != ''
+  GROUP BY project_path
+  ORDER BY chats DESC, project_path ASC
+     LIMIT ?
+  `).all(limit);
+  db.close();
+  if (opts.json) {
+    console.log(JSON.stringify({ count: rows.length, projects: rows }, null, 2));
+    return;
+  }
+  if (rows.length === 0) {
+    console.log('No projects captured yet. Run `memex-sync backfill-projects` to populate project paths on older conversations.');
+    return;
+  }
+  console.log(`${c.bold(rows.length)} project(s):\n`);
+  for (const r of rows) {
+    console.log(`  ${String(r.chats).padStart(4)} chats  ${c.cyan(r.path)}`);
+  }
+}
+// =============================================================
+// HELP — print HELP.md content
+// =============================================================
+async function cmdHelp() {
+  if (!existsSync(HELP_MD_PATH)) {
+    console.error(`HELP.md not found at ${HELP_MD_PATH}`);
+    console.error(`See https://github.com/parallelclaw/memex-mvp/blob/main/HELP.md`);
+    process.exit(1);
+  }
+  process.stdout.write(readFileSync(HELP_MD_PATH, 'utf-8'));
+}
+// =============================================================
+// USAGE — `memex --help`
+// =============================================================
+async function cmdUsage() {
+  console.log(`memex — local-first MCP memory server for AI agents
+USAGE
+  memex                          run as MCP stdio server (called by Claude Code,
+                                 Cursor, Cline, Continue, Zed via MCP config)
+  memex <command> [args]         run a one-shot terminal query and exit
+COMMANDS
+  search "<query>"               full-text search across all sources
+    --source <name>              filter by source (telegram, claude-code, …)
+    --chat "<title>"             filter by conversation title (substring)
+    --project <path>             filter by project_path (substring)
+    --sort <mode>                relevance | date_asc | date_desc
+    --limit N                    max results (default 10, max 50)
+    --json                       output JSON instead of markdown
+  recent                         most recent messages across all sources
+    --limit N                    default 20, max 100
+    --source <name>              filter by source
+    --json
+  list                           list conversations by recency
+    --source <name>              filter by source
+    --limit N                    default 20, max 200
+    --json
+  get <conversation_id>          full transcript of one conversation
+    --limit N                    max messages (default 200, max 2000)
+    --json
+  overview                       corpus snapshot — sources, counts, recent chats
+    --json
+  projects                       list distinct project_paths captured
+    --limit N                    default 50, max 500
+    --json
+  help                           print the user guide (HELP.md)
+  --help, -h                     this command reference
+  --version, -v                  print package version
+EXAMPLES
+  memex search "Postgres migration"
+  memex search "Q2 deck" --chat "Memex Bot"
+  memex search "auth" --source claude-code --sort date_desc --limit 5
+  memex list --source web --json | jq '.conversations[].title'
+  memex get web-1582ab51a7b7
+DAEMON COMMANDS (separate binary)
+  memex-sync install             register the macOS LaunchAgent for auto-capture
+  memex-sync status              daemon health + watched files
+  memex-sync scan                one-time backfill of existing AI sessions
+  memex-sync --help              full daemon CLI reference
+For the full user guide:  memex help
+On the web:                https://memex.parallelclaw.ai
+`);
+}
+// =============================================================
+// VERSION
+// =============================================================
+async function cmdVersion() {
+  try {
+    const pkgPath = join(PACKAGE_ROOT, 'package.json');
+    const pkg = JSON.parse(readFileSync(pkgPath, 'utf-8'));
+    console.log(`memex-mvp ${pkg.version}`);
+  } catch (_) {
+    console.log('memex-mvp (version unknown)');
+  }
+}
+// =============================================================
+// DISPATCH
+// =============================================================
+export async function runCli(sub, args) {
+  switch (sub) {
+    case 'search':     return cmdSearch(args);
+    case 'recent':     return cmdRecent(args);
+    case 'list':       return cmdList(args);
+    case 'get':        return cmdGet(args);
+    case 'overview':   return cmdOverview(args);
+    case 'projects':   return cmdProjects(args);
+    case 'help':       return cmdHelp();
+    case '--help':
+    case '-h':         return cmdUsage();
+    case '--version':
+    case '-v':         return cmdVersion();
+    default:
+      console.error(`Unknown subcommand: ${sub}`);
+      console.error(`Run 'memex --help' for usage.`);
+      process.exit(2);
+  }
+}

package/lib/store-doc/extract-title.js CHANGED Viewed

@@ -2,13 +2,18 @@
  * Extract a title from fetched page content.
  *
  * Strategy (first hit wins):
- *   1. Markdown H1 — `# Title text`  (Jina Reader's output starts with this)
- *   2. HTML <title> — `<title>Page Title</title>`
- *   3. HTML <h1>  — `<h1>Page Title</h1>`
- *   4. First non-empty line if short enough to look like a title
- *   5. URL slug fallback — last meaningful path segment, decoded
- *   6. Domain fallback — just the domain name
- *   7. "Untitled document"
+ *   0. Strip Jina Reader prefix block if present (Jina prepends
+ *      `Title: …\nURL Source: …\nPublished Time: …\nMarkdown Content:\n`
+ *      to its output; the literal "Title:" line is often useless boilerplate
+ *      like "Title: Perplexity" rather than the actual thread title)
+ *   1. Markdown H1 — `# Title text`
+ *   2. Markdown H2 — `## Title text`  (Perplexity threads start with H2)
+ *   3. HTML <title> — `<title>Page Title</title>`
+ *   4. HTML <h1>  — `<h1>Page Title</h1>`
+ *   5. First non-empty line if short enough to look like a title
+ *   6. URL slug fallback — last meaningful path segment, decoded
+ *   7. Domain fallback — just the domain name
+ *   8. "Untitled document"
  *
  * Returns a trimmed string up to MAX_LEN characters. Always returns a
  * non-empty string (worst case "Untitled document").
@@ -23,13 +28,52 @@ function trimTitle(s) {
   return t;
 }
+/**
+ * Jina AI Reader (r.jina.ai/<url>) wraps every page in a metadata
+ * prefix:
+ *
+ *   Title: <browser tab title>
+ *
+ *   URL Source: <original URL>
+ *
+ *   Published Time: <date>
+ *
+ *   Markdown Content:
+ *   <actual page markdown follows here>
+ *
+ * The "Title:" line is frequently a generic app shell ("Perplexity",
+ * "Twitter / X", "GitHub") rather than the actual document title — so
+ * we strip the whole prefix and run title extraction against the real
+ * markdown body. The actual H1/H2 inside is what we want.
+ *
+ * Detection is keyed on "URL Source: http" near the top — that line
+ * is unique to Jina's output format. If it's not present, content is
+ * returned unchanged (non-Jina source).
+ */
+function stripJinaPrefix(content) {
+  // Quick gate: look for URL Source line in the first ~500 chars
+  if (!/^URL Source:\s*https?:\/\//m.test(content.slice(0, 500))) {
+    return content;
+  }
+  // Find the "Markdown Content:" delimiter and slice everything after it
+  const m = content.match(/^Markdown Content:\s*\n/m);
+  if (!m) return content;
+  return content.slice(m.index + m[0].length);
+}
 function fromMarkdownH1(content) {
-  // Markdown H1: line starts with single # then space, then text.
-  // Use \r? for cross-platform line endings. Stop at end-of-line.
+  // Single # at start of line, then space(s), then text.
   const m = content.match(/^[ \t]*#[ \t]+([^\r\n]+?)[ \t]*$/m);
   return m ? trimTitle(m[1]) : '';
 }
+function fromMarkdownH2(content) {
+  // ## at start of line — used as fallback when H1 absent
+  // (Perplexity, Jina-fetched Twitter threads, many blog "subtopic" layouts).
+  const m = content.match(/^[ \t]*##[ \t]+([^\r\n]+?)[ \t]*$/m);
+  return m ? trimTitle(m[1]) : '';
+}
 function fromHtmlTitle(content) {
   const m = content.match(/<title[^>]*>([^<]+)<\/title>/i);
   return m ? trimTitle(decodeEntities(m[1])) : '';
@@ -104,12 +148,14 @@ function decodeEntities(s) {
  */
 export function extractTitle(content, url) {
   const safe = typeof content === 'string' ? content : '';
+  const body = stripJinaPrefix(safe);
   return (
-    fromMarkdownH1(safe) ||
-    fromHtmlTitle(safe) ||
-    fromHtmlH1(safe) ||
-    fromFirstLine(safe) ||
+    fromMarkdownH1(body) ||
+    fromMarkdownH2(body) ||
+    fromHtmlTitle(body) ||
+    fromHtmlH1(body) ||
+    fromFirstLine(body) ||
     fromUrlSlug(url) ||
     'Untitled document'
   );

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "memex-mvp",
-  "version": "0.6.0",
+  "version": "0.7.0",
   "description": "Local-first MCP server for cross-agent AI memory. One SQLite + FTS5 corpus across Claude Code, Cowork, Cursor, Continue, Zed, Obsidian, and Telegram — passively captured, verbatim, searchable from any MCP-compatible client.",
   "type": "module",
   "main": "server.js",
@@ -26,7 +26,7 @@
     "sync": "node ingest.js",
     "ingest": "node ingest.js",
     "bot": "node bot/index.js",
-    "test": "node test/parser.test.js && node test/bot-inbox.test.js && node test/search-sort.test.js && node test/store-document.test.js",
+    "test": "node test/parser.test.js && node test/bot-inbox.test.js && node test/search-sort.test.js && node test/store-document.test.js && node test/cli.test.js",
     "prepublishOnly": "npm test"
   },
   "engines": {

package/server.js CHANGED Viewed

@@ -50,6 +50,31 @@ import {
 import { detectIssues, isBlocked } from './lib/store-doc/detect.js';
 import { extractTitle } from './lib/store-doc/extract-title.js';
 import { createHash } from 'node:crypto';
+import { runCli, CLI_SUBCOMMAND_NAMES } from './lib/cli/index.js';
+// -------------------- CLI subcommand dispatch --------------------
+// When invoked with a recognized subcommand (search, recent, list, get,
+// overview, projects, help, --help, --version) — run a one-shot query
+// and exit. When invoked WITHOUT any argument (the way MCP clients
+// always call this binary), fall through to MCP-stdio mode below.
+//
+// This runs BEFORE any DB/watcher side-effects so the CLI doesn't open
+// the DB in write mode unnecessarily.
+{
+  const sub = process.argv[2];
+  if (sub && CLI_SUBCOMMAND_NAMES.includes(sub)) {
+    await runCli(sub, process.argv.slice(3));
+    process.exit(0);
+  }
+  if (sub && !sub.startsWith('-')) {
+    // Unknown positional subcommand — fail fast with help, don't drift
+    // into MCP mode (which would just hang waiting for stdin).
+    console.error(`Unknown subcommand: ${sub}`);
+    console.error(`Run 'memex --help' for usage.`);
+    process.exit(2);
+  }
+  // No args (or only flags we don't recognize) → MCP mode
+}
 // -------------------- Paths --------------------
 const HOME = homedir();
@@ -959,6 +984,38 @@ Archived conversations are hidden from default list/search but stay
 fully indexed. Pass include_archived: true on search/list to include
 them. Visibility flag only — never deletes data.
+══ CLI FALLBACK — when MCP isn't available ══
+If you're running in an agent where memex MCP tools aren't wired up
+(or wired up but not responding), memex ALSO ships a terminal CLI on
+the same \`memex\` binary. Use this as a fallback before resorting to
+raw SQLite. Available subcommands:
+  memex search "<query>" [--source X] [--chat X] [--sort MODE] [--limit N] [--json]
+  memex recent           [--limit N] [--source X] [--json]
+  memex list             [--source X] [--limit N] [--json]
+  memex get <id>         [--json]
+  memex overview         [--json]
+  memex projects
+  memex help             prints the full HELP.md user guide
+  memex --help           command reference
+The --json flag on every query subcommand returns structured JSON
+for parsing. The DB is opened read-only — safe to run while the
+auto-capture daemon is writing.
+WHEN TO USE THE CLI:
+  • You suspect MCP integration is broken — \`memex overview\` confirms
+    memex itself is healthy independent of MCP wiring
+  • You're in an agent without MCP support but with shell access
+  • You want to pipe results: \`memex search foo --json | jq ...\`
+  • You want to dump a full conversation to stdout for context
+DON'T fall back to raw SQLite queries against memex.db when the CLI
+exists — the CLI handles edge cases (FTS5 syntax sanitization,
+date formatting, snippet highlighting, archive filtering) that raw
+SQL doesn't, and the schema may change between versions.
 ══ DOCUMENT INGESTION (web pages, articles, AI chat shares) ══
 memex_store_document accepts content YOU fetch and stores it verbatim.

package/skills/install-memex/README.md CHANGED Viewed

@@ -11,15 +11,15 @@ After you drop the skill into your agent (`~/.claude/skills/` for Claude Code, o
 3. **MCP config merge** — adds a single absolute-path `command` entry into your client's `mcpServers` config. Never overwrites your other servers
 4. **`memex-sync install`** — registers the macOS LaunchAgent for live auto-capture
 5. **`memex-sync scan`** — one-time backfill of every session that already exists on disk
-6. **Restart hint + verification commands**
+6. **Restart hint + verification commands** — including the v0.7+ CLI fallback (`memex overview`, `memex search "foo"`) so you can verify memex works even if MCP didn't wire up cleanly
 End-to-end: **~2 minutes**, fully observable (agent shows each command before running).
 ## What is memex?
-Memex is a **local-first MCP server** that captures every conversation you have with an AI — across **Claude Code, Cowork (including subagent transcripts), Cursor, Cline, Continue, Zed**, plus **Obsidian notes** and **Telegram chats** — into one searchable SQLite + FTS5 corpus.
+Memex is a **local-first MCP server** that captures every conversation you have with an AI — across **Claude Code, Cowork (including subagent transcripts), Cursor, Cline, Continue, Zed**, plus **Obsidian notes**, **Telegram chats**, and **web pages / AI chat shares** (v0.6+ via `memex_store_document` — agent fetches, memex stores verbatim) — into one searchable SQLite + FTS5 corpus.
-Any MCP-compatible agent can then query that corpus through 11 standard tools (`memex_search`, `memex_recent`, `memex_overview`, …).
+Any MCP-compatible agent can then query that corpus through 12 standard tools (`memex_search`, `memex_recent`, `memex_overview`, `memex_store_document`, …).
 | Pain                                            | Memex                                |
 |-------------------------------------------------|--------------------------------------|

package/skills/install-memex/SKILL.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
 name: install-memex
-description: Make Claude, Cursor, Cline, Continue, and Zed remember every AI conversation forever — one local SQLite corpus shared across all of them. Installs memex (local-first MCP server) in ~2 minutes — npm install, MCP config wiring, auto-capture daemon, history backfill. No cloud, no account, verbatim storage. Also indexes Obsidian notes and Telegram chats. Use when the user says "install memex", "set up memex", "add memory to my AI", "make my agent remember across sessions", or similar.
+description: Make Claude, Cursor, Cline, Continue, and Zed remember every AI conversation forever — one local SQLite corpus shared across all of them. Installs memex (local-first MCP server) in ~2 minutes — npm install, MCP config wiring, auto-capture daemon, history backfill. No cloud, no account, verbatim storage. Also indexes Obsidian notes, Telegram chats, and any URL the user wants to save (web pages, Perplexity threads, AI chat shares — memex_store_document tool, v0.6+). Use when the user says "install memex", "set up memex", "add memory to my AI", "make my agent remember across sessions", or similar.
 version: 1.0.0
 metadata:
   openclaw:
@@ -160,9 +160,21 @@ After restart, suggest they try any of:
 - "show me what memex has in memory" → triggers `memex_overview`
 - "what projects has memex captured" → triggers `memex_list_projects`
 - "search memex for [recent topic]" → triggers `memex_search`
+- "save https://en.wikipedia.org/wiki/As_We_May_Think to memex" → triggers `memex_store_document` and teaches the user that URL-saving exists (v0.6+)
 These confirm everything works end-to-end.
+**CLI fallback (v0.7+):** if the MCP integration doesn't pick up in the user's client for any reason, tell them they can verify memex from the terminal directly — same binary, no MCP needed:
+```sh
+memex overview      # confirms memex itself is healthy
+memex search "foo"  # FTS search from CLI
+memex list          # list conversations
+memex --help        # command reference
+```
+This is also useful for agents without native MCP support (OpenCode + Kimi, plain shell scripts, CI pipelines) — they can shell out to `memex` directly.
 ## Safety rules — read before starting
 - If `node` or `npm` aren't installed, stop and tell the user to install Node.js (recommend nvm.sh or `brew install node`). Don't try to install Node yourself.

package/skills/install-memex/examples.md CHANGED Viewed

@@ -179,6 +179,65 @@ Sample interactions when the user invokes this skill. These show typical happy p
 ---
+---
+## Example 6 · Right after install — testing URL save (v0.6+)
+> **User:** save https://en.wikipedia.org/wiki/As_We_May_Think to memex
+>
+> **Agent:** Fetching the page with WebFetch...
+>
+> ```
+> ✓ Got 12,847 chars from en.wikipedia.org
+> ```
+>
+> Calling `memex_store_document(content, url=https://en.wikipedia.org/wiki/As_We_May_Think, title=auto)`...
+>
+> ```
+> ✓ stored
+>   conversation_id: web-d3b07a384d1b
+>   title: "As We May Think"
+>   length: 12,847 chars
+>   warnings: []
+> ```
+>
+> Saved. Try searching it now: *"search memex for memex Bush 1945"* — the article will show up alongside any AI chats where you discussed it.
+>
+> 💡 By the way, you can save any URL this way — Perplexity threads, blog posts, ChatGPT shares, Twitter threads, GitHub discussions. For Perplexity specifically, the thread must be made Public first (Share → Public link). For Cloudflare-protected sites, memex auto-falls back through `r.jina.ai` proxy.
+---
+---
+## Example 7 · MCP didn't connect — agent recommends CLI fallback (v0.7+)
+> **User:** ok, I installed memex. but when I ask Claude "show me what memex has", nothing happens.
+>
+> **Agent:** Could be a MCP-config issue with your Claude client — let's verify memex itself is healthy first, separately from MCP:
+>
+> ```sh
+> memex overview
+> ```
+>
+> ```
+> memex corpus snapshot
+>
+> Total: 1,247 messages in 23 conversations
+>
+> By source:
+>   claude-code           742 msgs · 18 chats · 2026-04-06 → 2026-05-15
+>   claude-cowork         312 msgs ·  4 chats · 2026-04-06 → 2026-05-07
+>   ...
+> ```
+>
+> Memex itself is healthy — your data is there. The problem is the MCP wiring in your Claude client.
+>
+> Try `memex search "recent topic you discussed"` directly in the terminal — if that returns hits, you've got a fully-working memex; it's just not exposed to your Claude session yet.
+>
+> Then check your Claude MCP config (`~/.claude/config.json` or equivalent) — the `memex` entry should have an absolute path. Run `which memex` to get it, and verify it's there. If you need to re-do the wiring, ask me again — I'll re-run step 2 of the install.
+---
 ## What these examples illustrate about the skill's behavior
 - **Always discover first**, then ask for confirmation before any write action