npm - memex-mvp - Versions diffs - 0.5.4 → 0.6.1 - Mend

memex-mvp 0.5.4 → 0.6.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (11) hide show

package/HELP.md +48 -0
package/README.md +19 -0
package/README.ru.md +20 -3
package/lib/store-doc/canonicalize.js +116 -0
package/lib/store-doc/detect.js +209 -0
package/lib/store-doc/extract-title.js +116 -0
package/package.json +2 -2
package/server.js +318 -0
package/skills/install-memex/README.md +86 -38
package/skills/install-memex/SKILL.md +11 -1
package/skills/install-memex/examples.md +218 -0

package/HELP.md CHANGED Viewed

@@ -227,12 +227,60 @@ Memex по дефолту сортирует по **релевантности**
 ---
+### 8. 🔗 Сохранение URL'ов в memex (Perplexity, статьи, AI-share'ы)
+Ты читаешь что-то — Perplexity research thread, длинную статью, GitHub-обсуждение, AI-chat share — и хочешь чтобы это жило в memex-памяти, искалось из любого AI-чата.
+**В любом MCP-агенте (Claude Code, Cursor, Cline, Continue, Zed):**
+```
+Сохрани https://perplexity.ai/share/<id> в memex
+Добавь эту статью в memex: https://example.com/great-post
+Захвати этот ChatGPT-разговор: https://chat.openai.com/share/<id>
+```
+**Что происходит за кулисами:**
+1. Агент сам делает fetch URL'a (через свой WebFetch)
+2. Если страница защищена Cloudflare (Perplexity, npm.com, Twitter, Medium…) — агент авто-retry через `r.jina.ai` proxy (бесплатный JS-runtime, обходит Cloudflare)
+3. Агент вызывает `memex_store_document(content, url, title)`
+4. Memex сохраняет содержимое как conversation с `source: "web"` — ищется через `memex_search` рядом с AI-чатами
+**Для Perplexity-thread'ов:** thread должен быть **PUBLIC**. В Perplexity: открой thread → Share → toggle "Public link" → скопируй новый URL → дай его агенту. URL из адресной строки браузера (`perplexity.ai/search/<id>`) — это **твой owner-URL, не shareable**.
+Если забудешь — memex детектит «private» в ответе и agent тебе явно скажет что делать.
+**Login-walled или paywalled контент не fetch'нется** (NYT subscription, твои приватные ChatGPT-чаты). Для них вставь контент руками:
+```
+Сохрани этот текст в memex (название: "..."): <вставь содержимое>
+```
+**Tag'и при сохранении** — для последующей фильтрации:
+```
+Сохрани https://... в memex, поставь теги "research" и "perplexity"
+```
+**Поиск:**
+```
+Найди в memex что Perplexity говорил про X на прошлой неделе
+```
+`memex_search` возвращает совпадения **и из AI-чатов, и из сохранённых URL'ов** в одном запросе — отсортировано по релевантности или дате.
+**Memex принципиально НЕ делает outbound network calls.** Fetcher живёт в твоём AI-агенте. Если он использует Jina для обхода Cloudflare — Jina видит URL (но НЕ остальной memex-корпус). Это выбор агента, не memex'a.
+---
 ## Какие MCP-tools агент может вызвать
 | Tool | Что делает |
 |---|---|
 | `memex_overview` | Снэпшот корпуса: источники, сколько сообщений, последние чаты, статус auto-capture |
 | `memex_search(query)` | Полнотекстовый поиск (FTS5) с recency boost'ом. Параметры: `project`, `source`, `chat`, `half_life_days`, `expand_match`, `sort` |
+| `memex_store_document(content, url?, title?)` | Сохранить внешний документ (web-страница, AI-chat share, paste) в memex. Агент сам делает fetch, memex хранит verbatim. Учит Jina-трюк для Cloudflare-страниц |
 | `memex_list_projects` | Список всех проектов с количеством разговоров |
 | `memex_list_conversations` | Список чатов отсортированных по recency |
 | `memex_get_conversation(id)` | Полный transcript одного чата |

package/README.md CHANGED Viewed

@@ -100,6 +100,23 @@ For a fully-automated install across all detected MCP clients, see [the AI-drive
 ---
+## Save URLs into memex (v0.6+)
+Once memex is installed, any MCP-aware agent can also save **web pages, AI chat shares, and pasted text** into your memex memory — searchable from any other AI chat later. In Claude Code, Cursor, Cline, …:
+```
+Save https://www.perplexity.ai/share/<id> to memex
+Add this article to my memex: https://example.com/long-post
+```
+The agent fetches the page via its own WebFetch (auto-falling back to `r.jina.ai` for Cloudflare-protected sites — memex teaches the trick) and calls `memex_store_document`. Memex stores the content verbatim as a `web` source conversation, indistinguishable from AI chats at search time.
+Perplexity threads need to be made **Public** in the Share dialog first — memex detects private threads and tells the user how to fix it. Full guide: [HELP.md §8](HELP.md).
+**Memex stays 100% local** — the agent fetches, memex only stores. Zero outbound calls from memex itself.
+---
 ## What it captures
 | Source                | How it gets in                                                 |
@@ -111,6 +128,7 @@ For a fully-automated install across all detected MCP clients, see [the AI-drive
 | Obsidian notes        | Auto: per-vault markdown watcher                               |
 | Telegram exports      | Manual: drop `result.json` (Telegram Desktop) into `~/.memex/inbox/` |
 | Telegram (live)       | Run [`memex-bot`](bot/README.md) — captures messages you send/forward to your private bot |
+| **Web pages, AI chat shares, pasted text** | From any MCP agent: *"save https://... to memex"*. Agent fetches; memex stores verbatim. Cloudflare-protected pages (Perplexity, npm.com, Twitter, Medium, …) handled via the agent's r.jina.ai fallback. See [HELP.md §8](HELP.md) |
 All sources land in the same FTS5 corpus, searchable by one `memex_search` call.
@@ -128,6 +146,7 @@ All sources land in the same FTS5 corpus, searchable by one `memex_search` call.
 | `memex_list_projects`         | Distinct project paths captured (for the `project` filter)               |
 | `memex_archive_conversation`  | Hide a chat from default listings (data preserved)                       |
 | `memex_export_markdown`       | Export one conversation as Markdown (for Obsidian round-trip)            |
+| `memex_store_document`        | Save a web page, AI chat share, or pasted text. Agent fetches; memex stores verbatim. Teaches the Jina r.jina.ai trick for Cloudflare-blocked pages |
 | `memex_list_sources`          | Per-source enabled/disabled + counts                                     |
 | `memex_status`                | Daemon health: PID, last capture, watched files                          |
 | `memex_sources_status`        | Which sources are captured + the exact CLI to opt out                    |

package/README.ru.md CHANGED Viewed

@@ -121,6 +121,21 @@ curl -fsSL https://raw.githubusercontent.com/parallelclaw/memex-mvp/main/skills/
 …или `/install-memex`. Агент сам сделает `npm install`, пропишет MCP-config, поднимет daemon и проверит что всё работает — ~2 минуты.
+### Сохранение URL'ов в memex (v0.6+)
+После установки в любом MCP-агенте (Claude Code, Cursor, Cline, Continue, Zed) можно сохранять **web-страницы, AI-chat share'ы и pasted-тексты** прямо в memex-память:
+```
+Сохрани https://www.perplexity.ai/share/<id> в memex
+Добавь эту статью в memex: https://example.com/article
+```
+Агент сам fetch'ит страницу через свой WebFetch — для Cloudflare-защищённых сайтов (Perplexity, npm.com, Twitter, Medium) автоматически falls back на `r.jina.ai` proxy (memex учит агента этому трюку через tool description). Затем агент вызывает `memex_store_document`, который хранит контент verbatim как conversation с `source: "web"`.
+**Memex остаётся 100% локальным** — fetch делает агент, memex только хранит. Никаких outbound network calls со стороны memex.
+Полное руководство и edge cases (private Perplexity, paywall, login-walls): [HELP.md §8](HELP.md).
 ### Подключение к Claude Code
 Сначала возьми **два абсолютных пути** в терминале:
@@ -162,9 +177,11 @@ which node  # → путь до бинарника node (например /Users
 | **Cursor IDE** (Composer + Chat) | SQLite `state.vscdb` в `~/Library/Application Support/Cursor/` | ✅ работает (poll каждые 5 мин) |
 | **Obsidian** vault notes | `.md` файлы + YAML frontmatter | ✅ работает (FSEvents, hash-based dedupe) |
 | **Telegram** | `result.json` из Desktop export | ✅ работает |
-| Claude.ai web export | будет в v0.3 | — |
-| ChatGPT export | будет в v0.3 | — |
-| Apple Notes | будет в v0.3 | — |
+| **Telegram (live)** | бот `memex-bot` ловит твои сообщения / форварды | ✅ работает |
+| **Web-страницы, AI-share'ы, paste'ы** | `memex_store_document` — агент fetch'ит, memex хранит verbatim (v0.6+) | ✅ работает |
+| Claude.ai web export | будет в v0.7 | — |
+| ChatGPT export | будет в v0.7 | — |
+| Apple Notes | будет в v0.7 | — |
 ### Filename convention для inbox-файлов

package/lib/store-doc/canonicalize.js ADDED Viewed

@@ -0,0 +1,116 @@
+/**
+ * URL canonicalization for stable deduplication of stored web documents.
+ *
+ * Goal: two URLs that point to "the same document" should map to the same
+ * canonical form, so memex_store_document gives them the same conversation_id
+ * via sha256(canonical).
+ *
+ * What we normalize:
+ *   - Lowercase scheme + host
+ *   - Strip known tracking params (utm_*, fbclid, gclid, ref, mc_*, _ga, …)
+ *   - Drop the fragment (#anchor) — same document
+ *   - Normalize trailing slash on pathname
+ *
+ * What we DON'T normalize:
+ *   - Path case (some servers are case-sensitive)
+ *   - Non-tracking query params (?q= search, ?id= permalinks — meaningful)
+ *   - Port (rare in public URLs)
+ *
+ * If the input isn't a valid URL, we return the input unchanged. Callers
+ * should still hash the result for deduplication.
+ */
+// Well-known tracking-param families. Case-insensitive prefix match.
+const TRACKING_PREFIXES = [
+  'utm_',         // Google Analytics
+  'mc_',          // Mailchimp
+];
+const TRACKING_EXACT = new Set([
+  'fbclid',       // Facebook
+  'gclid',        // Google ads
+  'dclid',        // Google DoubleClick
+  'gbraid',       // Google
+  'wbraid',       // Google
+  'yclid',        // Yandex
+  'msclkid',      // Microsoft ads
+  'twclid',       // Twitter
+  'igshid',       // Instagram
+  'ref',          // generic referrer
+  'ref_source',
+  'ref_url',
+  'referrer',
+  'source',       // common referrer flag (NOT always tracking but very often)
+  '_ga',          // Google Analytics
+  '_gl',          // Google Analytics linker
+  'hsCtaTracking',
+  'hsenc',
+  'hsmi',
+  'mkt_tok',
+  'pk_campaign',
+  'pk_source',
+  'pk_medium',
+  'pk_keyword',
+  'pk_content',
+  'vero_id',
+  'vero_conv',
+]);
+function isTrackingParam(name) {
+  const lower = name.toLowerCase();
+  if (TRACKING_EXACT.has(lower)) return true;
+  for (const prefix of TRACKING_PREFIXES) {
+    if (lower.startsWith(prefix)) return true;
+  }
+  return false;
+}
+/**
+ * @param {string} rawUrl
+ * @returns {string} canonicalized URL (or the input unchanged if unparseable)
+ */
+export function canonicalize(rawUrl) {
+  if (typeof rawUrl !== 'string' || !rawUrl.trim()) return rawUrl;
+  let u;
+  try {
+    u = new URL(rawUrl.trim());
+  } catch (_) {
+    return rawUrl.trim();
+  }
+  // Lowercase scheme + host (URL parser already does that, but be explicit)
+  u.protocol = u.protocol.toLowerCase();
+  u.hostname = u.hostname.toLowerCase();
+  // Drop the fragment
+  u.hash = '';
+  // Strip tracking params
+  const cleanParams = new URLSearchParams();
+  for (const [k, v] of u.searchParams) {
+    if (!isTrackingParam(k)) cleanParams.append(k, v);
+  }
+  u.search = cleanParams.toString();
+  // Normalize trailing slash: drop trailing slash on non-root paths,
+  // so /foo and /foo/ are treated as the same document
+  if (u.pathname.length > 1 && u.pathname.endsWith('/')) {
+    u.pathname = u.pathname.replace(/\/+$/, '');
+  }
+  return u.toString();
+}
+/**
+ * Best-effort domain extraction for metadata (e.g. "perplexity.ai").
+ * Returns null for unparseable URLs.
+ */
+export function extractDomain(rawUrl) {
+  if (typeof rawUrl !== 'string') return null;
+  try {
+    const u = new URL(rawUrl);
+    return u.hostname.toLowerCase().replace(/^www\./, '');
+  } catch (_) {
+    return null;
+  }
+}

package/lib/store-doc/detect.js ADDED Viewed

@@ -0,0 +1,209 @@
+/**
+ * Pattern detection for memex_store_document.
+ *
+ * When the agent passes content to memex_store_document, memex sniffs it
+ * for known failure signatures (Cloudflare challenge, Perplexity-private,
+ * paywalls, …) and returns actionable warnings.
+ *
+ * Each detector returns either null or an object:
+ *   { type, blocking, message }
+ *
+ * `blocking: true` → memex returns stored:false to the agent. Use only for
+ * clear-cut failures where storing the content would pollute the corpus.
+ * `blocking: false` → memex stores the content but appends the warning so
+ * the agent can decide whether to surface it to the user.
+ *
+ * Patterns may grow over time as new failure modes appear in real use.
+ * Single-purpose regexes — order matters (more specific first).
+ */
+const CLOUDFLARE_PATTERNS = [
+  /Just a moment\.\.\./i,
+  /Verifying you are human/i,
+  /cf-turnstile/i,
+  /cf_chl_opt/i,
+  /__cf_bm/i,                      // Cloudflare's bot-management cookie name in challenge pages
+  /Checking your browser before accessing/i,
+];
+const PERPLEXITY_PRIVATE_PATTERNS = [
+  /This thread is private/i,
+  /Sign in if you are the owner of this thread/i,
+];
+const LOGIN_REQUIRED_PATTERNS = [
+  /Sign in to continue/i,
+  /Please log in to continue/i,
+  /Login required/i,
+  /<form[^>]+action=["']\/?(login|signin|sign-in|auth)/i,
+  /You must be logged in/i,
+];
+const PAYWALL_PATTERNS = [
+  /Subscribe to (read|continue)/i,
+  /Continue reading with subscription/i,
+  /\bpaywall\b/i,
+  /You've reached your free article limit/i,
+  /metered access/i,
+  /Become a (member|subscriber)/i,
+];
+/**
+ * Detect a Cloudflare bot-challenge page. Blocking.
+ */
+function detectCloudflareChallenge(content, _url) {
+  for (const re of CLOUDFLARE_PATTERNS) {
+    if (re.test(content)) {
+      return {
+        type: 'cloudflare-challenge',
+        blocking: true,
+        message:
+          'The content you passed looks like a Cloudflare bot-challenge page, not the actual content. The page is protected by Cloudflare and your direct fetch was blocked.\n\n' +
+          'Retry by prepending https://r.jina.ai/ to the URL:\n\n' +
+          '  WebFetch("https://r.jina.ai/" + originalUrl)\n\n' +
+          'Jina AI Reader runs a real browser server-side and bypasses Cloudflare. Returns clean markdown. Free for personal use, no API key needed.\n\n' +
+          'If Jina ALSO fails or returns "private/login required", the page needs authentication — Jina can\'t bypass that.',
+      };
+    }
+  }
+  return null;
+}
+/**
+ * Detect a private Perplexity thread. Blocking — Jina can't help here,
+ * the user needs to make the thread public first.
+ */
+function detectPerplexityPrivate(content, url) {
+  // Only flag if we have a URL hint that it's Perplexity, OR if the message
+  // text is unambiguously Perplexity's phrasing.
+  const isPerplexityUrl =
+    typeof url === 'string' && /perplexity\.ai/i.test(url);
+  let matched = false;
+  for (const re of PERPLEXITY_PRIVATE_PATTERNS) {
+    if (re.test(content)) {
+      matched = true;
+      break;
+    }
+  }
+  if (!matched) return null;
+  if (!isPerplexityUrl && !/perplexity/i.test(content)) {
+    // Same phrasing might appear on other sites — only act if we're confident
+    return null;
+  }
+  return {
+    type: 'perplexity-private',
+    blocking: true,
+    message:
+      'This Perplexity thread is marked private — even Jina Reader can\'t access it (this is an authentication wall, not Cloudflare bot protection).\n\n' +
+      'Tell the user: "To save this Perplexity thread to memex, you need to make it public first:\n' +
+      '  1. Open the thread in Perplexity\n' +
+      '  2. Click Share (top right)\n' +
+      '  3. Toggle \'Public link\' on\n' +
+      '  4. Copy the new shareable URL Perplexity shows\n' +
+      '  5. Send me THAT URL — it\'ll work"\n\n' +
+      'The URL in the user\'s address bar (perplexity.ai/search/<id>) is the owner\'s private URL, not the shareable one.',
+  };
+}
+/**
+ * Suspiciously short content from a URL that should be substantive.
+ * Non-blocking — we store it, but warn.
+ */
+function detectSuspiciouslySmall(content, url) {
+  const trimmed = (content || '').trim();
+  // Threshold: documents shorter than 200 chars are almost certainly noise
+  // (error pages, redirects, JS-only stubs). Pasted snippets can legitimately
+  // be that short, so only flag when we have a URL (suggesting a fetch was
+  // attempted) — pastes get a free pass.
+  if (!url) return null;
+  if (trimmed.length >= 200) return null;
+  return {
+    type: 'suspiciously-small',
+    blocking: false,
+    message:
+      `The content you passed is very short (${trimmed.length} chars). ` +
+      'The page might have been blocked, redirect-failed, or be JS-rendered with no SSR. ' +
+      'Stored as-is — consider verifying with the user that this is what they expected.',
+  };
+}
+/**
+ * Login required (form / prompt). Non-blocking but worth flagging.
+ */
+function detectLoginRequired(content, _url) {
+  for (const re of LOGIN_REQUIRED_PATTERNS) {
+    if (re.test(content)) {
+      return {
+        type: 'login-required',
+        blocking: false,
+        message:
+          'The page appears to require login (sign-in prompt / login form detected). ' +
+          'The content you stored may be a login page, not the actual content the user wanted. ' +
+          'Ask the user to paste the content manually if this isn\'t what they expected.',
+      };
+    }
+  }
+  return null;
+}
+/**
+ * Paywall / subscription-gated content. Non-blocking.
+ */
+function detectPaywalled(content, _url) {
+  for (const re of PAYWALL_PATTERNS) {
+    if (re.test(content)) {
+      return {
+        type: 'paywalled',
+        blocking: false,
+        message:
+          'The page appears to be paywalled (subscription/payment prompt detected). ' +
+          'The content stored may just be the teaser. ' +
+          'If the user has full access, they can paste the complete article manually.',
+      };
+    }
+  }
+  return null;
+}
+/**
+ * Returns array of warnings sorted with blocking warnings first.
+ * If the first warning is blocking, memex should refuse the store
+ * and return that warning to the agent.
+ *
+ * Detectors run in this order (more-specific first):
+ *   1. cloudflare-challenge  (blocking)
+ *   2. perplexity-private    (blocking)
+ *   3. suspiciously-small    (non-blocking)
+ *   4. login-required        (non-blocking)
+ *   5. paywalled             (non-blocking)
+ */
+export function detectIssues(content, url) {
+  const safeContent = typeof content === 'string' ? content : '';
+  const warnings = [];
+  // Blocking first — stop on first hit so we surface the most actionable.
+  const blocking =
+    detectCloudflareChallenge(safeContent, url) ||
+    detectPerplexityPrivate(safeContent, url);
+  if (blocking) {
+    warnings.push(blocking);
+    return warnings;
+  }
+  // Non-blocking — collect all that match.
+  for (const fn of [detectSuspiciouslySmall, detectLoginRequired, detectPaywalled]) {
+    const w = fn(safeContent, url);
+    if (w) warnings.push(w);
+  }
+  return warnings;
+}
+/**
+ * Convenience: is any warning blocking?
+ */
+export function isBlocked(warnings) {
+  return Array.isArray(warnings) && warnings.some((w) => w.blocking);
+}

package/lib/store-doc/extract-title.js ADDED Viewed

@@ -0,0 +1,116 @@
+/**
+ * Extract a title from fetched page content.
+ *
+ * Strategy (first hit wins):
+ *   1. Markdown H1 — `# Title text`  (Jina Reader's output starts with this)
+ *   2. HTML <title> — `<title>Page Title</title>`
+ *   3. HTML <h1>  — `<h1>Page Title</h1>`
+ *   4. First non-empty line if short enough to look like a title
+ *   5. URL slug fallback — last meaningful path segment, decoded
+ *   6. Domain fallback — just the domain name
+ *   7. "Untitled document"
+ *
+ * Returns a trimmed string up to MAX_LEN characters. Always returns a
+ * non-empty string (worst case "Untitled document").
+ */
+const MAX_LEN = 200;
+function trimTitle(s) {
+  if (!s) return '';
+  let t = String(s).replace(/\s+/g, ' ').trim();
+  if (t.length > MAX_LEN) t = t.slice(0, MAX_LEN).trim() + '…';
+  return t;
+}
+function fromMarkdownH1(content) {
+  // Markdown H1: line starts with single # then space, then text.
+  // Use \r? for cross-platform line endings. Stop at end-of-line.
+  const m = content.match(/^[ \t]*#[ \t]+([^\r\n]+?)[ \t]*$/m);
+  return m ? trimTitle(m[1]) : '';
+}
+function fromHtmlTitle(content) {
+  const m = content.match(/<title[^>]*>([^<]+)<\/title>/i);
+  return m ? trimTitle(decodeEntities(m[1])) : '';
+}
+function fromHtmlH1(content) {
+  // Inner text only — strip nested tags like <span>...</span>
+  const m = content.match(/<h1[^>]*>([\s\S]*?)<\/h1>/i);
+  if (!m) return '';
+  const inner = m[1].replace(/<[^>]+>/g, '');
+  return trimTitle(decodeEntities(inner));
+}
+function fromFirstLine(content) {
+  // First non-empty line, but only if it looks like a heading
+  // (short-ish, no markdown junk).
+  const lines = content.split(/\r?\n/);
+  for (const raw of lines) {
+    const line = raw.trim();
+    if (!line) continue;
+    // Skip leading markdown decorators / metadata
+    if (/^[#\-=*>|`]/.test(line)) continue;
+    if (line.length > 0 && line.length <= 120) {
+      return trimTitle(line);
+    }
+    // First substantive line is too long — give up on this strategy
+    break;
+  }
+  return '';
+}
+function fromUrlSlug(rawUrl) {
+  if (!rawUrl) return '';
+  try {
+    const u = new URL(rawUrl);
+    // Last meaningful path segment
+    const segs = u.pathname.split('/').filter(Boolean);
+    if (segs.length) {
+      const slug = decodeURIComponent(segs[segs.length - 1])
+        .replace(/[-_]+/g, ' ')
+        .replace(/\.(html?|md|pdf|txt)$/i, '')
+        .trim();
+      if (slug) return trimTitle(slug);
+    }
+    // No useful path — fall through to domain
+    return trimTitle(u.hostname.replace(/^www\./, ''));
+  } catch (_) {
+    return '';
+  }
+}
+// Minimal HTML-entity decode for &amp; &lt; &gt; &quot; &apos; &#39; &#nnn;
+function decodeEntities(s) {
+  if (!s) return s;
+  return String(s)
+    .replace(/&amp;/g, '&')
+    .replace(/&lt;/g, '<')
+    .replace(/&gt;/g, '>')
+    .replace(/&quot;/g, '"')
+    .replace(/&apos;/g, "'")
+    .replace(/&#39;/g, "'")
+    .replace(/&#x([0-9a-f]+);/gi, (_, hex) =>
+      String.fromCodePoint(parseInt(hex, 16))
+    )
+    .replace(/&#(\d+);/g, (_, dec) => String.fromCodePoint(parseInt(dec, 10)));
+}
+/**
+ * @param {string} content - fetched page content
+ * @param {string|null} url - source URL (used for slug fallback)
+ * @returns {string} a non-empty trimmed title
+ */
+export function extractTitle(content, url) {
+  const safe = typeof content === 'string' ? content : '';
+  return (
+    fromMarkdownH1(safe) ||
+    fromHtmlTitle(safe) ||
+    fromHtmlH1(safe) ||
+    fromFirstLine(safe) ||
+    fromUrlSlug(url) ||
+    'Untitled document'
+  );
+}

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "memex-mvp",
-  "version": "0.5.4",
+  "version": "0.6.1",
   "description": "Local-first MCP server for cross-agent AI memory. One SQLite + FTS5 corpus across Claude Code, Cowork, Cursor, Continue, Zed, Obsidian, and Telegram — passively captured, verbatim, searchable from any MCP-compatible client.",
   "type": "module",
   "main": "server.js",
@@ -26,7 +26,7 @@
     "sync": "node ingest.js",
     "ingest": "node ingest.js",
     "bot": "node bot/index.js",
-    "test": "node test/parser.test.js && node test/bot-inbox.test.js && node test/search-sort.test.js",
+    "test": "node test/parser.test.js && node test/bot-inbox.test.js && node test/search-sort.test.js && node test/store-document.test.js",
     "prepublishOnly": "npm test"
   },
   "engines": {

package/server.js CHANGED Viewed

@@ -43,6 +43,13 @@ import {
   KNOWN_SOURCES,
   CONFIG_PATH,
 } from './lib/config.js';
+import {
+  canonicalize as canonicalizeUrl,
+  extractDomain,
+} from './lib/store-doc/canonicalize.js';
+import { detectIssues, isBlocked } from './lib/store-doc/detect.js';
+import { extractTitle } from './lib/store-doc/extract-title.js';
+import { createHash } from 'node:crypto';
 // -------------------- Paths --------------------
 const HOME = homedir();
@@ -952,6 +959,72 @@ Archived conversations are hidden from default list/search but stay
 fully indexed. Pass include_archived: true on search/list to include
 them. Visibility flag only — never deletes data.
+══ DOCUMENT INGESTION (web pages, articles, AI chat shares) ══
+memex_store_document accepts content YOU fetch and stores it verbatim.
+Memex never fetches by itself — that's your job. Reasons:
+  • Memex stays 100% local (no outbound network egress)
+  • You have better tools (WebFetch, WebSearch, shell curl)
+  • You have context for error recovery (can ask user to paste)
+THE JINA TRICK (full reference):
+Modern web is mostly Cloudflare-protected. Server-side fetchers
+(including most agent WebFetch implementations) hit a JS challenge and
+return interstitial content instead of the page itself. The free
+workaround is Jina AI Reader:
+  Original:  https://example.com/whatever
+  Wrapped:   https://r.jina.ai/https://example.com/whatever
+This works for: Perplexity shared threads (must be Public!), npm.com
+package pages, X/Twitter threads, Medium articles, Substack public
+posts, most modern SaaS marketing pages.
+This DOESN'T help for: login-walled content (paywall, private
+accounts), SPA with no SSR (Jina gets empty initial HTML),
+geo-restricted content.
+DETECTION HEURISTIC — when to retry through Jina:
+After your first WebFetch, retry through Jina if you see:
+  • HTTP 403 with header \`cf-mitigated: challenge\`
+  • Body contains "Just a moment..." / "Verifying you are human"
+  • Body contains \`cf-turnstile\` or \`cf_chl_opt\`
+  • Body < 500 bytes from a page that should be substantive
+After Jina retry, if you get clean markdown — pass to memex_store_document.
+If Jina ALSO returns "This thread is private" / "Sign in" / login form —
+that's authentication, not Cloudflare. Tell the user what to do.
+PERPLEXITY-SPECIFIC:
+Perplexity shared threads need to be marked "Public" by the owner.
+The URL in the user's address bar (perplexity.ai/search/<id>) is the
+OWNER'S private URL, not the shareable one. They must:
+  1. Open the thread in Perplexity
+  2. Click Share → toggle Public link
+  3. Copy the new URL Perplexity shows
+  4. Give you THAT URL
+If memex_store_document returns warning type=perplexity-private,
+surface the message verbatim — it has the exact instructions.
+PROACTIVE OFFER (TIMING):
+When user shares a URL they're DISCUSSING or RESEARCHING:
+  "I can save this to your memex memory — you'll be able to search
+   it from any AI chat later. Want me to?"
+Don't offer for:
+  • URLs you're already analyzing in the current turn
+  • Same URL twice in one session
+  • Casual passing mentions ("like example.com/foo")
+  • URLs that are just navigation aids in conversation
+Right moments:
+  • User finished Perplexity research and is about to move on
+  • User pasted a long article they want to discuss
+  • User shared documentation they keep returning to
+  • User shared an AI chat (ChatGPT/Claude.ai/Gemini share link)
 ══ ONBOARDING THE LAZY USER ══
 The user probably hasn't read the README. Treat yourself as their
@@ -1436,6 +1509,61 @@ const TOOLS = [
       },
     },
   },
+  {
+    name: 'memex_store_document',
+    description:
+      'Save an external document (web page, AI chat share, pasted text) to memex memory. Memex stores verbatim what you pass — it does NOT fetch URLs itself, that\'s your job (memex stays 100% local, no outbound network calls).\n\n' +
+      'TYPICAL FLOW for a URL:\n' +
+      '  content = WebFetch("https://example.com/article")\n' +
+      '  memex_store_document(content, url="https://example.com/article", title="...")\n\n' +
+      '⚠ THE JINA TRICK — memorize this:\n' +
+      'Many modern pages (Perplexity, npm.com, X/Twitter, Medium, ChatGPT shares) sit behind Cloudflare bot-protection. Server-side fetches return "Just a moment..." / 403 / "cf-mitigated: challenge" instead of the real content.\n\n' +
+      'Workaround: prepend `https://r.jina.ai/` to the URL.\n' +
+      '  Before: https://www.perplexity.ai/search/abc123\n' +
+      '  After:  https://r.jina.ai/https://www.perplexity.ai/search/abc123\n\n' +
+      'Jina AI Reader bypasses Cloudflare (runs a real browser server-side) and returns clean markdown. Free, no API key needed for personal use. Both http:// and https:// after `r.jina.ai/` work.\n\n' +
+      'Heuristic — when to retry through Jina:\n' +
+      '  • HTTP 403 with cf-mitigated header\n' +
+      '  • Body contains "Just a moment..." / "Verifying you are human" / cf-turnstile\n' +
+      '  • Body < 500 bytes from a substantive page\n\n' +
+      'PERPLEXITY THREADS need to be made PUBLIC first (Share → Public link in Perplexity). Private threads return "This thread is private" even via Jina — memex will detect this on store and tell you what to say to the user.\n\n' +
+      'PRIVATE / LOGIN-WALLED content (paywall, your private ChatGPT chats) can\'t be fetched server-side. Tell the user — don\'t try to scrape.\n\n' +
+      'PROACTIVE OFFER: When the user shares a substantive URL they\'re DISCUSSING or RESEARCHING (not just casually mentioning), offer to save it. Especially for Perplexity threads — that research is ephemeral and worth preserving.\n\n' +
+      'Returns: {conversation_id, title, length, stored, warnings[]}. If stored=false, the `warnings` array tells you exactly what went wrong and how to fix it — surface that message to the user.',
+    inputSchema: {
+      type: 'object',
+      properties: {
+        content: {
+          type: 'string',
+          description:
+            'The fetched page content as text or markdown. YOU (the agent) fetch this via WebFetch / curl / Jina. Memex stores it verbatim — no LLM processing, no summarization.',
+        },
+        url: {
+          type: 'string',
+          description:
+            'The original source URL. Used for conversation_id (sha256 of canonical form → free deduplication), domain metadata, and the slug-based title fallback. Omit for non-URL pastes — memex will assign a content-hash-based synthetic id.',
+        },
+        title: {
+          type: 'string',
+          description:
+            'Page title or document name. If omitted, memex extracts from content (markdown H1 → HTML title → URL slug → "Untitled document").',
+        },
+        tags: {
+          type: 'array',
+          items: { type: 'string' },
+          description:
+            'Optional tags stored in metadata (e.g. ["research", "perplexity"]). For future tag-based filtering. Lowercased and deduped on store.',
+        },
+        refresh: {
+          type: 'boolean',
+          default: false,
+          description:
+            'If a document with the same canonical URL was already ingested, set true to refetch and replace the stored content (the new message overwrites the old). Default false = skip with a "already in memex" note + the existing conversation_id.',
+        },
+      },
+      required: ['content'],
+    },
+  },
 ];
 server.setRequestHandler(ListToolsRequestSchema, async () => ({ tools: TOOLS }));
@@ -2374,6 +2502,196 @@ server.setRequestHandler(CallToolRequestSchema, async (req) => {
       return textResult(lines.join('\n'));
     }
+    if (name === 'memex_store_document') {
+      const content = typeof args.content === 'string' ? args.content : '';
+      const rawUrl = typeof args.url === 'string' ? args.url.trim() : '';
+      const explicitTitle = typeof args.title === 'string' ? args.title.trim() : '';
+      const refresh = args.refresh === true;
+      const tags = Array.isArray(args.tags)
+        ? Array.from(
+            new Set(
+              args.tags
+                .filter((t) => typeof t === 'string')
+                .map((t) => t.trim().toLowerCase())
+                .filter(Boolean)
+            )
+          )
+        : [];
+      if (!content.trim()) {
+        return jsonResult({
+          stored: false,
+          conversation_id: null,
+          title: null,
+          length: 0,
+          source: 'web',
+          warnings: [
+            {
+              type: 'empty-content',
+              blocking: true,
+              message:
+                'Content is empty. Pass the actual page text (you fetch it; memex stores it). ' +
+                'For URLs you can\'t fetch (Cloudflare-blocked), retry through https://r.jina.ai/<original-url>.',
+            },
+          ],
+        });
+      }
+      // Sniff for known failure patterns BEFORE storing
+      const warnings = detectIssues(content, rawUrl);
+      if (isBlocked(warnings)) {
+        return jsonResult({
+          stored: false,
+          conversation_id: null,
+          title: null,
+          length: content.length,
+          source: 'web',
+          url: rawUrl || null,
+          warnings,
+        });
+      }
+      // Build conversation_id: stable hash of canonical URL, or content hash for pastes
+      let canonical = '';
+      let convId;
+      let captured_via;
+      if (rawUrl) {
+        canonical = canonicalizeUrl(rawUrl);
+        const hash = createHash('sha256')
+          .update(canonical)
+          .digest('hex')
+          .slice(0, 12);
+        convId = `web-${hash}`;
+        captured_via = 'mcp-tool';
+      } else {
+        const hash = createHash('sha256')
+          .update(content)
+          .digest('hex')
+          .slice(0, 12);
+        convId = `web-paste-${hash}`;
+        captured_via = 'user-paste';
+      }
+      // Check if already ingested
+      const existing = db
+        .prepare(
+          `SELECT conversation_id, title, message_count FROM conversations WHERE conversation_id = ?`
+        )
+        .get(convId);
+      if (existing && !refresh) {
+        return jsonResult({
+          stored: false,
+          already_ingested: true,
+          conversation_id: existing.conversation_id,
+          title: existing.title,
+          length: content.length,
+          source: 'web',
+          url: rawUrl || null,
+          warnings: [
+            ...warnings,
+            {
+              type: 'already-ingested',
+              blocking: false,
+              message:
+                `This document is already in memex (conversation_id: ${existing.conversation_id}, title: "${existing.title}"). ` +
+                'Call again with refresh=true to overwrite with the new content. ' +
+                'Existing content can be retrieved via memex_get_conversation.',
+            },
+          ],
+        });
+      }
+      // Determine title (caller override → content extraction)
+      const title = explicitTitle || extractTitle(content, rawUrl);
+      const domain = rawUrl ? extractDomain(rawUrl) : null;
+      const now = Math.floor(Date.now() / 1000);
+      // msg_id is the ingest ts as string — unique per refetch, so refresh
+      // doesn't collide with the previous version's UNIQUE constraint.
+      const msgId = String(now);
+      const metadata = {
+        url: rawUrl || null,
+        canonical_url: canonical || null,
+        title,
+        fetched_via: 'agent',
+        captured_via,
+        domain: domain || null,
+        fetched_at: now,
+        tags,
+        content_length: content.length,
+        warnings_at_store: warnings.map((w) => w.type),
+      };
+      try {
+        // If refresh and a row already exists, drop the old message first so we
+        // don't carry stale content. (UNIQUE is (source, conversation_id, msg_id);
+        // a new msg_id wouldn't collide, but we want one message per URL by
+        // convention.)
+        if (existing && refresh) {
+          db.prepare(
+            `DELETE FROM messages WHERE source = 'web' AND conversation_id = ?`
+          ).run(convId);
+        }
+        insertMessage.run(
+          'web',
+          convId,
+          msgId,
+          'document',
+          domain || 'web',
+          content,
+          now,
+          JSON.stringify(metadata),
+          now, // edited_at = ts for refresh ordering
+          null // uuid — web docs don't have source uuids
+        );
+        upsertConversation.run(
+          convId,
+          'web',
+          title,
+          now,
+          now,
+          1,
+          null, // parent_conversation_id
+          null  // project_path
+        );
+      } catch (err) {
+        log('store-document error:', err.message);
+        return jsonResult({
+          stored: false,
+          conversation_id: null,
+          title: null,
+          length: content.length,
+          source: 'web',
+          url: rawUrl || null,
+          warnings: [
+            ...warnings,
+            {
+              type: 'storage-error',
+              blocking: true,
+              message: `Couldn't write to memex DB: ${err.message}`,
+            },
+          ],
+        });
+      }
+      return jsonResult({
+        stored: true,
+        conversation_id: convId,
+        title,
+        length: content.length,
+        source: 'web',
+        url: rawUrl || null,
+        domain,
+        refreshed: !!(existing && refresh),
+        warnings,
+      });
+    }
     return textResult(`Unknown tool: ${name}`);
   } catch (err) {
     log('tool error:', name, err.message);

package/skills/install-memex/README.md CHANGED Viewed

@@ -1,58 +1,106 @@
-# install-memex skill
+# 📚 Install memex — cross-AI memory in 2 minutes
-An [Anthropic Skill](https://docs.claude.com/docs/en/agents/skills) that
-walks Claude Code (or any MCP-compatible agent that supports the Skills
-spec) through installing **memex** on the user's machine — npm install,
-MCP-config wiring, auto-capture daemon, and end-to-end verification.
+> One prompt sets up local-first AI memory across **Claude Code, Cowork, Cursor, Cline, Continue, and Zed** — plus Obsidian notes and Telegram chats. No cloud. No account. No data leaves your machine.
-Roughly 2 minutes wall-clock; requires Node.js and shell access.
+## What this skill does
-## Install the skill
+After you drop the skill into your agent (`~/.claude/skills/` for Claude Code, or your client's equivalent), saying **"install memex"** triggers a guided installation:
-```sh
-mkdir -p ~/.claude/skills
-cp -r /path/to/memex-mvp/skills/install-memex ~/.claude/skills/
-```
+1. **Discovery** — read-only checks for which MCP client you're using and what AI data already exists on your machine
+2. **`npm install -g memex-mvp`** — with EACCES fallbacks (one-shot sudo OR permanent prefix-fix; the agent asks you which)
+3. **MCP config merge** — adds a single absolute-path `command` entry into your client's `mcpServers` config. Never overwrites your other servers
+4. **`memex-sync install`** — registers the macOS LaunchAgent for live auto-capture
+5. **`memex-sync scan`** — one-time backfill of every session that already exists on disk
+6. **Restart hint + verification commands**
+End-to-end: **~2 minutes**, fully observable (agent shows each command before running).
+## What is memex?
+Memex is a **local-first MCP server** that captures every conversation you have with an AI — across **Claude Code, Cowork (including subagent transcripts), Cursor, Cline, Continue, Zed**, plus **Obsidian notes**, **Telegram chats**, and **web pages / AI chat shares** (v0.6+ via `memex_store_document` — agent fetches, memex stores verbatim) — into one searchable SQLite + FTS5 corpus.
+Any MCP-compatible agent can then query that corpus through 12 standard tools (`memex_search`, `memex_recent`, `memex_overview`, `memex_store_document`, …).
+| Pain                                            | Memex                                |
+|-------------------------------------------------|--------------------------------------|
+| Claude forgets every session                    | One unified history across all sessions, all clients |
+| Switching from Claude Code to Cursor = lost context | Cursor can read your Claude history (same SQLite DB) |
+| Cloud memory services hold your data hostage    | One `~/.memex/data/memex.db` file on your disk       |
+| Mem0 / Supermemory pricing per token            | 0 LLM calls on write — free at any scale             |
+| Vendor ban = lose everything                    | Your DB stays even if Claude blocks your account     |
+📦 npm: [memex-mvp](https://www.npmjs.com/package/memex-mvp) · 🏠 [memex.parallelclaw.ai](https://memex.parallelclaw.ai) · 🐛 [GitHub](https://github.com/parallelclaw/memex-mvp)
+## Why a skill (vs. just copy-pasting the install prompt)
+The install playbook also exists as a copy-paste prompt on the landing page. But once you have the skill installed, you skip a friction layer:
+| Without skill | With skill |
+|---|---|
+| Open landing page in browser | (already in your agent) |
+| Click "Copy install prompt" | — |
+| Paste a wall of text into Claude/Cursor | Just say "install memex" |
+| Hope the prompt is up-to-date | Skill version updates auto-fetched |
+The skill description is written so Claude / OpenClaw auto-invoke it from any phrasing of "install memex", "set up memex", "add memory to my AI", etc. — no need to remember `/install-memex` exactly.
+## Example dialogue
+> **User:** install memex
+>
+> **Agent:** 👋 You're running inside Cursor. Let me run a quick discovery before any install actions...
+>
+> ✓ Found Claude Code data at `~/.claude/projects/` (8 sessions)
+> ✓ Found Cowork sessions (3 subagent transcripts in your projects)
+> ✗ Continue not installed
+> ✗ Zed not installed
+> ✓ Cursor data found (this is where I'll write the MCP config)
+>
+> After install, memex will auto-index conversations from: Claude Code, Cowork, Cursor. Drop Telegram exports into `~/.memex/inbox/` manually if you want those too.
+>
+> Ready to proceed with Step 1 (`npm install -g memex-mvp`)?
+>
+> **User:** yes
+>
+> **Agent:** [shows each command, runs npm install, gets `which memex` path, merges into `~/.cursor/mcp.json`, runs `memex-sync install`, runs `memex-sync scan`, prints restart hint]
+See [examples.md](examples.md) for more sample dialogues, including EACCES recovery and partial-install scenarios.
+## Requirements
+- **Node.js 20–24** (LTS recommended). Install via [nodejs.org](https://nodejs.org), `brew install node`, or [nvm](https://github.com/nvm-sh/nvm)
+- **macOS or Linux** (Windows untested for memex itself; skill works anywhere your agent has shell access)
+- **An MCP-compatible CLI agent**: Claude Code, OpenClaw, Cursor (with MCP), Cline, Continue, or Zed. Web-only agents (ChatGPT, Claude.ai) can't execute install steps
+## Install the skill
-Or via curl from this repo:
 ```sh
 mkdir -p ~/.claude/skills/install-memex
 curl -fsSL https://raw.githubusercontent.com/parallelclaw/memex-mvp/main/skills/install-memex/SKILL.md \
   -o ~/.claude/skills/install-memex/SKILL.md
 ```
-## Use it
+Or, if you've already done `npm install -g memex-mvp`, the skill ships inside the package:
-In any Skills-aware agent (Claude Code, OpenClaw, …):
-```
-/install-memex
+```sh
+cp -r "$(npm root -g)/memex-mvp/skills/install-memex" ~/.claude/skills/
 ```
-…or just say "install memex" — the skill description is written so
-Claude picks it up automatically from natural-language requests.
+Then in your agent: `install memex` (or `/install-memex` if your client supports slash-commands).
-## What it does (high level)
+## What the skill won't do
-1. **Discovery** — read-only checks for which MCP client you're using and which AI tool data already lives on this machine.
-2. **`npm install -g memex-mvp`** — with EACCES fallbacks (one-shot sudo OR permanent prefix-fix; user picks).
-3. **MCP config merge** — single absolute `"command": "<path from which memex>"` entry into your client's mcpServers config. Never overwrites your other servers.
-4. **`memex-sync install`** — registers the macOS LaunchAgent for live auto-capture.
-5. **`memex-sync scan`** — one-time backfill of every session that already exists on disk.
-6. **Verification + restart hint.**
+- ❌ Install Node.js for you — it stops and asks you to install it
+- ❌ Run `sudo` without asking — EACCES fallback always confirms with you first
+- ❌ Overwrite your existing `mcpServers` config — always merges
+- ❌ Redirect any host application's data location — memex reads from each tool's native path
+- ❌ Send anything over the network (besides `npm install` itself)
-## Why a skill (vs. just copy-pasting the install prompt)
-- **`/install-memex`** is shorter than "open the landing, click 'Copy install prompt', paste here"
-- Skills are auto-discovered by name — `description` field lets Claude invoke this from any phrasing of "install memex"
-- Future updates to the install flow auto-propagate if you re-fetch the skill, instead of relying on the user re-copying
-## Related
+## License
-- 🏠 [memex.parallelclaw.ai](https://memex.parallelclaw.ai) — landing page (also has copy-paste version of this prompt)
-- 📦 [memex-mvp on npm](https://www.npmjs.com/package/memex-mvp)
-- 📖 [Main memex repo](https://github.com/parallelclaw/memex-mvp)
+[MIT](https://github.com/parallelclaw/memex-mvp/blob/main/LICENSE) for memex itself.
+This skill bundle is published on ClawHub under [MIT-0](https://opensource.org/license/mit-0) (effectively public domain — no attribution required) per ClawHub's publishing terms.
-## License
+---
-MIT — see [LICENSE](../../LICENSE) at the repo root.
+📚 Built by [@parallelclaw](https://github.com/parallelclaw) · Star the repo if memex helps you: [github.com/parallelclaw/memex-mvp](https://github.com/parallelclaw/memex-mvp)

package/skills/install-memex/SKILL.md CHANGED Viewed

@@ -1,6 +1,15 @@
 ---
 name: install-memex
-description: Install memex (local-first MCP server for cross-agent AI memory) on the user's machine. Detects which MCP client is active, registers memex in its config, sets up the auto-capture daemon, and verifies end-to-end. Use when the user says "install memex", "set up memex", "add memex to this", or similar. Takes ~2 minutes; requires Node.js and shell access.
+description: Make Claude, Cursor, Cline, Continue, and Zed remember every AI conversation forever — one local SQLite corpus shared across all of them. Installs memex (local-first MCP server) in ~2 minutes — npm install, MCP config wiring, auto-capture daemon, history backfill. No cloud, no account, verbatim storage. Also indexes Obsidian notes, Telegram chats, and any URL the user wants to save (web pages, Perplexity threads, AI chat shares — memex_store_document tool, v0.6+). Use when the user says "install memex", "set up memex", "add memory to my AI", "make my agent remember across sessions", or similar.
+version: 1.0.0
+metadata:
+  openclaw:
+    emoji: "📚"
+    homepage: https://memex.parallelclaw.ai
+    requires:
+      bins:
+        - node
+        - npm
 ---
 # Install memex
@@ -151,6 +160,7 @@ After restart, suggest they try any of:
 - "show me what memex has in memory" → triggers `memex_overview`
 - "what projects has memex captured" → triggers `memex_list_projects`
 - "search memex for [recent topic]" → triggers `memex_search`
+- "save https://en.wikipedia.org/wiki/As_We_May_Think to memex" → triggers `memex_store_document` and teaches the user that URL-saving exists (v0.6+)
 These confirm everything works end-to-end.

package/skills/install-memex/examples.md ADDED Viewed

@@ -0,0 +1,218 @@
+# Example dialogues
+Sample interactions when the user invokes this skill. These show typical happy paths and how the skill handles common edge cases.
+---
+## Example 1 · Happy path on a fresh Cursor install
+> **User:** install memex
+>
+> **Agent:** 👋 You're running inside Cursor. Let me run a quick discovery before any install actions...
+>
+> ```
+> ✓ Found Claude Code data at ~/.claude/projects/ (12 sessions)
+> ✓ Found Cowork sessions (4 subagent transcripts)
+> ✓ Cursor data found — I'll write MCP config to ~/.cursor/mcp.json
+> ✗ Continue config not found
+> ✗ Zed config not found
+> ```
+>
+> After install, memex will auto-index conversations from: **Claude Code, Cowork, Cursor**. If you also use Obsidian or want Telegram chats indexed, we can add those after the main install.
+>
+> Ready to proceed with Step 1?
+>
+> **User:** yes
+>
+> **Agent:** Running `npm install -g memex-mvp`...
+>
+> ```
+> + memex-mvp@0.5.4
+> ```
+>
+> ✓ Memex installed. Verifying: `which memex` → `/usr/local/bin/memex`. Good.
+>
+> Step 2: I'm about to merge a new entry into your `~/.cursor/mcp.json`. Here's the diff:
+>
+> ```diff
+>   {
+>     "mcpServers": {
+>       "github": { ... },
+> +     "memex": {
+> +       "command": "/usr/local/bin/memex"
+> +     }
+>     }
+>   }
+> ```
+>
+> Looks correct? (Y/n)
+>
+> **User:** yes
+>
+> **Agent:** ✓ Config saved.
+>
+> Step 3: `memex-sync install` — sets up the macOS LaunchAgent that auto-captures new Claude Code / Cowork / Cursor sessions...
+>
+> ```
+> ✓ memex-sync installed
+> ✓ daemon running (PID 47291)
+> ✓ watching 12 sessions across 3 source roots
+> ```
+>
+> Step 4: `memex-sync scan` to backfill existing history. This indexes everything already on your disk — might take 30-90 seconds...
+>
+> ```
+> ingested 847 messages from 12 conversations
+> ```
+>
+> ✓ All done. **Restart Cursor (Cmd+Q + reopen)** so it picks up the new memex tools. After restart, try:
+>
+> - "what does memex have in memory?"
+> - "search memex for [a recent topic you worked on]"
+> - "list my memex projects"
+---
+## Example 2 · EACCES recovery (macOS system Node)
+> **User:** install memex
+>
+> **Agent:** [runs discovery, gets user OK] Step 1: `npm install -g memex-mvp`...
+>
+> ```
+> npm error code EACCES
+> npm error path: '/usr/local/lib/node_modules/memex-mvp'
+> ```
+>
+> Hit a permissions error — this is common on macOS when Node is installed system-wide. Two ways to fix:
+>
+> **Option A** (quick, requires sudo password):
+> ```sh
+> sudo npm install -g memex-mvp
+> ```
+>
+> **Option B** (one-time setup, then future `npm install -g` works without sudo):
+> ```sh
+> mkdir -p ~/.npm-global
+> npm config set prefix ~/.npm-global
+> echo 'export PATH=~/.npm-global/bin:$PATH' >> ~/.zshrc
+> source ~/.zshrc
+> npm install -g memex-mvp
+> ```
+>
+> Option B is the standard npm recommendation — solves this for every future global install. Option A is faster if you don't want to touch your shell config. Which do you prefer?
+>
+> **User:** B
+>
+> **Agent:** [runs the 5 commands in sequence, shows output for each] ✓ Done. Verifying:
+>
+> ```
+> which memex → /Users/macbook/.npm-global/bin/memex
+> ```
+>
+> Now continuing to Step 2 with this path...
+---
+## Example 3 · Wants only Claude Code, not the daemon
+> **User:** install memex but skip the daemon
+>
+> **Agent:** Sure — that means memex will be available as a tool, but new sessions won't be auto-captured. You'll need to run `memex-sync scan` manually whenever you want to index recent sessions. Got it?
+>
+> **User:** yes
+>
+> **Agent:** [runs Steps 1, 2, 4 — skipping `memex-sync install` in Step 3]
+>
+> ✓ Memex installed, MCP config wired, backfill complete (847 messages from existing sessions).
+>
+> Since you skipped the daemon, here's how to refresh memex going forward:
+> ```sh
+> memex-sync scan
+> ```
+>
+> You can install the daemon any time later: `memex-sync install`.
+---
+## Example 4 · Existing memex install detected
+> **User:** install memex
+>
+> **Agent:** Quick check: `which memex` → `/usr/local/bin/memex`. You already have memex installed.
+>
+> Running `memex-sync status`:
+> ```
+> daemon running (PID 47291), watching 12 sessions
+> last capture: 4 minutes ago
+> ```
+>
+> Memex is working. What would you like to do?
+>
+> - **Reinstall** (e.g. to update to the latest version): I can run `npm install -g memex-mvp@latest`
+> - **Add a new source** (Obsidian vault, Telegram exports, etc.)
+> - **Add memex to another MCP client** (you have it in Cursor; want it in Cline / Continue / Zed too?)
+> - **Nothing** — already good
+---
+## Example 5 · Web-only agent (no shell access)
+> **User:** install memex
+>
+> **Agent:** I see you're running me through claude.ai web — I don't have access to your machine's shell, so I can't run `npm install` or write files for you.
+>
+> You have three options to install memex:
+>
+> 1. **CLI agents** (best UX): Open Claude Code, Cursor, Cline, Continue, or Zed and ask me there. Those have shell access and will run this skill end-to-end.
+>
+> 2. **Manual install** (~5 minutes, you run the commands): Follow https://memex.parallelclaw.ai/#quickstart — 4 steps in your terminal.
+>
+> 3. **One-liner from terminal**:
+> ```sh
+> npm install -g memex-mvp
+> memex-sync install
+> ```
+> Then add `{"mcpServers": {"memex": {"command": "$(which memex)"}}}` to your MCP client's config file. Details: https://memex.parallelclaw.ai
+>
+> Which path works for you?
+---
+---
+## Example 6 · Right after install — testing URL save (v0.6+)
+> **User:** save https://en.wikipedia.org/wiki/As_We_May_Think to memex
+>
+> **Agent:** Fetching the page with WebFetch...
+>
+> ```
+> ✓ Got 12,847 chars from en.wikipedia.org
+> ```
+>
+> Calling `memex_store_document(content, url=https://en.wikipedia.org/wiki/As_We_May_Think, title=auto)`...
+>
+> ```
+> ✓ stored
+>   conversation_id: web-d3b07a384d1b
+>   title: "As We May Think"
+>   length: 12,847 chars
+>   warnings: []
+> ```
+>
+> Saved. Try searching it now: *"search memex for memex Bush 1945"* — the article will show up alongside any AI chats where you discussed it.
+>
+> 💡 By the way, you can save any URL this way — Perplexity threads, blog posts, ChatGPT shares, Twitter threads, GitHub discussions. For Perplexity specifically, the thread must be made Public first (Share → Public link). For Cloudflare-protected sites, memex auto-falls back through `r.jina.ai` proxy.
+---
+## What these examples illustrate about the skill's behavior
+- **Always discover first**, then ask for confirmation before any write action
+- **Show every command** before running it; show diffs for config edits
+- **Offer choices** at decision points (EACCES, daemon yes/no) rather than guessing
+- **Detect and adapt** to existing installs, missing prerequisites, partial setups
+- **Refuse gracefully** when running in a context where the install can't actually work (web-only agents)
+If you see the skill *not* doing one of these things, that's a bug — report it at https://github.com/parallelclaw/memex-mvp/issues.