parallelclaw 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (62) hide show
  1. package/CHANGELOG.md +204 -0
  2. package/HELP.md +600 -0
  3. package/LICENSE +21 -0
  4. package/MULTI_MACHINE.md +152 -0
  5. package/README.md +417 -0
  6. package/README.ru.md +740 -0
  7. package/SYNC.md +844 -0
  8. package/bot/README.md +173 -0
  9. package/bot/config.js +66 -0
  10. package/bot/inbox.js +153 -0
  11. package/bot/index.js +294 -0
  12. package/bot/nexara.js +61 -0
  13. package/bot/poll.js +304 -0
  14. package/bot/search.js +155 -0
  15. package/bot/telegram.js +96 -0
  16. package/ingest.js +2712 -0
  17. package/lib/cli/index.js +1987 -0
  18. package/lib/config.js +220 -0
  19. package/lib/db-init.js +158 -0
  20. package/lib/hook/install.js +268 -0
  21. package/lib/import-telegram.js +158 -0
  22. package/lib/ingest-file.js +779 -0
  23. package/lib/notify-click-action.js +281 -0
  24. package/lib/openclaw-channel.js +643 -0
  25. package/lib/parse-cursor.js +172 -0
  26. package/lib/parse-obsidian.js +256 -0
  27. package/lib/parse-telegram-html.js +384 -0
  28. package/lib/parse.js +175 -0
  29. package/lib/render-markdown.js +0 -0
  30. package/lib/store-doc/canonicalize.js +116 -0
  31. package/lib/store-doc/detect.js +209 -0
  32. package/lib/store-doc/extract-title.js +162 -0
  33. package/lib/sync/auth.js +80 -0
  34. package/lib/sync/cert.js +144 -0
  35. package/lib/sync/cli.js +906 -0
  36. package/lib/sync/client.js +138 -0
  37. package/lib/sync/config.js +130 -0
  38. package/lib/sync/pair.js +145 -0
  39. package/lib/sync/pull.js +158 -0
  40. package/lib/sync/push.js +305 -0
  41. package/lib/sync/replicate.js +335 -0
  42. package/lib/sync/server.js +224 -0
  43. package/lib/sync/service.js +726 -0
  44. package/lib/tasks.js +215 -0
  45. package/lib/telegram-decisions.js +165 -0
  46. package/lib/telegram-discovery.js +373 -0
  47. package/lib/telegram-notify.js +272 -0
  48. package/lib/telegram-pending.js +200 -0
  49. package/lib/web/index.js +265 -0
  50. package/lib/web/routes/conversation.js +193 -0
  51. package/lib/web/routes/conversations.js +180 -0
  52. package/lib/web/routes/dashboard.js +175 -0
  53. package/lib/web/routes/pending.js +277 -0
  54. package/lib/web/routes/settings.js +226 -0
  55. package/lib/web/static/style.css +393 -0
  56. package/lib/web/templates.js +234 -0
  57. package/package.json +84 -0
  58. package/server.js +3816 -0
  59. package/skills/install-memex/README.md +109 -0
  60. package/skills/install-memex/SKILL.md +342 -0
  61. package/skills/install-memex/examples.md +294 -0
  62. package/skills/install-memex-claw/SKILL.md +423 -0
package/SYNC.md ADDED
@@ -0,0 +1,844 @@
1
+ # ParallelClaw sync — multi-device replication
2
+
3
+ > **Status:** engine experimental since v0.11.11; the **`sync-join` lazy flow is
4
+ > the v0.13 front door**. After one successful `sync-join`, no
5
+ > `MEMEX_SYNC_EXPERIMENTAL` env var is needed (the join persists
6
+ > `sync.enabled: true`). Manual/advanced commands on a machine that never
7
+ > joined still want the env var. Pin your memex version on both sides.
8
+
9
+ A pair of memex instances (laptop + VPS, or two laptops, or any N) keep their
10
+ `~/.memex/data/memex.db` files **converging** — same conversations and messages
11
+ visible from every device, no cloud relay, no shared file system.
12
+
13
+ ## Quickstart (the lazy path — 2 steps)
14
+
15
+ The canonical setup: your laptop (Claude/Cursor) + one always-on server where
16
+ your agent lives. **Step 1 — paste to the agent on the server:**
17
+
18
+ ```
19
+ Set up memex sync as a hub and give me a join token for my laptop:
20
+ 1. npm install -g parallelclaw@latest (skip if installed)
21
+ 2. memex-sync sync-server install --bind 127.0.0.1
22
+ 3. memex-sync sync-server invite --join
23
+ Send me the memex-join:... line.
24
+ ```
25
+
26
+ **Step 2 — one command on the laptop:**
27
+
28
+ ```sh
29
+ memex-sync sync-join memex-join:eyJ2...
30
+ ```
31
+
32
+ That orchestrates everything: SSH probe (prints your pubkey + instructions if
33
+ access is missing), a self-healing forward tunnel (launchd/systemd KeepAlive),
34
+ pinned-cert health check, first sync (resumable if interrupted), 15-min
35
+ auto-sync, hourly watchdog, and a **marker self-test** that proves a note
36
+ round-trips before declaring success. Everything below this section is the
37
+ operational detail and the wire-protocol spec.
38
+
39
+ > **Tip — name your nodes first (v0.14).** Each node stamps its captures with
40
+ > an `origin` label (defaults to the hostname). Set a friendly one (`mac`,
41
+ > `vps1`, …) via `origin` in `~/.memex/config.json` on each node BEFORE data
42
+ > accumulates — old rows keep whatever stamp they got. This is what powers
43
+ > `memex_search(origin: …)` and the `[@node]` tags in merged conversations.
44
+
45
+ This document is **both** the operational guide and the wire-protocol spec.
46
+ Implementers and users read different sections.
47
+
48
+ ---
49
+
50
+ ## Table of contents
51
+
52
+ 1. [Why this exists](#why-this-exists) — what problem we're solving
53
+ 2. [How it works (30s version)](#how-it-works-30s-version) — for users
54
+ 3. [Transports](#transports) — SSH, Tailscale, HTTPS pair, mDNS
55
+ 4. [Setup walkthrough](#setup-walkthrough) — manual steps behind `sync-join`
56
+ 5. [Wire protocol (spec)](#wire-protocol-spec) — for implementers
57
+ 6. [Security model](#security-model)
58
+ 7. [Trade-offs we made](#trade-offs-we-made)
59
+ 8. [Out of scope (deliberately)](#out-of-scope-deliberately)
60
+
61
+ ---
62
+
63
+ ## Why this exists
64
+
65
+ memex is a **local-first** SQLite memory: every device captures its own AI
66
+ conversations into its own `memex.db`. Without sync, the Mac doesn't see what
67
+ the VPS captured, and vice versa.
68
+
69
+ The naïve fix — point Syncthing/Dropbox/iCloud at the `.db` file — corrupts
70
+ SQLite within hours under concurrent writes (documented [downstream of
71
+ claude-mem](https://github.com/thedotmack/claude-mem/issues/1037)).
72
+
73
+ memex sync solves it by treating each device's database as **append-only
74
+ authoritative** and exchanging **deltas** over HTTP. Conflicts cannot happen
75
+ because verbatim memory is never edited — we only ever insert.
76
+
77
+ ---
78
+
79
+ ## How it works (30s version)
80
+
81
+ ```
82
+ ┌──────────────────────┐ HTTP push/pull ┌──────────────────────┐
83
+ │ Mac │ ◀──── every 15 min ────▶ │ VPS │
84
+ │ memex.db (Mac side) │ │ memex.db (VPS side) │
85
+ │ │ POST /sync/push ───▶ │ │
86
+ │ Claude Code │ GET /sync/pull ◀─── │ OpenClaw, Hermes │
87
+ │ Telegram │ │ cron jobs │
88
+ └──────────────────────┘ └──────────────────────┘
89
+ ```
90
+
91
+ 1. **VPS** runs `memex sync server enable` — generates a self-signed TLS cert
92
+ and a bearer token, prints a one-line **pair blob**.
93
+ 2. **Mac** runs `memex sync pair memex-pair:...` — stores the blob, validates
94
+ the cert against its pinned fingerprint, can now talk to VPS.
95
+ 3. Every 15 min (configurable), Mac runs `memex sync run` — it:
96
+ - pulls rows from VPS with `id > last_seen_cursor` and INSERT-OR-IGNOREs them
97
+ - pushes rows VPS hasn't seen yet
98
+ - advances both cursors
99
+
100
+ Dedup is automatic via the existing `UNIQUE(source, conversation_id, msg_id)`
101
+ constraint — same row from two directions never double-inserts.
102
+
103
+ ---
104
+
105
+ ## Transports
106
+
107
+ Sync runs over HTTP/JSON. **How the bytes reach VPS** is independent of the
108
+ wire protocol — pick one:
109
+
110
+ | Transport | Best for | User setup steps |
111
+ |---|---|---|
112
+ | **SSH tunnel** | User already SSHes into VPS | Zero (autossh installed on demand) |
113
+ | **Tailscale** | Both devices on same tailnet | Zero (auto-detected) |
114
+ | **HTTPS + pair blob** | VPS only via agent/bot (no SSH) | One paste from agent chat |
115
+ | **mDNS LAN** | Two devices on same Wi-Fi, no VPS | Zero (auto-discovery) |
116
+ | **Caddy + public HTTPS** | Advanced, want public access | Domain + Caddy install |
117
+
118
+ `memex-sync sync-join` (v0.13) automates the SSH-tunnel transport end-to-end —
119
+ the canonical lazy path. The full environment-probing wizard that picks among
120
+ ALL transports is Roadmap §1.
121
+
122
+ ### SSH tunnel (default for SSH-capable users)
123
+
124
+ Mac runs `autossh -N -L 8765:localhost:8765 user@vps` as a LaunchAgent. Sync
125
+ client talks to `http://localhost:8765`, bytes flow through SSH to VPS:8765.
126
+
127
+ Pro: zero new accounts, encryption from SSH.
128
+ Con: tunnel-keeper daemon (autossh handles reconnect).
129
+
130
+ ### Tailscale (if available)
131
+
132
+ Mac talks to `http://memex-vps.tail-abc.ts.net:8765` directly. WireGuard
133
+ encryption and identity built in.
134
+
135
+ Pro: works through NAT, identity per device.
136
+ Con: requires Tailscale account (free for personal, 100 devices).
137
+
138
+ ### HTTPS + pair blob (lazy-user path)
139
+
140
+ VPS exposes `https://<host>:8765` with a self-signed cert. Client pins the
141
+ cert fingerprint baked into the pair blob. Bearer token in header authenticates
142
+ the request. No DNS, no Let's Encrypt, no SSH key — one paste from agent chat.
143
+
144
+ Pro: zero user terminal access to VPS required.
145
+ Con: VPS must have a reachable public IP/hostname.
146
+
147
+ ### mDNS LAN (no-VPS scenario) — planned
148
+
149
+ Two devices on the same Wi-Fi would announce themselves as `_memex._tcp.local`
150
+ and pair via trust-on-first-use, no VPS required. **Not built yet** — until then,
151
+ two LAN machines can still pair by running the server on one and `sync-add`-ing
152
+ its LAN IP from the other.
153
+
154
+ Pro: no VPS, no cloud, no account.
155
+ Con: only when both devices on same network.
156
+
157
+ ---
158
+
159
+ ## Setup walkthrough
160
+
161
+ > All commands are gated behind `MEMEX_SYNC_EXPERIMENTAL=1` in v0.11.x.
162
+ > The CLI lives under the existing `memex-sync` binary (`memex-sync sync-*`).
163
+
164
+ ### Scenario 1 — lazy path: VPS you only reach through an agent
165
+
166
+ The hub (VPS) runs the server durably; the spoke (laptop) pairs with one paste.
167
+
168
+ **On the VPS, once** (or have your agent run it):
169
+
170
+ ```sh
171
+ export MEMEX_SYNC_EXPERIMENTAL=1
172
+ memex-sync sync-server install --port 8766 --bind 0.0.0.0 # durable systemd/launchd service
173
+ ```
174
+
175
+ **Get a pairing token.** Either ask your agent in chat —
176
+
177
+ > "set up memex sync with my Mac" / "сгенерируй паринг-код для синка"
178
+
179
+ — and it calls the **`memex_sync_invite`** MCP tool (requires
180
+ `MEMEX_SYNC_EXPERIMENTAL=1` in the memex MCP server's env), or run it by hand:
181
+
182
+ ```sh
183
+ memex-sync sync-server invite --host <public-ip> # prints memex-pair:...
184
+ ```
185
+
186
+ **On the laptop, one paste:**
187
+
188
+ ```sh
189
+ export MEMEX_SYNC_EXPERIMENTAL=1
190
+ memex-sync sync-pair memex-pair:eyJ2IjoxLCJob3N0Ijoi... # decodes host+port+cert_fp+token
191
+ memex-sync sync-run vps # first sync
192
+ memex-sync sync-schedule install --every 15m # hands-off from here
193
+ ```
194
+
195
+ Done. New conversations propagate within the interval, both directions.
196
+
197
+ ### Scenario 2 — Mac + VPS over an SSH tunnel
198
+
199
+ If you have SSH to the VPS, skip the public bind. Run the server on loopback,
200
+ forward the port yourself, and pass `--host localhost` to invite:
201
+
202
+ ```sh
203
+ # VPS
204
+ memex-sync sync-server install --port 8766 --bind 127.0.0.1
205
+ memex-sync sync-server invite --host localhost # blob targets localhost
206
+
207
+ # Mac — keep this tunnel up (autossh/LaunchAgent automation is a follow-up)
208
+ ssh -N -L 8766:localhost:8766 user@vps &
209
+ memex-sync sync-pair memex-pair:... # → https://localhost:8766
210
+ memex-sync sync-run vps
211
+ ```
212
+
213
+ ### Scenario 3 — Tailscale
214
+
215
+ Both machines on one tailnet: `invite --host <vps>.tail-xxxx.ts.net`, then
216
+ `sync-pair` on the laptop. WireGuard handles encryption + NAT; the cert pin in
217
+ the blob still applies.
218
+
219
+ ### Manual fallback (no pair blob)
220
+
221
+ `sync-pair` is just sugar over `sync-add`. The explicit form:
222
+
223
+ ```sh
224
+ memex-sync sync-add vps https://<host>:8766 <bearer-hex> --cert-fp sha256:AA:BB:...
225
+ # or, over a transport you already trust (SSH tunnel / Tailscale):
226
+ memex-sync sync-add vps https://localhost:8766 <bearer-hex> --insecure
227
+ ```
228
+
229
+ ### Command reference
230
+
231
+ | Command | Side | What |
232
+ |---|---|---|
233
+ | `sync-server install / uninstall / status` | hub | durable server service |
234
+ | `sync-server start` | hub | foreground server |
235
+ | `sync-server invite [--host H] [--port N] [--ttl 30]` | hub | print a pair blob |
236
+ | `sync-pair <blob> [--alias vps]` | spoke | register a remote from a blob |
237
+ | `sync-add <alias> <url> <bearer> (--cert-fp F \| --insecure)` | spoke | register a remote explicitly |
238
+ | `sync-run <alias> \| --all` | spoke | one bidirectional sync |
239
+ | `sync-schedule install [--every 15m] / uninstall / status` | spoke | hands-off auto-sync timer |
240
+ | `sync-list / sync-remove <alias> / sync-status` | spoke | inspect / manage remotes |
241
+ | `memex_sync_invite` (MCP tool) | hub | agent emits a pair blob from a chat phrase |
242
+
243
+ > **Not yet automated (manual today, planned):** autossh tunnel management,
244
+ > Tailscale auto-detection, and mDNS LAN discovery (`_memex._tcp.local` for two
245
+ > machines on the same Wi-Fi with no VPS). The transports themselves work today
246
+ > via the manual steps above.
247
+
248
+ ---
249
+
250
+ ## Wire protocol (spec)
251
+
252
+ > Implementers: this is the source of truth. Anything that diverges from this
253
+ > section is a bug.
254
+
255
+ ### Endpoints
256
+
257
+ ```
258
+ POST /sync/push
259
+ Authorization: Bearer <token>
260
+ Content-Type: application/json
261
+ Body: {
262
+ "rows": [Row, Row, ...] // 1..1000 messages
263
+ }
264
+
265
+ Response 200: {
266
+ "accepted": N, // rows inserted (newly seen by us)
267
+ "deduplicated": M, // rows we already had (UNIQUE constraint hit)
268
+ "last_id": <int> // our local id of the highest-ranked row
269
+ // — useful for client log/debug
270
+ }
271
+ Response 401: { "error": "unauthorized" }
272
+ Response 400: { "error": "bad_request", "detail": "..." }
273
+ Response 413: { "error": "payload_too_large" } // >2MB body
274
+ ```
275
+
276
+ ```
277
+ GET /sync/pull?since=<int>&limit=<int>
278
+ Authorization: Bearer <token>
279
+
280
+ Query:
281
+ since — local id of caller's last-seen row from us; 0 for first pull
282
+ limit — max rows to return; default 500, max 1000
283
+
284
+ Response 200: {
285
+ "rows": [Row, Row, ...],
286
+ "next_cursor": <int>, // id of the last row in this batch
287
+ "has_more": bool, // true → caller should call again with
288
+ // since=next_cursor immediately
289
+ "server_now": <int> // our wall clock at response time (ms epoch)
290
+ // — informational
291
+ }
292
+ ```
293
+
294
+ ```
295
+ GET /sync/health
296
+ Authorization: Bearer <token> // optional — token gates extra detail
297
+
298
+ Response 200: {
299
+ "version": "0.11.11",
300
+ "schema_version": 12,
301
+ "row_count": <int>, // total messages in our DB
302
+ "last_id": <int> // highest message id we hold
303
+ }
304
+ ```
305
+
306
+ ### Row shape
307
+
308
+ A `Row` is exactly the JSON representation of a `messages` table row, plus
309
+ the parent `conversation` metadata necessary to materialize the row on the
310
+ other side:
311
+
312
+ ```json
313
+ {
314
+ "source": "claude-code",
315
+ "conversation_id": "claude-code-<uuid>",
316
+ "msg_id": "<source-specific-stable-id>",
317
+ "uuid": "<v4-uuid>",
318
+ "role": "user|assistant|system|tool|boundary|summary",
319
+ "sender": "me|claude-code|...",
320
+ "text": "raw verbatim content",
321
+ "ts": 1716800000, // source-original timestamp (seconds)
322
+ "edited_at": 1716800042000, // ms; null if never edited
323
+ "channel": "telegram|kimi-web|system|null",
324
+ "metadata": "{...json-string...}",
325
+ "conversation": {
326
+ "title": "...",
327
+ "first_ts": 1716700000,
328
+ "last_ts": 1716800000,
329
+ "project_path": "/Users/x/work|null",
330
+ "parent_conversation_id": "...|null"
331
+ }
332
+ }
333
+ ```
334
+
335
+ **Required fields:** `source`, `conversation_id`, `role`, `text`, `ts`.
336
+ **Stable identity for dedup:** `(source, conversation_id, msg_id)` — `msg_id`
337
+ may be null but if so the row is considered ephemeral and is NOT synced.
338
+ **Portable global identity:** `uuid` — populated by writer; if absent on a
339
+ synced row, receiver generates one on insert (so future pulls can refer to it).
340
+
341
+ ### Cursor semantics
342
+
343
+ A **cursor** is one integer: the receiver's local `messages.id` of the last
344
+ row it observed from this peer. Cursor is **per-peer, per-direction**:
345
+
346
+ ```
347
+ client_config.json:
348
+ "remotes": {
349
+ "vps": {
350
+ "url": "http://localhost:8765",
351
+ "bearer": "...",
352
+ "pulled_to": 18472, // we've pulled VPS rows up to its id 18472
353
+ "pushed_to": 9341 // we've pushed our rows up to our id 9341
354
+ }
355
+ }
356
+ ```
357
+
358
+ Both endpoints are **strictly monotonic per peer**. Pull returns rows with
359
+ `id > since` ordered ASC by id. Push always sends rows with `id > pushed_to`
360
+ ordered ASC. Receivers never assume cursor monotonicity beyond a single peer.
361
+
362
+ ### Idempotency
363
+
364
+ Push is **at-least-once**. Two identical push requests produce identical state
365
+ on the server (UNIQUE constraint absorbs dupes). The client is free to retry
366
+ indefinitely.
367
+
368
+ Pull is **at-least-once**. The client may receive the same row twice across
369
+ retries (e.g. network failure mid-batch). It must INSERT OR IGNORE on its side.
370
+
371
+ ### Conversation upsert
372
+
373
+ `messages` and `conversations` are separate tables linked by `conversation_id`.
374
+ On every push, the receiver:
375
+
376
+ 1. UPSERTs `conversations` row from `row.conversation` (latest values win on
377
+ `title`, `last_ts`, `message_count`).
378
+ 2. INSERT OR IGNOREs the message via UNIQUE.
379
+
380
+ This way a conversation that exists only on Mac becomes a real row on VPS the
381
+ first time any of its messages arrives.
382
+
383
+ ### Schema-version handshake
384
+
385
+ `GET /sync/health` reports `schema_version`. Client and server must match
386
+ **major schema version**. If client < server schema version: client refuses to
387
+ sync, prints "upgrade memex on this side". If client > server: same.
388
+
389
+ Schema versions bump only when wire shape changes (column adds that affect
390
+ sync). Pure additive changes that don't ship over the wire don't bump.
391
+
392
+ Initial sync schema version: **12**.
393
+
394
+ ### Error semantics
395
+
396
+ | Code | Meaning | Client action |
397
+ |------|---------|---------------|
398
+ | 200 | OK | Continue |
399
+ | 400 | Bad request body | Log + abort; don't retry; this is a bug |
400
+ | 401 | Unauthorized | Token rotation needed; abort sync until reconfigured |
401
+ | 409 | Schema mismatch | Print upgrade instruction; abort |
402
+ | 413 | Payload too large | Reduce batch size and retry |
403
+ | 429 | Rate limited (too many concurrent pushes) | Honor Retry-After header |
404
+ | 500 | Server error | Exponential backoff, retry |
405
+
406
+ ### Rate limits
407
+
408
+ The server may rate-limit per-token at **10 push requests per minute** and
409
+ **60 pull requests per minute**. Bursting above this returns 429 with
410
+ `Retry-After: <seconds>` header.
411
+
412
+ These limits exist to bound the worst case of a misconfigured client and are
413
+ generous for normal operation.
414
+
415
+ ---
416
+
417
+ ## Security model
418
+
419
+ ### Authentication
420
+
421
+ **Bearer tokens** — 256-bit random, generated by `memex sync invite` on the
422
+ server side. Token is in `Authorization: Bearer <hex>` header on every request.
423
+
424
+ Tokens are stored on disk in `~/.memex/config.json` (mode 0600).
425
+
426
+ `memex sync rotate-token` invalidates the current token and prints a new pair
427
+ blob. Pre-existing connected clients break until they re-pair.
428
+
429
+ ### Transport encryption
430
+
431
+ | Transport | How encryption is achieved |
432
+ |-----------|----------------------------|
433
+ | HTTPS + pair blob | Self-signed TLS, client pins server cert fingerprint |
434
+ | SSH tunnel | SSH transport |
435
+ | Tailscale | WireGuard tunnel between nodes |
436
+ | mDNS LAN | TLS with pinned fingerprint (same as HTTPS path) |
437
+ | Caddy + public HTTPS | Let's Encrypt-issued cert |
438
+
439
+ **Self-signed certs are pinned**: client refuses to talk to the server if the
440
+ TLS cert fingerprint doesn't match what was baked into the pair blob. This is
441
+ the same mechanism Plex/Tailscale/etc. use for device-to-device trust.
442
+
443
+ ### Threat model
444
+
445
+ | Threat | Mitigation |
446
+ |--------|------------|
447
+ | Attacker on network sees bearer token | TLS encryption blocks |
448
+ | Attacker MITMs and replaces TLS cert | Cert pinning rejects |
449
+ | Stolen bearer token | `memex sync rotate-token` invalidates |
450
+ | Replay attack | Idempotent endpoints — no harm; receiver dedups |
451
+ | Malicious peer pushes garbage rows | Rate limit + payload size cap; rows still need valid `source/conv_id/msg_id` shape |
452
+ | Compromised peer pulls all our data | Bearer auth is binary (token = full access); for least-privilege you'd need per-source ACLs (future work) |
453
+
454
+ ### Out of scope for security v1
455
+
456
+ - Per-conversation ACL (a peer can pull all your conversations or none)
457
+ - E2E encryption of payloads (we rely on transport encryption)
458
+ - mTLS (you can layer it on if you use Caddy)
459
+ - Signed rows (verifiable origin) — possible v2 if needed
460
+
461
+ ---
462
+
463
+ ## Trade-offs we made
464
+
465
+ | Choice | Why | Lose |
466
+ |---|---|---|
467
+ | HTTP push/pull + cursors | Replicache 2026 consensus pattern; idempotent; simple | Real-time — sync is up to 15 min stale |
468
+ | Local AUTOINCREMENT id as cursor | Per-DB monotonic, zero design overhead | Cursors not portable; each peer has its own |
469
+ | Self-signed cert + pinning | Zero DNS/CA infrastructure | Browser tooling can't poke the endpoint |
470
+ | Bearer token (not OAuth) | Days vs weeks to ship | Manual rotation |
471
+ | UNIQUE-constraint dedup | We don't edit verbatim — perfect fit | Cannot reconcile two divergent edits to the same logical row (we don't do that) |
472
+ | Skip CRDT / cr-sqlite | Maintenance risk + extension dependency | If we ever want concurrent-edit reconciliation, we'd need to revisit |
473
+ | Hub-and-spoke for 2 nodes | P2P degenerate at N=2; VPS always-on anyway | Single point of failure (mitigated: laptop keeps full local copy) |
474
+ | Schema-version handshake | Refuse to silently corrupt data on version skew | Coupling clients to specific server versions |
475
+
476
+ ---
477
+
478
+ ## Out of scope (deliberately)
479
+
480
+ - **Selective sync per conversation** — v2. v1 syncs everything.
481
+ - **Web UI for sync state** — `memex sync status` CLI is the surface.
482
+ - **Multi-VPS / N-device sync** — works (each Mac points at one VPS) but the
483
+ config UX is single-pair-only in v1.
484
+ - **Sync of archived conversations** — currently archive is local-only flag.
485
+ TBD whether archives should sync.
486
+ - **End-to-end encryption** — transport encryption is enough for v1 given the
487
+ threat model.
488
+ - **Cloud relay** — never. Against memex's local-first principle.
489
+
490
+ ---
491
+
492
+ ## Deployed patterns (what's been proven live)
493
+
494
+ The "lazy-user mesh" got real-world stress tests over several days. These are
495
+ patterns we observed working end-to-end, in order of decreasing dependence on
496
+ network privilege.
497
+
498
+ ### A. VPS-as-hub on a public port (the assumed default)
499
+
500
+ VPS exposes the sync-server on some port (e.g., 8766 or 443 behind nginx).
501
+ Spokes dial it directly. Works when: VPS firewall + spokes' egress both allow
502
+ the port.
503
+
504
+ **Fragility we hit:** ISP/VPN/cloud-SG can silently start blocking the port
505
+ that "worked yesterday" — without a guest reboot or any user change. We saw
506
+ this on a HOSTKEY VPS where 8766 just stopped passing externally. Public
507
+ ports are subject to anyone's firewall above the OS.
508
+
509
+ ### B. SSH tunnel (spoke initiates `ssh -L`) ⭐ THE CANONICAL PATTERN — automated by `sync-join`
510
+
511
+ Spoke `ssh -L 8766:localhost:8766 user@vps` over the existing port 22; sync
512
+ client talks to `localhost:8766`. Hub doesn't expose 8766 publicly; the
513
+ spoke's Mac VPN/proxy mostly relays SSH (it usually does to standard ports).
514
+
515
+ This is what `memex-sync sync-join` builds (v0.13): the always-on server is
516
+ the hub on loopback, the laptop dials out with a supervised `-L` tunnel.
517
+ Strictly better than C for the common laptop+server case — the authoritative
518
+ always-reachable node is the one that's actually always on. **Live since
519
+ 2026-06-11** on the maintainer's own Mac↔VPS pair (migrated off pattern C
520
+ via `sync-join` itself; built-in marker self-test round-tripped in 3.4s).
521
+
522
+ ### C. Mac-as-hub via reverse SSH tunnel (`ssh -R`) ⭐
523
+
524
+ The inversion that solved everything when public ports failed across the board:
525
+
526
+ ```
527
+ Mac runs sync-server on localhost:8766 (non-privileged, no root)
528
+ Mac runs: ssh -fN -R 8766:127.0.0.1:8766 user@vps
529
+
530
+ On the VPS, sshd creates a LOOPBACK listener on 127.0.0.1:8766
531
+ that forwards through the existing SSH connection back to Mac.
532
+
533
+ The VPS-side memex agent then runs:
534
+ sync-add mac https://localhost:8766 <mac-bearer> --insecure
535
+ sync-run mac
536
+ sync-schedule install --every 5m
537
+ ```
538
+
539
+ The radical property: **the only port traversed is 22, which is already open
540
+ on every VPS by definition (otherwise you couldn't have provisioned it).**
541
+ No firewall change anywhere. The sync-server is bound to loopback on both
542
+ ends — nothing public, anywhere. Cloud SG, ufw, the Mac's full-tunnel VPN
543
+ proxy — all irrelevant.
544
+
545
+ The trade-off: the Mac is a laptop. When it sleeps or the network changes,
546
+ the SSH tunnel dies. The VPS's scheduler then sees `peer unreachable` for
547
+ each tick until Mac wakes and re-establishes the tunnel. Acceptable for
548
+ "used-daily-driver" workflows; sync pauses, never loses data.
549
+
550
+ ### D. Transit-hub: chained `ssh -R` for a node Mac can't reach directly
551
+
552
+ Pattern C breaks when the Mac's VPN proxy refuses to relay SSH to a particular
553
+ destination (we saw this on Mac → Alibaba Asia: banner-exchange timeout even
554
+ though SSH worked fine to a European VPS). The fix: that node initiates its
555
+ own `ssh -R` to a third node that Mac CAN reach.
556
+
557
+ ```
558
+ Mac (sync-server localhost:8766)
559
+
560
+ │ ssh -fN -R 8766:localhost:8766 (Mac → VPS-EU)
561
+
562
+ VPS-EU (transit-hub, openclaw user, no sudo)
563
+ ├ localhost:8766 = Mac via Mac's tunnel
564
+ └ localhost:8767 = Asia VPS via its own tunnel
565
+
566
+ │ ssh -fN -R 8767:localhost:8766 (Asia VPS → VPS-EU)
567
+
568
+ Asia VPS (sync-server localhost:8766)
569
+ ```
570
+
571
+ The transit-hub runs `sync-run --all` periodically and converges everyone.
572
+ No spoke ever exposes a public port; the transit-hub only exposes port 22
573
+ (which was already open); the only required network capability anywhere is
574
+ "outbound SSH to one node the other spokes can also reach outbound."
575
+
576
+ This generalizes: any number of spokes can join the same transit-hub by
577
+ reverse-tunneling in. The transit-hub's bearer is the only thing that's
578
+ shared. Each spoke needs SSH access to the transit-hub (one pubkey paste
579
+ into `~/.ssh/authorized_keys` per spoke, no sudo).
580
+
581
+ **Real session evidence:** the 3-node mesh (Mac in San Francisco / VPN + a
582
+ HOSTKEY VPS in Milan + an Alibaba VPS in Asia) ran this exact topology after
583
+ every other public-port approach hit a firewall wall. 33k + 7k rows synced
584
+ cleanly via SSH tunnels at ~165 s/round.
585
+
586
+ **Topology update (2026-06-11):** the Mac↔VPS-EU leg has since migrated to
587
+ the canonical pattern B via `sync-join` (VPS-EU is now the hub on loopback;
588
+ the Mac dials in with a supervised `-L` tunnel). The Asia spoke still uses
589
+ its D-style reverse tunnel into VPS-EU, whose 5-min schedule keeps the
590
+ third node converged — C/D remain the right tools when a node can't be a
591
+ normal `-L` client.
592
+
593
+ ---
594
+
595
+ ## Roadmap / backlog
596
+
597
+ Surfaced while taking sync from tracer-bullet to a live 3-node mesh
598
+ (Mac + two VPSes). Ordered roughly by priority.
599
+
600
+ ### 1. Mesh-bootstrap wizard ⭐ (top priority, the consolidating product feature)
601
+
602
+ The end-state of everything else below. A prompt-driven, agent-mediated
603
+ setup that empirically discovers reachability between user's nodes, picks
604
+ the best topology automatically (deployed pattern A → B → C → D from above,
605
+ in decreasing order of "ideal" reachability), and emits ready-to-paste
606
+ prompts for each agent. The user never has to know whether their setup is
607
+ "VPS-as-hub" or "Mac-as-hub" or "transit-hub" — the wizard figures it out
608
+ and explains the choice.
609
+
610
+ **UX sketch** (interactive, via Mac CLI or a memex MCP tool):
611
+
612
+ ```
613
+ $ memex-sync mesh bootstrap
614
+
615
+ Wizard: Which agents do you have? (multi-select)
616
+ [ ] OpenClaw [ ] Hermes [ ] Kimi [ ] Custom
617
+
618
+ Wizard: For each, paste the probe prompt into the agent's chat and paste the
619
+ reply back here. (The probe is read-only — `whoami`, `nc -z github.com:443`,
620
+ `ss -ltn`, `sudo -ln`, etc.)
621
+
622
+ [Wizard parses all replies]
623
+
624
+ Wizard: Building reachability matrix…
625
+ ✓ Mac can reach OpenClaw-VPS on :22 (banner OK, ~120ms)
626
+ ✗ Mac can reach Kimi-VPS on :22 (banner timeout — your VPN proxy
627
+ relays SSH to Europe, drops it to Asia)
628
+ ✓ Kimi-VPS can reach OpenClaw-VPS on :22 (Asia → Europe, ~280ms)
629
+ ✓ All three reach github.com:443 — internet works everywhere
630
+
631
+ Wizard: Best plan: **Mac-as-hub with OpenClaw-VPS as transit**.
632
+ Why: Kimi can't be reached from Mac directly (proxy block); but
633
+ Kimi CAN reach OpenClaw-VPS, which Mac CAN reach. So OpenClaw-VPS
634
+ becomes a transit point. (Deployed pattern D from SYNC.md.)
635
+ No public ports needed anywhere.
636
+
637
+ [show topology diagram]
638
+ [confirm: y/n]
639
+
640
+ Wizard: Generating setup prompts. Paste each into the indicated chat:
641
+
642
+ — Prompt for OpenClaw: [add Mac pubkey + add Kimi pubkey + sync-add mac + sync-schedule]
643
+ — Prompt for Kimi: [ssh -R outbound to OpenClaw]
644
+ — Mac will: [ssh -R outbound to OpenClaw + start local sync-server]
645
+
646
+ Wizard: Paste OpenClaw's reply confirming pubkeys added…
647
+ Paste Kimi's reply confirming ssh -R up…
648
+
649
+ Wizard: Establishing Mac's ssh -R… [does it]
650
+ Verifying via marker propagation… [posts a marker to local sync-server,
651
+ polls each agent over their tunneled connection until the marker
652
+ appears in their DB]
653
+ ✓ marker reached OpenClaw in 8s
654
+ ✓ marker reached Kimi in 14s (via transit)
655
+ Mesh up.
656
+
657
+ Wizard: Installing durability layer…
658
+ ✓ LaunchAgent on Mac with autossh-style retry for ssh -R to OpenClaw
659
+ ✓ systemd-user on Kimi for ssh -R to OpenClaw with restart
660
+ Mesh self-heals on Mac sleep/network change.
661
+ ```
662
+
663
+ **Implementation shape:**
664
+
665
+ - `lib/sync/bootstrap.js` — a state-machine wizard.
666
+ - `memex_mesh_bootstrap` MCP tool — agent-facing entry; Mac's Claude Code
667
+ invokes it, the conversation IS the wizard.
668
+ - `scripts/probe-prompt.sh` (or generated) — the standard read-only probe
669
+ any agent runs once and replies with structured output.
670
+ - Topology decision is the empirical heart: a reachability matrix +
671
+ preference order (A > B > C > D in the deployed-patterns section).
672
+ - Marker propagation is the end-to-end test: posts a known message via Mac's
673
+ sync-server, polls each remote until it appears, gives a per-hop latency.
674
+
675
+ This is the consolidating feature that absorbs items 2, 3, and the older
676
+ auto-hub election idea: every constituent decision (which port to use, when
677
+ to fall back to SSH, when to ssh -R, whether to ssh-R-chain) becomes a
678
+ branch in the wizard's decision tree, made from empirical data rather than
679
+ operator guesswork.
680
+
681
+ ### 1b. Self-healing tunnels — durability that ships to every user ⭐
682
+
683
+ When the mesh runs on patterns C or D (SSH reverse tunnels), the tunnels
684
+ themselves are fragile: laptop sleep, network change, VPN toggle, or VPS reboot
685
+ kills them. **This is not theoretical — it cost real data.** On 2026-06-07 the
686
+ Mac↔VPS1 tunnel had been dead since a sleep on ~06-02; six days of OpenClaw
687
+ research (incl. the whole ECC investigation, ~125 rows) sat stranded on the VPS
688
+ and never reached the main store. The 5-min schedule kept firing into a dead
689
+ tunnel and **failed silently**. Two lessons drive this design:
690
+
691
+ 1. **Auto-heal** so breaks are rare and short.
692
+ 2. **Never fail silently** — when a break *can't* self-heal (key rotated, VPS
693
+ gone, account suspended), the user must find out in hours, not days. The
694
+ second matters more than the first.
695
+
696
+ #### Proven reference implementation (live, 2026-06-07)
697
+
698
+ Hand-built and verified on the Mac hub — this is the prototype the product
699
+ should generate:
700
+
701
+ - `~/.memex/sync-tunnel.sh` — `exec ssh -N … -o ServerAliveInterval=30
702
+ -o ServerAliveCountMax=3 -o ExitOnForwardFailure=yes -R 127.0.0.1:8766:127.0.0.1:8766
703
+ openclaw@VPS` (foreground, no `-f`; explicit IPv4 loopback to avoid the
704
+ earlier `::1`-only bind bug).
705
+ - `~/Library/LaunchAgents/com.parallelclaw.memex.synctunnel.plist` — `KeepAlive=true`
706
+ + `RunAtLoad=true` + `ThrottleInterval=15`. launchd respawns ssh whenever it
707
+ exits (sleep/wake, network change, drop).
708
+ - **Self-test passed**: killed the ssh PID → launchd respawned it in ~15s →
709
+ loopback listener + sync endpoint (cert D2:96) back automatically.
710
+
711
+ #### Architectural decision: supervise the tunnel *inside the memex daemon*
712
+
713
+ The reference is a "dumb" OS supervisor over a raw `ssh`. The product should go
714
+ one level up: **fold tunnel supervision into the long-running memex process**
715
+ (the sync-server / capture daemon the OS already keeps alive). Then the OS keeps
716
+ ONE thing alive (memex); memex keeps the tunnel alive. Benefits:
717
+
718
+ - **One supervisor tree**, not two (OS→ssh becomes OS→memex→ssh).
719
+ - memex still delegates crypto/transport to `ssh` (no SSH reimpl) but owns
720
+ **retry/backoff, error classification, and health** — so it can show status
721
+ and surface failure (impossible when ssh is an opaque sibling of launchd).
722
+ - `ssh` stays a child process; on hub nodes no *new* OS unit is needed (reuse
723
+ the existing sync-server unit). Spoke-only nodes (no server) get a dedicated
724
+ keeper unit.
725
+
726
+ #### Components
727
+
728
+ 1. **In-daemon tunnel keeper.** Tunnel spec in `config.json`
729
+ (`{peer, direction, local_port, remote_port, ssh_target, identity}`).
730
+ Supervisor loop: spawn ssh → on exit, **classify** the failure:
731
+ - *auth failure* → STOP + surface (don't loop forever on a dead key);
732
+ - *network unreachable* → exponential backoff (cap ~2 min);
733
+ - *bind conflict* (stale remote listener after a drop) → fast retry, the
734
+ listener clears within seconds;
735
+ - *clean drop* → immediate re-establish.
736
+ 2. **Zombie-tunnel detection.** TCP-up ≠ data-flowing (the `nc -z` lies through
737
+ the Xray proxy bit us already). Keeper periodically curls `/sync/health` with
738
+ the bearer *through* the tunnel; if TCP is up but data is dead, recycle it.
739
+ 3. **OS-unit generation** (extend `lib/sync/service.js`, which already builds
740
+ server + schedule units). Add tunnel-keeper variants only where a spoke runs
741
+ no server: `buildTunnelLaunchAgentPlist` (KeepAlive) / `buildTunnelSystemdUnit`
742
+ (`Restart=always` + `RestartSec` + linger). Reuse the existing
743
+ `MEMEX_SYNC_EXPERIMENTAL` injection + log-path conventions.
744
+ 4. **Observability — `memex sync status`.** Per peer: tunnel state
745
+ (up/healing/down), last successful sync, last heal time, consecutive
746
+ failures, last error class. Turns "it silently broke" into a glance.
747
+ 5. **Failure surfacing (the core lesson).** When a peer's sync has been failing
748
+ past a threshold (e.g. tunnel down > 1 h), surface it loudly:
749
+ - a line in the SessionStart auto-context: *"⚠️ sync to peer X down 6 days —
750
+ N conversations not backed up"*;
751
+ - optionally an OS notification.
752
+ This is what would have caught the 2026-06-07 incident on day one.
753
+ 6. **Install UX + proof.** `memex sync durability install` (or the wizard's final
754
+ step): detect OS → generate + load unit(s) → **run the kill→respawn self-test**
755
+ → report "self-healing active (verified)". Shipping the self-test means the
756
+ user gets *proof*, not a promise.
757
+
758
+ #### Edge cases the productized version must handle
759
+
760
+ - **Idempotency** — re-install must not stack duplicate tunnels/units.
761
+ - **Passphrase keys** — reference key had none; a protected key needs
762
+ agent/keychain integration (`UseKeychain`/`AddKeysToAgent` on macOS, ssh-agent
763
+ on Linux). Detect and guide.
764
+ - **Multiple peers** — a hub may dial out to N spokes; the keeper manages N specs.
765
+ - **No-VPS users** — patterns C/D need one publicly-reachable sshd (the VPS as
766
+ rendezvous). Users with two laptops and no VPS have nowhere to dial; that
767
+ segment needs a relay (bring-your-own $5 VPS, or a future managed
768
+ memex-relay — see the OSS-free / managed-tunnels-paid split noted in backlog
769
+ discussion). Don't pretend SSH-R covers them.
770
+
771
+ ### 2. `sync-server invite` external-reachability check
772
+
773
+ `memex_sync_invite` currently probes only the *local* port (127.0.0.1). It
774
+ happily emits a blob whose `host` is a public IP that's actually firewalled at
775
+ the cloud layer (the Alibaba case). It should additionally attempt an external
776
+ reachability hint and warn: "listening locally but the public host may be
777
+ blocked by your cloud Security Group — verify, or pair the spoke outbound to an
778
+ already-reachable hub instead."
779
+
780
+ ### 3. Transport auto-management (deferred from Phase 6)
781
+
782
+ The transports work today via manual steps; automate the setup:
783
+ - **autossh** LaunchAgent/systemd to keep an SSH tunnel up (for SSH-reachable
784
+ hubs without an open public port).
785
+ - **Tailscale** auto-detect + `tailscale up` from a prompt (needs a one-time
786
+ auth key — moves the human action to the TS console, doesn't remove it).
787
+ - **mDNS LAN** discovery (`_memex._tcp.local`) for two machines on the same
788
+ Wi-Fi with no VPS.
789
+
790
+ ### 4. Kimi Code CLI capture bridge
791
+
792
+ The standalone Kimi Code CLI writes `~/.kimi/sessions/<uuid>/context.jsonl`
793
+ (roles `_system_prompt`/`_checkpoint`/…) which the capture daemon doesn't watch
794
+ or parse. (Kimi accessed *through* OpenClaw — channel `kimi-web` — is already
795
+ captured.) An inbox-bridge (`kimi-to-memex` → `~/.memex/inbox/`) keeps the sync
796
+ engine untouched and isolates us from Moonshot format changes. Low priority —
797
+ the OpenClaw path already covers the common case.
798
+
799
+ ### 5. Push-side skip surfacing
800
+
801
+ `POST /sync/push` applies rows via the shared row-applier, which counts `skipped`,
802
+ but the HTTP response doesn't return it to the pushing client (only the pull path
803
+ surfaces skips). A server-side FTS corruption could silently drop pushed rows
804
+ without the client knowing. Mirror the pull-side retry/abort on the server, or
805
+ return `skipped` in the push response so the client can react.
806
+
807
+ ### 6. Provenance: per-row `origin` (node identity) — ✅ SHIPPED in v0.14.0
808
+
809
+ In a synced mesh, nothing records WHICH node captured a row. Two live failures
810
+ on the maintainer's 3-node mesh (2026-06-12), both from agents querying their
811
+ own synced DB:
812
+
813
+ - An agent looked for its peer's sessions, found no `source='vps1'` (a label
814
+ it *invented* — telling: that's how users expect it to work), and concluded
815
+ sync was broken. The 12,826 peer rows were present all along — blended into
816
+ the same `source='openclaw'` its own capture uses.
817
+ - Sharper: **conversation-key collision across nodes.** Two OpenClaw instances
818
+ (different VPSes) both capture the same human's Telegram presence, keyed by
819
+ the same Telegram id → both write `openclaw-tg-<id>` and sync MERGES two
820
+ different agents' dialogues into ONE interleaved conversation. msg_id-dedup
821
+ keeps it lossless, but "what did I discuss with agent A vs agent B" is
822
+ unanswerable.
823
+
824
+ Fix shape (additive, wire-compatible):
825
+ - `origin` column on messages (e.g. short host label or stable node id),
826
+ stamped at CAPTURE time, carried verbatim on the wire like `channel`.
827
+ - `origin:` filter in `memex_search` + origin shown in `get_conversation`
828
+ headers; `memex_overview` breaks counts down per origin.
829
+ - Do NOT namespace conversation ids by node — same-chat dedup across nodes is
830
+ a feature (live capture + export of the same chat must still converge).
831
+ Provenance belongs on the row, not in the key.
832
+ - Backfill: deliberately NONE by default — post-hoc a node cannot tell its
833
+ own NULL rows from peer rows that synced in pre-provenance, and a blind
834
+ stamp would FABRICATE provenance. Instead: forward-stamping from v0.14 +
835
+ the conflict branch backfills origin when a local RE-IMPORT of the source
836
+ file re-encounters a row (origin = COALESCE(existing, incoming) — never
837
+ overwrites). History without a re-import stays NULL = "pre-v0.14 era".
838
+
839
+ As shipped: `getOrigin()` (env MEMEX_ORIGIN → config `origin` → persisted
840
+ sanitised hostname) baked into every local-capture INSERT; wire carries
841
+ `origin` verbatim both directions; `memex_search(origin:)`; multi-origin
842
+ conversations tag lines `[@origin]` in `memex_get_conversation`;
843
+ `memex_overview` shows the per-origin breakdown; OpenClaw plugin stamps via
844
+ the same resolution (reads config, never writes it).