parallelclaw 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +204 -0
- package/HELP.md +600 -0
- package/LICENSE +21 -0
- package/MULTI_MACHINE.md +152 -0
- package/README.md +417 -0
- package/README.ru.md +740 -0
- package/SYNC.md +844 -0
- package/bot/README.md +173 -0
- package/bot/config.js +66 -0
- package/bot/inbox.js +153 -0
- package/bot/index.js +294 -0
- package/bot/nexara.js +61 -0
- package/bot/poll.js +304 -0
- package/bot/search.js +155 -0
- package/bot/telegram.js +96 -0
- package/ingest.js +2712 -0
- package/lib/cli/index.js +1987 -0
- package/lib/config.js +220 -0
- package/lib/db-init.js +158 -0
- package/lib/hook/install.js +268 -0
- package/lib/import-telegram.js +158 -0
- package/lib/ingest-file.js +779 -0
- package/lib/notify-click-action.js +281 -0
- package/lib/openclaw-channel.js +643 -0
- package/lib/parse-cursor.js +172 -0
- package/lib/parse-obsidian.js +256 -0
- package/lib/parse-telegram-html.js +384 -0
- package/lib/parse.js +175 -0
- package/lib/render-markdown.js +0 -0
- package/lib/store-doc/canonicalize.js +116 -0
- package/lib/store-doc/detect.js +209 -0
- package/lib/store-doc/extract-title.js +162 -0
- package/lib/sync/auth.js +80 -0
- package/lib/sync/cert.js +144 -0
- package/lib/sync/cli.js +906 -0
- package/lib/sync/client.js +138 -0
- package/lib/sync/config.js +130 -0
- package/lib/sync/pair.js +145 -0
- package/lib/sync/pull.js +158 -0
- package/lib/sync/push.js +305 -0
- package/lib/sync/replicate.js +335 -0
- package/lib/sync/server.js +224 -0
- package/lib/sync/service.js +726 -0
- package/lib/tasks.js +215 -0
- package/lib/telegram-decisions.js +165 -0
- package/lib/telegram-discovery.js +373 -0
- package/lib/telegram-notify.js +272 -0
- package/lib/telegram-pending.js +200 -0
- package/lib/web/index.js +265 -0
- package/lib/web/routes/conversation.js +193 -0
- package/lib/web/routes/conversations.js +180 -0
- package/lib/web/routes/dashboard.js +175 -0
- package/lib/web/routes/pending.js +277 -0
- package/lib/web/routes/settings.js +226 -0
- package/lib/web/static/style.css +393 -0
- package/lib/web/templates.js +234 -0
- package/package.json +84 -0
- package/server.js +3816 -0
- package/skills/install-memex/README.md +109 -0
- package/skills/install-memex/SKILL.md +342 -0
- package/skills/install-memex/examples.md +294 -0
- package/skills/install-memex-claw/SKILL.md +423 -0
package/SYNC.md
ADDED
|
@@ -0,0 +1,844 @@
|
|
|
1
|
+
# ParallelClaw sync — multi-device replication
|
|
2
|
+
|
|
3
|
+
> **Status:** engine experimental since v0.11.11; the **`sync-join` lazy flow is
|
|
4
|
+
> the v0.13 front door**. After one successful `sync-join`, no
|
|
5
|
+
> `MEMEX_SYNC_EXPERIMENTAL` env var is needed (the join persists
|
|
6
|
+
> `sync.enabled: true`). Manual/advanced commands on a machine that never
|
|
7
|
+
> joined still want the env var. Pin your memex version on both sides.
|
|
8
|
+
|
|
9
|
+
A pair of memex instances (laptop + VPS, or two laptops, or any N) keep their
|
|
10
|
+
`~/.memex/data/memex.db` files **converging** — same conversations and messages
|
|
11
|
+
visible from every device, no cloud relay, no shared file system.
|
|
12
|
+
|
|
13
|
+
## Quickstart (the lazy path — 2 steps)
|
|
14
|
+
|
|
15
|
+
The canonical setup: your laptop (Claude/Cursor) + one always-on server where
|
|
16
|
+
your agent lives. **Step 1 — paste to the agent on the server:**
|
|
17
|
+
|
|
18
|
+
```
|
|
19
|
+
Set up memex sync as a hub and give me a join token for my laptop:
|
|
20
|
+
1. npm install -g parallelclaw@latest (skip if installed)
|
|
21
|
+
2. memex-sync sync-server install --bind 127.0.0.1
|
|
22
|
+
3. memex-sync sync-server invite --join
|
|
23
|
+
Send me the memex-join:... line.
|
|
24
|
+
```
|
|
25
|
+
|
|
26
|
+
**Step 2 — one command on the laptop:**
|
|
27
|
+
|
|
28
|
+
```sh
|
|
29
|
+
memex-sync sync-join memex-join:eyJ2...
|
|
30
|
+
```
|
|
31
|
+
|
|
32
|
+
That orchestrates everything: SSH probe (prints your pubkey + instructions if
|
|
33
|
+
access is missing), a self-healing forward tunnel (launchd/systemd KeepAlive),
|
|
34
|
+
pinned-cert health check, first sync (resumable if interrupted), 15-min
|
|
35
|
+
auto-sync, hourly watchdog, and a **marker self-test** that proves a note
|
|
36
|
+
round-trips before declaring success. Everything below this section is the
|
|
37
|
+
operational detail and the wire-protocol spec.
|
|
38
|
+
|
|
39
|
+
> **Tip — name your nodes first (v0.14).** Each node stamps its captures with
|
|
40
|
+
> an `origin` label (defaults to the hostname). Set a friendly one (`mac`,
|
|
41
|
+
> `vps1`, …) via `origin` in `~/.memex/config.json` on each node BEFORE data
|
|
42
|
+
> accumulates — old rows keep whatever stamp they got. This is what powers
|
|
43
|
+
> `memex_search(origin: …)` and the `[@node]` tags in merged conversations.
|
|
44
|
+
|
|
45
|
+
This document is **both** the operational guide and the wire-protocol spec.
|
|
46
|
+
Implementers and users read different sections.
|
|
47
|
+
|
|
48
|
+
---
|
|
49
|
+
|
|
50
|
+
## Table of contents
|
|
51
|
+
|
|
52
|
+
1. [Why this exists](#why-this-exists) — what problem we're solving
|
|
53
|
+
2. [How it works (30s version)](#how-it-works-30s-version) — for users
|
|
54
|
+
3. [Transports](#transports) — SSH, Tailscale, HTTPS pair, mDNS
|
|
55
|
+
4. [Setup walkthrough](#setup-walkthrough) — manual steps behind `sync-join`
|
|
56
|
+
5. [Wire protocol (spec)](#wire-protocol-spec) — for implementers
|
|
57
|
+
6. [Security model](#security-model)
|
|
58
|
+
7. [Trade-offs we made](#trade-offs-we-made)
|
|
59
|
+
8. [Out of scope (deliberately)](#out-of-scope-deliberately)
|
|
60
|
+
|
|
61
|
+
---
|
|
62
|
+
|
|
63
|
+
## Why this exists
|
|
64
|
+
|
|
65
|
+
memex is a **local-first** SQLite memory: every device captures its own AI
|
|
66
|
+
conversations into its own `memex.db`. Without sync, the Mac doesn't see what
|
|
67
|
+
the VPS captured, and vice versa.
|
|
68
|
+
|
|
69
|
+
The naïve fix — point Syncthing/Dropbox/iCloud at the `.db` file — corrupts
|
|
70
|
+
SQLite within hours under concurrent writes (documented [downstream of
|
|
71
|
+
claude-mem](https://github.com/thedotmack/claude-mem/issues/1037)).
|
|
72
|
+
|
|
73
|
+
memex sync solves it by treating each device's database as **append-only
|
|
74
|
+
authoritative** and exchanging **deltas** over HTTP. Conflicts cannot happen
|
|
75
|
+
because verbatim memory is never edited — we only ever insert.
|
|
76
|
+
|
|
77
|
+
---
|
|
78
|
+
|
|
79
|
+
## How it works (30s version)
|
|
80
|
+
|
|
81
|
+
```
|
|
82
|
+
┌──────────────────────┐ HTTP push/pull ┌──────────────────────┐
|
|
83
|
+
│ Mac │ ◀──── every 15 min ────▶ │ VPS │
|
|
84
|
+
│ memex.db (Mac side) │ │ memex.db (VPS side) │
|
|
85
|
+
│ │ POST /sync/push ───▶ │ │
|
|
86
|
+
│ Claude Code │ GET /sync/pull ◀─── │ OpenClaw, Hermes │
|
|
87
|
+
│ Telegram │ │ cron jobs │
|
|
88
|
+
└──────────────────────┘ └──────────────────────┘
|
|
89
|
+
```
|
|
90
|
+
|
|
91
|
+
1. **VPS** runs `memex sync server enable` — generates a self-signed TLS cert
|
|
92
|
+
and a bearer token, prints a one-line **pair blob**.
|
|
93
|
+
2. **Mac** runs `memex sync pair memex-pair:...` — stores the blob, validates
|
|
94
|
+
the cert against its pinned fingerprint, can now talk to VPS.
|
|
95
|
+
3. Every 15 min (configurable), Mac runs `memex sync run` — it:
|
|
96
|
+
- pulls rows from VPS with `id > last_seen_cursor` and INSERT-OR-IGNOREs them
|
|
97
|
+
- pushes rows VPS hasn't seen yet
|
|
98
|
+
- advances both cursors
|
|
99
|
+
|
|
100
|
+
Dedup is automatic via the existing `UNIQUE(source, conversation_id, msg_id)`
|
|
101
|
+
constraint — same row from two directions never double-inserts.
|
|
102
|
+
|
|
103
|
+
---
|
|
104
|
+
|
|
105
|
+
## Transports
|
|
106
|
+
|
|
107
|
+
Sync runs over HTTP/JSON. **How the bytes reach VPS** is independent of the
|
|
108
|
+
wire protocol — pick one:
|
|
109
|
+
|
|
110
|
+
| Transport | Best for | User setup steps |
|
|
111
|
+
|---|---|---|
|
|
112
|
+
| **SSH tunnel** | User already SSHes into VPS | Zero (autossh installed on demand) |
|
|
113
|
+
| **Tailscale** | Both devices on same tailnet | Zero (auto-detected) |
|
|
114
|
+
| **HTTPS + pair blob** | VPS only via agent/bot (no SSH) | One paste from agent chat |
|
|
115
|
+
| **mDNS LAN** | Two devices on same Wi-Fi, no VPS | Zero (auto-discovery) |
|
|
116
|
+
| **Caddy + public HTTPS** | Advanced, want public access | Domain + Caddy install |
|
|
117
|
+
|
|
118
|
+
`memex-sync sync-join` (v0.13) automates the SSH-tunnel transport end-to-end —
|
|
119
|
+
the canonical lazy path. The full environment-probing wizard that picks among
|
|
120
|
+
ALL transports is Roadmap §1.
|
|
121
|
+
|
|
122
|
+
### SSH tunnel (default for SSH-capable users)
|
|
123
|
+
|
|
124
|
+
Mac runs `autossh -N -L 8765:localhost:8765 user@vps` as a LaunchAgent. Sync
|
|
125
|
+
client talks to `http://localhost:8765`, bytes flow through SSH to VPS:8765.
|
|
126
|
+
|
|
127
|
+
Pro: zero new accounts, encryption from SSH.
|
|
128
|
+
Con: tunnel-keeper daemon (autossh handles reconnect).
|
|
129
|
+
|
|
130
|
+
### Tailscale (if available)
|
|
131
|
+
|
|
132
|
+
Mac talks to `http://memex-vps.tail-abc.ts.net:8765` directly. WireGuard
|
|
133
|
+
encryption and identity built in.
|
|
134
|
+
|
|
135
|
+
Pro: works through NAT, identity per device.
|
|
136
|
+
Con: requires Tailscale account (free for personal, 100 devices).
|
|
137
|
+
|
|
138
|
+
### HTTPS + pair blob (lazy-user path)
|
|
139
|
+
|
|
140
|
+
VPS exposes `https://<host>:8765` with a self-signed cert. Client pins the
|
|
141
|
+
cert fingerprint baked into the pair blob. Bearer token in header authenticates
|
|
142
|
+
the request. No DNS, no Let's Encrypt, no SSH key — one paste from agent chat.
|
|
143
|
+
|
|
144
|
+
Pro: zero user terminal access to VPS required.
|
|
145
|
+
Con: VPS must have a reachable public IP/hostname.
|
|
146
|
+
|
|
147
|
+
### mDNS LAN (no-VPS scenario) — planned
|
|
148
|
+
|
|
149
|
+
Two devices on the same Wi-Fi would announce themselves as `_memex._tcp.local`
|
|
150
|
+
and pair via trust-on-first-use, no VPS required. **Not built yet** — until then,
|
|
151
|
+
two LAN machines can still pair by running the server on one and `sync-add`-ing
|
|
152
|
+
its LAN IP from the other.
|
|
153
|
+
|
|
154
|
+
Pro: no VPS, no cloud, no account.
|
|
155
|
+
Con: only when both devices on same network.
|
|
156
|
+
|
|
157
|
+
---
|
|
158
|
+
|
|
159
|
+
## Setup walkthrough
|
|
160
|
+
|
|
161
|
+
> All commands are gated behind `MEMEX_SYNC_EXPERIMENTAL=1` in v0.11.x.
|
|
162
|
+
> The CLI lives under the existing `memex-sync` binary (`memex-sync sync-*`).
|
|
163
|
+
|
|
164
|
+
### Scenario 1 — lazy path: VPS you only reach through an agent
|
|
165
|
+
|
|
166
|
+
The hub (VPS) runs the server durably; the spoke (laptop) pairs with one paste.
|
|
167
|
+
|
|
168
|
+
**On the VPS, once** (or have your agent run it):
|
|
169
|
+
|
|
170
|
+
```sh
|
|
171
|
+
export MEMEX_SYNC_EXPERIMENTAL=1
|
|
172
|
+
memex-sync sync-server install --port 8766 --bind 0.0.0.0 # durable systemd/launchd service
|
|
173
|
+
```
|
|
174
|
+
|
|
175
|
+
**Get a pairing token.** Either ask your agent in chat —
|
|
176
|
+
|
|
177
|
+
> "set up memex sync with my Mac" / "сгенерируй паринг-код для синка"
|
|
178
|
+
|
|
179
|
+
— and it calls the **`memex_sync_invite`** MCP tool (requires
|
|
180
|
+
`MEMEX_SYNC_EXPERIMENTAL=1` in the memex MCP server's env), or run it by hand:
|
|
181
|
+
|
|
182
|
+
```sh
|
|
183
|
+
memex-sync sync-server invite --host <public-ip> # prints memex-pair:...
|
|
184
|
+
```
|
|
185
|
+
|
|
186
|
+
**On the laptop, one paste:**
|
|
187
|
+
|
|
188
|
+
```sh
|
|
189
|
+
export MEMEX_SYNC_EXPERIMENTAL=1
|
|
190
|
+
memex-sync sync-pair memex-pair:eyJ2IjoxLCJob3N0Ijoi... # decodes host+port+cert_fp+token
|
|
191
|
+
memex-sync sync-run vps # first sync
|
|
192
|
+
memex-sync sync-schedule install --every 15m # hands-off from here
|
|
193
|
+
```
|
|
194
|
+
|
|
195
|
+
Done. New conversations propagate within the interval, both directions.
|
|
196
|
+
|
|
197
|
+
### Scenario 2 — Mac + VPS over an SSH tunnel
|
|
198
|
+
|
|
199
|
+
If you have SSH to the VPS, skip the public bind. Run the server on loopback,
|
|
200
|
+
forward the port yourself, and pass `--host localhost` to invite:
|
|
201
|
+
|
|
202
|
+
```sh
|
|
203
|
+
# VPS
|
|
204
|
+
memex-sync sync-server install --port 8766 --bind 127.0.0.1
|
|
205
|
+
memex-sync sync-server invite --host localhost # blob targets localhost
|
|
206
|
+
|
|
207
|
+
# Mac — keep this tunnel up (autossh/LaunchAgent automation is a follow-up)
|
|
208
|
+
ssh -N -L 8766:localhost:8766 user@vps &
|
|
209
|
+
memex-sync sync-pair memex-pair:... # → https://localhost:8766
|
|
210
|
+
memex-sync sync-run vps
|
|
211
|
+
```
|
|
212
|
+
|
|
213
|
+
### Scenario 3 — Tailscale
|
|
214
|
+
|
|
215
|
+
Both machines on one tailnet: `invite --host <vps>.tail-xxxx.ts.net`, then
|
|
216
|
+
`sync-pair` on the laptop. WireGuard handles encryption + NAT; the cert pin in
|
|
217
|
+
the blob still applies.
|
|
218
|
+
|
|
219
|
+
### Manual fallback (no pair blob)
|
|
220
|
+
|
|
221
|
+
`sync-pair` is just sugar over `sync-add`. The explicit form:
|
|
222
|
+
|
|
223
|
+
```sh
|
|
224
|
+
memex-sync sync-add vps https://<host>:8766 <bearer-hex> --cert-fp sha256:AA:BB:...
|
|
225
|
+
# or, over a transport you already trust (SSH tunnel / Tailscale):
|
|
226
|
+
memex-sync sync-add vps https://localhost:8766 <bearer-hex> --insecure
|
|
227
|
+
```
|
|
228
|
+
|
|
229
|
+
### Command reference
|
|
230
|
+
|
|
231
|
+
| Command | Side | What |
|
|
232
|
+
|---|---|---|
|
|
233
|
+
| `sync-server install / uninstall / status` | hub | durable server service |
|
|
234
|
+
| `sync-server start` | hub | foreground server |
|
|
235
|
+
| `sync-server invite [--host H] [--port N] [--ttl 30]` | hub | print a pair blob |
|
|
236
|
+
| `sync-pair <blob> [--alias vps]` | spoke | register a remote from a blob |
|
|
237
|
+
| `sync-add <alias> <url> <bearer> (--cert-fp F \| --insecure)` | spoke | register a remote explicitly |
|
|
238
|
+
| `sync-run <alias> \| --all` | spoke | one bidirectional sync |
|
|
239
|
+
| `sync-schedule install [--every 15m] / uninstall / status` | spoke | hands-off auto-sync timer |
|
|
240
|
+
| `sync-list / sync-remove <alias> / sync-status` | spoke | inspect / manage remotes |
|
|
241
|
+
| `memex_sync_invite` (MCP tool) | hub | agent emits a pair blob from a chat phrase |
|
|
242
|
+
|
|
243
|
+
> **Not yet automated (manual today, planned):** autossh tunnel management,
|
|
244
|
+
> Tailscale auto-detection, and mDNS LAN discovery (`_memex._tcp.local` for two
|
|
245
|
+
> machines on the same Wi-Fi with no VPS). The transports themselves work today
|
|
246
|
+
> via the manual steps above.
|
|
247
|
+
|
|
248
|
+
---
|
|
249
|
+
|
|
250
|
+
## Wire protocol (spec)
|
|
251
|
+
|
|
252
|
+
> Implementers: this is the source of truth. Anything that diverges from this
|
|
253
|
+
> section is a bug.
|
|
254
|
+
|
|
255
|
+
### Endpoints
|
|
256
|
+
|
|
257
|
+
```
|
|
258
|
+
POST /sync/push
|
|
259
|
+
Authorization: Bearer <token>
|
|
260
|
+
Content-Type: application/json
|
|
261
|
+
Body: {
|
|
262
|
+
"rows": [Row, Row, ...] // 1..1000 messages
|
|
263
|
+
}
|
|
264
|
+
|
|
265
|
+
Response 200: {
|
|
266
|
+
"accepted": N, // rows inserted (newly seen by us)
|
|
267
|
+
"deduplicated": M, // rows we already had (UNIQUE constraint hit)
|
|
268
|
+
"last_id": <int> // our local id of the highest-ranked row
|
|
269
|
+
// — useful for client log/debug
|
|
270
|
+
}
|
|
271
|
+
Response 401: { "error": "unauthorized" }
|
|
272
|
+
Response 400: { "error": "bad_request", "detail": "..." }
|
|
273
|
+
Response 413: { "error": "payload_too_large" } // >2MB body
|
|
274
|
+
```
|
|
275
|
+
|
|
276
|
+
```
|
|
277
|
+
GET /sync/pull?since=<int>&limit=<int>
|
|
278
|
+
Authorization: Bearer <token>
|
|
279
|
+
|
|
280
|
+
Query:
|
|
281
|
+
since — local id of caller's last-seen row from us; 0 for first pull
|
|
282
|
+
limit — max rows to return; default 500, max 1000
|
|
283
|
+
|
|
284
|
+
Response 200: {
|
|
285
|
+
"rows": [Row, Row, ...],
|
|
286
|
+
"next_cursor": <int>, // id of the last row in this batch
|
|
287
|
+
"has_more": bool, // true → caller should call again with
|
|
288
|
+
// since=next_cursor immediately
|
|
289
|
+
"server_now": <int> // our wall clock at response time (ms epoch)
|
|
290
|
+
// — informational
|
|
291
|
+
}
|
|
292
|
+
```
|
|
293
|
+
|
|
294
|
+
```
|
|
295
|
+
GET /sync/health
|
|
296
|
+
Authorization: Bearer <token> // optional — token gates extra detail
|
|
297
|
+
|
|
298
|
+
Response 200: {
|
|
299
|
+
"version": "0.11.11",
|
|
300
|
+
"schema_version": 12,
|
|
301
|
+
"row_count": <int>, // total messages in our DB
|
|
302
|
+
"last_id": <int> // highest message id we hold
|
|
303
|
+
}
|
|
304
|
+
```
|
|
305
|
+
|
|
306
|
+
### Row shape
|
|
307
|
+
|
|
308
|
+
A `Row` is exactly the JSON representation of a `messages` table row, plus
|
|
309
|
+
the parent `conversation` metadata necessary to materialize the row on the
|
|
310
|
+
other side:
|
|
311
|
+
|
|
312
|
+
```json
|
|
313
|
+
{
|
|
314
|
+
"source": "claude-code",
|
|
315
|
+
"conversation_id": "claude-code-<uuid>",
|
|
316
|
+
"msg_id": "<source-specific-stable-id>",
|
|
317
|
+
"uuid": "<v4-uuid>",
|
|
318
|
+
"role": "user|assistant|system|tool|boundary|summary",
|
|
319
|
+
"sender": "me|claude-code|...",
|
|
320
|
+
"text": "raw verbatim content",
|
|
321
|
+
"ts": 1716800000, // source-original timestamp (seconds)
|
|
322
|
+
"edited_at": 1716800042000, // ms; null if never edited
|
|
323
|
+
"channel": "telegram|kimi-web|system|null",
|
|
324
|
+
"metadata": "{...json-string...}",
|
|
325
|
+
"conversation": {
|
|
326
|
+
"title": "...",
|
|
327
|
+
"first_ts": 1716700000,
|
|
328
|
+
"last_ts": 1716800000,
|
|
329
|
+
"project_path": "/Users/x/work|null",
|
|
330
|
+
"parent_conversation_id": "...|null"
|
|
331
|
+
}
|
|
332
|
+
}
|
|
333
|
+
```
|
|
334
|
+
|
|
335
|
+
**Required fields:** `source`, `conversation_id`, `role`, `text`, `ts`.
|
|
336
|
+
**Stable identity for dedup:** `(source, conversation_id, msg_id)` — `msg_id`
|
|
337
|
+
may be null but if so the row is considered ephemeral and is NOT synced.
|
|
338
|
+
**Portable global identity:** `uuid` — populated by writer; if absent on a
|
|
339
|
+
synced row, receiver generates one on insert (so future pulls can refer to it).
|
|
340
|
+
|
|
341
|
+
### Cursor semantics
|
|
342
|
+
|
|
343
|
+
A **cursor** is one integer: the receiver's local `messages.id` of the last
|
|
344
|
+
row it observed from this peer. Cursor is **per-peer, per-direction**:
|
|
345
|
+
|
|
346
|
+
```
|
|
347
|
+
client_config.json:
|
|
348
|
+
"remotes": {
|
|
349
|
+
"vps": {
|
|
350
|
+
"url": "http://localhost:8765",
|
|
351
|
+
"bearer": "...",
|
|
352
|
+
"pulled_to": 18472, // we've pulled VPS rows up to its id 18472
|
|
353
|
+
"pushed_to": 9341 // we've pushed our rows up to our id 9341
|
|
354
|
+
}
|
|
355
|
+
}
|
|
356
|
+
```
|
|
357
|
+
|
|
358
|
+
Both endpoints are **strictly monotonic per peer**. Pull returns rows with
|
|
359
|
+
`id > since` ordered ASC by id. Push always sends rows with `id > pushed_to`
|
|
360
|
+
ordered ASC. Receivers never assume cursor monotonicity beyond a single peer.
|
|
361
|
+
|
|
362
|
+
### Idempotency
|
|
363
|
+
|
|
364
|
+
Push is **at-least-once**. Two identical push requests produce identical state
|
|
365
|
+
on the server (UNIQUE constraint absorbs dupes). The client is free to retry
|
|
366
|
+
indefinitely.
|
|
367
|
+
|
|
368
|
+
Pull is **at-least-once**. The client may receive the same row twice across
|
|
369
|
+
retries (e.g. network failure mid-batch). It must INSERT OR IGNORE on its side.
|
|
370
|
+
|
|
371
|
+
### Conversation upsert
|
|
372
|
+
|
|
373
|
+
`messages` and `conversations` are separate tables linked by `conversation_id`.
|
|
374
|
+
On every push, the receiver:
|
|
375
|
+
|
|
376
|
+
1. UPSERTs `conversations` row from `row.conversation` (latest values win on
|
|
377
|
+
`title`, `last_ts`, `message_count`).
|
|
378
|
+
2. INSERT OR IGNOREs the message via UNIQUE.
|
|
379
|
+
|
|
380
|
+
This way a conversation that exists only on Mac becomes a real row on VPS the
|
|
381
|
+
first time any of its messages arrives.
|
|
382
|
+
|
|
383
|
+
### Schema-version handshake
|
|
384
|
+
|
|
385
|
+
`GET /sync/health` reports `schema_version`. Client and server must match
|
|
386
|
+
**major schema version**. If client < server schema version: client refuses to
|
|
387
|
+
sync, prints "upgrade memex on this side". If client > server: same.
|
|
388
|
+
|
|
389
|
+
Schema versions bump only when wire shape changes (column adds that affect
|
|
390
|
+
sync). Pure additive changes that don't ship over the wire don't bump.
|
|
391
|
+
|
|
392
|
+
Initial sync schema version: **12**.
|
|
393
|
+
|
|
394
|
+
### Error semantics
|
|
395
|
+
|
|
396
|
+
| Code | Meaning | Client action |
|
|
397
|
+
|------|---------|---------------|
|
|
398
|
+
| 200 | OK | Continue |
|
|
399
|
+
| 400 | Bad request body | Log + abort; don't retry; this is a bug |
|
|
400
|
+
| 401 | Unauthorized | Token rotation needed; abort sync until reconfigured |
|
|
401
|
+
| 409 | Schema mismatch | Print upgrade instruction; abort |
|
|
402
|
+
| 413 | Payload too large | Reduce batch size and retry |
|
|
403
|
+
| 429 | Rate limited (too many concurrent pushes) | Honor Retry-After header |
|
|
404
|
+
| 500 | Server error | Exponential backoff, retry |
|
|
405
|
+
|
|
406
|
+
### Rate limits
|
|
407
|
+
|
|
408
|
+
The server may rate-limit per-token at **10 push requests per minute** and
|
|
409
|
+
**60 pull requests per minute**. Bursting above this returns 429 with
|
|
410
|
+
`Retry-After: <seconds>` header.
|
|
411
|
+
|
|
412
|
+
These limits exist to bound the worst case of a misconfigured client and are
|
|
413
|
+
generous for normal operation.
|
|
414
|
+
|
|
415
|
+
---
|
|
416
|
+
|
|
417
|
+
## Security model
|
|
418
|
+
|
|
419
|
+
### Authentication
|
|
420
|
+
|
|
421
|
+
**Bearer tokens** — 256-bit random, generated by `memex sync invite` on the
|
|
422
|
+
server side. Token is in `Authorization: Bearer <hex>` header on every request.
|
|
423
|
+
|
|
424
|
+
Tokens are stored on disk in `~/.memex/config.json` (mode 0600).
|
|
425
|
+
|
|
426
|
+
`memex sync rotate-token` invalidates the current token and prints a new pair
|
|
427
|
+
blob. Pre-existing connected clients break until they re-pair.
|
|
428
|
+
|
|
429
|
+
### Transport encryption
|
|
430
|
+
|
|
431
|
+
| Transport | How encryption is achieved |
|
|
432
|
+
|-----------|----------------------------|
|
|
433
|
+
| HTTPS + pair blob | Self-signed TLS, client pins server cert fingerprint |
|
|
434
|
+
| SSH tunnel | SSH transport |
|
|
435
|
+
| Tailscale | WireGuard tunnel between nodes |
|
|
436
|
+
| mDNS LAN | TLS with pinned fingerprint (same as HTTPS path) |
|
|
437
|
+
| Caddy + public HTTPS | Let's Encrypt-issued cert |
|
|
438
|
+
|
|
439
|
+
**Self-signed certs are pinned**: client refuses to talk to the server if the
|
|
440
|
+
TLS cert fingerprint doesn't match what was baked into the pair blob. This is
|
|
441
|
+
the same mechanism Plex/Tailscale/etc. use for device-to-device trust.
|
|
442
|
+
|
|
443
|
+
### Threat model
|
|
444
|
+
|
|
445
|
+
| Threat | Mitigation |
|
|
446
|
+
|--------|------------|
|
|
447
|
+
| Attacker on network sees bearer token | TLS encryption blocks |
|
|
448
|
+
| Attacker MITMs and replaces TLS cert | Cert pinning rejects |
|
|
449
|
+
| Stolen bearer token | `memex sync rotate-token` invalidates |
|
|
450
|
+
| Replay attack | Idempotent endpoints — no harm; receiver dedups |
|
|
451
|
+
| Malicious peer pushes garbage rows | Rate limit + payload size cap; rows still need valid `source/conv_id/msg_id` shape |
|
|
452
|
+
| Compromised peer pulls all our data | Bearer auth is binary (token = full access); for least-privilege you'd need per-source ACLs (future work) |
|
|
453
|
+
|
|
454
|
+
### Out of scope for security v1
|
|
455
|
+
|
|
456
|
+
- Per-conversation ACL (a peer can pull all your conversations or none)
|
|
457
|
+
- E2E encryption of payloads (we rely on transport encryption)
|
|
458
|
+
- mTLS (you can layer it on if you use Caddy)
|
|
459
|
+
- Signed rows (verifiable origin) — possible v2 if needed
|
|
460
|
+
|
|
461
|
+
---
|
|
462
|
+
|
|
463
|
+
## Trade-offs we made
|
|
464
|
+
|
|
465
|
+
| Choice | Why | Lose |
|
|
466
|
+
|---|---|---|
|
|
467
|
+
| HTTP push/pull + cursors | Replicache 2026 consensus pattern; idempotent; simple | Real-time — sync is up to 15 min stale |
|
|
468
|
+
| Local AUTOINCREMENT id as cursor | Per-DB monotonic, zero design overhead | Cursors not portable; each peer has its own |
|
|
469
|
+
| Self-signed cert + pinning | Zero DNS/CA infrastructure | Browser tooling can't poke the endpoint |
|
|
470
|
+
| Bearer token (not OAuth) | Days vs weeks to ship | Manual rotation |
|
|
471
|
+
| UNIQUE-constraint dedup | We don't edit verbatim — perfect fit | Cannot reconcile two divergent edits to the same logical row (we don't do that) |
|
|
472
|
+
| Skip CRDT / cr-sqlite | Maintenance risk + extension dependency | If we ever want concurrent-edit reconciliation, we'd need to revisit |
|
|
473
|
+
| Hub-and-spoke for 2 nodes | P2P degenerate at N=2; VPS always-on anyway | Single point of failure (mitigated: laptop keeps full local copy) |
|
|
474
|
+
| Schema-version handshake | Refuse to silently corrupt data on version skew | Coupling clients to specific server versions |
|
|
475
|
+
|
|
476
|
+
---
|
|
477
|
+
|
|
478
|
+
## Out of scope (deliberately)
|
|
479
|
+
|
|
480
|
+
- **Selective sync per conversation** — v2. v1 syncs everything.
|
|
481
|
+
- **Web UI for sync state** — `memex sync status` CLI is the surface.
|
|
482
|
+
- **Multi-VPS / N-device sync** — works (each Mac points at one VPS) but the
|
|
483
|
+
config UX is single-pair-only in v1.
|
|
484
|
+
- **Sync of archived conversations** — currently archive is local-only flag.
|
|
485
|
+
TBD whether archives should sync.
|
|
486
|
+
- **End-to-end encryption** — transport encryption is enough for v1 given the
|
|
487
|
+
threat model.
|
|
488
|
+
- **Cloud relay** — never. Against memex's local-first principle.
|
|
489
|
+
|
|
490
|
+
---
|
|
491
|
+
|
|
492
|
+
## Deployed patterns (what's been proven live)
|
|
493
|
+
|
|
494
|
+
The "lazy-user mesh" got real-world stress tests over several days. These are
|
|
495
|
+
patterns we observed working end-to-end, in order of decreasing dependence on
|
|
496
|
+
network privilege.
|
|
497
|
+
|
|
498
|
+
### A. VPS-as-hub on a public port (the assumed default)
|
|
499
|
+
|
|
500
|
+
VPS exposes the sync-server on some port (e.g., 8766 or 443 behind nginx).
|
|
501
|
+
Spokes dial it directly. Works when: VPS firewall + spokes' egress both allow
|
|
502
|
+
the port.
|
|
503
|
+
|
|
504
|
+
**Fragility we hit:** ISP/VPN/cloud-SG can silently start blocking the port
|
|
505
|
+
that "worked yesterday" — without a guest reboot or any user change. We saw
|
|
506
|
+
this on a HOSTKEY VPS where 8766 just stopped passing externally. Public
|
|
507
|
+
ports are subject to anyone's firewall above the OS.
|
|
508
|
+
|
|
509
|
+
### B. SSH tunnel (spoke initiates `ssh -L`) ⭐ THE CANONICAL PATTERN — automated by `sync-join`
|
|
510
|
+
|
|
511
|
+
Spoke `ssh -L 8766:localhost:8766 user@vps` over the existing port 22; sync
|
|
512
|
+
client talks to `localhost:8766`. Hub doesn't expose 8766 publicly; the
|
|
513
|
+
spoke's Mac VPN/proxy mostly relays SSH (it usually does to standard ports).
|
|
514
|
+
|
|
515
|
+
This is what `memex-sync sync-join` builds (v0.13): the always-on server is
|
|
516
|
+
the hub on loopback, the laptop dials out with a supervised `-L` tunnel.
|
|
517
|
+
Strictly better than C for the common laptop+server case — the authoritative
|
|
518
|
+
always-reachable node is the one that's actually always on. **Live since
|
|
519
|
+
2026-06-11** on the maintainer's own Mac↔VPS pair (migrated off pattern C
|
|
520
|
+
via `sync-join` itself; built-in marker self-test round-tripped in 3.4s).
|
|
521
|
+
|
|
522
|
+
### C. Mac-as-hub via reverse SSH tunnel (`ssh -R`) ⭐
|
|
523
|
+
|
|
524
|
+
The inversion that solved everything when public ports failed across the board:
|
|
525
|
+
|
|
526
|
+
```
|
|
527
|
+
Mac runs sync-server on localhost:8766 (non-privileged, no root)
|
|
528
|
+
Mac runs: ssh -fN -R 8766:127.0.0.1:8766 user@vps
|
|
529
|
+
|
|
530
|
+
On the VPS, sshd creates a LOOPBACK listener on 127.0.0.1:8766
|
|
531
|
+
that forwards through the existing SSH connection back to Mac.
|
|
532
|
+
|
|
533
|
+
The VPS-side memex agent then runs:
|
|
534
|
+
sync-add mac https://localhost:8766 <mac-bearer> --insecure
|
|
535
|
+
sync-run mac
|
|
536
|
+
sync-schedule install --every 5m
|
|
537
|
+
```
|
|
538
|
+
|
|
539
|
+
The radical property: **the only port traversed is 22, which is already open
|
|
540
|
+
on every VPS by definition (otherwise you couldn't have provisioned it).**
|
|
541
|
+
No firewall change anywhere. The sync-server is bound to loopback on both
|
|
542
|
+
ends — nothing public, anywhere. Cloud SG, ufw, the Mac's full-tunnel VPN
|
|
543
|
+
proxy — all irrelevant.
|
|
544
|
+
|
|
545
|
+
The trade-off: the Mac is a laptop. When it sleeps or the network changes,
|
|
546
|
+
the SSH tunnel dies. The VPS's scheduler then sees `peer unreachable` for
|
|
547
|
+
each tick until Mac wakes and re-establishes the tunnel. Acceptable for
|
|
548
|
+
"used-daily-driver" workflows; sync pauses, never loses data.
|
|
549
|
+
|
|
550
|
+
### D. Transit-hub: chained `ssh -R` for a node Mac can't reach directly
|
|
551
|
+
|
|
552
|
+
Pattern C breaks when the Mac's VPN proxy refuses to relay SSH to a particular
|
|
553
|
+
destination (we saw this on Mac → Alibaba Asia: banner-exchange timeout even
|
|
554
|
+
though SSH worked fine to a European VPS). The fix: that node initiates its
|
|
555
|
+
own `ssh -R` to a third node that Mac CAN reach.
|
|
556
|
+
|
|
557
|
+
```
|
|
558
|
+
Mac (sync-server localhost:8766)
|
|
559
|
+
▲
|
|
560
|
+
│ ssh -fN -R 8766:localhost:8766 (Mac → VPS-EU)
|
|
561
|
+
│
|
|
562
|
+
VPS-EU (transit-hub, openclaw user, no sudo)
|
|
563
|
+
├ localhost:8766 = Mac via Mac's tunnel
|
|
564
|
+
└ localhost:8767 = Asia VPS via its own tunnel
|
|
565
|
+
▲
|
|
566
|
+
│ ssh -fN -R 8767:localhost:8766 (Asia VPS → VPS-EU)
|
|
567
|
+
│
|
|
568
|
+
Asia VPS (sync-server localhost:8766)
|
|
569
|
+
```
|
|
570
|
+
|
|
571
|
+
The transit-hub runs `sync-run --all` periodically and converges everyone.
|
|
572
|
+
No spoke ever exposes a public port; the transit-hub only exposes port 22
|
|
573
|
+
(which was already open); the only required network capability anywhere is
|
|
574
|
+
"outbound SSH to one node the other spokes can also reach outbound."
|
|
575
|
+
|
|
576
|
+
This generalizes: any number of spokes can join the same transit-hub by
|
|
577
|
+
reverse-tunneling in. The transit-hub's bearer is the only thing that's
|
|
578
|
+
shared. Each spoke needs SSH access to the transit-hub (one pubkey paste
|
|
579
|
+
into `~/.ssh/authorized_keys` per spoke, no sudo).
|
|
580
|
+
|
|
581
|
+
**Real session evidence:** the 3-node mesh (Mac in San Francisco / VPN + a
|
|
582
|
+
HOSTKEY VPS in Milan + an Alibaba VPS in Asia) ran this exact topology after
|
|
583
|
+
every other public-port approach hit a firewall wall. 33k + 7k rows synced
|
|
584
|
+
cleanly via SSH tunnels at ~165 s/round.
|
|
585
|
+
|
|
586
|
+
**Topology update (2026-06-11):** the Mac↔VPS-EU leg has since migrated to
|
|
587
|
+
the canonical pattern B via `sync-join` (VPS-EU is now the hub on loopback;
|
|
588
|
+
the Mac dials in with a supervised `-L` tunnel). The Asia spoke still uses
|
|
589
|
+
its D-style reverse tunnel into VPS-EU, whose 5-min schedule keeps the
|
|
590
|
+
third node converged — C/D remain the right tools when a node can't be a
|
|
591
|
+
normal `-L` client.
|
|
592
|
+
|
|
593
|
+
---
|
|
594
|
+
|
|
595
|
+
## Roadmap / backlog
|
|
596
|
+
|
|
597
|
+
Surfaced while taking sync from tracer-bullet to a live 3-node mesh
|
|
598
|
+
(Mac + two VPSes). Ordered roughly by priority.
|
|
599
|
+
|
|
600
|
+
### 1. Mesh-bootstrap wizard ⭐ (top priority, the consolidating product feature)
|
|
601
|
+
|
|
602
|
+
The end-state of everything else below. A prompt-driven, agent-mediated
|
|
603
|
+
setup that empirically discovers reachability between user's nodes, picks
|
|
604
|
+
the best topology automatically (deployed pattern A → B → C → D from above,
|
|
605
|
+
in decreasing order of "ideal" reachability), and emits ready-to-paste
|
|
606
|
+
prompts for each agent. The user never has to know whether their setup is
|
|
607
|
+
"VPS-as-hub" or "Mac-as-hub" or "transit-hub" — the wizard figures it out
|
|
608
|
+
and explains the choice.
|
|
609
|
+
|
|
610
|
+
**UX sketch** (interactive, via Mac CLI or a memex MCP tool):
|
|
611
|
+
|
|
612
|
+
```
|
|
613
|
+
$ memex-sync mesh bootstrap
|
|
614
|
+
|
|
615
|
+
Wizard: Which agents do you have? (multi-select)
|
|
616
|
+
[ ] OpenClaw [ ] Hermes [ ] Kimi [ ] Custom
|
|
617
|
+
|
|
618
|
+
Wizard: For each, paste the probe prompt into the agent's chat and paste the
|
|
619
|
+
reply back here. (The probe is read-only — `whoami`, `nc -z github.com:443`,
|
|
620
|
+
`ss -ltn`, `sudo -ln`, etc.)
|
|
621
|
+
|
|
622
|
+
[Wizard parses all replies]
|
|
623
|
+
|
|
624
|
+
Wizard: Building reachability matrix…
|
|
625
|
+
✓ Mac can reach OpenClaw-VPS on :22 (banner OK, ~120ms)
|
|
626
|
+
✗ Mac can reach Kimi-VPS on :22 (banner timeout — your VPN proxy
|
|
627
|
+
relays SSH to Europe, drops it to Asia)
|
|
628
|
+
✓ Kimi-VPS can reach OpenClaw-VPS on :22 (Asia → Europe, ~280ms)
|
|
629
|
+
✓ All three reach github.com:443 — internet works everywhere
|
|
630
|
+
|
|
631
|
+
Wizard: Best plan: **Mac-as-hub with OpenClaw-VPS as transit**.
|
|
632
|
+
Why: Kimi can't be reached from Mac directly (proxy block); but
|
|
633
|
+
Kimi CAN reach OpenClaw-VPS, which Mac CAN reach. So OpenClaw-VPS
|
|
634
|
+
becomes a transit point. (Deployed pattern D from SYNC.md.)
|
|
635
|
+
No public ports needed anywhere.
|
|
636
|
+
|
|
637
|
+
[show topology diagram]
|
|
638
|
+
[confirm: y/n]
|
|
639
|
+
|
|
640
|
+
Wizard: Generating setup prompts. Paste each into the indicated chat:
|
|
641
|
+
|
|
642
|
+
— Prompt for OpenClaw: [add Mac pubkey + add Kimi pubkey + sync-add mac + sync-schedule]
|
|
643
|
+
— Prompt for Kimi: [ssh -R outbound to OpenClaw]
|
|
644
|
+
— Mac will: [ssh -R outbound to OpenClaw + start local sync-server]
|
|
645
|
+
|
|
646
|
+
Wizard: Paste OpenClaw's reply confirming pubkeys added…
|
|
647
|
+
Paste Kimi's reply confirming ssh -R up…
|
|
648
|
+
|
|
649
|
+
Wizard: Establishing Mac's ssh -R… [does it]
|
|
650
|
+
Verifying via marker propagation… [posts a marker to local sync-server,
|
|
651
|
+
polls each agent over their tunneled connection until the marker
|
|
652
|
+
appears in their DB]
|
|
653
|
+
✓ marker reached OpenClaw in 8s
|
|
654
|
+
✓ marker reached Kimi in 14s (via transit)
|
|
655
|
+
Mesh up.
|
|
656
|
+
|
|
657
|
+
Wizard: Installing durability layer…
|
|
658
|
+
✓ LaunchAgent on Mac with autossh-style retry for ssh -R to OpenClaw
|
|
659
|
+
✓ systemd-user on Kimi for ssh -R to OpenClaw with restart
|
|
660
|
+
Mesh self-heals on Mac sleep/network change.
|
|
661
|
+
```
|
|
662
|
+
|
|
663
|
+
**Implementation shape:**
|
|
664
|
+
|
|
665
|
+
- `lib/sync/bootstrap.js` — a state-machine wizard.
|
|
666
|
+
- `memex_mesh_bootstrap` MCP tool — agent-facing entry; Mac's Claude Code
|
|
667
|
+
invokes it, the conversation IS the wizard.
|
|
668
|
+
- `scripts/probe-prompt.sh` (or generated) — the standard read-only probe
|
|
669
|
+
any agent runs once and replies with structured output.
|
|
670
|
+
- Topology decision is the empirical heart: a reachability matrix +
|
|
671
|
+
preference order (A > B > C > D in the deployed-patterns section).
|
|
672
|
+
- Marker propagation is the end-to-end test: posts a known message via Mac's
|
|
673
|
+
sync-server, polls each remote until it appears, gives a per-hop latency.
|
|
674
|
+
|
|
675
|
+
This is the consolidating feature that absorbs items 2, 3, and the older
|
|
676
|
+
auto-hub election idea: every constituent decision (which port to use, when
|
|
677
|
+
to fall back to SSH, when to ssh -R, whether to ssh-R-chain) becomes a
|
|
678
|
+
branch in the wizard's decision tree, made from empirical data rather than
|
|
679
|
+
operator guesswork.
|
|
680
|
+
|
|
681
|
+
### 1b. Self-healing tunnels — durability that ships to every user ⭐
|
|
682
|
+
|
|
683
|
+
When the mesh runs on patterns C or D (SSH reverse tunnels), the tunnels
|
|
684
|
+
themselves are fragile: laptop sleep, network change, VPN toggle, or VPS reboot
|
|
685
|
+
kills them. **This is not theoretical — it cost real data.** On 2026-06-07 the
|
|
686
|
+
Mac↔VPS1 tunnel had been dead since a sleep on ~06-02; six days of OpenClaw
|
|
687
|
+
research (incl. the whole ECC investigation, ~125 rows) sat stranded on the VPS
|
|
688
|
+
and never reached the main store. The 5-min schedule kept firing into a dead
|
|
689
|
+
tunnel and **failed silently**. Two lessons drive this design:
|
|
690
|
+
|
|
691
|
+
1. **Auto-heal** so breaks are rare and short.
|
|
692
|
+
2. **Never fail silently** — when a break *can't* self-heal (key rotated, VPS
|
|
693
|
+
gone, account suspended), the user must find out in hours, not days. The
|
|
694
|
+
second matters more than the first.
|
|
695
|
+
|
|
696
|
+
#### Proven reference implementation (live, 2026-06-07)
|
|
697
|
+
|
|
698
|
+
Hand-built and verified on the Mac hub — this is the prototype the product
|
|
699
|
+
should generate:
|
|
700
|
+
|
|
701
|
+
- `~/.memex/sync-tunnel.sh` — `exec ssh -N … -o ServerAliveInterval=30
|
|
702
|
+
-o ServerAliveCountMax=3 -o ExitOnForwardFailure=yes -R 127.0.0.1:8766:127.0.0.1:8766
|
|
703
|
+
openclaw@VPS` (foreground, no `-f`; explicit IPv4 loopback to avoid the
|
|
704
|
+
earlier `::1`-only bind bug).
|
|
705
|
+
- `~/Library/LaunchAgents/com.parallelclaw.memex.synctunnel.plist` — `KeepAlive=true`
|
|
706
|
+
+ `RunAtLoad=true` + `ThrottleInterval=15`. launchd respawns ssh whenever it
|
|
707
|
+
exits (sleep/wake, network change, drop).
|
|
708
|
+
- **Self-test passed**: killed the ssh PID → launchd respawned it in ~15s →
|
|
709
|
+
loopback listener + sync endpoint (cert D2:96) back automatically.
|
|
710
|
+
|
|
711
|
+
#### Architectural decision: supervise the tunnel *inside the memex daemon*
|
|
712
|
+
|
|
713
|
+
The reference is a "dumb" OS supervisor over a raw `ssh`. The product should go
|
|
714
|
+
one level up: **fold tunnel supervision into the long-running memex process**
|
|
715
|
+
(the sync-server / capture daemon the OS already keeps alive). Then the OS keeps
|
|
716
|
+
ONE thing alive (memex); memex keeps the tunnel alive. Benefits:
|
|
717
|
+
|
|
718
|
+
- **One supervisor tree**, not two (OS→ssh becomes OS→memex→ssh).
|
|
719
|
+
- memex still delegates crypto/transport to `ssh` (no SSH reimpl) but owns
|
|
720
|
+
**retry/backoff, error classification, and health** — so it can show status
|
|
721
|
+
and surface failure (impossible when ssh is an opaque sibling of launchd).
|
|
722
|
+
- `ssh` stays a child process; on hub nodes no *new* OS unit is needed (reuse
|
|
723
|
+
the existing sync-server unit). Spoke-only nodes (no server) get a dedicated
|
|
724
|
+
keeper unit.
|
|
725
|
+
|
|
726
|
+
#### Components
|
|
727
|
+
|
|
728
|
+
1. **In-daemon tunnel keeper.** Tunnel spec in `config.json`
|
|
729
|
+
(`{peer, direction, local_port, remote_port, ssh_target, identity}`).
|
|
730
|
+
Supervisor loop: spawn ssh → on exit, **classify** the failure:
|
|
731
|
+
- *auth failure* → STOP + surface (don't loop forever on a dead key);
|
|
732
|
+
- *network unreachable* → exponential backoff (cap ~2 min);
|
|
733
|
+
- *bind conflict* (stale remote listener after a drop) → fast retry, the
|
|
734
|
+
listener clears within seconds;
|
|
735
|
+
- *clean drop* → immediate re-establish.
|
|
736
|
+
2. **Zombie-tunnel detection.** TCP-up ≠ data-flowing (the `nc -z` lies through
|
|
737
|
+
the Xray proxy bit us already). Keeper periodically curls `/sync/health` with
|
|
738
|
+
the bearer *through* the tunnel; if TCP is up but data is dead, recycle it.
|
|
739
|
+
3. **OS-unit generation** (extend `lib/sync/service.js`, which already builds
|
|
740
|
+
server + schedule units). Add tunnel-keeper variants only where a spoke runs
|
|
741
|
+
no server: `buildTunnelLaunchAgentPlist` (KeepAlive) / `buildTunnelSystemdUnit`
|
|
742
|
+
(`Restart=always` + `RestartSec` + linger). Reuse the existing
|
|
743
|
+
`MEMEX_SYNC_EXPERIMENTAL` injection + log-path conventions.
|
|
744
|
+
4. **Observability — `memex sync status`.** Per peer: tunnel state
|
|
745
|
+
(up/healing/down), last successful sync, last heal time, consecutive
|
|
746
|
+
failures, last error class. Turns "it silently broke" into a glance.
|
|
747
|
+
5. **Failure surfacing (the core lesson).** When a peer's sync has been failing
|
|
748
|
+
past a threshold (e.g. tunnel down > 1 h), surface it loudly:
|
|
749
|
+
- a line in the SessionStart auto-context: *"⚠️ sync to peer X down 6 days —
|
|
750
|
+
N conversations not backed up"*;
|
|
751
|
+
- optionally an OS notification.
|
|
752
|
+
This is what would have caught the 2026-06-07 incident on day one.
|
|
753
|
+
6. **Install UX + proof.** `memex sync durability install` (or the wizard's final
|
|
754
|
+
step): detect OS → generate + load unit(s) → **run the kill→respawn self-test**
|
|
755
|
+
→ report "self-healing active (verified)". Shipping the self-test means the
|
|
756
|
+
user gets *proof*, not a promise.
|
|
757
|
+
|
|
758
|
+
#### Edge cases the productized version must handle
|
|
759
|
+
|
|
760
|
+
- **Idempotency** — re-install must not stack duplicate tunnels/units.
|
|
761
|
+
- **Passphrase keys** — reference key had none; a protected key needs
|
|
762
|
+
agent/keychain integration (`UseKeychain`/`AddKeysToAgent` on macOS, ssh-agent
|
|
763
|
+
on Linux). Detect and guide.
|
|
764
|
+
- **Multiple peers** — a hub may dial out to N spokes; the keeper manages N specs.
|
|
765
|
+
- **No-VPS users** — patterns C/D need one publicly-reachable sshd (the VPS as
|
|
766
|
+
rendezvous). Users with two laptops and no VPS have nowhere to dial; that
|
|
767
|
+
segment needs a relay (bring-your-own $5 VPS, or a future managed
|
|
768
|
+
memex-relay — see the OSS-free / managed-tunnels-paid split noted in backlog
|
|
769
|
+
discussion). Don't pretend SSH-R covers them.
|
|
770
|
+
|
|
771
|
+
### 2. `sync-server invite` external-reachability check
|
|
772
|
+
|
|
773
|
+
`memex_sync_invite` currently probes only the *local* port (127.0.0.1). It
|
|
774
|
+
happily emits a blob whose `host` is a public IP that's actually firewalled at
|
|
775
|
+
the cloud layer (the Alibaba case). It should additionally attempt an external
|
|
776
|
+
reachability hint and warn: "listening locally but the public host may be
|
|
777
|
+
blocked by your cloud Security Group — verify, or pair the spoke outbound to an
|
|
778
|
+
already-reachable hub instead."
|
|
779
|
+
|
|
780
|
+
### 3. Transport auto-management (deferred from Phase 6)
|
|
781
|
+
|
|
782
|
+
The transports work today via manual steps; automate the setup:
|
|
783
|
+
- **autossh** LaunchAgent/systemd to keep an SSH tunnel up (for SSH-reachable
|
|
784
|
+
hubs without an open public port).
|
|
785
|
+
- **Tailscale** auto-detect + `tailscale up` from a prompt (needs a one-time
|
|
786
|
+
auth key — moves the human action to the TS console, doesn't remove it).
|
|
787
|
+
- **mDNS LAN** discovery (`_memex._tcp.local`) for two machines on the same
|
|
788
|
+
Wi-Fi with no VPS.
|
|
789
|
+
|
|
790
|
+
### 4. Kimi Code CLI capture bridge
|
|
791
|
+
|
|
792
|
+
The standalone Kimi Code CLI writes `~/.kimi/sessions/<uuid>/context.jsonl`
|
|
793
|
+
(roles `_system_prompt`/`_checkpoint`/…) which the capture daemon doesn't watch
|
|
794
|
+
or parse. (Kimi accessed *through* OpenClaw — channel `kimi-web` — is already
|
|
795
|
+
captured.) An inbox-bridge (`kimi-to-memex` → `~/.memex/inbox/`) keeps the sync
|
|
796
|
+
engine untouched and isolates us from Moonshot format changes. Low priority —
|
|
797
|
+
the OpenClaw path already covers the common case.
|
|
798
|
+
|
|
799
|
+
### 5. Push-side skip surfacing
|
|
800
|
+
|
|
801
|
+
`POST /sync/push` applies rows via the shared row-applier, which counts `skipped`,
|
|
802
|
+
but the HTTP response doesn't return it to the pushing client (only the pull path
|
|
803
|
+
surfaces skips). A server-side FTS corruption could silently drop pushed rows
|
|
804
|
+
without the client knowing. Mirror the pull-side retry/abort on the server, or
|
|
805
|
+
return `skipped` in the push response so the client can react.
|
|
806
|
+
|
|
807
|
+
### 6. Provenance: per-row `origin` (node identity) — ✅ SHIPPED in v0.14.0
|
|
808
|
+
|
|
809
|
+
In a synced mesh, nothing records WHICH node captured a row. Two live failures
|
|
810
|
+
on the maintainer's 3-node mesh (2026-06-12), both from agents querying their
|
|
811
|
+
own synced DB:
|
|
812
|
+
|
|
813
|
+
- An agent looked for its peer's sessions, found no `source='vps1'` (a label
|
|
814
|
+
it *invented* — telling: that's how users expect it to work), and concluded
|
|
815
|
+
sync was broken. The 12,826 peer rows were present all along — blended into
|
|
816
|
+
the same `source='openclaw'` its own capture uses.
|
|
817
|
+
- Sharper: **conversation-key collision across nodes.** Two OpenClaw instances
|
|
818
|
+
(different VPSes) both capture the same human's Telegram presence, keyed by
|
|
819
|
+
the same Telegram id → both write `openclaw-tg-<id>` and sync MERGES two
|
|
820
|
+
different agents' dialogues into ONE interleaved conversation. msg_id-dedup
|
|
821
|
+
keeps it lossless, but "what did I discuss with agent A vs agent B" is
|
|
822
|
+
unanswerable.
|
|
823
|
+
|
|
824
|
+
Fix shape (additive, wire-compatible):
|
|
825
|
+
- `origin` column on messages (e.g. short host label or stable node id),
|
|
826
|
+
stamped at CAPTURE time, carried verbatim on the wire like `channel`.
|
|
827
|
+
- `origin:` filter in `memex_search` + origin shown in `get_conversation`
|
|
828
|
+
headers; `memex_overview` breaks counts down per origin.
|
|
829
|
+
- Do NOT namespace conversation ids by node — same-chat dedup across nodes is
|
|
830
|
+
a feature (live capture + export of the same chat must still converge).
|
|
831
|
+
Provenance belongs on the row, not in the key.
|
|
832
|
+
- Backfill: deliberately NONE by default — post-hoc a node cannot tell its
|
|
833
|
+
own NULL rows from peer rows that synced in pre-provenance, and a blind
|
|
834
|
+
stamp would FABRICATE provenance. Instead: forward-stamping from v0.14 +
|
|
835
|
+
the conflict branch backfills origin when a local RE-IMPORT of the source
|
|
836
|
+
file re-encounters a row (origin = COALESCE(existing, incoming) — never
|
|
837
|
+
overwrites). History without a re-import stays NULL = "pre-v0.14 era".
|
|
838
|
+
|
|
839
|
+
As shipped: `getOrigin()` (env MEMEX_ORIGIN → config `origin` → persisted
|
|
840
|
+
sanitised hostname) baked into every local-capture INSERT; wire carries
|
|
841
|
+
`origin` verbatim both directions; `memex_search(origin:)`; multi-origin
|
|
842
|
+
conversations tag lines `[@origin]` in `memex_get_conversation`;
|
|
843
|
+
`memex_overview` shows the per-origin breakdown; OpenClaw plugin stamps via
|
|
844
|
+
the same resolution (reads config, never writes it).
|