@rubytech/create-realagent 1.0.826 → 1.0.829
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/package.json +1 -1
- package/payload/platform/neo4j/schema.cypher +35 -2
- package/payload/platform/package.json +2 -2
- package/payload/platform/plugins/admin/hooks/__tests__/archive-ingest-surface-gate.test.sh +39 -54
- package/payload/platform/plugins/admin/hooks/archive-ingest-surface-gate.sh +26 -52
- package/payload/platform/plugins/admin/skills/onboarding/SKILL.md +7 -7
- package/payload/platform/plugins/docs/references/cloudflare.md +1 -1
- package/payload/platform/plugins/docs/references/plugins-guide.md +1 -1
- package/payload/platform/plugins/docs/references/troubleshooting.md +1 -0
- package/payload/platform/plugins/memory/PLUGIN.md +5 -5
- package/payload/platform/plugins/memory/mcp/dist/index.js +18 -253
- package/payload/platform/plugins/memory/mcp/dist/index.js.map +1 -1
- package/payload/platform/plugins/memory/mcp/dist/lib/__tests__/llm-classifier.test.js +51 -0
- package/payload/platform/plugins/memory/mcp/dist/lib/__tests__/llm-classifier.test.js.map +1 -1
- package/payload/platform/plugins/memory/mcp/dist/lib/__tests__/schema-validator.test.js +103 -0
- package/payload/platform/plugins/memory/mcp/dist/lib/__tests__/schema-validator.test.js.map +1 -1
- package/payload/platform/plugins/memory/mcp/dist/lib/llm-classifier.d.ts +19 -4
- package/payload/platform/plugins/memory/mcp/dist/lib/llm-classifier.d.ts.map +1 -1
- package/payload/platform/plugins/memory/mcp/dist/lib/llm-classifier.js +149 -56
- package/payload/platform/plugins/memory/mcp/dist/lib/llm-classifier.js.map +1 -1
- package/payload/platform/plugins/memory/mcp/dist/lib/schema-validator.d.ts +16 -1
- package/payload/platform/plugins/memory/mcp/dist/lib/schema-validator.d.ts.map +1 -1
- package/payload/platform/plugins/memory/mcp/dist/lib/schema-validator.js +12 -3
- package/payload/platform/plugins/memory/mcp/dist/lib/schema-validator.js.map +1 -1
- package/payload/platform/plugins/memory/mcp/dist/tools/__tests__/memory-archive-write.test.js +2 -138
- package/payload/platform/plugins/memory/mcp/dist/tools/__tests__/memory-archive-write.test.js.map +1 -1
- package/payload/platform/plugins/memory/mcp/dist/tools/__tests__/memory-ingest.test.d.ts +2 -0
- package/payload/platform/plugins/memory/mcp/dist/tools/__tests__/memory-ingest.test.d.ts.map +1 -0
- package/payload/platform/plugins/memory/mcp/dist/tools/__tests__/memory-ingest.test.js +66 -0
- package/payload/platform/plugins/memory/mcp/dist/tools/__tests__/memory-ingest.test.js.map +1 -0
- package/payload/platform/plugins/memory/mcp/dist/tools/__tests__/profile-update-personfields-open.test.d.ts +2 -0
- package/payload/platform/plugins/memory/mcp/dist/tools/__tests__/profile-update-personfields-open.test.d.ts.map +1 -0
- package/payload/platform/plugins/memory/mcp/dist/tools/__tests__/profile-update-personfields-open.test.js +148 -0
- package/payload/platform/plugins/memory/mcp/dist/tools/__tests__/profile-update-personfields-open.test.js.map +1 -0
- package/payload/platform/plugins/memory/mcp/dist/tools/memory-archive-write.d.ts +1 -64
- package/payload/platform/plugins/memory/mcp/dist/tools/memory-archive-write.d.ts.map +1 -1
- package/payload/platform/plugins/memory/mcp/dist/tools/memory-archive-write.js +6 -336
- package/payload/platform/plugins/memory/mcp/dist/tools/memory-archive-write.js.map +1 -1
- package/payload/platform/plugins/memory/mcp/dist/tools/memory-ingest.d.ts +30 -0
- package/payload/platform/plugins/memory/mcp/dist/tools/memory-ingest.d.ts.map +1 -1
- package/payload/platform/plugins/memory/mcp/dist/tools/memory-ingest.js +231 -0
- package/payload/platform/plugins/memory/mcp/dist/tools/memory-ingest.js.map +1 -1
- package/payload/platform/plugins/memory/mcp/dist/tools/profile-update.d.ts +21 -17
- package/payload/platform/plugins/memory/mcp/dist/tools/profile-update.d.ts.map +1 -1
- package/payload/platform/plugins/memory/mcp/dist/tools/profile-update.js +77 -37
- package/payload/platform/plugins/memory/mcp/dist/tools/profile-update.js.map +1 -1
- package/payload/platform/plugins/memory/references/schema-base.md +7 -2
- package/payload/platform/plugins/memory/skills/document-ingest/SKILL.md +54 -4
- package/payload/platform/plugins/whatsapp/PLUGIN.md +1 -1
- package/payload/platform/plugins/whatsapp-import/lib/dist/delta-cursor.d.ts +18 -0
- package/payload/platform/plugins/whatsapp-import/lib/dist/delta-cursor.d.ts.map +1 -0
- package/payload/platform/plugins/whatsapp-import/lib/dist/delta-cursor.js +31 -0
- package/payload/platform/plugins/whatsapp-import/lib/dist/delta-cursor.js.map +1 -0
- package/payload/platform/plugins/whatsapp-import/lib/dist/derive-keys.d.ts +27 -12
- package/payload/platform/plugins/whatsapp-import/lib/dist/derive-keys.d.ts.map +1 -1
- package/payload/platform/plugins/whatsapp-import/lib/dist/derive-keys.js +40 -20
- package/payload/platform/plugins/whatsapp-import/lib/dist/derive-keys.js.map +1 -1
- package/payload/platform/plugins/whatsapp-import/lib/dist/index.d.ts +7 -4
- package/payload/platform/plugins/whatsapp-import/lib/dist/index.d.ts.map +1 -1
- package/payload/platform/plugins/whatsapp-import/lib/dist/index.js +9 -6
- package/payload/platform/plugins/whatsapp-import/lib/dist/index.js.map +1 -1
- package/payload/platform/plugins/whatsapp-import/lib/dist/sessionize.d.ts +25 -0
- package/payload/platform/plugins/whatsapp-import/lib/dist/sessionize.d.ts.map +1 -0
- package/payload/platform/plugins/whatsapp-import/lib/dist/sessionize.js +48 -0
- package/payload/platform/plugins/whatsapp-import/lib/dist/sessionize.js.map +1 -0
- package/payload/platform/plugins/whatsapp-import/lib/dist/to-classifier-input.d.ts +3 -0
- package/payload/platform/plugins/whatsapp-import/lib/dist/to-classifier-input.d.ts.map +1 -0
- package/payload/platform/plugins/whatsapp-import/lib/dist/to-classifier-input.js +47 -0
- package/payload/platform/plugins/whatsapp-import/lib/dist/to-classifier-input.js.map +1 -0
- package/payload/platform/scripts/seed-neo4j.sh +15 -14
- package/payload/platform/templates/specialists/agents/database-operator.md +10 -17
- package/payload/server/chunk-CUSH3UXP.js +2305 -0
- package/payload/server/chunk-IWNDVGKT.js +10077 -0
- package/payload/server/chunk-KC7NUABI.js +654 -0
- package/payload/server/chunk-T2OPNP3L.js +654 -0
- package/payload/server/chunk-WUVXPZIV.js +1116 -0
- package/payload/server/client-pool-3TM3SRIA.js +32 -0
- package/payload/server/cloudflare-task-tracker-4NIODMGL.js +19 -0
- package/payload/server/cloudflare-task-tracker-CR6TL4VL.js +19 -0
- package/payload/server/maxy-edge.js +3 -3
- package/payload/server/neo4j-migrations-XTQ4WEV6.js +428 -0
- package/payload/server/public/assets/{admin-DOkUspG1.js → admin-BNwPsMhJ.js} +2 -2
- package/payload/server/public/assets/{graph-LLMJa4Ch.js → graph-N_Bw-8oT.js} +1 -1
- package/payload/server/public/assets/{page-DoaF3DB0.js → page-BKLGP-th.js} +1 -1
- package/payload/server/public/graph.html +2 -2
- package/payload/server/public/index.html +2 -2
- package/payload/server/server.js +281 -168
- package/payload/platform/plugins/whatsapp-import/PLUGIN.md +0 -46
- package/payload/platform/plugins/whatsapp-import/bin/ingest.mjs +0 -670
- package/payload/platform/plugins/whatsapp-import/bin/whatsapp-ingest.sh +0 -131
- package/payload/platform/plugins/whatsapp-import/lib/src/__tests__/filter-gate.test.ts +0 -172
- package/payload/platform/plugins/whatsapp-import/lib/src/__tests__/ingest-idempotence.test.ts +0 -141
- package/payload/platform/plugins/whatsapp-import/lib/src/__tests__/parse-export-lrm.test.ts +0 -83
- package/payload/platform/plugins/whatsapp-import/lib/src/__tests__/parse-export.test.ts +0 -678
- package/payload/platform/plugins/whatsapp-import/lib/src/derive-keys.ts +0 -59
- package/payload/platform/plugins/whatsapp-import/lib/src/filter.ts +0 -136
- package/payload/platform/plugins/whatsapp-import/lib/src/index.ts +0 -19
- package/payload/platform/plugins/whatsapp-import/lib/src/parse-export.ts +0 -471
- package/payload/platform/plugins/whatsapp-import/lib/tsconfig.json +0 -9
- package/payload/platform/plugins/whatsapp-import/lib/vitest.config.ts +0 -9
- package/payload/platform/plugins/whatsapp-import/skills/whatsapp-import/SKILL.md +0 -131
- package/payload/platform/plugins/whatsapp-import/skills/whatsapp-import/references/export-parse.md +0 -109
- package/payload/platform/plugins/whatsapp-import/skills/whatsapp-import-enrich/SKILL.md +0 -333
|
@@ -1,131 +0,0 @@
|
|
|
1
|
-
---
|
|
2
|
-
name: whatsapp-import
|
|
3
|
-
description: Phase 1 of the WhatsApp `_chat.txt` ingest contract — deterministic, LLM-free. Preview the archive (parsed counts, date range, sender histogram), confirm the owner + third-party `:Person`, ask the operator to choose a filter (`all`, `senders=<csv>`, `date-range=<isoFrom>..<isoTo>`), then write Conversation + Messages + NEXT chain via the single Bash entry `whatsapp-ingest.sh`. The writer binds participants to the owner + subject pair: any parsed senderName outside that closed set LOUD-FAILs. NO observations and NO LLM at this phase — semantic enrichment lives in the `whatsapp-import-enrich` skill (Phase 2). Triggers when the user asks to import a WhatsApp chat, ingest a `_chat.txt` file, or drops the contents of an "Export Chat" folder into chat. Distinct from the live `whatsapp` plugin (Baileys); this is import-from-export only.
|
|
4
|
-
---
|
|
5
|
-
|
|
6
|
-
# WhatsApp Import — Phase 1 (Load)
|
|
7
|
-
|
|
8
|
-
Phase 1 of the two-phase WhatsApp ingest contract. Deterministic only: parse → preview → operator-supplied filter → archive-write. NO LLM is invoked at this phase. The chunked Haiku insight pass moved to Phase 2 (`whatsapp-import-enrich` skill) so one ingest cannot blow the operator's context window with `:Observation` enumeration prose.
|
|
9
|
-
|
|
10
|
-
## Owner + subject confirmation (mandatory first step)
|
|
11
|
-
|
|
12
|
-
A WhatsApp DM has exactly two participants. The **owner** is the operator who exported the `_chat.txt`; the **subject** is the third party in the conversation. Both must resolve to existing graph nodes (`:AdminUser` or `:Person`) before the script runs — the writer is bound to that closed pair to prevent auto-creating phantom participants.
|
|
13
|
-
|
|
14
|
-
1. List every `:AdminUser` and the senders surfaced by Step 1 preview via `mcp__graph__maxy-graph-read_neo4j_cypher`:
|
|
15
|
-
`MATCH (u:AdminUser) RETURN elementId(u) AS elementId, u.name AS name, u.userId AS userId, u.accountId AS accountId`
|
|
16
|
-
2. Ask the operator: "Who exported this `_chat.txt`?" — accept either an existing `:AdminUser` elementId or, if the operator names someone not in the graph, surface that as a blocker (auto-creating an unknown owner is refused).
|
|
17
|
-
3. Identify the third party from the preview's sender histogram. Look up the matching `:Person` (by name); if no match, ask the operator to confirm a `:Person` elementId or block until one exists. **Auto-creating the third-party `:Person` is forbidden** — the operator must confirm the canonical node.
|
|
18
|
-
4. Echo both back verbatim and require explicit yes/no confirmation.
|
|
19
|
-
5. Persist the owner's `elementId` as `--owner-element-id` and the subject's as `--subject-person-id`.
|
|
20
|
-
|
|
21
|
-
## Step 1 — preview (mandatory before any write)
|
|
22
|
-
|
|
23
|
-
Call `mcp__memory__whatsapp-export-preview` with the operator-supplied path:
|
|
24
|
-
|
|
25
|
-
```json
|
|
26
|
-
{
|
|
27
|
-
"filePath": "/abs/path/to/_chat.txt",
|
|
28
|
-
"timezone": "Europe/London"
|
|
29
|
-
}
|
|
30
|
-
```
|
|
31
|
-
|
|
32
|
-
Returns: `{conversationSha256, archiveSourceFile, archiveBytes, parsed, mediaSkipped, systemSkipped, totalMessages, dateRange:{first,last}, senders:[{name,messageCount}, …]}`. No Cypher writes; the call is read-only and does NOT touch Neo4j.
|
|
33
|
-
|
|
34
|
-
Surface to the operator as one chat message — counters and the histogram, no prose:
|
|
35
|
-
|
|
36
|
-
> Preview of `<archive>`: `<parsed>` messages parsed, `<mediaSkipped>` media skipped, `<systemSkipped>` system skipped. Date range: `<first>` → `<last>`. Senders (top by count): `Joel (812), Adam (895)`. File hash: `<conversationSha256>` (`<archiveBytes>` bytes).
|
|
37
|
-
|
|
38
|
-
## Step 2 — operator chooses a filter
|
|
39
|
-
|
|
40
|
-
Ask exactly: "Filter to apply: `all`, `senders=<csv>`, or `date-range=<isoFrom>..<isoTo>`?" — no defaults, no menu of "or shall I just write everything". The operator picks one of the three forms verbatim:
|
|
41
|
-
|
|
42
|
-
| Filter | Effect |
|
|
43
|
-
|--------|--------|
|
|
44
|
-
| `all` | Write every parsed row. Operator's explicit "I want the full archive" choice. |
|
|
45
|
-
| `senders=Alice,Bob Carter` | Keep only rows whose senderName matches one of the comma-separated names exactly (whitespace trimmed). |
|
|
46
|
-
| `date-range=2024-01-01..2024-06-30` | Keep only rows whose `dateSent` falls inside the inclusive range (date-only or full ISO 8601 endpoints both accepted). |
|
|
47
|
-
|
|
48
|
-
Echo the chosen filter back; require explicit yes/no confirmation before the write.
|
|
49
|
-
|
|
50
|
-
## Step 3 — archive-write
|
|
51
|
-
|
|
52
|
-
Single Bash call:
|
|
53
|
-
|
|
54
|
-
```bash
|
|
55
|
-
bash platform/plugins/whatsapp-import/bin/whatsapp-ingest.sh <archive.zip|dir|_chat.txt> \
|
|
56
|
-
--owner-element-id <id> \
|
|
57
|
-
--subject-person-id <id> \
|
|
58
|
-
--scope <admin|public> \
|
|
59
|
-
--filter <all|senders=<csv>|date-range=<isoFrom>..<isoTo>>
|
|
60
|
-
```
|
|
61
|
-
|
|
62
|
-
Optional flags:
|
|
63
|
-
- `--account-id <id>` — explicit account id when more than one exists under `data/accounts/` (Phase 0 has one).
|
|
64
|
-
- `--timezone <iana>` — IANA zone for timestamps (default `Europe/London`).
|
|
65
|
-
- `--date-format <DD/MM/YY|MM/DD/YY|DD/MM/YYYY|MM/DD/YYYY>` — override auto-detect for ambiguous locales.
|
|
66
|
-
|
|
67
|
-
The script:
|
|
68
|
-
- Unzips the archive if needed; locates `_chat.txt`.
|
|
69
|
-
- Parses the file deterministically (year shape, sender/body grammar, timezone offset, U+200E/U+200F leading-bidi-strip).
|
|
70
|
-
- Applies the operator-supplied filter to `parseResult.parsedLines` BEFORE archive-write.
|
|
71
|
-
- Validates every distinct parsed senderName against the canonical-name candidates of `{owner, subject}` (NFKC-trim-lower normalisation). Any miss LOUD-FAILs with `[whatsapp-ingest] FAIL parser-miss reason="senderName=<verbatim> not in preview histogram (parser failure — re-export or report)"` and exits non-zero. **Never auto-creates a participant** — the fallback path was removed by design.
|
|
72
|
-
- Writes the Conversation + Messages + edges + NEXT chronology via `memoryArchiveWrite` directly (no MCP envelope between steps).
|
|
73
|
-
|
|
74
|
-
NO insight pass runs. The `--no-insight` flag of older releases is gone — Phase 1 always means parse + filter + archive-write, nothing else.
|
|
75
|
-
|
|
76
|
-
## Phase 1 agent-return — counters only
|
|
77
|
-
|
|
78
|
-
Stdout JSON shape (success — full diagnostic counters):
|
|
79
|
-
|
|
80
|
-
```json
|
|
81
|
-
{
|
|
82
|
-
"conversationElementId": "4:abcd…:42",
|
|
83
|
-
"conversationId": "whatsapp-export:<sha>:<accountId>",
|
|
84
|
-
"parsed": 1707,
|
|
85
|
-
"mediaSkipped": 0,
|
|
86
|
-
"systemSkipped": 0,
|
|
87
|
-
"filtered": 1707,
|
|
88
|
-
"written": 1707,
|
|
89
|
-
"messagesAlreadyExisted": 0,
|
|
90
|
-
"nextEdgesProcessed": 1706,
|
|
91
|
-
"nextEdgesCreated": 1706,
|
|
92
|
-
"participantsAlreadyExisted": 2,
|
|
93
|
-
"ms": 6800
|
|
94
|
-
}
|
|
95
|
-
```
|
|
96
|
-
|
|
97
|
-
Surface to the admin agent as exactly one message (counters first, one sentence pointing at the Phase 2 surface):
|
|
98
|
-
|
|
99
|
-
> Imported `<written>` messages from `<archive>` into conversation `<conversationElementId>` (`<conversationId>`); already existed: `<messagesAlreadyExisted>`; NEXT edges created: `<nextEdgesCreated>`. Use `mcp__memory__whatsapp-export-preview` for any future re-import preview; trigger semantic enrichment via the `whatsapp-import-enrich` skill ("enrich the `<chat-name>` chat") when ready.
|
|
100
|
-
|
|
101
|
-
NO inline enumeration of mention/task/preference/relationship counts. NO multi-paragraph "ask to enrich" prose. The above shape is load-bearing — concision over completeness, because one prior ingest blew the operator's context with count enumeration.
|
|
102
|
-
|
|
103
|
-
### Re-import signal
|
|
104
|
-
|
|
105
|
-
A second invocation against the same archive should report `messagesAlreadyExisted > 0 AND written > 0` once the stable-messageId contract is in place. The subagent asserts both counters appear non-trivially before claiming a re-import landed cleanly.
|
|
106
|
-
|
|
107
|
-
## Failure path — single FAIL line
|
|
108
|
-
|
|
109
|
-
- **Exit non-zero** + one stderr line: `[whatsapp-ingest] FAIL phase=<argv|filter|parse|archive-write|import|uncaught> reason="<sanitised first 80c>" ...`. Surface this verbatim to the operator and yield. **Do not retry. Do not edit parser source.** The archive-ingest-surface-gate denies parser-source edits, JS test runners, and the legacy `whatsapp-export-parse` / `whatsapp-export-insight-write` / `memory-archive-write{archiveType:whatsapp-export}` MCP tools — none of those are escape hatches in your surface.
|
|
110
|
-
|
|
111
|
-
Missing `--filter` emits the pinned line `[whatsapp-ingest] FAIL filter-required reason="bulk-archive-gate — operator must specify --filter (one of all, senders=<csv>, date-range=<isoFrom>..<isoTo>)"`. Re-invoke with the operator's chosen filter — never fabricate a default.
|
|
112
|
-
|
|
113
|
-
## Idempotency
|
|
114
|
-
|
|
115
|
-
Re-running with the same `<archive>` + `--filter` is a no-op once the stable-messageId contract is in place: `written: 0`, `nextEdgesCreated: 0`, conversation scalars refreshed via `lastImportedAt` / `lastImportedBySession`. Re-exports with appended messages add only the delta and extend the NEXT chain. Without a stable messageId (line-hash + array-position keying) re-imports double the message set — the natural-key fix is what makes the contract real.
|
|
116
|
-
|
|
117
|
-
## Verification (post-write)
|
|
118
|
-
|
|
119
|
-
Run via `mcp__graph__maxy-graph-read_neo4j_cypher`:
|
|
120
|
-
|
|
121
|
-
- `MATCH (c:Conversation:WhatsAppConversation {conversationId: $cid}) RETURN c.messageCount, c.participantCount, c.firstMessageAt, c.lastMessageAt` — agrees with the JSON summary.
|
|
122
|
-
- `MATCH (m:Message)-[:PART_OF]->(c {conversationId: $cid}) RETURN count(m)` — equals `written + messagesAlreadyExisted` (post-filter).
|
|
123
|
-
- `MATCH p=(:Message {conversationId: $cid})-[:NEXT*]->() WITH max(length(p)) AS chain RETURN chain` — equals `messageCount - 1`.
|
|
124
|
-
- Phase 1 wrote ZERO observations: `MATCH (o:Observation)-[:OBSERVED_IN]->(:Conversation {conversationId: $cid}) RETURN count(o)` — should be 0 immediately after Phase 1. Observations land only when the operator triggers Phase 2.
|
|
125
|
-
|
|
126
|
-
## What this is not
|
|
127
|
-
|
|
128
|
-
- **Not** the live `whatsapp` plugin. That plugin (Baileys QR pairing) holds messages in an in-memory store cleared on restart. This plugin imports historical exports into Neo4j as persistent graph nodes.
|
|
129
|
-
- **Not** a media-transcription pipeline. Voice notes, photos, PDFs are skipped at parse with a counter logged.
|
|
130
|
-
- **Not** the operator-level semantic enrichment pass. Auto-created participants and `:Observation` nodes are deliberately raw — Phase 2 (`whatsapp-import-enrich`) lays down the observations via `whatsapp-export-insight-pass` and walks them through operator-confirmed wiring.
|
|
131
|
-
- **Not** an LLM entry. Phase 1 has no Haiku call, no OAuth call, no model surface. The single sanctioned LLM entry for WhatsApp ingest is `mcp__memory__whatsapp-export-insight-pass`, invoked by the Phase 2 skill.
|
package/payload/platform/plugins/whatsapp-import/skills/whatsapp-import/references/export-parse.md
DELETED
|
@@ -1,109 +0,0 @@
|
|
|
1
|
-
# Reference: `_chat.txt` parsing — implementation reference
|
|
2
|
-
|
|
3
|
-
> **This is no longer operator instruction.** The agent does NOT walk this grammar in its own LLM turn. Parsing runs deterministically in [`platform/plugins/whatsapp-import/lib/src/parse-export.ts`](../../../lib/src/parse-export.ts), invoked in-process by [`bin/ingest.mjs`](../../../bin/ingest.mjs) (which the operator calls via [`bin/whatsapp-ingest.sh`](../../../bin/whatsapp-ingest.sh) — the single deterministic Bash entry). The legacy MCP wrapper is blocked at the harness gate. The vitest grid in [`lib/src/__tests__/parse-export.test.ts`](../../../lib/src/__tests__/parse-export.test.ts) is the executable contract; this prose is the human-readable companion. Extend the grammar by adding a failing test first.
|
|
4
|
-
|
|
5
|
-
WhatsApp's "Export Chat" produces a UTF-8 text file with a deterministic line grammar. This reference describes what the parser library does when it converts that file into the `{senderName, dateSent, body, sequenceIndex}[]` structure the SKILL.md consumes.
|
|
6
|
-
|
|
7
|
-
## File-open invariants
|
|
8
|
-
|
|
9
|
-
1. **UTF-8 only.** Open the file with explicit UTF-8 decoding. On encoding error, abort the import with a named LOUD-FAIL — never silently substitute or corrupt bodies. WhatsApp's modern apps emit UTF-8 reliably; an encoding error usually means the operator manually edited the file with a tool that broke it. Surface the named error so they can re-export.
|
|
10
|
-
2. **No size cap from the parser.** The parser handles arbitrarily large files; the [SKILL.md](../SKILL.md)'s 100-message selective-ingest gate is the operator-facing compression layer.
|
|
11
|
-
3. **Compute `archiveSourceFile = sha256(file bytes)` first.** The hash drives `conversationId` and lets re-imports of the same archive land idempotently.
|
|
12
|
-
|
|
13
|
-
## Line grammar
|
|
14
|
-
|
|
15
|
-
Every message line begins with a square-bracketed timestamp prefix followed by `<Sender>: <body>`:
|
|
16
|
-
|
|
17
|
-
```
|
|
18
|
-
[DD/MM/YYYY, HH:MM:SS] <Sender>: <body> ← modern WhatsApp default (4-digit year)
|
|
19
|
-
[DD/MM/YY, HH:MM:SS] <Sender>: <body> ← legacy exports (2-digit year)
|
|
20
|
-
```
|
|
21
|
-
|
|
22
|
-
- **Day/month ordering.** `DD/MM` is the WhatsApp default everywhere except US iOS, which emits `MM/DD`. The parser auto-detects from the first prefix-matching line when `dateFormat` is omitted: probe DD/MM first; if range-valid, lock DD/MM; otherwise lock MM/DD. The lock is per-file — a single export never mixes orderings (the locale is set by the device that generated the export). Manually concatenated multi-locale archives are an explicit out-of-scope: pass `dateFormat` to override.
|
|
23
|
-
- **Year shape.** Both 2-digit (`\d{2}` → `2000+yy`) and 4-digit (`\d{4}` → as-is) years are accepted by the same regex (`\d{2,4}`). A single file may hold both shapes; year semantics are resolved per-line from the captured length, not per-file.
|
|
24
|
-
- **Time.** `HH:MM:SS` 24-hour; older exports may emit `HH:MM` (no seconds — treat as `:00`).
|
|
25
|
-
- **Sender.** Saved contact name, phone number with country code (`+44 7700 900123`), or `You` for legacy operator-sent messages.
|
|
26
|
-
- **Body.** Message text, possibly multi-line.
|
|
27
|
-
|
|
28
|
-
Trim trailing whitespace from each line before parsing.
|
|
29
|
-
|
|
30
|
-
## Multi-line bodies
|
|
31
|
-
|
|
32
|
-
A body that wraps to multiple lines continues onto subsequent lines that do **not** match the timestamp-prefix pattern. The parser must accumulate these continuation lines into the previous message's body, joining with `\n`. End-of-message is detected by the next timestamp prefix or end-of-file.
|
|
33
|
-
|
|
34
|
-
```
|
|
35
|
-
[14/03/26, 10:15:23] Joel: Quick question about the deck —
|
|
36
|
-
do you have the v3 PDF anywhere?
|
|
37
|
-
I checked Drive and only see v2.
|
|
38
|
-
[14/03/26, 10:16:01] Sarah: Sec, will dig it out
|
|
39
|
-
```
|
|
40
|
-
|
|
41
|
-
Yields two messages:
|
|
42
|
-
- `Joel: "Quick question about the deck —\ndo you have the v3 PDF anywhere?\nI checked Drive and only see v2."`
|
|
43
|
-
- `Sarah: "Sec, will dig it out"`
|
|
44
|
-
|
|
45
|
-
## System messages — skip with counter
|
|
46
|
-
|
|
47
|
-
WhatsApp injects system-generated lines into the export for group events, contact changes, and security messages. These lines match the timestamp prefix but are **not** sent by a person and have no first-class graph value. Skip them at parse time, increment `systemSkipped`, and do not pass to `memory-archive-write`.
|
|
48
|
-
|
|
49
|
-
Patterns to recognise (English; localisation expands the list):
|
|
50
|
-
|
|
51
|
-
- `<Sender> created group "<name>"`
|
|
52
|
-
- `<Sender> changed the subject from "<old>" to "<new>"`
|
|
53
|
-
- `<Sender> changed this group's icon`
|
|
54
|
-
- `<Sender> added <other>`
|
|
55
|
-
- `<Sender> removed <other>`
|
|
56
|
-
- `<Sender> left`
|
|
57
|
-
- `<Sender>'s security code changed.`
|
|
58
|
-
- `Messages and calls are end-to-end encrypted. ...` (the conversation header)
|
|
59
|
-
- `You deleted this message.`
|
|
60
|
-
- `This message was deleted.`
|
|
61
|
-
|
|
62
|
-
Heuristic for the parser: if the body contains no spaces between `<Sender>` and a verb-phrase token, AND the body lacks a colon-after-sender separator, treat as a system message. Conservative — when uncertain, prefer to ingest as a message; the insight pass tolerates noisy bodies better than the parser would tolerate dropped real messages.
|
|
63
|
-
|
|
64
|
-
## Media attachments — skip with counter
|
|
65
|
-
|
|
66
|
-
Lines whose body indicates a media-only message (no text content) get skipped at parse time, increment `mediaSkipped`. Patterns:
|
|
67
|
-
|
|
68
|
-
- `<Media omitted>` (when the operator chose "Without Media" on export)
|
|
69
|
-
- `IMG-<digits>-<digits>.jpg (file attached)` / `.jpeg` / `.png` / `.heic` / `.gif`
|
|
70
|
-
- `VID-<digits>-<digits>.mp4 (file attached)`
|
|
71
|
-
- `PTT-<digits>-<digits>.opus (file attached)` (voice notes)
|
|
72
|
-
- `AUD-<digits>-<digits>.opus (file attached)` (audio)
|
|
73
|
-
- `STK-<digits>-<digits>.webp (file attached)` (stickers)
|
|
74
|
-
- `<filename>.pdf (file attached)`
|
|
75
|
-
- `<filename>.docx (file attached)`
|
|
76
|
-
- `<...> attached: <filename>` (alternative format on some platforms)
|
|
77
|
-
|
|
78
|
-
Mixed messages (text + media reference in one body) are kept as messages — only pure-media-only lines are skipped. The text body is retained.
|
|
79
|
-
|
|
80
|
-
## Forwarded messages
|
|
81
|
-
|
|
82
|
-
A forwarded message is prefixed with the invisible Unicode `` (U+200E LEFT-TO-RIGHT MARK) followed by metadata WhatsApp injects. Parse the body normally; the LRM character is preserved in `body` (the insight pass's classifier sees it as benign). Do not strip — the raw body's fidelity matters for downstream queries.
|
|
83
|
-
|
|
84
|
-
## Edge cases
|
|
85
|
-
|
|
86
|
-
- **Empty body** (timestamp prefix followed by sender colon but no text). Rare. Skip with `systemSkipped` increment — usually corresponds to a deleted message stub.
|
|
87
|
-
- **Leading BOM** (U+FEFF at file start). Strip before parsing the first line.
|
|
88
|
-
- **Mixed line endings** (`\r\n` vs `\n`). Normalise to `\n` before tokenisation.
|
|
89
|
-
- **Sender containing a colon** (e.g., a contact named "Joel: Work"). The grammar splits on the FIRST `: ` (colon-space) after the timestamp prefix's closing `]`. Subsequent colons in the sender or body are preserved verbatim.
|
|
90
|
-
|
|
91
|
-
## Parser output shape
|
|
92
|
-
|
|
93
|
-
The parser returns `{conversationId, archiveSourceFile, parsedLines[], counters}` where:
|
|
94
|
-
|
|
95
|
-
- `parsedLines[]: Array<{senderName: string, dateSent: string (ISO 8601 with operator-supplied timezone), body: string, sequenceIndex: number}>`
|
|
96
|
-
- `counters: {parsed: n, systemSkipped: n, mediaSkipped: n, parseErrors: n}`
|
|
97
|
-
|
|
98
|
-
The skill consumes this directly. The `messageId` is computed by the skill (not the parser) so the `lineHash` covers the original raw line, not the post-parse normalised body.
|
|
99
|
-
|
|
100
|
-
## When to LOUD-FAIL
|
|
101
|
-
|
|
102
|
-
The parser throws (and `whatsapp-export-parse` returns `isError: true`) on:
|
|
103
|
-
|
|
104
|
-
- Encoding error at file open (UTF-8 decode fails — the parser uses `TextDecoder` with `fatal: true`, so any invalid byte sequence aborts loudly rather than silently substituting U+FFFD).
|
|
105
|
-
- Empty file or zero parsed lines after walking the file (the file isn't a `_chat.txt`). The thrown error and the `[whatsapp-import] parse-grammar-miss first-line="<sample>"` stderr line both carry a sanitised first-line sample (control chars stripped, truncated to 80 chars) so the operator can recognise the offending header shape without re-running with a debugger.
|
|
106
|
-
- A timestamp prefix matches but the body parse fails (no `: ` separator after the closing `]` AND no system-pattern match) — emits `parse-error file=<...> line=<n> reason=no-sender-body-separator content="<...>"`.
|
|
107
|
-
- Missing required input (`accountId`, `timezone`).
|
|
108
|
-
|
|
109
|
-
Never silently drop data the parser couldn't classify. The operator chooses to skip; the parser does not choose for them.
|
|
@@ -1,333 +0,0 @@
|
|
|
1
|
-
---
|
|
2
|
-
name: whatsapp-import-enrich
|
|
3
|
-
description: Operator-driven semantic enrichment pass over an already-loaded WhatsApp Conversation. Owns the LLM half of the WhatsApp ingest pipeline — first runs `mcp__memory__whatsapp-export-insight-pass` (chunkSize=50, overlap=5, server-side confidence>=0.8 gate) to lay down `:Observation {observationStatus:'auto-extracted'}` rows, then walks `:Person {participantStatus:'auto-created'}` and the auto-extracted observations, surfaces evidence per row, and writes operator-confirmed wiring (participant promotion/merge, `:MENTIONS` / `:RELATED_TO` edges, `:Task` and `:Preference` nodes). Triggers on operator phrases like "enrich the X chat", "promote the auto-created participants from Y", "wire the observations from yesterday's import". Runs against a Conversation already imported by `whatsapp-import` Phase 1; never re-runs parse.
|
|
4
|
-
---
|
|
5
|
-
|
|
6
|
-
# WhatsApp Import — Enrich
|
|
7
|
-
|
|
8
|
-
Phase 2 of the two-phase WhatsApp ingest contract. Phase 1 (`whatsapp-import`) is the deterministic, LLM-FREE Bash entry that lands raw shape: Conversation + Messages + chronological NEXT chain + auto-created `:Person` participants. Phase 2 (this skill) owns the LLM half: it runs the chunked Haiku insight pass on demand to lay down `:Observation` nodes, then operator-driven semantic resolution disambiguates participants, wires observations to typed entities, and reattributes the operator's own messages from the auto-Person to their `:AdminUser`.
|
|
9
|
-
|
|
10
|
-
The split exists because the inline insight pass on Phase 1 (1500 msgs/chunk, no operator gate) polluted the parent's tool_result with `:Observation` enumeration prose and blew operator context. Phase 1 is now mute on insights; this skill triggers them consciously with `mcp__memory__whatsapp-export-insight-pass` when the operator asks.
|
|
11
|
-
|
|
12
|
-
## When this applies
|
|
13
|
-
|
|
14
|
-
The operator triggers this skill against a single, already-loaded `:Conversation:WhatsAppConversation`. Acceptable phrases include any reference to enriching, promoting participants from, or wiring observations against a conversation the operator can name (display name, recent timestamp, conversationId). When the conversation reference is ambiguous, list the recent WhatsApp conversations and require operator selection before any walk begins. Never run against a conversation whose `whatsapp-import` Phase 1 has not completed (`MATCH (c:WhatsAppConversation {conversationId:$cid}) WHERE c.lastImportedAt IS NULL` is a blocker — surface "Phase 1 has not completed for <cid>; run whatsapp-import first" and yield).
|
|
15
|
-
|
|
16
|
-
## Step 0 — run the chunked Haiku insight pass (Phase 2a)
|
|
17
|
-
|
|
18
|
-
Phase 1 writes ZERO `:Observation` rows. Before any walk, lay them down via `mcp__memory__whatsapp-export-insight-pass`:
|
|
19
|
-
|
|
20
|
-
```json
|
|
21
|
-
{ "conversationId": "whatsapp-export:<sha>:<accountId>" }
|
|
22
|
-
```
|
|
23
|
-
|
|
24
|
-
The tool walks the Messages of the conversation in chronological order, chunks them at **chunkSize=50** with **overlap=5** (the prior 1500 msgs/chunk implementation lost per-message attention), runs Haiku per chunk, applies a server-side `confidence>=0.8` gate, and MERGE-keys `:Observation` rows. Returns `{conversationId, chunks, chunkSize, overlap, confidenceThreshold, totals:{mentions, tasks, preferences, observedRelationships, rejectedLowConfidence, written}, ms}`.
|
|
25
|
-
|
|
26
|
-
Surface to the operator as one chat message — counters only, no enumeration:
|
|
27
|
-
|
|
28
|
-
> Insight pass complete on `<conversationId>`: `<chunks>` chunks at chunkSize=50 / overlap=5 / confidenceThreshold=0.8. Wrote `<written>` observations (`<mentions>` mentions, `<tasks>` tasks, `<preferences>` preferences, `<observedRelationships>` relationships); rejected `<rejectedLowConfidence>` low-confidence items.
|
|
29
|
-
|
|
30
|
-
Idempotent — re-running collapses identical `(conversationId, sourceMessageRef, kind, contentHash)` tuples into one row. Re-runs are safe; the operator can tune the conversation by re-importing extra rows in Phase 1, then re-running the pass here.
|
|
31
|
-
|
|
32
|
-
## Bulk preview (mandatory, before any walk)
|
|
33
|
-
|
|
34
|
-
Before walking a single row, count the work and offer a yield. Two read-only Cyphers via `mcp__graph__maxy-graph-read_neo4j_cypher`:
|
|
35
|
-
|
|
36
|
-
```cypher
|
|
37
|
-
MATCH (p:Person {accountId:$acct, source:'whatsapp', participantStatus:'auto-created'})
|
|
38
|
-
-[:PARTICIPANT_IN]->(:Conversation {conversationId:$cid})
|
|
39
|
-
RETURN count(p) AS autoParticipants
|
|
40
|
-
```
|
|
41
|
-
|
|
42
|
-
```cypher
|
|
43
|
-
MATCH (o:Observation {accountId:$acct, observationStatus:'auto-extracted', insightPass:true})
|
|
44
|
-
-[:OBSERVED_IN]->(:Conversation {conversationId:$cid})
|
|
45
|
-
RETURN count(o) AS autoObservations,
|
|
46
|
-
sum(CASE o.kind WHEN 'mention' THEN 1 ELSE 0 END) AS mentions,
|
|
47
|
-
sum(CASE o.kind WHEN 'task' THEN 1 ELSE 0 END) AS tasks,
|
|
48
|
-
sum(CASE o.kind WHEN 'preference' THEN 1 ELSE 0 END) AS preferences,
|
|
49
|
-
sum(CASE o.kind WHEN 'observed-relationship' THEN 1 ELSE 0 END) AS relationships
|
|
50
|
-
```
|
|
51
|
-
|
|
52
|
-
Surface one chat message: `"<N> auto-participants and <M> auto-observations to review (<a> mentions, <b> tasks, <c> preferences, <d> relationships). Proceed?"`. Yield on no. On yes, persist `$preParticipantCount` and per-kind `$preObservationCount` for post-walk verification (Cypher silent-no-op detection — see "Status-update verification").
|
|
53
|
-
|
|
54
|
-
Emit one chat line: `[whatsapp-import-enrich] start conversationId=<cid> auto-participants=<N> auto-observations=<M>`.
|
|
55
|
-
|
|
56
|
-
## Walk 1 — Auto-created participants
|
|
57
|
-
|
|
58
|
-
For each `:Person {participantStatus:'auto-created'}` `PARTICIPANT_IN` the conversation, surface evidence and ask the operator to choose an action.
|
|
59
|
-
|
|
60
|
-
Per-row evidence Cypher:
|
|
61
|
-
|
|
62
|
-
```cypher
|
|
63
|
-
MATCH (p:Person)-[:PARTICIPANT_IN]->(c:Conversation {conversationId:$cid})
|
|
64
|
-
WHERE p.participantStatus = 'auto-created' AND p.accountId = $acct AND p.source = 'whatsapp'
|
|
65
|
-
WITH p, c
|
|
66
|
-
OPTIONAL MATCH (p)-[:SENT]->(m:Message {conversationId:$cid})
|
|
67
|
-
WITH p, count(m) AS messageCount,
|
|
68
|
-
min(m.dateSent) AS firstSeenAt, max(m.dateSent) AS lastSeenAt,
|
|
69
|
-
[m IN collect(m)[..3] | substring(m.body, 0, 80)] AS bodySamples
|
|
70
|
-
RETURN elementId(p) AS elemId, p.name AS displayName,
|
|
71
|
-
messageCount, firstSeenAt, lastSeenAt, bodySamples
|
|
72
|
-
```
|
|
73
|
-
|
|
74
|
-
Operator choices per row:
|
|
75
|
-
|
|
76
|
-
| Action | Effect |
|
|
77
|
-
|--------|--------|
|
|
78
|
-
| **promote-to-existing** | Operator names an existing `:Person` or `:AdminUser` (resolved via `mcp__memory__memory-search` against `displayName`). Skill writes the merge below. |
|
|
79
|
-
| **mint-new-Person** | Operator names a new contact identity. Skill calls `mcp__contacts__contact-create` with `givenName` / `familyName` / at least one of `email` / `telephone`, then merges the auto-Person into the new contact's `:Person` node. |
|
|
80
|
-
| **merge-same-person** | Two auto-Persons that are the same person under different display names (e.g. phone-then-name). Operator names the survivor; skill merges the other into it. |
|
|
81
|
-
| **skip** | Leave `participantStatus='auto-created'`. Re-running the skill surfaces it again. |
|
|
82
|
-
|
|
83
|
-
### Merge Cypher (load-bearing — read carefully)
|
|
84
|
-
|
|
85
|
-
Every promote-to-existing, mint-new, and merge-same-person uses `apoc.refactor.mergeNodes` with **non-default** property-merge mode:
|
|
86
|
-
|
|
87
|
-
```cypher
|
|
88
|
-
MATCH (survivor) WHERE elementId(survivor) = $survivorId
|
|
89
|
-
MATCH (duplicate:Person) WHERE elementId(duplicate) = $autoPersonId
|
|
90
|
-
CALL apoc.refactor.mergeNodes([survivor, duplicate], {properties:'discard', mergeRels:true})
|
|
91
|
-
YIELD node
|
|
92
|
-
SET node.participantStatus = 'operator-confirmed',
|
|
93
|
-
node.mergedFromAutoPerson = $autoPersonId,
|
|
94
|
-
node.mergedAt = datetime(),
|
|
95
|
-
node.mergedFromAgent = 'whatsapp-import-enrich',
|
|
96
|
-
node.mergedFromSession = $sessionId
|
|
97
|
-
RETURN elementId(node) AS survivorId, count(node) AS affected
|
|
98
|
-
```
|
|
99
|
-
|
|
100
|
-
`{properties:'discard'}` keeps the survivor's value when both nodes have the same property — the survivor (first array element) wins on conflict. The duplicate's properties that the survivor does NOT have are still copied onto the survivor; this is desirable here (auto-Person's `firstSeenAt`/`lastSeenAt` enrich the `:AdminUser` if absent). **Do not switch to `'overwrite'`** — that mode hands the conflict to the duplicate, silently replacing `:AdminUser.name` (e.g. "Joel Smalley") with the auto-Person's WhatsApp display name (e.g. "Joel S."), which is identity corruption with no error. `'combine'` would coerce conflicting scalars into arrays (`name=["Joel Smalley","Joel S."]`), which breaks downstream callers that expect string scalars. **`discard` is the only safe mode here.**
|
|
101
|
-
|
|
102
|
-
`mergeRels:true` reparents every `:SENT` / `:PARTICIPANT_IN` / `:MENTIONS` edge from the duplicate onto the survivor in the same transaction.
|
|
103
|
-
|
|
104
|
-
### Owner reconciliation — first row of Walk 1
|
|
105
|
-
|
|
106
|
-
Phase 1 takes `--owner-element-id` as argv but does not stamp it on the Conversation node — the owner pointer is implicit (the `:SENT` edges from `:AdminUser` → `:Message` are the only structural link). The skill therefore re-asks the operator who owns this conversation, the same way Phase 1's anchor-confirmation flow does. List candidate `:AdminUser` rows:
|
|
107
|
-
|
|
108
|
-
```cypher
|
|
109
|
-
MATCH (u:AdminUser)
|
|
110
|
-
OPTIONAL MATCH (u)-[:SENT]->(m:Message {conversationId:$cid})
|
|
111
|
-
WITH u, count(m) AS senderMessageCount
|
|
112
|
-
RETURN elementId(u) AS elementId, u.name AS name, u.userId AS userId,
|
|
113
|
-
senderMessageCount
|
|
114
|
-
ORDER BY senderMessageCount DESC, name
|
|
115
|
-
```
|
|
116
|
-
|
|
117
|
-
Surface: `"Who exported this conversation? Pick from: <:AdminUser rows with senderMessageCount>"`. The `senderMessageCount` is a hint — an :AdminUser already SENT-edged to messages in this conversation is the most likely owner — but the operator confirms verbatim. Echo the chosen owner back (`:AdminUser <name> (<elementId>) — confirm yes/no`) before any write.
|
|
118
|
-
|
|
119
|
-
On confirm, find auto-Persons whose display name might match the owner. Surface ALL candidates — string equality alone is not safe (owner display names drift across re-exports):
|
|
120
|
-
|
|
121
|
-
```cypher
|
|
122
|
-
MATCH (auto:Person {participantStatus:'auto-created', accountId:$acct, source:'whatsapp'})
|
|
123
|
-
-[:PARTICIPANT_IN]->(:Conversation {conversationId:$cid})
|
|
124
|
-
RETURN elementId(auto) AS elemId, auto.name AS displayName,
|
|
125
|
-
size([(auto)-[:SENT]->(:Message) | 1]) AS sentMessageCount
|
|
126
|
-
ORDER BY sentMessageCount DESC
|
|
127
|
-
```
|
|
128
|
-
|
|
129
|
-
Operator picks zero or more auto-Persons to merge into the owner. For each picked auto-Person, run the merge Cypher above with `survivorId = ownerElementId` and `autoPersonId = picked-auto-elemId`. `mergeRels:true` reparents SENT and PARTICIPANT_IN onto the `:AdminUser`; the auto-Person is consumed.
|
|
130
|
-
|
|
131
|
-
### After each row — emit one log line
|
|
132
|
-
|
|
133
|
-
`[whatsapp-import-enrich] participant action=<promoted-existing|minted-new|merged-with-id|reattributed-to-owner|skipped> name=<displayName> elementId=<survivorId-or-autoId>`.
|
|
134
|
-
|
|
135
|
-
## Walk 2 — Auto-extracted observations
|
|
136
|
-
|
|
137
|
-
For each `:Observation {observationStatus:'auto-extracted', insightPass:true}` `OBSERVED_IN` the conversation, dispatch by `kind`. Field mapping is non-obvious (see ingest.mjs:459-505) — get this exactly right:
|
|
138
|
-
|
|
139
|
-
| kind | entity name | evidence | from | to |
|
|
140
|
-
|------|-------------|----------|------|-----|
|
|
141
|
-
| `mention` | `o.summary` | `o.snippet` (≤80 chars verbatim) | — | — |
|
|
142
|
-
| `task` | task body in `o.summary` | `o.snippet` | — | — |
|
|
143
|
-
| `preference` | `o.summary` (preference statement) | — | — | — |
|
|
144
|
-
| | `o.subject` (whose preference) | | | |
|
|
145
|
-
| `observed-relationship` | `o.summary` (verb) | — | `o.from` | `o.to` |
|
|
146
|
-
|
|
147
|
-
`o.summary` is the load-bearing field for mention disambiguation — `o.subject` is `null` on mentions (verified ingest.mjs:462).
|
|
148
|
-
|
|
149
|
-
### kind = 'mention'
|
|
150
|
-
|
|
151
|
-
Run `mcp__memory__memory-search` against `o.summary` (the mention text — e.g. "Sarah", "Sarah Chen at Acme"). Three branches:
|
|
152
|
-
|
|
153
|
-
1. **Single high-confidence match AND `o.summary` contains whitespace OR matches a unique disambiguator (email, phone, role context).** Wire the edge and mark wired.
|
|
154
|
-
2. **Single match BUT `o.summary` is single-token (e.g. "Sarah") with no disambiguator.** Surface: `"Mention: 'Sarah' — found one :Person <fullName> (<elemId>). Confirm wire to this person?"` (mirrors the Gate 2 pattern in [whatsapp-export-insight-write.ts](../../../memory/mcp/src/tools/whatsapp-export-insight-write.ts) — operator IS the disambiguator). On yes wire; on no mark `observationStatus='rejected'`.
|
|
155
|
-
3. **Multiple matches OR zero matches.** Surface candidates (or the absence) and let the operator pick or reject.
|
|
156
|
-
|
|
157
|
-
### Wire Cypher — `:MENTIONS` edge with messageId recovery
|
|
158
|
-
|
|
159
|
-
The load phase does NOT stamp `messageId` on `:Observation` (the chunked Haiku has no per-message provenance). To respect the `(:Message)-[:MENTIONS]->(:Person)` semantics, recover `messageId` from `snippet`:
|
|
160
|
-
|
|
161
|
-
```cypher
|
|
162
|
-
MATCH (m:Message {conversationId:$cid})
|
|
163
|
-
WHERE m.body CONTAINS $snippet
|
|
164
|
-
RETURN m.messageId AS messageId, m.dateSent AS sentAt
|
|
165
|
-
ORDER BY m.dateSent ASC
|
|
166
|
-
LIMIT 1
|
|
167
|
-
```
|
|
168
|
-
|
|
169
|
-
Three outcomes:
|
|
170
|
-
|
|
171
|
-
- **Unique or first-by-chronology match** → write `(:Message)-[:MENTIONS]->(:Person|:AdminUser)`:
|
|
172
|
-
|
|
173
|
-
```cypher
|
|
174
|
-
MATCH (m:Message {conversationId:$cid, messageId:$messageId})
|
|
175
|
-
MATCH (target) WHERE elementId(target) = $targetElementId AND (target:Person OR target:AdminUser)
|
|
176
|
-
MERGE (m)-[r:MENTIONS]->(target)
|
|
177
|
-
ON CREATE SET r.source='whatsapp', r.evidenceSnippet=$snippet,
|
|
178
|
-
r.createdByAgent='whatsapp-import-enrich', r.createdAt=datetime(),
|
|
179
|
-
r.createdBySession=$sessionId
|
|
180
|
-
WITH r
|
|
181
|
-
MATCH (o:Observation) WHERE elementId(o) = $observationElementId
|
|
182
|
-
SET o.observationStatus = 'wired', o.wiredEdgeKind = 'MENTIONS-from-Message',
|
|
183
|
-
o.wiredAt = datetime(), o.wiredBySession = $sessionId
|
|
184
|
-
RETURN elementId(r) AS edgeId, count(o) AS affected
|
|
185
|
-
```
|
|
186
|
-
|
|
187
|
-
- **Zero match** (snippet was paraphrased / normalised by Haiku) → fall back to `:Conversation`-anchored mention:
|
|
188
|
-
|
|
189
|
-
```cypher
|
|
190
|
-
MATCH (c:Conversation {conversationId:$cid})
|
|
191
|
-
MATCH (target) WHERE elementId(target) = $targetElementId AND (target:Person OR target:AdminUser)
|
|
192
|
-
MERGE (c)-[r:MENTIONS]->(target)
|
|
193
|
-
ON CREATE SET r.source='whatsapp', r.evidenceSnippet=$snippet,
|
|
194
|
-
r.createdByAgent='whatsapp-import-enrich', r.createdAt=datetime(),
|
|
195
|
-
r.createdBySession=$sessionId
|
|
196
|
-
WITH r
|
|
197
|
-
MATCH (o:Observation) WHERE elementId(o) = $observationElementId
|
|
198
|
-
SET o.observationStatus = 'wired', o.wiredEdgeKind = 'MENTIONS-from-Conversation-fallback',
|
|
199
|
-
o.wiredAt = datetime(), o.wiredBySession = $sessionId
|
|
200
|
-
RETURN elementId(r) AS edgeId, count(o) AS affected
|
|
201
|
-
```
|
|
202
|
-
|
|
203
|
-
The `wiredEdgeKind` property is the audit anchor — operator can grep wired observations to see which path the skill took. Surface one chat line per wire: `[whatsapp-import-enrich] observation kind=mention action=wired-mention edge=<edgeKind> elementId=<observationElementId>`.
|
|
204
|
-
|
|
205
|
-
### kind = 'task'
|
|
206
|
-
|
|
207
|
-
Surface a one-line proposal: `"Task: '<o.summary>' — evidence: '<o.snippet>'. Mint as :Task affecting this conversation?"`. On yes, call `mcp__tasks__task-create` with the task text and `affects=$conversationElementId` (the conversation elementId is the required adjacency — `:Task` requires ≥1 typed edge at creation per project-manager.md:44). Then mark wired:
|
|
208
|
-
|
|
209
|
-
```cypher
|
|
210
|
-
MATCH (o:Observation) WHERE elementId(o) = $observationElementId
|
|
211
|
-
SET o.observationStatus = 'wired', o.wiredEdgeKind = 'task-created',
|
|
212
|
-
o.wiredTaskElementId = $taskElementId,
|
|
213
|
-
o.wiredAt = datetime(), o.wiredBySession = $sessionId
|
|
214
|
-
RETURN count(o) AS affected
|
|
215
|
-
```
|
|
216
|
-
|
|
217
|
-
On no, mark `observationStatus='rejected'`. Log line: `action=task-created` or `action=rejected`.
|
|
218
|
-
|
|
219
|
-
### kind = 'preference'
|
|
220
|
-
|
|
221
|
-
Write a `:Preference` node with `:OBSERVED_IN` edge to the conversation. The `mcp__memory__memory-write` tool is schema-aware and the wrapped writer enforces `≥1 typed edge`. Pass:
|
|
222
|
-
|
|
223
|
-
```json
|
|
224
|
-
{
|
|
225
|
-
"type": "Preference",
|
|
226
|
-
"properties": {
|
|
227
|
-
"subject": "<o.subject>",
|
|
228
|
-
"preference": "<o.summary>",
|
|
229
|
-
"source": "whatsapp",
|
|
230
|
-
"scope": "<conversation scope>"
|
|
231
|
-
},
|
|
232
|
-
"relationships": [
|
|
233
|
-
{ "type": "OBSERVED_IN", "targetElementId": "<conversationElementId>" }
|
|
234
|
-
]
|
|
235
|
-
}
|
|
236
|
-
```
|
|
237
|
-
|
|
238
|
-
Then mark the observation wired (`wiredEdgeKind='preference-written', wiredPreferenceElementId=<id>`). Log line: `action=preference-written`.
|
|
239
|
-
|
|
240
|
-
### kind = 'observed-relationship'
|
|
241
|
-
|
|
242
|
-
Surface: `"Relationship: <o.from> --[<o.summary>]--> <o.to>. Confirm?"`. The endpoints (`o.from`, `o.to`) are participant display names from the chat — they may be auto-Persons (now possibly merged into existing `:Person` / `:AdminUser` after Walk 1). Resolve each endpoint via `mcp__memory__memory-search` against the display name AND scoped to participants of this conversation. Branches:
|
|
243
|
-
|
|
244
|
-
1. **Both endpoints resolve uniquely.** Operator-confirms; write the edge:
|
|
245
|
-
|
|
246
|
-
```cypher
|
|
247
|
-
MATCH (a) WHERE elementId(a) = $fromElementId
|
|
248
|
-
MATCH (b) WHERE elementId(b) = $toElementId
|
|
249
|
-
MERGE (a)-[r:RELATED_TO {relationship: $relationship}]->(b)
|
|
250
|
-
ON CREATE SET r.source='whatsapp', r.operatorConfirmed=true,
|
|
251
|
-
r.evidenceMessageIds=$evidenceMessageIds,
|
|
252
|
-
r.createdByAgent='whatsapp-import-enrich', r.createdAt=datetime(),
|
|
253
|
-
r.createdBySession=$sessionId
|
|
254
|
-
WITH r
|
|
255
|
-
MATCH (o:Observation) WHERE elementId(o) = $observationElementId
|
|
256
|
-
SET o.observationStatus = 'wired', o.wiredEdgeKind='RELATED_TO',
|
|
257
|
-
o.wiredAt = datetime(), o.wiredBySession = $sessionId
|
|
258
|
-
RETURN elementId(r) AS edgeId, count(o) AS affected
|
|
259
|
-
```
|
|
260
|
-
|
|
261
|
-
`evidenceMessageIds` is best-effort — recover via `MATCH (m:Message {conversationId:$cid}) WHERE m.body CONTAINS $relationship OR m.body CONTAINS $fromName OR m.body CONTAINS $toName RETURN collect(m.messageId)[..5]` (cap at 5).
|
|
262
|
-
|
|
263
|
-
2. **Endpoint does not resolve.** Surface the candidate options or the missing-Person; operator decides whether to mint via `contact-create` first or reject the observation.
|
|
264
|
-
|
|
265
|
-
3. **Operator answers no.** Mark `observationStatus='rejected'`. Log line: `action=rejected`.
|
|
266
|
-
|
|
267
|
-
`operatorConfirmed=true` is mandatory for every `:RELATED_TO` write — the brief's anti-hallucination doctrine (mirrors [whatsapp-export-insight-write.ts](../../../memory/mcp/src/tools/whatsapp-export-insight-write.ts)). Never write `:RELATED_TO` without explicit operator yes.
|
|
268
|
-
|
|
269
|
-
## Status-update verification (Cypher silent-no-op trap)
|
|
270
|
-
|
|
271
|
-
`MATCH (n) WHERE elementId(n) = $id SET n.foo = $val` against a missing node returns zero rows and zero mutations, but the query SUCCEEDS — Neo4j Community read-committed isolation does not protect co-transactional writes. Every status-update Cypher in this skill ends `RETURN count(<bound-var>) AS affected`. The skill code path:
|
|
272
|
-
|
|
273
|
-
- Captures pre-walk counts (`$preParticipantCount`, `$preObservationCount` per kind) from the bulk-preview Cyphers.
|
|
274
|
-
- After each transition (`participantStatus='operator-confirmed'`, `observationStatus IN {'wired','rejected'}`) records `affected` from the result.
|
|
275
|
-
- Post-walk re-runs the bulk-preview Cyphers. Asserts the new counts equal `pre - count(operations-that-claimed-affected=1)`. Mismatch is a hard blocker — surface `"Status-update silent-no-op detected: pre=<n> ops=<n> post=<n>; aborting before further writes"` and yield. Never claim done with a count mismatch.
|
|
276
|
-
|
|
277
|
-
## Idempotency
|
|
278
|
-
|
|
279
|
-
The walk filters on `participantStatus='auto-created'` and `observationStatus='auto-extracted'`. Re-running surfaces only items still in those states. Already-wired observations (`'wired'` / `'rejected'`) and operator-confirmed participants are skipped naturally — no skill-side bookkeeping needed.
|
|
280
|
-
|
|
281
|
-
This means the skill is safe to re-run at any time. Operators can enrich incrementally (one ten-row session, then another) and re-imports of the same archive (which add only the message delta per Phase 1's idempotency contract) leave existing wired state untouched.
|
|
282
|
-
|
|
283
|
-
## Done — emit one chat line
|
|
284
|
-
|
|
285
|
-
After both walks complete, emit `[whatsapp-import-enrich] done conversationId=<cid> wired=<n> skipped=<n> rejected=<n> ms=<n>` and return a structured summary to the admin agent in the database-operator output contract shape (see [database-operator.md](../../../../templates/specialists/agents/database-operator.md#output-contract)):
|
|
286
|
-
|
|
287
|
-
> WhatsApp enrichment complete for conversation `<displayName>` (`<cid>`):
|
|
288
|
-
> Participants: `<promoted>` promoted to existing, `<minted>` minted new, `<merged>` cross-displayname merged, `<reattributed>` reattributed to operator, `<skipped>` left as auto-created.
|
|
289
|
-
> Observations: `<wiredMentions>` mentions wired (`<msgEdge>` from :Message, `<convoEdge>` from :Conversation fallback), `<wiredTasks>` tasks created, `<wiredPrefs>` preferences written, `<wiredRels>` relationships confirmed, `<rejected>` rejected.
|
|
290
|
-
> Status-update verification: pre=`<preCount>` ops=`<opCount>` post=`<postCount>` (mismatch=`<0|N>`).
|
|
291
|
-
|
|
292
|
-
## Verification (post-write — for operator audit)
|
|
293
|
-
|
|
294
|
-
Run via `mcp__graph__maxy-graph-read_neo4j_cypher`:
|
|
295
|
-
|
|
296
|
-
- `MATCH (o:Observation {accountId:$acct, observationStatus:'auto-extracted'})-[:OBSERVED_IN]->(c:Conversation {conversationId:$cid}) RETURN count(o)` — should be 0 after a complete enrich.
|
|
297
|
-
- `MATCH (p:Person {participantStatus:'auto-created', accountId:$acct, source:'whatsapp'})-[:PARTICIPANT_IN]->(c:Conversation {conversationId:$cid}) RETURN count(p)` — equals the count of skipped rows from the chat summary.
|
|
298
|
-
- `MATCH ()-[r:MENTIONS {createdByAgent:'whatsapp-import-enrich'}]->() WHERE r.evidenceSnippet IS NOT NULL RETURN count(r)` — equals `wiredMentions` from the chat summary.
|
|
299
|
-
- Re-run the skill against the same conversation immediately. Bulk preview should report `auto-participants=<skippedCount>` and `auto-observations=<rejectedAndUnwiredCount>` — never duplicate edges, never duplicate `:Task`/`:Preference`.
|
|
300
|
-
|
|
301
|
-
## Observability — log lines
|
|
302
|
-
|
|
303
|
-
Every line emitted to chat is mirrored into the per-conversation agent-stream log (greppable via `ssh neo@<host> "grep -nE '\[whatsapp-import-enrich\]' ~/<install>/data/accounts/<accountId>/logs/<conversationId>-claude-agent-stream.log"`):
|
|
304
|
-
|
|
305
|
-
- `[whatsapp-import-enrich] start conversationId=<id> auto-participants=<n> auto-observations=<n>`
|
|
306
|
-
- `[whatsapp-import-enrich] participant action=<promoted-existing|minted-new|merged-with-id|reattributed-to-owner|skipped> name=<name> elementId=<id>`
|
|
307
|
-
- `[whatsapp-import-enrich] observation kind=<mention|task|preference|observed-relationship> action=<wired-mention|task-created|preference-written|relationship-confirmed|rejected> elementId=<id>` (mention rows append `edge=MENTIONS-from-Message` or `edge=MENTIONS-from-Conversation-fallback`)
|
|
308
|
-
- `[whatsapp-import-enrich] done conversationId=<id> wired=<n> skipped=<n> rejected=<n> ms=<n>`
|
|
309
|
-
|
|
310
|
-
**Confirms correct behaviour:** one `start … done` pair per enrich run; every `:Observation` row transitions out of `auto-extracted`; every `participant` log line cites a real `elementId`; the `done` line's wired/skipped/rejected sum equals the `start` line's `auto-observations` count.
|
|
311
|
-
|
|
312
|
-
**Indicates failure:** post-run grep `'observationStatus="auto-extracted"'` non-zero (silent SET no-op); duplicate `participant action=promoted-existing` for the same elementId across reruns (idempotency violation); `done` line missing from a run that emitted `start` (mid-walk crash, no rollback).
|
|
313
|
-
|
|
314
|
-
## Tools this skill uses
|
|
315
|
-
|
|
316
|
-
Every prescribed tool resolves on database-operator's frontmatter `tools:` list. The pre-publish gate `platform/scripts/verify-skill-tool-surface.sh` asserts this statically:
|
|
317
|
-
|
|
318
|
-
- `mcp__memory__whatsapp-export-insight-pass` — Phase 2a chunked-Haiku insight extraction (chunkSize=50, overlap=5, confidence>=0.8). Lays down `:Observation` rows the rest of this skill walks. Owns the LLM half of WhatsApp ingest — Phase 1 has none.
|
|
319
|
-
- `mcp__graph__maxy-graph-read_neo4j_cypher` — bulk preview, evidence reads, messageId recovery, owner-reconciliation lookup.
|
|
320
|
-
- `mcp__graph__maxy-graph-write_neo4j_cypher` — `apoc.refactor.mergeNodes`, `:MENTIONS` and `:RELATED_TO` MERGEs, status-update SETs.
|
|
321
|
-
- `mcp__memory__memory-search` — entity disambiguation for mentions and observed-relationship endpoints.
|
|
322
|
-
- `mcp__memory__memory-write` — `:Preference` node creation with `:OBSERVED_IN` edge.
|
|
323
|
-
- `mcp__contacts__contact-create` — mint-new-Person path.
|
|
324
|
-
- `mcp__tasks__task-create` — `:Task` node creation with `affects=$conversationElementId`.
|
|
325
|
-
|
|
326
|
-
Raw Cypher and `cypher-shell` are forbidden in this skill (per [database-operator's LOUD-FAIL prerogative](../../../../templates/specialists/agents/database-operator.md#prerogatives)). Every write goes through the MCP tool surface above. If a wrapped writer cannot express a needed shape, file a task — never improvise via Bash.
|
|
327
|
-
|
|
328
|
-
## What this is not
|
|
329
|
-
|
|
330
|
-
- **Not** Phase 1. Parse and archive-write live in `whatsapp-import` (the deterministic Bash entry, LLM-FREE). This skill never re-parses. The Haiku insight pass lives here — Step 0 above is the one sanctioned LLM entry for WhatsApp ingest, and it is invoked consciously by the operator, not silently on archive-write.
|
|
331
|
-
- **Not** automatic. Every transition out of `auto-created` / `auto-extracted` requires an operator action — no auto-promotion, no auto-mention-acceptance, no batch confirmation. Compress-at-ingest doctrine requires per-row operator judgement.
|
|
332
|
-
- **Not** cross-conversation. The walk is scoped to one Conversation. Cross-conversation participant deduplication (the same person under two conversations) is operator-driven graph hygiene via [database-operator.md §Dedup merges](../../../../templates/specialists/agents/database-operator.md#dedup-merges), not this skill.
|
|
333
|
-
- **Not** a backfill tool. This skill assumes the Phase 1 contract and refuses to walk a conversation without `c.lastImportedAt`.
|