ai-browser-profile 1.0.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/setup/SKILL.md ADDED
@@ -0,0 +1,177 @@
1
+ ---
2
+ name: ai-browser-profile-setup
3
+ description: "Set up ai-browser-profile for a new user. Installs via npm, creates Python venv, extracts browser data, and optionally enables semantic search. Use when: 'set up browser profile', 'install ai browser profile', 'configure browser profile'."
4
+ ---
5
+
6
+ # AI Browser Profile Setup
7
+
8
+ Interactive setup wizard for ai-browser-profile. Walk the user through installation and first extraction.
9
+
10
+ ## When to use
11
+
12
+ - First-time setup of ai-browser-profile
13
+ - Reinstalling after a fresh machine setup
14
+ - Troubleshooting a broken installation
15
+
16
+ ## Prerequisites
17
+
18
+ - Node.js 16+ (for `npx`)
19
+ - Python 3.10+
20
+ - macOS (browser paths are macOS-specific)
21
+
22
+ ---
23
+
24
+ ## Setup Flow
25
+
26
+ Run each step sequentially. **After each step, print a progress status to the user** so they can follow along:
27
+
28
+ ```
29
+ [1/7] Install ............ done (12s)
30
+ [2/7] Verify ............. done (1s)
31
+ [3/7] Extract ............ running...
32
+ ```
33
+
34
+ ### Step 1: Install via npm
35
+
36
+ Check if already installed:
37
+
38
+ ```bash
39
+ ls ~/ai-browser-profile/extract.py 2>/dev/null && echo "FOUND" || echo "NOT_FOUND"
40
+ ```
41
+
42
+ If NOT_FOUND, install:
43
+ ```bash
44
+ npx ai-browser-profile init
45
+ ```
46
+
47
+ This:
48
+ - Copies Python source + skills to `~/ai-browser-profile/`
49
+ - Creates a Python venv at `~/ai-browser-profile/.venv/`
50
+ - Installs core deps (`ccl_chromium_reader`, `numpy`)
51
+ - Symlinks skills into `~/.claude/skills/`
52
+
53
+ To update code later without touching data:
54
+ ```bash
55
+ npx ai-browser-profile update
56
+ ```
57
+
58
+ **Tell the user:** "Installed ai-browser-profile to ~/ai-browser-profile. Python venv created, core deps installed, skills symlinked."
59
+
60
+ ### Step 2: Verify the installation
61
+
62
+ ```bash
63
+ ~/ai-browser-profile/.venv/bin/python -c "
64
+ import sys
65
+ sys.path.insert(0, '$HOME/ai-browser-profile')
66
+ from ai_browser_profile import MemoryDB
67
+ print('MemoryDB imported successfully')
68
+ "
69
+ ```
70
+
71
+ Expected: `MemoryDB imported successfully`
72
+
73
+ If it fails, check:
74
+ - Python venv exists: `ls ~/ai-browser-profile/.venv/bin/python`
75
+ - Deps installed: `~/ai-browser-profile/.venv/bin/pip list | grep ccl`
76
+
77
+ **Tell the user:** "Python environment verified - MemoryDB loads correctly."
78
+
79
+ ### Step 3: Run extraction
80
+
81
+ **IMPORTANT:** Run extraction in the background so you can report progress to the user. The extraction has 8 stages and logs timing for each.
82
+
83
+ ```bash
84
+ cd ~/ai-browser-profile && source .venv/bin/activate && python extract.py 2>&1
85
+ ```
86
+
87
+ This scans all detected browsers (Arc, Chrome, Brave, Edge, Safari, Firefox) and extracts:
88
+ - Autofill profiles (names, emails, phones, addresses)
89
+ - Login data (accounts per domain)
90
+ - Browser history (tools/services used)
91
+ - Bookmarks (interests, tool usage)
92
+ - IndexedDB (WhatsApp contacts)
93
+ - Local Storage (LinkedIn connections)
94
+ - Notion (workspace contacts, if configured)
95
+ - Embeddings (semantic vectors, backfilled at end)
96
+
97
+ **INTERIM PROFILE:** The extraction pipeline prints an interim profile after the fast steps (autofill, history, bookmarks, logins, Notion — ~1s total) but before the slow steps (WhatsApp ~10s, embeddings ~3min). **As soon as you see the "Interim profile ready" log line, show the profile to the user immediately.** Don't wait for WhatsApp or embeddings to finish — the profile already has all identity, email, address, payment, account, and tool data. WhatsApp only adds a contact count.
98
+
99
+ Look for this in the logs:
100
+ ```
101
+ Interim profile ready (WhatsApp + embeddings still running):
102
+ ## User Profile
103
+ **Name:** ...
104
+ ```
105
+
106
+ **Show this to the user right away**, then let the extraction continue in the background. Tell them: "Here's your profile from browser data. WhatsApp contacts and semantic embeddings are still processing..."
107
+
108
+ **After extraction + cleanup finish, report a final summary to the user:**
109
+
110
+ ```
111
+ Extraction complete:
112
+ Browsers scanned: 8 profiles (Arc, Chrome, Safari, Firefox)
113
+ Raw memories: 5,878
114
+ After cleanup: 5,431
115
+ Time: 54s
116
+
117
+ Breakdown:
118
+ Autofill: 0.1s (forms, addresses, cards)
119
+ History: 1.8s (tools & services)
120
+ Bookmarks: 0.4s (interests & links)
121
+ Logins: 2.1s (saved accounts)
122
+ LinkedIn: 8.7s (connections)
123
+ Notion: 0.1s (contacts & pages)
124
+ WhatsApp: 15.3s (contacts)
125
+ Embeddings: 22.4s (semantic vectors)
126
+ ```
127
+
128
+ ### Step 4: Verify extraction
129
+
130
+ ```bash
131
+ ~/ai-browser-profile/.venv/bin/python -c "
132
+ import sys, os
133
+ sys.path.insert(0, os.path.expanduser('~/ai-browser-profile'))
134
+ from ai_browser_profile import MemoryDB
135
+ mem = MemoryDB(os.path.expanduser('~/ai-browser-profile/memories.db'))
136
+ stats = mem.stats()
137
+ print(f'Total memories: {stats[\"total_memories\"]}')
138
+ print()
139
+ print(mem.profile_text())
140
+ mem.close()
141
+ "
142
+ ```
143
+
144
+ **Show the profile to the user.** Check that name, email, phone, address look reasonable. If the primary email is wrong (a contact's email ranked higher), note that the review pipeline will fix this.
145
+
146
+ ### Step 5: Set up automation (optional)
147
+
148
+ Ask: "Do you want weekly automatic extraction + review? (y/n)"
149
+
150
+ If yes (macOS):
151
+ ```bash
152
+ ln -sf ~/ai-browser-profile/launchd/com.m13v.memory-review.plist ~/Library/LaunchAgents/
153
+ launchctl load ~/Library/LaunchAgents/com.m13v.memory-review.plist
154
+ ```
155
+
156
+ Schedule: extracts new browser data weekly, then runs Claude to review new entries.
157
+
158
+ ### Step 6: Summary
159
+
160
+ Print a final status card:
161
+
162
+ ```
163
+ Setup Complete
164
+
165
+ Location: ~/ai-browser-profile
166
+ Database: ~/ai-browser-profile/memories.db
167
+ Python: ~/ai-browser-profile/.venv/bin/python
168
+ Skills: ~/.claude/skills/ai-browser-profile (+ 4 more)
169
+
170
+ Memories: 5,431
171
+ Embeddings: 5,431 vectors (semantic search enabled)
172
+ Automation: launchd weekly / not set up
173
+
174
+ Try it: Tell Claude "what's my email address"
175
+ Update: npx ai-browser-profile update
176
+ Review: /memory-review (Claude-powered cleanup)
177
+ ```
package/skill/SKILL.md ADDED
@@ -0,0 +1,180 @@
1
+ ---
2
+ name: ai-browser-profile
3
+ description: "Query the user's AI browser profile: identity, accounts, tools, contacts, addresses, payments extracted from browser data. Use when you need context about the user to help with any task: form filling, emailing, booking, payments, or any task where knowing the user's info helps."
4
+ ---
5
+
6
+ # AI Browser Profile
7
+
8
+ A self-ranking database of everything learned about the user from browser data. Memories are ranked by how often they're accessed vs how often they appear in search results — frequently useful memories rise, noise sinks.
9
+
10
+ ## Quick Reference
11
+
12
+ | Item | Value |
13
+ |------|-------|
14
+ | Database | `~/ai-browser-profile/memories.db` |
15
+ | Module | `~/ai-browser-profile/ai_browser_profile/` |
16
+ | Python | `~/ai-browser-profile/.venv/bin/python` |
17
+ | Rebuild | `~/ai-browser-profile/.venv/bin/python ~/ai-browser-profile/extract.py` |
18
+
19
+ ## How to Use
20
+
21
+ ### User profile (start here)
22
+
23
+ Get a compact overview of the user — name, emails, addresses, accounts, tools, contacts. This is deterministic (no LLM) and computed from the database. Use it as baseline context before doing any task.
24
+
25
+ ```python
26
+ import sys, os
27
+ sys.path.insert(0, os.path.expanduser("~/ai-browser-profile"))
28
+ from ai_browser_profile import MemoryDB
29
+
30
+ mem = MemoryDB(os.path.expanduser("~/ai-browser-profile/memories.db"))
31
+ print(mem.profile_text()) # markdown formatted, ~1.5KB
32
+ mem.close()
33
+ ```
34
+
35
+ The profile shows: name, all known emails, phone numbers, handles, addresses, payment info, companies, top tools/services, accounts grouped by email, Notion projects, and contact count. Values are ranked by frequency across browser profiles — higher frequency = more likely to be the user's own data.
36
+
37
+ ### Search by tags
38
+
39
+ ```python
40
+ import sys, os
41
+ sys.path.insert(0, os.path.expanduser("~/ai-browser-profile"))
42
+ from ai_browser_profile import MemoryDB
43
+
44
+ mem = MemoryDB(os.path.expanduser("~/ai-browser-profile/memories.db"))
45
+
46
+ # Search returns results ranked by hit_rate (accessed/appeared), then counts
47
+ # accessed_count and appeared_count are auto-incremented on every search call
48
+ results = mem.search(["identity", "contact_info"], limit=10)
49
+ for r in results:
50
+ print(f'{r["key"]}: {r["value"]}')
51
+
52
+ mem.close()
53
+ ```
54
+
55
+ ### Semantic search (natural language)
56
+
57
+ ```python
58
+ # Find memories by meaning, not just keywords
59
+ results = mem.semantic_search("what products does the user build")
60
+ for r in results[:5]:
61
+ print(f'{r["key"]}: {r["value"][:80]} (sim={r["similarity"]:.3f})')
62
+
63
+ # Falls back to text_search() if embeddings not installed
64
+ # Install with: npx ai-browser-profile install-embeddings
65
+ ```
66
+
67
+ ### Quick SQL queries
68
+
69
+ ```bash
70
+ sqlite3 ~/ai-browser-profile/memories.db
71
+ ```
72
+
73
+ ```sql
74
+ -- All identity info
75
+ SELECT m.key, m.value FROM memories m
76
+ JOIN memory_tags t ON m.id = t.memory_id WHERE t.tag = 'identity'
77
+ AND m.superseded_by IS NULL;
78
+
79
+ -- All contact info (emails, phones)
80
+ SELECT m.key, m.value, m.source FROM memories m
81
+ JOIN memory_tags t ON m.id = t.memory_id WHERE t.tag = 'contact_info'
82
+ AND m.superseded_by IS NULL;
83
+
84
+ -- All contacts
85
+ SELECT m.key, m.value FROM memories m
86
+ JOIN memory_tags t ON m.id = t.memory_id WHERE t.tag = 'contact'
87
+ AND m.superseded_by IS NULL
88
+ ORDER BY m.accessed_count DESC;
89
+
90
+ -- Most accessed memories (the ones that proved useful)
91
+ SELECT key, value, accessed_count, appeared_count,
92
+ CAST(accessed_count AS REAL) / MAX(appeared_count, 1) AS hit_rate
93
+ FROM memories WHERE accessed_count > 0
94
+ ORDER BY hit_rate DESC;
95
+
96
+ -- Search by key pattern
97
+ SELECT key, value FROM memories WHERE key LIKE 'account:%'
98
+ AND superseded_by IS NULL;
99
+ ```
100
+
101
+ ## Canonical Tags
102
+
103
+ | Tag | What it covers | Example keys |
104
+ |-----|---------------|-------------|
105
+ | `identity` | Name, DOB, gender, job title, language | `first_name`, `last_name`, `full_name`, `date_of_birth` |
106
+ | `contact_info` | Email addresses, phone numbers | `email`, `phone` |
107
+ | `address` | Physical addresses | `street_address`, `city`, `state`, `zip`, `country` |
108
+ | `payment` | Card holder names, expiry | `card_holder_name`, `card_expiry`, `card_nickname` |
109
+ | `account` | Service accounts, login credentials | `account:{domain}` |
110
+ | `tool` | Tools/services used (from history) | `tool:GitHub`, `tool:Slack`, `tool:Stripe` |
111
+ | `contact` | People the user knows | `contact:{Name}`, `linkedin:{Name}` |
112
+ | `work` | Work-related (company, LinkedIn) | `company`, `linkedin:*` |
113
+ | `knowledge` | Interests, skills, projects, products | `product:*`, `project:*`, `interest:*` |
114
+ | `communication` | Messaging platforms | `tool:Slack`, `tool:WhatsApp` |
115
+ | `social` | Social platforms | `tool:LinkedIn`, `tool:X/Twitter` |
116
+ | `finance` | Financial tools | `tool:Stripe`, `tool:QuickBooks` |
117
+
118
+ ## Ranking System
119
+
120
+ Every `search()`, `semantic_search()`, and `text_search()` call automatically increments both `appeared_count` and `accessed_count` for all returned results. No manual `mark_accessed()` calls needed.
121
+
122
+ **hit_rate** = `accessed_count / appeared_count`
123
+
124
+ Memories that are frequently returned by searches rise in ranking. The system is fully automatic — no manual curation or agent instrumentation needed.
125
+
126
+ ## Semantic Dedup
127
+
128
+ On `upsert()`, near-duplicate memories (cosine similarity >= 0.92 with same key prefix) are automatically superseded. This prevents storing "Screen recording tool for compliance" and "Screen recording tool launched on Product Hunt for compliance use cases" as separate entries.
129
+
130
+ ## Task-Specific Tag Queries
131
+
132
+ | Task | Tags to search |
133
+ |------|---------------|
134
+ | Fill out a form | `["identity", "contact_info", "address"]` |
135
+ | Send an email | `["contact_info", "communication"]` + search contact by name |
136
+ | Book a flight/hotel | `["identity", "address", "payment"]` |
137
+ | Log into a service | `["account"]` |
138
+ | Invoice a client | `["identity", "work", "address", "payment"]` |
139
+ | Find a contact | `["contact"]` + filter by key pattern |
140
+ | Dev/deploy task | `["account", "tool"]` |
141
+ | Social media post | `["account", "social"]` |
142
+ | Research question | `mem.semantic_search("your question here")` |
143
+
144
+ ## Rebuilding Memories
145
+
146
+ To refresh from latest browser data:
147
+
148
+ ```bash
149
+ cd ~/ai-browser-profile
150
+ source .venv/bin/activate
151
+ python extract.py # full scan
152
+ python extract.py --browsers arc chrome # specific browsers
153
+ python extract.py --no-indexeddb --no-localstorage # fast, skip LevelDB
154
+ ```
155
+
156
+ ### Backfill embeddings (after install-embeddings)
157
+
158
+ ```python
159
+ import sys, os
160
+ sys.path.insert(0, os.path.expanduser("~/ai-browser-profile"))
161
+ from ai_browser_profile import MemoryDB
162
+ mem = MemoryDB(os.path.expanduser("~/ai-browser-profile/memories.db"))
163
+ n = mem.backfill_embeddings()
164
+ print(f"Embedded {n} memories")
165
+ mem.close()
166
+ ```
167
+
168
+ This reads browser files directly (History, Login Data, Web Data, IndexedDB, Local Storage). The memory database preserves `appeared_count` and `accessed_count` across rebuilds via UPSERT logic — rankings are never lost.
169
+
170
+ ## Dependencies
171
+
172
+ **Core** (installed by `npx ai-browser-profile init`):
173
+ - `ccl_chromium_reader` — IndexedDB + Local Storage LevelDB files
174
+ - `numpy` — vector math for cosine similarity
175
+
176
+ **Embeddings** (optional, installed by `npx ai-browser-profile install-embeddings`):
177
+ - `onnxruntime` — ONNX model inference
178
+ - `huggingface_hub` — model downloading
179
+ - `tokenizers` — text tokenization
180
+ - Model: nomic-embed-text-v1.5 (~131MB, downloads on first use)
@@ -0,0 +1,321 @@
1
+ ---
2
+ name: whatsapp-analysis
3
+ description: "Analyze WhatsApp Web data — contacts, groups, social graph, AND decrypt actual message content via live browser interception. Use when: 'WhatsApp contacts', 'WhatsApp groups', 'WhatsApp analysis', 'who do I talk to', 'WhatsApp messages', 'decrypt WhatsApp', 'read WhatsApp messages', 'WhatsApp network', 'social graph', 'inner circle'."
4
+ ---
5
+
6
+ # WhatsApp Analysis
7
+
8
+ Two capabilities:
9
+ 1. **Live message decryption** — intercept `crypto.subtle.decrypt` via Playwright to read actual message text (requires open browser session)
10
+ 2. **Metadata analysis** — contacts, groups, social graph from IndexedDB (offline, fast)
11
+
12
+ ## Part 1: Live Message Decryption (Playwright)
13
+
14
+ ### How It Works
15
+
16
+ WhatsApp Web encrypts messages in IndexedDB (`msgRowOpaqueData`) using **AES-CBC-128** with keys derived via **HKDF-SHA256**. The crypto happens on the **main thread** (not Web Workers). By intercepting `crypto.subtle` before page JS loads, we capture plaintext output and extractable keys.
17
+
18
+ ### Prerequisites
19
+
20
+ - MCP Playwright browser available
21
+ - WhatsApp Web logged in (or QR code scan needed)
22
+
23
+ ### Step 1: Navigate to WhatsApp Web
24
+
25
+ ```
26
+ Use browser_navigate to go to https://web.whatsapp.com
27
+ Wait for the page to load. If QR code appears, user must scan it.
28
+ Use browser_snapshot to verify the chat list is visible.
29
+ ```
30
+
31
+ ### Step 2: Install the crypto interceptor
32
+
33
+ Use `browser_run_code` with this **exact** code to install the interceptor via `addInitScript` (runs before any page JS on reload):
34
+
35
+ ```javascript
36
+ async (page) => {
37
+ await page.addInitScript(() => {
38
+ const scope = (typeof globalThis !== 'undefined') ? globalThis : self;
39
+ scope.__waCaptured = { decrypt: [], deriveKey: [], importKey: [] };
40
+
41
+ const origImportKey = crypto.subtle.importKey.bind(crypto.subtle);
42
+ crypto.subtle.importKey = function(format, keyData, algorithm, extractable, keyUsages) {
43
+ return origImportKey(format, keyData, algorithm, true, keyUsages);
44
+ };
45
+
46
+ const origGenerateKey = crypto.subtle.generateKey.bind(crypto.subtle);
47
+ crypto.subtle.generateKey = function(algorithm, extractable, keyUsages) {
48
+ return origGenerateKey(algorithm, true, keyUsages);
49
+ };
50
+
51
+ const origDeriveKey = crypto.subtle.deriveKey.bind(crypto.subtle);
52
+ crypto.subtle.deriveKey = async function(algorithm, baseKey, derivedKeyType, extractable, keyUsages) {
53
+ const result = await origDeriveKey(algorithm, baseKey, derivedKeyType, true, keyUsages);
54
+ try {
55
+ const raw = await crypto.subtle.exportKey('raw', result);
56
+ const b64 = btoa(String.fromCharCode(...new Uint8Array(raw)));
57
+ scope.__waCaptured.deriveKey.push({
58
+ ts: Date.now(),
59
+ alg: algorithm.name,
60
+ saltLen: algorithm.salt ? algorithm.salt.byteLength : 0,
61
+ derivedAlg: derivedKeyType.name,
62
+ derivedLen: derivedKeyType.length,
63
+ keyB64: b64,
64
+ usages: keyUsages
65
+ });
66
+ } catch(e) {}
67
+ return result;
68
+ };
69
+
70
+ const origDecrypt = crypto.subtle.decrypt.bind(crypto.subtle);
71
+ crypto.subtle.decrypt = async function(algorithm, key, data) {
72
+ const result = await origDecrypt(algorithm, key, data);
73
+ let keyB64 = null;
74
+ try {
75
+ const raw = await crypto.subtle.exportKey('raw', key);
76
+ keyB64 = btoa(String.fromCharCode(...new Uint8Array(raw)));
77
+ } catch(e) { keyB64 = 'export-failed'; }
78
+
79
+ const entry = {
80
+ ts: Date.now(),
81
+ alg: algorithm.name,
82
+ key: keyB64,
83
+ inSize: data.byteLength,
84
+ outSize: result.byteLength
85
+ };
86
+
87
+ if (result.byteLength > 0 && result.byteLength < 500000) {
88
+ const slice = new Uint8Array(result.slice(0, Math.min(1500, result.byteLength)));
89
+ entry.outB64 = btoa(String.fromCharCode(...slice));
90
+ }
91
+ scope.__waCaptured.decrypt.push(entry);
92
+ return result;
93
+ };
94
+ });
95
+
96
+ await page.reload({ waitUntil: 'networkidle' });
97
+ await page.waitForTimeout(8000);
98
+ return 'Interceptor installed and page reloaded. Decryptions are being captured.';
99
+ }
100
+ ```
101
+
102
+ ### Step 3: Collect captured decryptions
103
+
104
+ Wait 10-15 seconds after reload, then extract with `browser_run_code`:
105
+
106
+ ```javascript
107
+ async (page) => {
108
+ const data = await page.evaluate(() => {
109
+ const c = (globalThis || self).__waCaptured;
110
+ if (!c) return JSON.stringify({error: 'no captures'});
111
+ return JSON.stringify({
112
+ decryptCount: c.decrypt.length,
113
+ deriveKeyCount: c.deriveKey.length,
114
+ derivedKeys: c.deriveKey,
115
+ sample: c.decrypt.slice(0, 5)
116
+ });
117
+ });
118
+ return data;
119
+ }
120
+ ```
121
+
122
+ ### Step 4: Extract and parse message text
123
+
124
+ Decrypted output is **protobuf**. Extract text with `browser_run_code`:
125
+
126
+ ```javascript
127
+ async (page) => {
128
+ const result = await page.evaluate(() => {
129
+ const c = (globalThis || self).__waCaptured;
130
+ if (!c) return JSON.stringify({error: 'no captures'});
131
+
132
+ function readVarint(bytes, pos) {
133
+ let result = 0, shift = 0;
134
+ while (pos < bytes.length) {
135
+ const byte = bytes[pos];
136
+ result |= (byte & 0x7f) << shift;
137
+ shift += 7; pos++;
138
+ if (!(byte & 0x80)) return [result, pos];
139
+ }
140
+ return [null, pos];
141
+ }
142
+
143
+ function parseProtobufText(bytes) {
144
+ if (!bytes || bytes[0] !== 0x0a) return null;
145
+ let pos = 1;
146
+ let [outerLen, p1] = readVarint(bytes, pos);
147
+ if (outerLen === null || p1 >= bytes.length) return null;
148
+ pos = p1;
149
+ if (bytes[pos] !== 0x0a) return null;
150
+ pos++;
151
+ let [textLen, p2] = readVarint(bytes, pos);
152
+ if (textLen === null || textLen <= 0 || p2 + textLen > bytes.length) return null;
153
+ pos = p2;
154
+ try {
155
+ return new TextDecoder('utf-8').decode(bytes.slice(pos, pos + textLen));
156
+ } catch(e) { return null; }
157
+ }
158
+
159
+ function extractSender(bytes) {
160
+ try {
161
+ const str = Array.from(bytes).map(b => String.fromCharCode(b)).join('');
162
+ const match = str.match(/(\d+@(?:s\.whatsapp\.net|lid|g\.us))/);
163
+ return match ? match[1] : null;
164
+ } catch(e) { return null; }
165
+ }
166
+
167
+ const messages = [];
168
+ for (const d of c.decrypt) {
169
+ if (!d.outB64 || d.alg !== 'AES-CBC') continue;
170
+ try {
171
+ const bytes = Uint8Array.from(atob(d.outB64), c => c.charCodeAt(0));
172
+ const text = parseProtobufText(bytes);
173
+ if (text && text.length > 0) {
174
+ messages.push({
175
+ text: text,
176
+ sender: extractSender(bytes),
177
+ ts: d.ts
178
+ });
179
+ }
180
+ } catch(e) {}
181
+ }
182
+
183
+ return JSON.stringify({
184
+ totalDecrypts: c.decrypt.length,
185
+ textMessages: messages.length,
186
+ messages: messages
187
+ });
188
+ });
189
+ return result;
190
+ }
191
+ ```
192
+
193
+ ### Step 5: Navigate to more chats for additional messages
194
+
195
+ WhatsApp only decrypts messages for loaded chats. To capture more:
196
+
197
+ ```
198
+ Use browser_snapshot to see the chat list.
199
+ Click on different chats using browser_click with the ref for each chat.
200
+ Wait 3-5 seconds between clicks for decryption to complete.
201
+ Re-run Step 4 to collect newly decrypted messages.
202
+ ```
203
+
204
+ ### Step 6: Store messages in memories.db
205
+
206
+ ```python
207
+ import sys, os
208
+ sys.path.insert(0, os.path.expanduser("~/ai-browser-profile"))
209
+ from ai_browser_profile import MemoryDB
210
+ from ai_browser_profile.ingestors.messages import ingest_messages, message_stats
211
+
212
+ mem = MemoryDB(os.path.expanduser("~/ai-browser-profile/memories.db"))
213
+
214
+ # messages = the parsed messages list from Step 4
215
+ inserted = ingest_messages(mem, messages)
216
+ print(f"Inserted {inserted} new messages")
217
+
218
+ stats = message_stats(mem)
219
+ print(f"Total stored: {stats['total_messages']}")
220
+
221
+ mem.close()
222
+ ```
223
+
224
+ ### Step 7: Analyze messages and write relationship memories
225
+
226
+ ```python
227
+ from ai_browser_profile.ingestors.messages import get_messages
228
+
229
+ mem = MemoryDB(os.path.expanduser("~/ai-browser-profile/memories.db"))
230
+
231
+ all_msgs = get_messages(mem, limit=1000)
232
+
233
+ # After analyzing, write relationship/interest memories:
234
+ mem.upsert("relationship:ContactName", "description of relationship",
235
+ ["contact", "relationship"], 0.8, "whatsapp:messages")
236
+
237
+ mem.conn.commit()
238
+ mem.close()
239
+ ```
240
+
241
+ ---
242
+
243
+ ## Part 2: Metadata Analysis (Offline from IndexedDB)
244
+
245
+ ### Prerequisites
246
+
247
+ Run the memory extraction first to populate contacts in `memories.db`:
248
+
249
+ ```bash
250
+ cd ~/ai-browser-profile
251
+ source .venv/bin/activate
252
+ python extract.py
253
+ ```
254
+
255
+ For deeper metadata analysis, read IndexedDB directly:
256
+
257
+ ```python
258
+ import shutil, tempfile, json
259
+ from pathlib import Path
260
+ from ccl_chromium_reader import ccl_chromium_indexeddb
261
+
262
+ APP_SUPPORT = Path.home() / "Library" / "Application Support"
263
+ arc_idb = APP_SUPPORT / "Arc" / "User Data" / "Default" / "IndexedDB"
264
+
265
+ for db_dir in arc_idb.glob("*whatsapp*_0.indexeddb.leveldb"):
266
+ tmp = Path(tempfile.mkdtemp())
267
+ shutil.copytree(db_dir, tmp / db_dir.name)
268
+ blob_dir = db_dir.parent / db_dir.name.replace(".leveldb", ".blob")
269
+ tmp_blob = None
270
+ if blob_dir.exists():
271
+ tmp_blob = Path(tempfile.mkdtemp())
272
+ shutil.copytree(blob_dir, tmp_blob / blob_dir.name)
273
+
274
+ wrapper = ccl_chromium_indexeddb.WrappedIndexDB(
275
+ str(tmp / db_dir.name),
276
+ str(tmp_blob / blob_dir.name) if tmp_blob else None,
277
+ )
278
+ # Now iterate stores...
279
+ ```
280
+
281
+ ### Data Available
282
+
283
+ WhatsApp Web stores 51 IndexedDB object stores. Message bodies are encrypted (Signal protocol), but all metadata is plaintext:
284
+
285
+ | Store | Records | What's in it |
286
+ |-------|---------|-------------|
287
+ | contact | 1000 | Phone numbers, names, isAddressBook, isBusiness |
288
+ | chat | 1000 | Chat IDs, last message timestamps, unread counts |
289
+ | group-metadata | 400+ | Group subjects, creation dates, owner phone |
290
+ | participant | 400+ | Group member phone lists |
291
+ | message | 1000 | Message type, from/to/author, timestamps (NOT body) |
292
+ | reactions | 900+ | Emoji reactions with sender, timestamp |
293
+
294
+ ### Inner Circle Analysis
295
+
296
+ Count shared group membership to find closest contacts:
297
+
298
+ ```python
299
+ your_groups = set()
300
+ person_groups = {}
301
+
302
+ for participant_record in participants:
303
+ gid = record['groupId']
304
+ for p in record['participants']:
305
+ phone = p.split('@')[0]
306
+ resolved = lid_to_phone.get(phone, phone)
307
+ if resolved == YOUR_NUMBER:
308
+ your_groups.add(gid)
309
+ person_groups.setdefault(resolved, set()).add(gid)
310
+
311
+ shared = {p: len(gs & your_groups) for p, gs in person_groups.items() if p != YOUR_NUMBER}
312
+ top_connections = sorted(shared.items(), key=lambda x: -x[1])
313
+ ```
314
+
315
+ ## Known Limitations
316
+
317
+ - **Session-specific keys**: HKDF-derived keys change per browser session
318
+ - **Offline decryption**: Does not work across sessions
319
+ - **Record cap**: 1000 per IndexedDB store
320
+ - **Contact names**: Only available for address book contacts
321
+ - **Timestamps**: Unix epoch seconds — convert with `datetime.fromtimestamp(t, tz=timezone.utc)`