ai-browser-profile 1.0.5
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +118 -0
- package/ai_browser_profile/__init__.py +6 -0
- package/ai_browser_profile/db.py +929 -0
- package/ai_browser_profile/embeddings.py +196 -0
- package/ai_browser_profile/extract.py +108 -0
- package/ai_browser_profile/ingestors/__init__.py +0 -0
- package/ai_browser_profile/ingestors/bookmarks.py +185 -0
- package/ai_browser_profile/ingestors/browser_detect.py +100 -0
- package/ai_browser_profile/ingestors/constants.py +208 -0
- package/ai_browser_profile/ingestors/history.py +123 -0
- package/ai_browser_profile/ingestors/indexeddb.py +203 -0
- package/ai_browser_profile/ingestors/localstorage.py +66 -0
- package/ai_browser_profile/ingestors/logins.py +46 -0
- package/ai_browser_profile/ingestors/messages.py +151 -0
- package/ai_browser_profile/ingestors/notion.py +313 -0
- package/ai_browser_profile/ingestors/webdata.py +134 -0
- package/autofill/SKILL.md +252 -0
- package/bin/cli.js +315 -0
- package/clean.py +295 -0
- package/extract.py +53 -0
- package/package.json +40 -0
- package/review/SKILL.md +171 -0
- package/review/run.sh +82 -0
- package/setup/SKILL.md +177 -0
- package/skill/SKILL.md +180 -0
- package/whatsapp/SKILL.md +321 -0
package/setup/SKILL.md
ADDED
|
@@ -0,0 +1,177 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: ai-browser-profile-setup
|
|
3
|
+
description: "Set up ai-browser-profile for a new user. Installs via npm, creates Python venv, extracts browser data, and optionally enables semantic search. Use when: 'set up browser profile', 'install ai browser profile', 'configure browser profile'."
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# AI Browser Profile Setup
|
|
7
|
+
|
|
8
|
+
Interactive setup wizard for ai-browser-profile. Walk the user through installation and first extraction.
|
|
9
|
+
|
|
10
|
+
## When to use
|
|
11
|
+
|
|
12
|
+
- First-time setup of ai-browser-profile
|
|
13
|
+
- Reinstalling after a fresh machine setup
|
|
14
|
+
- Troubleshooting a broken installation
|
|
15
|
+
|
|
16
|
+
## Prerequisites
|
|
17
|
+
|
|
18
|
+
- Node.js 16+ (for `npx`)
|
|
19
|
+
- Python 3.10+
|
|
20
|
+
- macOS (browser paths are macOS-specific)
|
|
21
|
+
|
|
22
|
+
---
|
|
23
|
+
|
|
24
|
+
## Setup Flow
|
|
25
|
+
|
|
26
|
+
Run each step sequentially. **After each step, print a progress status to the user** so they can follow along:
|
|
27
|
+
|
|
28
|
+
```
|
|
29
|
+
[1/7] Install ............ done (12s)
|
|
30
|
+
[2/7] Verify ............. done (1s)
|
|
31
|
+
[3/7] Extract ............ running...
|
|
32
|
+
```
|
|
33
|
+
|
|
34
|
+
### Step 1: Install via npm
|
|
35
|
+
|
|
36
|
+
Check if already installed:
|
|
37
|
+
|
|
38
|
+
```bash
|
|
39
|
+
ls ~/ai-browser-profile/extract.py 2>/dev/null && echo "FOUND" || echo "NOT_FOUND"
|
|
40
|
+
```
|
|
41
|
+
|
|
42
|
+
If NOT_FOUND, install:
|
|
43
|
+
```bash
|
|
44
|
+
npx ai-browser-profile init
|
|
45
|
+
```
|
|
46
|
+
|
|
47
|
+
This:
|
|
48
|
+
- Copies Python source + skills to `~/ai-browser-profile/`
|
|
49
|
+
- Creates a Python venv at `~/ai-browser-profile/.venv/`
|
|
50
|
+
- Installs core deps (`ccl_chromium_reader`, `numpy`)
|
|
51
|
+
- Symlinks skills into `~/.claude/skills/`
|
|
52
|
+
|
|
53
|
+
To update code later without touching data:
|
|
54
|
+
```bash
|
|
55
|
+
npx ai-browser-profile update
|
|
56
|
+
```
|
|
57
|
+
|
|
58
|
+
**Tell the user:** "Installed ai-browser-profile to ~/ai-browser-profile. Python venv created, core deps installed, skills symlinked."
|
|
59
|
+
|
|
60
|
+
### Step 2: Verify the installation
|
|
61
|
+
|
|
62
|
+
```bash
|
|
63
|
+
~/ai-browser-profile/.venv/bin/python -c "
|
|
64
|
+
import sys
|
|
65
|
+
sys.path.insert(0, '$HOME/ai-browser-profile')
|
|
66
|
+
from ai_browser_profile import MemoryDB
|
|
67
|
+
print('MemoryDB imported successfully')
|
|
68
|
+
"
|
|
69
|
+
```
|
|
70
|
+
|
|
71
|
+
Expected: `MemoryDB imported successfully`
|
|
72
|
+
|
|
73
|
+
If it fails, check:
|
|
74
|
+
- Python venv exists: `ls ~/ai-browser-profile/.venv/bin/python`
|
|
75
|
+
- Deps installed: `~/ai-browser-profile/.venv/bin/pip list | grep ccl`
|
|
76
|
+
|
|
77
|
+
**Tell the user:** "Python environment verified - MemoryDB loads correctly."
|
|
78
|
+
|
|
79
|
+
### Step 3: Run extraction
|
|
80
|
+
|
|
81
|
+
**IMPORTANT:** Run extraction in the background so you can report progress to the user. The extraction has 8 stages and logs timing for each.
|
|
82
|
+
|
|
83
|
+
```bash
|
|
84
|
+
cd ~/ai-browser-profile && source .venv/bin/activate && python extract.py 2>&1
|
|
85
|
+
```
|
|
86
|
+
|
|
87
|
+
This scans all detected browsers (Arc, Chrome, Brave, Edge, Safari, Firefox) and extracts:
|
|
88
|
+
- Autofill profiles (names, emails, phones, addresses)
|
|
89
|
+
- Login data (accounts per domain)
|
|
90
|
+
- Browser history (tools/services used)
|
|
91
|
+
- Bookmarks (interests, tool usage)
|
|
92
|
+
- IndexedDB (WhatsApp contacts)
|
|
93
|
+
- Local Storage (LinkedIn connections)
|
|
94
|
+
- Notion (workspace contacts, if configured)
|
|
95
|
+
- Embeddings (semantic vectors, backfilled at end)
|
|
96
|
+
|
|
97
|
+
**INTERIM PROFILE:** The extraction pipeline prints an interim profile after the fast steps (autofill, history, bookmarks, logins, Notion — ~1s total) but before the slow steps (WhatsApp ~10s, embeddings ~3min). **As soon as you see the "Interim profile ready" log line, show the profile to the user immediately.** Don't wait for WhatsApp or embeddings to finish — the profile already has all identity, email, address, payment, account, and tool data. WhatsApp only adds a contact count.
|
|
98
|
+
|
|
99
|
+
Look for this in the logs:
|
|
100
|
+
```
|
|
101
|
+
Interim profile ready (WhatsApp + embeddings still running):
|
|
102
|
+
## User Profile
|
|
103
|
+
**Name:** ...
|
|
104
|
+
```
|
|
105
|
+
|
|
106
|
+
**Show this to the user right away**, then let the extraction continue in the background. Tell them: "Here's your profile from browser data. WhatsApp contacts and semantic embeddings are still processing..."
|
|
107
|
+
|
|
108
|
+
**After extraction + cleanup finish, report a final summary to the user:**
|
|
109
|
+
|
|
110
|
+
```
|
|
111
|
+
Extraction complete:
|
|
112
|
+
Browsers scanned: 8 profiles (Arc, Chrome, Safari, Firefox)
|
|
113
|
+
Raw memories: 5,878
|
|
114
|
+
After cleanup: 5,431
|
|
115
|
+
Time: 54s
|
|
116
|
+
|
|
117
|
+
Breakdown:
|
|
118
|
+
Autofill: 0.1s (forms, addresses, cards)
|
|
119
|
+
History: 1.8s (tools & services)
|
|
120
|
+
Bookmarks: 0.4s (interests & links)
|
|
121
|
+
Logins: 2.1s (saved accounts)
|
|
122
|
+
LinkedIn: 8.7s (connections)
|
|
123
|
+
Notion: 0.1s (contacts & pages)
|
|
124
|
+
WhatsApp: 15.3s (contacts)
|
|
125
|
+
Embeddings: 22.4s (semantic vectors)
|
|
126
|
+
```
|
|
127
|
+
|
|
128
|
+
### Step 4: Verify extraction
|
|
129
|
+
|
|
130
|
+
```bash
|
|
131
|
+
~/ai-browser-profile/.venv/bin/python -c "
|
|
132
|
+
import sys, os
|
|
133
|
+
sys.path.insert(0, os.path.expanduser('~/ai-browser-profile'))
|
|
134
|
+
from ai_browser_profile import MemoryDB
|
|
135
|
+
mem = MemoryDB(os.path.expanduser('~/ai-browser-profile/memories.db'))
|
|
136
|
+
stats = mem.stats()
|
|
137
|
+
print(f'Total memories: {stats[\"total_memories\"]}')
|
|
138
|
+
print()
|
|
139
|
+
print(mem.profile_text())
|
|
140
|
+
mem.close()
|
|
141
|
+
"
|
|
142
|
+
```
|
|
143
|
+
|
|
144
|
+
**Show the profile to the user.** Check that name, email, phone, address look reasonable. If the primary email is wrong (a contact's email ranked higher), note that the review pipeline will fix this.
|
|
145
|
+
|
|
146
|
+
### Step 5: Set up automation (optional)
|
|
147
|
+
|
|
148
|
+
Ask: "Do you want weekly automatic extraction + review? (y/n)"
|
|
149
|
+
|
|
150
|
+
If yes (macOS):
|
|
151
|
+
```bash
|
|
152
|
+
ln -sf ~/ai-browser-profile/launchd/com.m13v.memory-review.plist ~/Library/LaunchAgents/
|
|
153
|
+
launchctl load ~/Library/LaunchAgents/com.m13v.memory-review.plist
|
|
154
|
+
```
|
|
155
|
+
|
|
156
|
+
Schedule: extracts new browser data weekly, then runs Claude to review new entries.
|
|
157
|
+
|
|
158
|
+
### Step 6: Summary
|
|
159
|
+
|
|
160
|
+
Print a final status card:
|
|
161
|
+
|
|
162
|
+
```
|
|
163
|
+
Setup Complete
|
|
164
|
+
|
|
165
|
+
Location: ~/ai-browser-profile
|
|
166
|
+
Database: ~/ai-browser-profile/memories.db
|
|
167
|
+
Python: ~/ai-browser-profile/.venv/bin/python
|
|
168
|
+
Skills: ~/.claude/skills/ai-browser-profile (+ 4 more)
|
|
169
|
+
|
|
170
|
+
Memories: 5,431
|
|
171
|
+
Embeddings: 5,431 vectors (semantic search enabled)
|
|
172
|
+
Automation: launchd weekly / not set up
|
|
173
|
+
|
|
174
|
+
Try it: Tell Claude "what's my email address"
|
|
175
|
+
Update: npx ai-browser-profile update
|
|
176
|
+
Review: /memory-review (Claude-powered cleanup)
|
|
177
|
+
```
|
package/skill/SKILL.md
ADDED
|
@@ -0,0 +1,180 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: ai-browser-profile
|
|
3
|
+
description: "Query the user's AI browser profile: identity, accounts, tools, contacts, addresses, payments extracted from browser data. Use when you need context about the user to help with any task: form filling, emailing, booking, payments, or any task where knowing the user's info helps."
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# AI Browser Profile
|
|
7
|
+
|
|
8
|
+
A self-ranking database of everything learned about the user from browser data. Memories are ranked by how often they're accessed vs how often they appear in search results — frequently useful memories rise, noise sinks.
|
|
9
|
+
|
|
10
|
+
## Quick Reference
|
|
11
|
+
|
|
12
|
+
| Item | Value |
|
|
13
|
+
|------|-------|
|
|
14
|
+
| Database | `~/ai-browser-profile/memories.db` |
|
|
15
|
+
| Module | `~/ai-browser-profile/ai_browser_profile/` |
|
|
16
|
+
| Python | `~/ai-browser-profile/.venv/bin/python` |
|
|
17
|
+
| Rebuild | `~/ai-browser-profile/.venv/bin/python ~/ai-browser-profile/extract.py` |
|
|
18
|
+
|
|
19
|
+
## How to Use
|
|
20
|
+
|
|
21
|
+
### User profile (start here)
|
|
22
|
+
|
|
23
|
+
Get a compact overview of the user — name, emails, addresses, accounts, tools, contacts. This is deterministic (no LLM) and computed from the database. Use it as baseline context before doing any task.
|
|
24
|
+
|
|
25
|
+
```python
|
|
26
|
+
import sys, os
|
|
27
|
+
sys.path.insert(0, os.path.expanduser("~/ai-browser-profile"))
|
|
28
|
+
from ai_browser_profile import MemoryDB
|
|
29
|
+
|
|
30
|
+
mem = MemoryDB(os.path.expanduser("~/ai-browser-profile/memories.db"))
|
|
31
|
+
print(mem.profile_text()) # markdown formatted, ~1.5KB
|
|
32
|
+
mem.close()
|
|
33
|
+
```
|
|
34
|
+
|
|
35
|
+
The profile shows: name, all known emails, phone numbers, handles, addresses, payment info, companies, top tools/services, accounts grouped by email, Notion projects, and contact count. Values are ranked by frequency across browser profiles — higher frequency = more likely to be the user's own data.
|
|
36
|
+
|
|
37
|
+
### Search by tags
|
|
38
|
+
|
|
39
|
+
```python
|
|
40
|
+
import sys, os
|
|
41
|
+
sys.path.insert(0, os.path.expanduser("~/ai-browser-profile"))
|
|
42
|
+
from ai_browser_profile import MemoryDB
|
|
43
|
+
|
|
44
|
+
mem = MemoryDB(os.path.expanduser("~/ai-browser-profile/memories.db"))
|
|
45
|
+
|
|
46
|
+
# Search returns results ranked by hit_rate (accessed/appeared), then counts
|
|
47
|
+
# accessed_count and appeared_count are auto-incremented on every search call
|
|
48
|
+
results = mem.search(["identity", "contact_info"], limit=10)
|
|
49
|
+
for r in results:
|
|
50
|
+
print(f'{r["key"]}: {r["value"]}')
|
|
51
|
+
|
|
52
|
+
mem.close()
|
|
53
|
+
```
|
|
54
|
+
|
|
55
|
+
### Semantic search (natural language)
|
|
56
|
+
|
|
57
|
+
```python
|
|
58
|
+
# Find memories by meaning, not just keywords
|
|
59
|
+
results = mem.semantic_search("what products does the user build")
|
|
60
|
+
for r in results[:5]:
|
|
61
|
+
print(f'{r["key"]}: {r["value"][:80]} (sim={r["similarity"]:.3f})')
|
|
62
|
+
|
|
63
|
+
# Falls back to text_search() if embeddings not installed
|
|
64
|
+
# Install with: npx ai-browser-profile install-embeddings
|
|
65
|
+
```
|
|
66
|
+
|
|
67
|
+
### Quick SQL queries
|
|
68
|
+
|
|
69
|
+
```bash
|
|
70
|
+
sqlite3 ~/ai-browser-profile/memories.db
|
|
71
|
+
```
|
|
72
|
+
|
|
73
|
+
```sql
|
|
74
|
+
-- All identity info
|
|
75
|
+
SELECT m.key, m.value FROM memories m
|
|
76
|
+
JOIN memory_tags t ON m.id = t.memory_id WHERE t.tag = 'identity'
|
|
77
|
+
AND m.superseded_by IS NULL;
|
|
78
|
+
|
|
79
|
+
-- All contact info (emails, phones)
|
|
80
|
+
SELECT m.key, m.value, m.source FROM memories m
|
|
81
|
+
JOIN memory_tags t ON m.id = t.memory_id WHERE t.tag = 'contact_info'
|
|
82
|
+
AND m.superseded_by IS NULL;
|
|
83
|
+
|
|
84
|
+
-- All contacts
|
|
85
|
+
SELECT m.key, m.value FROM memories m
|
|
86
|
+
JOIN memory_tags t ON m.id = t.memory_id WHERE t.tag = 'contact'
|
|
87
|
+
AND m.superseded_by IS NULL
|
|
88
|
+
ORDER BY m.accessed_count DESC;
|
|
89
|
+
|
|
90
|
+
-- Most accessed memories (the ones that proved useful)
|
|
91
|
+
SELECT key, value, accessed_count, appeared_count,
|
|
92
|
+
CAST(accessed_count AS REAL) / MAX(appeared_count, 1) AS hit_rate
|
|
93
|
+
FROM memories WHERE accessed_count > 0
|
|
94
|
+
ORDER BY hit_rate DESC;
|
|
95
|
+
|
|
96
|
+
-- Search by key pattern
|
|
97
|
+
SELECT key, value FROM memories WHERE key LIKE 'account:%'
|
|
98
|
+
AND superseded_by IS NULL;
|
|
99
|
+
```
|
|
100
|
+
|
|
101
|
+
## Canonical Tags
|
|
102
|
+
|
|
103
|
+
| Tag | What it covers | Example keys |
|
|
104
|
+
|-----|---------------|-------------|
|
|
105
|
+
| `identity` | Name, DOB, gender, job title, language | `first_name`, `last_name`, `full_name`, `date_of_birth` |
|
|
106
|
+
| `contact_info` | Email addresses, phone numbers | `email`, `phone` |
|
|
107
|
+
| `address` | Physical addresses | `street_address`, `city`, `state`, `zip`, `country` |
|
|
108
|
+
| `payment` | Card holder names, expiry | `card_holder_name`, `card_expiry`, `card_nickname` |
|
|
109
|
+
| `account` | Service accounts, login credentials | `account:{domain}` |
|
|
110
|
+
| `tool` | Tools/services used (from history) | `tool:GitHub`, `tool:Slack`, `tool:Stripe` |
|
|
111
|
+
| `contact` | People the user knows | `contact:{Name}`, `linkedin:{Name}` |
|
|
112
|
+
| `work` | Work-related (company, LinkedIn) | `company`, `linkedin:*` |
|
|
113
|
+
| `knowledge` | Interests, skills, projects, products | `product:*`, `project:*`, `interest:*` |
|
|
114
|
+
| `communication` | Messaging platforms | `tool:Slack`, `tool:WhatsApp` |
|
|
115
|
+
| `social` | Social platforms | `tool:LinkedIn`, `tool:X/Twitter` |
|
|
116
|
+
| `finance` | Financial tools | `tool:Stripe`, `tool:QuickBooks` |
|
|
117
|
+
|
|
118
|
+
## Ranking System
|
|
119
|
+
|
|
120
|
+
Every `search()`, `semantic_search()`, and `text_search()` call automatically increments both `appeared_count` and `accessed_count` for all returned results. No manual `mark_accessed()` calls needed.
|
|
121
|
+
|
|
122
|
+
**hit_rate** = `accessed_count / appeared_count`
|
|
123
|
+
|
|
124
|
+
Memories that are frequently returned by searches rise in ranking. The system is fully automatic — no manual curation or agent instrumentation needed.
|
|
125
|
+
|
|
126
|
+
## Semantic Dedup
|
|
127
|
+
|
|
128
|
+
On `upsert()`, near-duplicate memories (cosine similarity >= 0.92 with same key prefix) are automatically superseded. This prevents storing "Screen recording tool for compliance" and "Screen recording tool launched on Product Hunt for compliance use cases" as separate entries.
|
|
129
|
+
|
|
130
|
+
## Task-Specific Tag Queries
|
|
131
|
+
|
|
132
|
+
| Task | Tags to search |
|
|
133
|
+
|------|---------------|
|
|
134
|
+
| Fill out a form | `["identity", "contact_info", "address"]` |
|
|
135
|
+
| Send an email | `["contact_info", "communication"]` + search contact by name |
|
|
136
|
+
| Book a flight/hotel | `["identity", "address", "payment"]` |
|
|
137
|
+
| Log into a service | `["account"]` |
|
|
138
|
+
| Invoice a client | `["identity", "work", "address", "payment"]` |
|
|
139
|
+
| Find a contact | `["contact"]` + filter by key pattern |
|
|
140
|
+
| Dev/deploy task | `["account", "tool"]` |
|
|
141
|
+
| Social media post | `["account", "social"]` |
|
|
142
|
+
| Research question | `mem.semantic_search("your question here")` |
|
|
143
|
+
|
|
144
|
+
## Rebuilding Memories
|
|
145
|
+
|
|
146
|
+
To refresh from latest browser data:
|
|
147
|
+
|
|
148
|
+
```bash
|
|
149
|
+
cd ~/ai-browser-profile
|
|
150
|
+
source .venv/bin/activate
|
|
151
|
+
python extract.py # full scan
|
|
152
|
+
python extract.py --browsers arc chrome # specific browsers
|
|
153
|
+
python extract.py --no-indexeddb --no-localstorage # fast, skip LevelDB
|
|
154
|
+
```
|
|
155
|
+
|
|
156
|
+
### Backfill embeddings (after install-embeddings)
|
|
157
|
+
|
|
158
|
+
```python
|
|
159
|
+
import sys, os
|
|
160
|
+
sys.path.insert(0, os.path.expanduser("~/ai-browser-profile"))
|
|
161
|
+
from ai_browser_profile import MemoryDB
|
|
162
|
+
mem = MemoryDB(os.path.expanduser("~/ai-browser-profile/memories.db"))
|
|
163
|
+
n = mem.backfill_embeddings()
|
|
164
|
+
print(f"Embedded {n} memories")
|
|
165
|
+
mem.close()
|
|
166
|
+
```
|
|
167
|
+
|
|
168
|
+
This reads browser files directly (History, Login Data, Web Data, IndexedDB, Local Storage). The memory database preserves `appeared_count` and `accessed_count` across rebuilds via UPSERT logic — rankings are never lost.
|
|
169
|
+
|
|
170
|
+
## Dependencies
|
|
171
|
+
|
|
172
|
+
**Core** (installed by `npx ai-browser-profile init`):
|
|
173
|
+
- `ccl_chromium_reader` — IndexedDB + Local Storage LevelDB files
|
|
174
|
+
- `numpy` — vector math for cosine similarity
|
|
175
|
+
|
|
176
|
+
**Embeddings** (optional, installed by `npx ai-browser-profile install-embeddings`):
|
|
177
|
+
- `onnxruntime` — ONNX model inference
|
|
178
|
+
- `huggingface_hub` — model downloading
|
|
179
|
+
- `tokenizers` — text tokenization
|
|
180
|
+
- Model: nomic-embed-text-v1.5 (~131MB, downloads on first use)
|
|
@@ -0,0 +1,321 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: whatsapp-analysis
|
|
3
|
+
description: "Analyze WhatsApp Web data — contacts, groups, social graph, AND decrypt actual message content via live browser interception. Use when: 'WhatsApp contacts', 'WhatsApp groups', 'WhatsApp analysis', 'who do I talk to', 'WhatsApp messages', 'decrypt WhatsApp', 'read WhatsApp messages', 'WhatsApp network', 'social graph', 'inner circle'."
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# WhatsApp Analysis
|
|
7
|
+
|
|
8
|
+
Two capabilities:
|
|
9
|
+
1. **Live message decryption** — intercept `crypto.subtle.decrypt` via Playwright to read actual message text (requires open browser session)
|
|
10
|
+
2. **Metadata analysis** — contacts, groups, social graph from IndexedDB (offline, fast)
|
|
11
|
+
|
|
12
|
+
## Part 1: Live Message Decryption (Playwright)
|
|
13
|
+
|
|
14
|
+
### How It Works
|
|
15
|
+
|
|
16
|
+
WhatsApp Web encrypts messages in IndexedDB (`msgRowOpaqueData`) using **AES-CBC-128** with keys derived via **HKDF-SHA256**. The crypto happens on the **main thread** (not Web Workers). By intercepting `crypto.subtle` before page JS loads, we capture plaintext output and extractable keys.
|
|
17
|
+
|
|
18
|
+
### Prerequisites
|
|
19
|
+
|
|
20
|
+
- MCP Playwright browser available
|
|
21
|
+
- WhatsApp Web logged in (or QR code scan needed)
|
|
22
|
+
|
|
23
|
+
### Step 1: Navigate to WhatsApp Web
|
|
24
|
+
|
|
25
|
+
```
|
|
26
|
+
Use browser_navigate to go to https://web.whatsapp.com
|
|
27
|
+
Wait for the page to load. If QR code appears, user must scan it.
|
|
28
|
+
Use browser_snapshot to verify the chat list is visible.
|
|
29
|
+
```
|
|
30
|
+
|
|
31
|
+
### Step 2: Install the crypto interceptor
|
|
32
|
+
|
|
33
|
+
Use `browser_run_code` with this **exact** code to install the interceptor via `addInitScript` (runs before any page JS on reload):
|
|
34
|
+
|
|
35
|
+
```javascript
|
|
36
|
+
async (page) => {
|
|
37
|
+
await page.addInitScript(() => {
|
|
38
|
+
const scope = (typeof globalThis !== 'undefined') ? globalThis : self;
|
|
39
|
+
scope.__waCaptured = { decrypt: [], deriveKey: [], importKey: [] };
|
|
40
|
+
|
|
41
|
+
const origImportKey = crypto.subtle.importKey.bind(crypto.subtle);
|
|
42
|
+
crypto.subtle.importKey = function(format, keyData, algorithm, extractable, keyUsages) {
|
|
43
|
+
return origImportKey(format, keyData, algorithm, true, keyUsages);
|
|
44
|
+
};
|
|
45
|
+
|
|
46
|
+
const origGenerateKey = crypto.subtle.generateKey.bind(crypto.subtle);
|
|
47
|
+
crypto.subtle.generateKey = function(algorithm, extractable, keyUsages) {
|
|
48
|
+
return origGenerateKey(algorithm, true, keyUsages);
|
|
49
|
+
};
|
|
50
|
+
|
|
51
|
+
const origDeriveKey = crypto.subtle.deriveKey.bind(crypto.subtle);
|
|
52
|
+
crypto.subtle.deriveKey = async function(algorithm, baseKey, derivedKeyType, extractable, keyUsages) {
|
|
53
|
+
const result = await origDeriveKey(algorithm, baseKey, derivedKeyType, true, keyUsages);
|
|
54
|
+
try {
|
|
55
|
+
const raw = await crypto.subtle.exportKey('raw', result);
|
|
56
|
+
const b64 = btoa(String.fromCharCode(...new Uint8Array(raw)));
|
|
57
|
+
scope.__waCaptured.deriveKey.push({
|
|
58
|
+
ts: Date.now(),
|
|
59
|
+
alg: algorithm.name,
|
|
60
|
+
saltLen: algorithm.salt ? algorithm.salt.byteLength : 0,
|
|
61
|
+
derivedAlg: derivedKeyType.name,
|
|
62
|
+
derivedLen: derivedKeyType.length,
|
|
63
|
+
keyB64: b64,
|
|
64
|
+
usages: keyUsages
|
|
65
|
+
});
|
|
66
|
+
} catch(e) {}
|
|
67
|
+
return result;
|
|
68
|
+
};
|
|
69
|
+
|
|
70
|
+
const origDecrypt = crypto.subtle.decrypt.bind(crypto.subtle);
|
|
71
|
+
crypto.subtle.decrypt = async function(algorithm, key, data) {
|
|
72
|
+
const result = await origDecrypt(algorithm, key, data);
|
|
73
|
+
let keyB64 = null;
|
|
74
|
+
try {
|
|
75
|
+
const raw = await crypto.subtle.exportKey('raw', key);
|
|
76
|
+
keyB64 = btoa(String.fromCharCode(...new Uint8Array(raw)));
|
|
77
|
+
} catch(e) { keyB64 = 'export-failed'; }
|
|
78
|
+
|
|
79
|
+
const entry = {
|
|
80
|
+
ts: Date.now(),
|
|
81
|
+
alg: algorithm.name,
|
|
82
|
+
key: keyB64,
|
|
83
|
+
inSize: data.byteLength,
|
|
84
|
+
outSize: result.byteLength
|
|
85
|
+
};
|
|
86
|
+
|
|
87
|
+
if (result.byteLength > 0 && result.byteLength < 500000) {
|
|
88
|
+
const slice = new Uint8Array(result.slice(0, Math.min(1500, result.byteLength)));
|
|
89
|
+
entry.outB64 = btoa(String.fromCharCode(...slice));
|
|
90
|
+
}
|
|
91
|
+
scope.__waCaptured.decrypt.push(entry);
|
|
92
|
+
return result;
|
|
93
|
+
};
|
|
94
|
+
});
|
|
95
|
+
|
|
96
|
+
await page.reload({ waitUntil: 'networkidle' });
|
|
97
|
+
await page.waitForTimeout(8000);
|
|
98
|
+
return 'Interceptor installed and page reloaded. Decryptions are being captured.';
|
|
99
|
+
}
|
|
100
|
+
```
|
|
101
|
+
|
|
102
|
+
### Step 3: Collect captured decryptions
|
|
103
|
+
|
|
104
|
+
Wait 10-15 seconds after reload, then extract with `browser_run_code`:
|
|
105
|
+
|
|
106
|
+
```javascript
|
|
107
|
+
async (page) => {
|
|
108
|
+
const data = await page.evaluate(() => {
|
|
109
|
+
const c = (globalThis || self).__waCaptured;
|
|
110
|
+
if (!c) return JSON.stringify({error: 'no captures'});
|
|
111
|
+
return JSON.stringify({
|
|
112
|
+
decryptCount: c.decrypt.length,
|
|
113
|
+
deriveKeyCount: c.deriveKey.length,
|
|
114
|
+
derivedKeys: c.deriveKey,
|
|
115
|
+
sample: c.decrypt.slice(0, 5)
|
|
116
|
+
});
|
|
117
|
+
});
|
|
118
|
+
return data;
|
|
119
|
+
}
|
|
120
|
+
```
|
|
121
|
+
|
|
122
|
+
### Step 4: Extract and parse message text
|
|
123
|
+
|
|
124
|
+
Decrypted output is **protobuf**. Extract text with `browser_run_code`:
|
|
125
|
+
|
|
126
|
+
```javascript
|
|
127
|
+
async (page) => {
|
|
128
|
+
const result = await page.evaluate(() => {
|
|
129
|
+
const c = (globalThis || self).__waCaptured;
|
|
130
|
+
if (!c) return JSON.stringify({error: 'no captures'});
|
|
131
|
+
|
|
132
|
+
function readVarint(bytes, pos) {
|
|
133
|
+
let result = 0, shift = 0;
|
|
134
|
+
while (pos < bytes.length) {
|
|
135
|
+
const byte = bytes[pos];
|
|
136
|
+
result |= (byte & 0x7f) << shift;
|
|
137
|
+
shift += 7; pos++;
|
|
138
|
+
if (!(byte & 0x80)) return [result, pos];
|
|
139
|
+
}
|
|
140
|
+
return [null, pos];
|
|
141
|
+
}
|
|
142
|
+
|
|
143
|
+
function parseProtobufText(bytes) {
|
|
144
|
+
if (!bytes || bytes[0] !== 0x0a) return null;
|
|
145
|
+
let pos = 1;
|
|
146
|
+
let [outerLen, p1] = readVarint(bytes, pos);
|
|
147
|
+
if (outerLen === null || p1 >= bytes.length) return null;
|
|
148
|
+
pos = p1;
|
|
149
|
+
if (bytes[pos] !== 0x0a) return null;
|
|
150
|
+
pos++;
|
|
151
|
+
let [textLen, p2] = readVarint(bytes, pos);
|
|
152
|
+
if (textLen === null || textLen <= 0 || p2 + textLen > bytes.length) return null;
|
|
153
|
+
pos = p2;
|
|
154
|
+
try {
|
|
155
|
+
return new TextDecoder('utf-8').decode(bytes.slice(pos, pos + textLen));
|
|
156
|
+
} catch(e) { return null; }
|
|
157
|
+
}
|
|
158
|
+
|
|
159
|
+
function extractSender(bytes) {
|
|
160
|
+
try {
|
|
161
|
+
const str = Array.from(bytes).map(b => String.fromCharCode(b)).join('');
|
|
162
|
+
const match = str.match(/(\d+@(?:s\.whatsapp\.net|lid|g\.us))/);
|
|
163
|
+
return match ? match[1] : null;
|
|
164
|
+
} catch(e) { return null; }
|
|
165
|
+
}
|
|
166
|
+
|
|
167
|
+
const messages = [];
|
|
168
|
+
for (const d of c.decrypt) {
|
|
169
|
+
if (!d.outB64 || d.alg !== 'AES-CBC') continue;
|
|
170
|
+
try {
|
|
171
|
+
const bytes = Uint8Array.from(atob(d.outB64), c => c.charCodeAt(0));
|
|
172
|
+
const text = parseProtobufText(bytes);
|
|
173
|
+
if (text && text.length > 0) {
|
|
174
|
+
messages.push({
|
|
175
|
+
text: text,
|
|
176
|
+
sender: extractSender(bytes),
|
|
177
|
+
ts: d.ts
|
|
178
|
+
});
|
|
179
|
+
}
|
|
180
|
+
} catch(e) {}
|
|
181
|
+
}
|
|
182
|
+
|
|
183
|
+
return JSON.stringify({
|
|
184
|
+
totalDecrypts: c.decrypt.length,
|
|
185
|
+
textMessages: messages.length,
|
|
186
|
+
messages: messages
|
|
187
|
+
});
|
|
188
|
+
});
|
|
189
|
+
return result;
|
|
190
|
+
}
|
|
191
|
+
```
|
|
192
|
+
|
|
193
|
+
### Step 5: Navigate to more chats for additional messages
|
|
194
|
+
|
|
195
|
+
WhatsApp only decrypts messages for loaded chats. To capture more:
|
|
196
|
+
|
|
197
|
+
```
|
|
198
|
+
Use browser_snapshot to see the chat list.
|
|
199
|
+
Click on different chats using browser_click with the ref for each chat.
|
|
200
|
+
Wait 3-5 seconds between clicks for decryption to complete.
|
|
201
|
+
Re-run Step 4 to collect newly decrypted messages.
|
|
202
|
+
```
|
|
203
|
+
|
|
204
|
+
### Step 6: Store messages in memories.db
|
|
205
|
+
|
|
206
|
+
```python
|
|
207
|
+
import sys, os
|
|
208
|
+
sys.path.insert(0, os.path.expanduser("~/ai-browser-profile"))
|
|
209
|
+
from ai_browser_profile import MemoryDB
|
|
210
|
+
from ai_browser_profile.ingestors.messages import ingest_messages, message_stats
|
|
211
|
+
|
|
212
|
+
mem = MemoryDB(os.path.expanduser("~/ai-browser-profile/memories.db"))
|
|
213
|
+
|
|
214
|
+
# messages = the parsed messages list from Step 4
|
|
215
|
+
inserted = ingest_messages(mem, messages)
|
|
216
|
+
print(f"Inserted {inserted} new messages")
|
|
217
|
+
|
|
218
|
+
stats = message_stats(mem)
|
|
219
|
+
print(f"Total stored: {stats['total_messages']}")
|
|
220
|
+
|
|
221
|
+
mem.close()
|
|
222
|
+
```
|
|
223
|
+
|
|
224
|
+
### Step 7: Analyze messages and write relationship memories
|
|
225
|
+
|
|
226
|
+
```python
|
|
227
|
+
from ai_browser_profile.ingestors.messages import get_messages
|
|
228
|
+
|
|
229
|
+
mem = MemoryDB(os.path.expanduser("~/ai-browser-profile/memories.db"))
|
|
230
|
+
|
|
231
|
+
all_msgs = get_messages(mem, limit=1000)
|
|
232
|
+
|
|
233
|
+
# After analyzing, write relationship/interest memories:
|
|
234
|
+
mem.upsert("relationship:ContactName", "description of relationship",
|
|
235
|
+
["contact", "relationship"], 0.8, "whatsapp:messages")
|
|
236
|
+
|
|
237
|
+
mem.conn.commit()
|
|
238
|
+
mem.close()
|
|
239
|
+
```
|
|
240
|
+
|
|
241
|
+
---
|
|
242
|
+
|
|
243
|
+
## Part 2: Metadata Analysis (Offline from IndexedDB)
|
|
244
|
+
|
|
245
|
+
### Prerequisites
|
|
246
|
+
|
|
247
|
+
Run the memory extraction first to populate contacts in `memories.db`:
|
|
248
|
+
|
|
249
|
+
```bash
|
|
250
|
+
cd ~/ai-browser-profile
|
|
251
|
+
source .venv/bin/activate
|
|
252
|
+
python extract.py
|
|
253
|
+
```
|
|
254
|
+
|
|
255
|
+
For deeper metadata analysis, read IndexedDB directly:
|
|
256
|
+
|
|
257
|
+
```python
|
|
258
|
+
import shutil, tempfile, json
|
|
259
|
+
from pathlib import Path
|
|
260
|
+
from ccl_chromium_reader import ccl_chromium_indexeddb
|
|
261
|
+
|
|
262
|
+
APP_SUPPORT = Path.home() / "Library" / "Application Support"
|
|
263
|
+
arc_idb = APP_SUPPORT / "Arc" / "User Data" / "Default" / "IndexedDB"
|
|
264
|
+
|
|
265
|
+
for db_dir in arc_idb.glob("*whatsapp*_0.indexeddb.leveldb"):
|
|
266
|
+
tmp = Path(tempfile.mkdtemp())
|
|
267
|
+
shutil.copytree(db_dir, tmp / db_dir.name)
|
|
268
|
+
blob_dir = db_dir.parent / db_dir.name.replace(".leveldb", ".blob")
|
|
269
|
+
tmp_blob = None
|
|
270
|
+
if blob_dir.exists():
|
|
271
|
+
tmp_blob = Path(tempfile.mkdtemp())
|
|
272
|
+
shutil.copytree(blob_dir, tmp_blob / blob_dir.name)
|
|
273
|
+
|
|
274
|
+
wrapper = ccl_chromium_indexeddb.WrappedIndexDB(
|
|
275
|
+
str(tmp / db_dir.name),
|
|
276
|
+
str(tmp_blob / blob_dir.name) if tmp_blob else None,
|
|
277
|
+
)
|
|
278
|
+
# Now iterate stores...
|
|
279
|
+
```
|
|
280
|
+
|
|
281
|
+
### Data Available
|
|
282
|
+
|
|
283
|
+
WhatsApp Web stores 51 IndexedDB object stores. Message bodies are encrypted (Signal protocol), but all metadata is plaintext:
|
|
284
|
+
|
|
285
|
+
| Store | Records | What's in it |
|
|
286
|
+
|-------|---------|-------------|
|
|
287
|
+
| contact | 1000 | Phone numbers, names, isAddressBook, isBusiness |
|
|
288
|
+
| chat | 1000 | Chat IDs, last message timestamps, unread counts |
|
|
289
|
+
| group-metadata | 400+ | Group subjects, creation dates, owner phone |
|
|
290
|
+
| participant | 400+ | Group member phone lists |
|
|
291
|
+
| message | 1000 | Message type, from/to/author, timestamps (NOT body) |
|
|
292
|
+
| reactions | 900+ | Emoji reactions with sender, timestamp |
|
|
293
|
+
|
|
294
|
+
### Inner Circle Analysis
|
|
295
|
+
|
|
296
|
+
Count shared group membership to find closest contacts:
|
|
297
|
+
|
|
298
|
+
```python
|
|
299
|
+
your_groups = set()
|
|
300
|
+
person_groups = {}
|
|
301
|
+
|
|
302
|
+
for participant_record in participants:
|
|
303
|
+
gid = record['groupId']
|
|
304
|
+
for p in record['participants']:
|
|
305
|
+
phone = p.split('@')[0]
|
|
306
|
+
resolved = lid_to_phone.get(phone, phone)
|
|
307
|
+
if resolved == YOUR_NUMBER:
|
|
308
|
+
your_groups.add(gid)
|
|
309
|
+
person_groups.setdefault(resolved, set()).add(gid)
|
|
310
|
+
|
|
311
|
+
shared = {p: len(gs & your_groups) for p, gs in person_groups.items() if p != YOUR_NUMBER}
|
|
312
|
+
top_connections = sorted(shared.items(), key=lambda x: -x[1])
|
|
313
|
+
```
|
|
314
|
+
|
|
315
|
+
## Known Limitations
|
|
316
|
+
|
|
317
|
+
- **Session-specific keys**: HKDF-derived keys change per browser session
|
|
318
|
+
- **Offline decryption**: Does not work across sessions
|
|
319
|
+
- **Record cap**: 1000 per IndexedDB store
|
|
320
|
+
- **Contact names**: Only available for address book contacts
|
|
321
|
+
- **Timestamps**: Unix epoch seconds — convert with `datetime.fromtimestamp(t, tz=timezone.utc)`
|