@nathanvale/chatline 0.0.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +1 -0
- package/LICENSE +21 -0
- package/README.md +1535 -0
- package/dist/bin/index.js +5121 -0
- package/dist/cli/commands/clean.d.ts +17 -0
- package/dist/cli/commands/clean.d.ts.map +1 -0
- package/dist/cli/commands/clean.js +142 -0
- package/dist/cli/commands/clean.js.map +1 -0
- package/dist/cli/commands/doctor.d.ts +17 -0
- package/dist/cli/commands/doctor.d.ts.map +1 -0
- package/dist/cli/commands/doctor.js +202 -0
- package/dist/cli/commands/doctor.js.map +1 -0
- package/dist/cli/commands/enrich-ai.d.ts +17 -0
- package/dist/cli/commands/enrich-ai.d.ts.map +1 -0
- package/dist/cli/commands/enrich-ai.js +371 -0
- package/dist/cli/commands/enrich-ai.js.map +1 -0
- package/dist/cli/commands/index.d.ts +16 -0
- package/dist/cli/commands/index.d.ts.map +1 -0
- package/dist/cli/commands/index.js +16 -0
- package/dist/cli/commands/index.js.map +1 -0
- package/dist/cli/commands/ingest-csv.d.ts +17 -0
- package/dist/cli/commands/ingest-csv.d.ts.map +1 -0
- package/dist/cli/commands/ingest-csv.js +138 -0
- package/dist/cli/commands/ingest-csv.js.map +1 -0
- package/dist/cli/commands/ingest-db.d.ts +17 -0
- package/dist/cli/commands/ingest-db.d.ts.map +1 -0
- package/dist/cli/commands/ingest-db.js +159 -0
- package/dist/cli/commands/ingest-db.js.map +1 -0
- package/dist/cli/commands/init.d.ts +17 -0
- package/dist/cli/commands/init.d.ts.map +1 -0
- package/dist/cli/commands/init.js +110 -0
- package/dist/cli/commands/init.js.map +1 -0
- package/dist/cli/commands/normalize-link.d.ts +16 -0
- package/dist/cli/commands/normalize-link.d.ts.map +1 -0
- package/dist/cli/commands/normalize-link.js +144 -0
- package/dist/cli/commands/normalize-link.js.map +1 -0
- package/dist/cli/commands/render-markdown.d.ts +17 -0
- package/dist/cli/commands/render-markdown.d.ts.map +1 -0
- package/dist/cli/commands/render-markdown.js +218 -0
- package/dist/cli/commands/render-markdown.js.map +1 -0
- package/dist/cli/commands/stats.d.ts +17 -0
- package/dist/cli/commands/stats.d.ts.map +1 -0
- package/dist/cli/commands/stats.js +175 -0
- package/dist/cli/commands/stats.js.map +1 -0
- package/dist/cli/commands/validate.d.ts +17 -0
- package/dist/cli/commands/validate.d.ts.map +1 -0
- package/dist/cli/commands/validate.js +152 -0
- package/dist/cli/commands/validate.js.map +1 -0
- package/dist/cli/index.d.ts +13 -0
- package/dist/cli/index.d.ts.map +1 -0
- package/dist/cli/index.js +121 -0
- package/dist/cli/index.js.map +1 -0
- package/dist/cli/types.d.ts +93 -0
- package/dist/cli/types.d.ts.map +1 -0
- package/dist/cli/types.js +7 -0
- package/dist/cli/types.js.map +1 -0
- package/dist/cli/utils.d.ts +29 -0
- package/dist/cli/utils.d.ts.map +1 -0
- package/dist/cli/utils.js +53 -0
- package/dist/cli/utils.js.map +1 -0
- package/dist/cli.d.ts +9 -0
- package/dist/cli.d.ts.map +1 -0
- package/dist/cli.js +1805 -0
- package/dist/config/generator.d.ts +90 -0
- package/dist/config/generator.d.ts.map +1 -0
- package/dist/config/generator.js +320 -0
- package/dist/config/generator.js.map +1 -0
- package/dist/config/loader.d.ts +107 -0
- package/dist/config/loader.d.ts.map +1 -0
- package/dist/config/loader.js +251 -0
- package/dist/config/loader.js.map +1 -0
- package/dist/config/schema.d.ts +107 -0
- package/dist/config/schema.d.ts.map +1 -0
- package/dist/config/schema.js +169 -0
- package/dist/config/schema.js.map +1 -0
- package/dist/enrich/audio-transcription.d.ts +77 -0
- package/dist/enrich/audio-transcription.d.ts.map +1 -0
- package/dist/enrich/audio-transcription.js +370 -0
- package/dist/enrich/audio-transcription.js.map +1 -0
- package/dist/enrich/checkpoint.d.ts +137 -0
- package/dist/enrich/checkpoint.d.ts.map +1 -0
- package/dist/enrich/checkpoint.js +205 -0
- package/dist/enrich/checkpoint.js.map +1 -0
- package/dist/enrich/idempotency.d.ts +90 -0
- package/dist/enrich/idempotency.d.ts.map +1 -0
- package/dist/enrich/idempotency.js +188 -0
- package/dist/enrich/idempotency.js.map +1 -0
- package/dist/enrich/image-analysis.d.ts +62 -0
- package/dist/enrich/image-analysis.d.ts.map +1 -0
- package/dist/enrich/image-analysis.js +264 -0
- package/dist/enrich/image-analysis.js.map +1 -0
- package/dist/enrich/index.d.ts +60 -0
- package/dist/enrich/index.d.ts.map +1 -0
- package/dist/enrich/index.js +74 -0
- package/dist/enrich/index.js.map +1 -0
- package/dist/enrich/link-enrichment.d.ts +37 -0
- package/dist/enrich/link-enrichment.d.ts.map +1 -0
- package/dist/enrich/link-enrichment.js +202 -0
- package/dist/enrich/link-enrichment.js.map +1 -0
- package/dist/enrich/pdf-video-handling.d.ts +49 -0
- package/dist/enrich/pdf-video-handling.d.ts.map +1 -0
- package/dist/enrich/pdf-video-handling.js +325 -0
- package/dist/enrich/pdf-video-handling.js.map +1 -0
- package/dist/enrich/progress-tracker.d.ts +120 -0
- package/dist/enrich/progress-tracker.d.ts.map +1 -0
- package/dist/enrich/progress-tracker.js +220 -0
- package/dist/enrich/progress-tracker.js.map +1 -0
- package/dist/enrich/providers/firecrawl.d.ts +18 -0
- package/dist/enrich/providers/firecrawl.d.ts.map +1 -0
- package/dist/enrich/providers/firecrawl.js +48 -0
- package/dist/enrich/providers/firecrawl.js.map +1 -0
- package/dist/enrich/providers/generic.d.ts +16 -0
- package/dist/enrich/providers/generic.d.ts.map +1 -0
- package/dist/enrich/providers/generic.js +36 -0
- package/dist/enrich/providers/generic.js.map +1 -0
- package/dist/enrich/providers/index.d.ts +14 -0
- package/dist/enrich/providers/index.d.ts.map +1 -0
- package/dist/enrich/providers/index.js +13 -0
- package/dist/enrich/providers/index.js.map +1 -0
- package/dist/enrich/providers/instagram.d.ts +16 -0
- package/dist/enrich/providers/instagram.d.ts.map +1 -0
- package/dist/enrich/providers/instagram.js +43 -0
- package/dist/enrich/providers/instagram.js.map +1 -0
- package/dist/enrich/providers/spotify.d.ts +16 -0
- package/dist/enrich/providers/spotify.d.ts.map +1 -0
- package/dist/enrich/providers/spotify.js +45 -0
- package/dist/enrich/providers/spotify.js.map +1 -0
- package/dist/enrich/providers/twitter.d.ts +16 -0
- package/dist/enrich/providers/twitter.d.ts.map +1 -0
- package/dist/enrich/providers/twitter.js +43 -0
- package/dist/enrich/providers/twitter.js.map +1 -0
- package/dist/enrich/providers/types.d.ts +47 -0
- package/dist/enrich/providers/types.d.ts.map +1 -0
- package/dist/enrich/providers/types.js +15 -0
- package/dist/enrich/providers/types.js.map +1 -0
- package/dist/enrich/providers/youtube.d.ts +16 -0
- package/dist/enrich/providers/youtube.d.ts.map +1 -0
- package/dist/enrich/providers/youtube.js +43 -0
- package/dist/enrich/providers/youtube.js.map +1 -0
- package/dist/enrich/rate-limiting.d.ts +118 -0
- package/dist/enrich/rate-limiting.d.ts.map +1 -0
- package/dist/enrich/rate-limiting.js +258 -0
- package/dist/enrich/rate-limiting.js.map +1 -0
- package/dist/index.d.ts +688 -0
- package/dist/index.d.ts.map +1 -0
- package/dist/index.js +1729 -0
- package/dist/index.js.map +1 -0
- package/dist/ingest/dedup-merge.d.ts +82 -0
- package/dist/ingest/dedup-merge.d.ts.map +1 -0
- package/dist/ingest/dedup-merge.js +262 -0
- package/dist/ingest/dedup-merge.js.map +1 -0
- package/dist/ingest/ingest-csv.d.ts +62 -0
- package/dist/ingest/ingest-csv.d.ts.map +1 -0
- package/dist/ingest/ingest-csv.js +300 -0
- package/dist/ingest/ingest-csv.js.map +1 -0
- package/dist/ingest/ingest-db.d.ts +64 -0
- package/dist/ingest/ingest-db.d.ts.map +1 -0
- package/dist/ingest/ingest-db.js +172 -0
- package/dist/ingest/ingest-db.js.map +1 -0
- package/dist/ingest/link-replies-and-tapbacks.d.ts +53 -0
- package/dist/ingest/link-replies-and-tapbacks.d.ts.map +1 -0
- package/dist/ingest/link-replies-and-tapbacks.js +381 -0
- package/dist/ingest/link-replies-and-tapbacks.js.map +1 -0
- package/dist/normalize/date-converters.d.ts +45 -0
- package/dist/normalize/date-converters.d.ts.map +1 -0
- package/dist/normalize/date-converters.js +166 -0
- package/dist/normalize/date-converters.js.map +1 -0
- package/dist/normalize/path-validator.d.ts +65 -0
- package/dist/normalize/path-validator.d.ts.map +1 -0
- package/dist/normalize/path-validator.js +221 -0
- package/dist/normalize/path-validator.js.map +1 -0
- package/dist/normalize/validate-normalized.d.ts +45 -0
- package/dist/normalize/validate-normalized.d.ts.map +1 -0
- package/dist/normalize/validate-normalized.js +144 -0
- package/dist/normalize/validate-normalized.js.map +1 -0
- package/dist/render/embeds-blockquotes.d.ts +84 -0
- package/dist/render/embeds-blockquotes.d.ts.map +1 -0
- package/dist/render/embeds-blockquotes.js +204 -0
- package/dist/render/embeds-blockquotes.js.map +1 -0
- package/dist/render/grouping.d.ts +78 -0
- package/dist/render/grouping.d.ts.map +1 -0
- package/dist/render/grouping.js +134 -0
- package/dist/render/grouping.js.map +1 -0
- package/dist/render/index.d.ts +47 -0
- package/dist/render/index.d.ts.map +1 -0
- package/dist/render/index.js +245 -0
- package/dist/render/index.js.map +1 -0
- package/dist/render/reply-rendering.d.ts +88 -0
- package/dist/render/reply-rendering.d.ts.map +1 -0
- package/dist/render/reply-rendering.js +196 -0
- package/dist/render/reply-rendering.js.map +1 -0
- package/dist/schema/message.d.ts +125 -0
- package/dist/schema/message.d.ts.map +1 -0
- package/dist/schema/message.js +331 -0
- package/dist/schema/message.js.map +1 -0
- package/dist/utils/delta-detection.d.ts +107 -0
- package/dist/utils/delta-detection.d.ts.map +1 -0
- package/dist/utils/delta-detection.js +199 -0
- package/dist/utils/delta-detection.js.map +1 -0
- package/dist/utils/enrichment-merge.d.ts +135 -0
- package/dist/utils/enrichment-merge.d.ts.map +1 -0
- package/dist/utils/enrichment-merge.js +280 -0
- package/dist/utils/enrichment-merge.js.map +1 -0
- package/dist/utils/human.d.ts +15 -0
- package/dist/utils/human.d.ts.map +1 -0
- package/dist/utils/human.js +27 -0
- package/dist/utils/human.js.map +1 -0
- package/dist/utils/incremental-state.d.ts +133 -0
- package/dist/utils/incremental-state.d.ts.map +1 -0
- package/dist/utils/incremental-state.js +237 -0
- package/dist/utils/incremental-state.js.map +1 -0
- package/dist/utils/logger.d.ts +40 -0
- package/dist/utils/logger.d.ts.map +1 -0
- package/dist/utils/logger.js +176 -0
- package/dist/utils/logger.js.map +1 -0
- package/package.json +165 -0
package/README.md
ADDED
|
@@ -0,0 +1,1535 @@
|
|
|
1
|
+
# Chatline
|
|
2
|
+
|
|
3
|
+
> Extract, enrich, and render your iMessage conversations into beautiful,
|
|
4
|
+
> AI-powered markdown timelines with full conversation threading and deep media
|
|
5
|
+
> analysis.
|
|
6
|
+
|
|
7
|
+
[](https://www.typescriptlang.org/)
|
|
8
|
+
[](https://nodejs.org/)
|
|
9
|
+
[](#license)
|
|
10
|
+
[](https://github.com/nathanvale/chatline/actions/workflows/pr-quality.yml)
|
|
11
|
+
[](https://github.com/nathanvale/chatline/actions/workflows/codeql.yml)
|
|
12
|
+
[](#testing)
|
|
13
|
+
[](#testing)
|
|
14
|
+
|
|
15
|
+
## Overview
|
|
16
|
+
|
|
17
|
+
**iMessage Timeline** is a sophisticated data pipeline that transforms your
|
|
18
|
+
iMessage conversations into searchable, enriched markdown timelines. It
|
|
19
|
+
intelligently extracts messages from multiple sources (iMazing CSV exports,
|
|
20
|
+
macOS Messages.app SQLite database), deduplicates and links replies/reactions,
|
|
21
|
+
enriches with AI-powered analysis (image descriptions, audio transcription, link
|
|
22
|
+
summaries), and generates deterministic markdown files organized by date and
|
|
23
|
+
time-of-day.
|
|
24
|
+
|
|
25
|
+
Perfect for creating browsable conversation archives, enriched research notes,
|
|
26
|
+
or personal history exports.
|
|
27
|
+
|
|
28
|
+
## 📚 Documentation
|
|
29
|
+
|
|
30
|
+
**Full documentation available at:**
|
|
31
|
+
[https://nathanvale.github.io/chatline/](https://nathanvale.github.io/chatline/)
|
|
32
|
+
|
|
33
|
+
The documentation site includes:
|
|
34
|
+
|
|
35
|
+
- Getting Started Guide
|
|
36
|
+
- CLI Usage
|
|
37
|
+
- Pipeline Technical Specifications
|
|
38
|
+
- Best Practices
|
|
39
|
+
- Release Management
|
|
40
|
+
- Troubleshooting
|
|
41
|
+
|
|
42
|
+
### Automated Quality & Security
|
|
43
|
+
|
|
44
|
+
This repository is continuously checked by:
|
|
45
|
+
|
|
46
|
+
- **PR quality** workflow: lint, typecheck, full Vitest suite + V8 coverage,
|
|
47
|
+
delta quality checks
|
|
48
|
+
- **CodeQL**: static analysis for code vulnerabilities (JS/TS)
|
|
49
|
+
- **OSV Scanner**: open source vulnerability scanning of dependencies
|
|
50
|
+
- **Workflow Lint**: actionlint validation of GitHub Actions syntax
|
|
51
|
+
- **Dependency Review**: flags risky transitive additions on PRs
|
|
52
|
+
- **Package Hygiene**: build integrity (`publint`, type checks, dry pack)
|
|
53
|
+
- **Renovate**: automated npm/pnpm dependency maintenance (grouped, safe
|
|
54
|
+
automerge for dev minor/patch)
|
|
55
|
+
- **Dependabot**: weekly GitHub Actions version bumps only
|
|
56
|
+
|
|
57
|
+
Badges surface current status; failing checks block merges ensuring a
|
|
58
|
+
high-signal, low-noise pipeline.
|
|
59
|
+
|
|
60
|
+
### Releases and automation
|
|
61
|
+
|
|
62
|
+
- Zero‑touch, canonical Changesets flow: CI opens a "Version Packages" PR,
|
|
63
|
+
auto‑merges it after required checks, publishes to npm (with tags), and
|
|
64
|
+
creates GitHub Releases with SBOM.
|
|
65
|
+
- Pre‑releases: enter/exit beta/rc via a workflow; CI publishes -beta.N/-rc.N to
|
|
66
|
+
the matching npm dist‑tag.
|
|
67
|
+
- Nightly snapshots (alpha): gated to run only while pre‑mode is active to avoid
|
|
68
|
+
confusion with stable releases.
|
|
69
|
+
|
|
70
|
+
#### Publishing and NPM_TOKEN
|
|
71
|
+
|
|
72
|
+
This repo uses Changesets to open a "Version Packages" PR and publish on merge.
|
|
73
|
+
|
|
74
|
+
- If an npm token is configured (repository secret `NPM_TOKEN`):
|
|
75
|
+
- The Changesets workflow authenticates to npm and runs `pnpm release`
|
|
76
|
+
- A GitHub Release is created for the tagged version
|
|
77
|
+
- If no npm token is configured:
|
|
78
|
+
- The workflow does NOT fail main
|
|
79
|
+
- It surfaces a clear warning in the logs and Job Summary:
|
|
80
|
+
- "NPM_TOKEN not set; skipping publish. Configure repository secret
|
|
81
|
+
'NPM_TOKEN' to enable publishing."
|
|
82
|
+
- Versioning PR behavior still works; only publish is skipped
|
|
83
|
+
|
|
84
|
+
How to enable publishing:
|
|
85
|
+
|
|
86
|
+
1. Create an npm access token with publish rights
|
|
87
|
+
2. In GitHub → Settings → Secrets and variables → Actions → New repository
|
|
88
|
+
secret
|
|
89
|
+
3. Name: `NPM_TOKEN`, Value: your token
|
|
90
|
+
4. Re-run the "Changesets Manage & Publish" workflow on `main`
|
|
91
|
+
|
|
92
|
+
Tip: You can trigger the workflow manually from the Actions tab or by:
|
|
93
|
+
|
|
94
|
+
```bash
|
|
95
|
+
gh workflow run "Changesets Manage & Publish" -r main
|
|
96
|
+
```
|
|
97
|
+
|
|
98
|
+
Note: The workflow also posts an annotation and adds a short section to the Job
|
|
99
|
+
Summary whenever publish is skipped so it's visible in the GitHub UI without
|
|
100
|
+
failing the pipeline.
|
|
101
|
+
|
|
102
|
+
Docs:
|
|
103
|
+
|
|
104
|
+
- Canonical flow: `docs/releases/changesets-canonical.md`
|
|
105
|
+
- Branch protection policy: `docs/branch-protection-policy.md`
|
|
106
|
+
- CI standards: `docs/ci-workflow-standards.md`
|
|
107
|
+
|
|
108
|
+
### CI performance (optional)
|
|
109
|
+
|
|
110
|
+
- setup-node already caches pnpm. If installs become a bottleneck, consider
|
|
111
|
+
adding a pnpm store cache step to PR quality (restore/save `.pnpm-store`). We
|
|
112
|
+
can wire this later once baseline run times are known.
|
|
113
|
+
|
|
114
|
+
## CLI
|
|
115
|
+
|
|
116
|
+
The project publishes a CLI executable `chatline` and provides a fast
|
|
117
|
+
Bun-powered development loop.
|
|
118
|
+
|
|
119
|
+
- Dev (TypeScript direct): `pnpm dev -- --help`
|
|
120
|
+
- Built dist run: `pnpm cli -- --help`
|
|
121
|
+
- Installed (after publish): `chatline --help`
|
|
122
|
+
|
|
123
|
+
Docs:
|
|
124
|
+
|
|
125
|
+
- Detailed usage: `docs/cli-usage.md`
|
|
126
|
+
- Bun script rationale: `docs/bun-script-best-practices.md`
|
|
127
|
+
|
|
128
|
+
Pass arguments after `--` when using `pnpm dev` or `pnpm cli`.
|
|
129
|
+
|
|
130
|
+
### Key Features
|
|
131
|
+
|
|
132
|
+
- **Multiple Sources**: Ingest from iMazing CSV exports and macOS Messages.app
|
|
133
|
+
database
|
|
134
|
+
- **Intelligent Linking**: Automatically link replies to parents and associate
|
|
135
|
+
emoji reactions (tapbacks)
|
|
136
|
+
- **Smart Deduplication**: Merge CSV/DB sources with GUID matching and content
|
|
137
|
+
equivalence detection
|
|
138
|
+
- **AI Enrichment**:
|
|
139
|
+
- Image analysis (HEIC/TIFF→JPG previews + Gemini Vision captions)
|
|
140
|
+
- Audio transcription (with speaker labels and timestamps)
|
|
141
|
+
- PDF summarization
|
|
142
|
+
- Link context extraction (Firecrawl + provider fallbacks)
|
|
143
|
+
- **Resumable Processing**: Checkpoint support for crash recovery and
|
|
144
|
+
incremental enrichment for processing only new messages
|
|
145
|
+
- **Deterministic Output**: Identical input always produces identical markdown
|
|
146
|
+
(reproducible pipelines)
|
|
147
|
+
- **Privacy-First**: Local-only mode, no API key persistence, full data control
|
|
148
|
+
- **Conversation Threading**: Nested replies and tapbacks rendered as readable
|
|
149
|
+
blockquotes
|
|
150
|
+
- **Type-Safe**: 100% TypeScript with Zod schema validation
|
|
151
|
+
|
|
152
|
+
## For End Users
|
|
153
|
+
|
|
154
|
+
### Installation & Setup
|
|
155
|
+
|
|
156
|
+
#### Prerequisites
|
|
157
|
+
|
|
158
|
+
- **Node.js** 22.20+
|
|
159
|
+
- **macOS** (for database export; CSV import works on any OS)
|
|
160
|
+
- **Gemini API Key** (for AI enrichment, get free at
|
|
161
|
+
https://aistudio.google.com)
|
|
162
|
+
- **Firecrawl API Key** (optional, for link enrichment, get at
|
|
163
|
+
https://www.firecrawl.dev)
|
|
164
|
+
|
|
165
|
+
#### Install Global CLI
|
|
166
|
+
|
|
167
|
+
```bash
|
|
168
|
+
npm install -g /chatline
|
|
169
|
+
```
|
|
170
|
+
|
|
171
|
+
This installs the `chatline` command globally, available from any
|
|
172
|
+
directory.
|
|
173
|
+
|
|
174
|
+
#### Environment Setup
|
|
175
|
+
|
|
176
|
+
Create a `.env` file in your working directory:
|
|
177
|
+
|
|
178
|
+
```bash
|
|
179
|
+
GEMINI_API_KEY=your-api-key-here
|
|
180
|
+
FIRECRAWL_API_KEY=your-api-key-here
|
|
181
|
+
```
|
|
182
|
+
|
|
183
|
+
Or export them in your shell:
|
|
184
|
+
|
|
185
|
+
```bash
|
|
186
|
+
export GEMINI_API_KEY=your-api-key-here
|
|
187
|
+
export FIRECRAWL_API_KEY=your-api-key-here
|
|
188
|
+
```
|
|
189
|
+
|
|
190
|
+
#### Verify Installation
|
|
191
|
+
|
|
192
|
+
```bash
|
|
193
|
+
chatline doctor
|
|
194
|
+
```
|
|
195
|
+
|
|
196
|
+
Should show all checks passing.
|
|
197
|
+
|
|
198
|
+
### Quick Start (Consumer)
|
|
199
|
+
|
|
200
|
+
```bash
|
|
201
|
+
# Initialize config (creates imessage-config.yaml)
|
|
202
|
+
chatline init
|
|
203
|
+
|
|
204
|
+
# Ingest CSV export from iMazing
|
|
205
|
+
chatline ingest-csv -i messages.csv -o messages.csv.ingested.json
|
|
206
|
+
|
|
207
|
+
# Ingest from macOS Messages.app database
|
|
208
|
+
chatline ingest-db -i db-export.json -o messages.db.ingested.json
|
|
209
|
+
|
|
210
|
+
# Normalize and link messages (merge sources, deduplicate, link replies)
|
|
211
|
+
chatline normalize-link \
|
|
212
|
+
-i messages.csv.ingested.json messages.db.ingested.json \
|
|
213
|
+
-o messages.normalized.json
|
|
214
|
+
|
|
215
|
+
# Enrich with AI (images, audio, links)
|
|
216
|
+
chatline enrich-ai \
|
|
217
|
+
-i messages.normalized.json \
|
|
218
|
+
-o messages.enriched.json \
|
|
219
|
+
--enable-vision --enable-audio --enable-links
|
|
220
|
+
|
|
221
|
+
# Render to markdown
|
|
222
|
+
chatline render-markdown \
|
|
223
|
+
-i messages.enriched.json \
|
|
224
|
+
-o ./timeline
|
|
225
|
+
```
|
|
226
|
+
|
|
227
|
+
Output: A `timeline/` directory with daily markdown files, one per date.
|
|
228
|
+
|
|
229
|
+
## Library Usage (Programmatic API)
|
|
230
|
+
|
|
231
|
+
### Installation as Dependency
|
|
232
|
+
|
|
233
|
+
Install as a library in your Node.js/TypeScript project:
|
|
234
|
+
|
|
235
|
+
```bash
|
|
236
|
+
npm install /chatline
|
|
237
|
+
# or
|
|
238
|
+
pnpm add /chatline
|
|
239
|
+
# or
|
|
240
|
+
yarn add /chatline
|
|
241
|
+
```
|
|
242
|
+
|
|
243
|
+
### TypeScript/JavaScript Import
|
|
244
|
+
|
|
245
|
+
```typescript
|
|
246
|
+
import {
|
|
247
|
+
// Config Management
|
|
248
|
+
loadConfig,
|
|
249
|
+
generateConfigContent,
|
|
250
|
+
validateConfig,
|
|
251
|
+
|
|
252
|
+
// Ingest Functions
|
|
253
|
+
ingestCSV,
|
|
254
|
+
dedupAndMerge,
|
|
255
|
+
|
|
256
|
+
// Utilities
|
|
257
|
+
detectDelta,
|
|
258
|
+
mergeEnrichments,
|
|
259
|
+
|
|
260
|
+
// Rate Limiting
|
|
261
|
+
createRateLimiter,
|
|
262
|
+
|
|
263
|
+
// Types
|
|
264
|
+
type Message,
|
|
265
|
+
type Config,
|
|
266
|
+
type DeltaResult,
|
|
267
|
+
} from '@nathanvale/chatline'
|
|
268
|
+
```
|
|
269
|
+
|
|
270
|
+
### Example: Load and Validate Config
|
|
271
|
+
|
|
272
|
+
```typescript
|
|
273
|
+
import { loadConfig, validateConfig } from '@nathanvale/chatline'
|
|
274
|
+
|
|
275
|
+
// Load config with auto-discovery (looks for imessage-config.yaml/json)
|
|
276
|
+
const config = await loadConfig()
|
|
277
|
+
|
|
278
|
+
// Load specific config file
|
|
279
|
+
const config = await loadConfig({ configPath: './custom-config.yaml' })
|
|
280
|
+
|
|
281
|
+
// Validate existing config object
|
|
282
|
+
const validated = validateConfig({
|
|
283
|
+
gemini: { apiKey: 'your-key' },
|
|
284
|
+
inputs: { csv: ['messages.csv'] },
|
|
285
|
+
})
|
|
286
|
+
```
|
|
287
|
+
|
|
288
|
+
### Example: Ingest Messages from CSV
|
|
289
|
+
|
|
290
|
+
```typescript
|
|
291
|
+
import { ingestCSV, createExportEnvelope } from '@nathanvale/chatline'
|
|
292
|
+
import type { Message, IngestOptions } from '@nathanvale/chatline'
|
|
293
|
+
|
|
294
|
+
const options: IngestOptions = {
|
|
295
|
+
attachmentDir: '/path/to/attachments',
|
|
296
|
+
strictMode: false,
|
|
297
|
+
}
|
|
298
|
+
|
|
299
|
+
const messages: Message[] = ingestCSV('./messages.csv', options)
|
|
300
|
+
|
|
301
|
+
// Wrap in export envelope
|
|
302
|
+
const envelope = createExportEnvelope(messages)
|
|
303
|
+
console.log(`Ingested ${envelope.totalMessages} messages`)
|
|
304
|
+
```
|
|
305
|
+
|
|
306
|
+
### Example: Deduplicate and Merge Sources
|
|
307
|
+
|
|
308
|
+
```typescript
|
|
309
|
+
import { dedupAndMerge } from '@nathanvale/chatline'
|
|
310
|
+
import type { Message } from '@nathanvale/chatline'
|
|
311
|
+
|
|
312
|
+
const csvMessages: Message[] = ingestCSV('./messages.csv', options)
|
|
313
|
+
const dbMessages: Message[] = JSON.parse(
|
|
314
|
+
fs.readFileSync('./db-export.json', 'utf-8'),
|
|
315
|
+
)
|
|
316
|
+
|
|
317
|
+
const result = dedupAndMerge(csvMessages, dbMessages)
|
|
318
|
+
|
|
319
|
+
console.log(`Merged ${result.mergedCount} messages`)
|
|
320
|
+
console.log(`Found ${result.stats.exactMatches} exact matches`)
|
|
321
|
+
console.log(`Deduped ${result.stats.duplicatesRemoved} duplicates`)
|
|
322
|
+
```
|
|
323
|
+
|
|
324
|
+
### Example: Detect New Messages (Incremental Processing)
|
|
325
|
+
|
|
326
|
+
```typescript
|
|
327
|
+
import { detectDelta, extractGuidsFromMessages } from '@nathanvale/chatline'
|
|
328
|
+
import type { Message, DeltaResult } from '@nathanvale/chatline'
|
|
329
|
+
|
|
330
|
+
const currentMessages: Message[] = loadCurrentMessages()
|
|
331
|
+
const previousMessages: Message[] = loadPreviousCheckpoint()
|
|
332
|
+
|
|
333
|
+
const delta: DeltaResult = detectDelta(currentMessages, previousMessages)
|
|
334
|
+
|
|
335
|
+
console.log(`New messages: ${delta.new.length}`)
|
|
336
|
+
console.log(`Modified messages: ${delta.changed.length}`)
|
|
337
|
+
console.log(`Removed messages: ${delta.removed.length}`)
|
|
338
|
+
|
|
339
|
+
// Process only new messages
|
|
340
|
+
const newGuids = extractGuidsFromMessages(delta.new)
|
|
341
|
+
await enrichOnlyNew(newGuids)
|
|
342
|
+
```
|
|
343
|
+
|
|
344
|
+
### Example: Rate Limiting for API Calls
|
|
345
|
+
|
|
346
|
+
```typescript
|
|
347
|
+
import { createRateLimiter } from '@nathanvale/chatline'
|
|
348
|
+
import type { RateLimitConfig } from '@nathanvale/chatline'
|
|
349
|
+
|
|
350
|
+
const limiter = createRateLimiter({
|
|
351
|
+
requestsPerSecond: 10,
|
|
352
|
+
maxRetries: 3,
|
|
353
|
+
retryDelayMs: 1000,
|
|
354
|
+
})
|
|
355
|
+
|
|
356
|
+
// Use with fetch or any async API call
|
|
357
|
+
const response = await limiter.execute(async () => {
|
|
358
|
+
return fetch('https://api.example.com/data')
|
|
359
|
+
})
|
|
360
|
+
|
|
361
|
+
console.log(`Status: ${response.status}`)
|
|
362
|
+
```
|
|
363
|
+
|
|
364
|
+
### Example: Generate Config Programmatically
|
|
365
|
+
|
|
366
|
+
```typescript
|
|
367
|
+
import { generateConfigContent, getDefaultConfigPath } from '@nathanvale/chatline'
|
|
368
|
+
import fs from 'node:fs/promises'
|
|
369
|
+
|
|
370
|
+
// Generate YAML config with defaults
|
|
371
|
+
const yamlContent = generateConfigContent('yaml')
|
|
372
|
+
const configPath = getDefaultConfigPath('yaml')
|
|
373
|
+
|
|
374
|
+
await fs.writeFile(configPath, yamlContent, 'utf-8')
|
|
375
|
+
console.log(`Config written to ${configPath}`)
|
|
376
|
+
|
|
377
|
+
// Generate JSON config
|
|
378
|
+
const jsonContent = generateConfigContent('json')
|
|
379
|
+
await fs.writeFile('./imessage-config.json', jsonContent, 'utf-8')
|
|
380
|
+
```
|
|
381
|
+
|
|
382
|
+
### TypeScript Type Definitions
|
|
383
|
+
|
|
384
|
+
The package includes full TypeScript definitions for all exports:
|
|
385
|
+
|
|
386
|
+
```typescript
|
|
387
|
+
import type {
|
|
388
|
+
// Core message types
|
|
389
|
+
Message,
|
|
390
|
+
MessageCore,
|
|
391
|
+
MediaMeta,
|
|
392
|
+
MediaEnrichment,
|
|
393
|
+
ReplyInfo,
|
|
394
|
+
TapbackInfo,
|
|
395
|
+
|
|
396
|
+
// Config types
|
|
397
|
+
Config,
|
|
398
|
+
ConfigFormat,
|
|
399
|
+
|
|
400
|
+
// Utility types
|
|
401
|
+
DeltaResult,
|
|
402
|
+
MergeStats,
|
|
403
|
+
IngestMergeResult,
|
|
404
|
+
EnrichmentMergeResult,
|
|
405
|
+
|
|
406
|
+
// Rate limiting types
|
|
407
|
+
RateLimitConfig,
|
|
408
|
+
RateLimitState,
|
|
409
|
+
ApiResponse,
|
|
410
|
+
} from '@nathanvale/chatline'
|
|
411
|
+
```
|
|
412
|
+
|
|
413
|
+
### Advanced: Custom Pipeline
|
|
414
|
+
|
|
415
|
+
```typescript
|
|
416
|
+
import {
|
|
417
|
+
loadConfig,
|
|
418
|
+
ingestCSV,
|
|
419
|
+
dedupAndMerge,
|
|
420
|
+
detectDelta,
|
|
421
|
+
mergeEnrichments,
|
|
422
|
+
createRateLimiter,
|
|
423
|
+
} from '@nathanvale/chatline'
|
|
424
|
+
import type { Message, Config } from '@nathanvale/chatline'
|
|
425
|
+
|
|
426
|
+
async function runCustomPipeline() {
|
|
427
|
+
// 1. Load configuration
|
|
428
|
+
const config: Config = await loadConfig()
|
|
429
|
+
|
|
430
|
+
// 2. Ingest from multiple sources
|
|
431
|
+
const csvMessages = ingestCSV(config.inputs.csv[0], {
|
|
432
|
+
attachmentDir: config.paths?.attachmentRoot,
|
|
433
|
+
strictMode: false,
|
|
434
|
+
})
|
|
435
|
+
|
|
436
|
+
const dbMessages = JSON.parse(await fs.readFile(config.inputs.db, 'utf-8'))
|
|
437
|
+
|
|
438
|
+
// 3. Merge and deduplicate
|
|
439
|
+
const merged = dedupAndMerge(csvMessages, dbMessages)
|
|
440
|
+
console.log(`Merged to ${merged.mergedCount} unique messages`)
|
|
441
|
+
|
|
442
|
+
// 4. Detect changes since last run
|
|
443
|
+
const previous = await loadPreviousState()
|
|
444
|
+
const delta = detectDelta(merged.messages, previous)
|
|
445
|
+
|
|
446
|
+
// 5. Enrich only new messages with rate limiting
|
|
447
|
+
const limiter = createRateLimiter({ requestsPerSecond: 5 })
|
|
448
|
+
|
|
449
|
+
for (const message of delta.new) {
|
|
450
|
+
if (message.media?.kind === 'image') {
|
|
451
|
+
const enrichment = await limiter.execute(() =>
|
|
452
|
+
enrichImageWithGemini(message),
|
|
453
|
+
)
|
|
454
|
+
message.media.enrichment = enrichment
|
|
455
|
+
}
|
|
456
|
+
}
|
|
457
|
+
|
|
458
|
+
// 6. Merge enrichments back into full dataset
|
|
459
|
+
const enriched = mergeEnrichments(merged.messages, delta.new)
|
|
460
|
+
|
|
461
|
+
// 7. Save checkpoint
|
|
462
|
+
await saveState(enriched.messages)
|
|
463
|
+
|
|
464
|
+
return enriched
|
|
465
|
+
}
|
|
466
|
+
```
|
|
467
|
+
|
|
468
|
+
### See Also
|
|
469
|
+
|
|
470
|
+
- **CLI Usage**: See `docs/cli-usage.md` for command-line interface examples
|
|
471
|
+
- **Dual Distribution**: See `docs/dual-mode-distribution-best-practices.md` for
|
|
472
|
+
packaging details
|
|
473
|
+
- **API Documentation**: See generated TypeDoc output (coming soon)
|
|
474
|
+
|
|
475
|
+
## For Developers
|
|
476
|
+
|
|
477
|
+
### Development Setup
|
|
478
|
+
|
|
479
|
+
#### Clone & Install
|
|
480
|
+
|
|
481
|
+
```bash
|
|
482
|
+
# Clone the repository
|
|
483
|
+
git clone https://github.com/yourusername/chatline.git
|
|
484
|
+
cd chatline
|
|
485
|
+
|
|
486
|
+
# Install dependencies
|
|
487
|
+
pnpm install
|
|
488
|
+
|
|
489
|
+
# Build TypeScript
|
|
490
|
+
pnpm build
|
|
491
|
+
```
|
|
492
|
+
|
|
493
|
+
#### Local CLI Development
|
|
494
|
+
|
|
495
|
+
During development, run the CLI directly from TypeScript via Bun, or run from
|
|
496
|
+
the built `dist` output:
|
|
497
|
+
|
|
498
|
+
```bash
|
|
499
|
+
# Fast dev (TypeScript direct via Bun)
|
|
500
|
+
pnpm dev -- --help
|
|
501
|
+
pnpm dev -- --config examples/imessage-config.yaml
|
|
502
|
+
|
|
503
|
+
# Or build, then run individual commands from dist
|
|
504
|
+
pnpm build
|
|
505
|
+
pnpm cli doctor
|
|
506
|
+
pnpm cli ingest-csv -i messages.csv -o output.json
|
|
507
|
+
|
|
508
|
+
# Watch mode for development (typecheck/build)
|
|
509
|
+
pnpm watch
|
|
510
|
+
```
|
|
511
|
+
|
|
512
|
+
#### Running Tests
|
|
513
|
+
|
|
514
|
+
```bash
|
|
515
|
+
# Run all tests
|
|
516
|
+
pnpm test
|
|
517
|
+
|
|
518
|
+
# Watch mode
|
|
519
|
+
pnpm test:watch
|
|
520
|
+
|
|
521
|
+
# With UI
|
|
522
|
+
pnpm test:ui
|
|
523
|
+
|
|
524
|
+
# Coverage report
|
|
525
|
+
pnpm coverage
|
|
526
|
+
```
|
|
527
|
+
|
|
528
|
+
#### Code Quality
|
|
529
|
+
|
|
530
|
+
```bash
|
|
531
|
+
# Lint code
|
|
532
|
+
pnpm lint
|
|
533
|
+
pnpm lint:fix
|
|
534
|
+
|
|
535
|
+
# Format code
|
|
536
|
+
pnpm format
|
|
537
|
+
|
|
538
|
+
# Run quality checks (pre-commit hook)
|
|
539
|
+
pnpm quality-check
|
|
540
|
+
```
|
|
541
|
+
|
|
542
|
+
### Bun-powered dev and tooling
|
|
543
|
+
|
|
544
|
+
This project keeps pnpm as the package manager and Vitest on Node for stable
|
|
545
|
+
tests, while using Bun for fast local development and tooling:
|
|
546
|
+
|
|
547
|
+
- Primary (CI/stable): `pnpm build`, `pnpm test`, `pnpm test:ci`
|
|
548
|
+
- Local convenience (Bun):
|
|
549
|
+
- `pnpm dev` – run the CLI from TypeScript via Bun
|
|
550
|
+
- `pnpm typecheck` – no-emit typechecking via `bunx tsc`
|
|
551
|
+
- `pnpm lint` / `pnpm lint:fix` – via `bunx eslint`
|
|
552
|
+
- `pnpm format` – via `bunx prettier`
|
|
553
|
+
|
|
554
|
+
Notes:
|
|
555
|
+
|
|
556
|
+
- Native addons (sharp, better-sqlite3) may build from source under Bun; Node
|
|
557
|
+
remains the primary test substrate.
|
|
558
|
+
- Firecrawl SDK is compatible with Bun; the MCP server and tests remain on Node.
|
|
559
|
+
|
|
560
|
+
## Architecture
|
|
561
|
+
|
|
562
|
+
The pipeline follows a strict **4-stage architecture** with clear separation of
|
|
563
|
+
concerns:
|
|
564
|
+
|
|
565
|
+
```
|
|
566
|
+
CSV/DB Exports
|
|
567
|
+
│
|
|
568
|
+
├─────────────────┬──────────────────┐
|
|
569
|
+
▼ ▼ ▼
|
|
570
|
+
Ingest-CSV Ingest-DB (Other sources)
|
|
571
|
+
│ │ │
|
|
572
|
+
└─────────────────┼──────────────────┘
|
|
573
|
+
▼
|
|
574
|
+
[Stage 1: Ingest]
|
|
575
|
+
Parse & normalize
|
|
576
|
+
▼
|
|
577
|
+
messages.*.ingested.json
|
|
578
|
+
│
|
|
579
|
+
▼
|
|
580
|
+
[Stage 2: Normalize-Link]
|
|
581
|
+
Deduplicate, link replies/tapbacks
|
|
582
|
+
▼
|
|
583
|
+
messages.normalized.json
|
|
584
|
+
│
|
|
585
|
+
▼
|
|
586
|
+
[Stage 3: Enrich-AI] ◄── Resumable & Incremental
|
|
587
|
+
Add AI enrichments
|
|
588
|
+
▼
|
|
589
|
+
messages.enriched.json
|
|
590
|
+
│
|
|
591
|
+
▼
|
|
592
|
+
[Stage 4: Render-Markdown]
|
|
593
|
+
Generate daily files
|
|
594
|
+
▼
|
|
595
|
+
timeline/*.md (output)
|
|
596
|
+
```
|
|
597
|
+
|
|
598
|
+
### Stage 1: Ingest
|
|
599
|
+
|
|
600
|
+
Extracts messages from CSV or SQLite database and normalizes to a unified
|
|
601
|
+
schema.
|
|
602
|
+
|
|
603
|
+
**Responsibilities:**
|
|
604
|
+
|
|
605
|
+
- Parse rows with field mapping (handle CSV/DB dialect differences)
|
|
606
|
+
- Convert dates (CSV UTC → ISO 8601, Apple epoch → ISO 8601)
|
|
607
|
+
- Split rows into `text`/`media`/`notification`/`tapback` messages
|
|
608
|
+
- Resolve attachment paths to absolute paths when possible
|
|
609
|
+
- Create stable part GUIDs for multi-attachment DB messages:
|
|
610
|
+
`p:<index>/<original_guid>`
|
|
611
|
+
- Preserve source metadata (CSV vs DB origin)
|
|
612
|
+
|
|
613
|
+
**Input:** iMazing CSV or Messages.app SQLite database **Output:** Normalized
|
|
614
|
+
`Message[]` in JSON envelope with metadata
|
|
615
|
+
|
|
616
|
+
### Stage 2: Normalize-Link
|
|
617
|
+
|
|
618
|
+
Merges multiple sources, deduplicates, links replies/tapbacks, and validates
|
|
619
|
+
schema.
|
|
620
|
+
|
|
621
|
+
**Responsibilities:**
|
|
622
|
+
|
|
623
|
+
- Link replies to parents:
|
|
624
|
+
- Primary: DB `association_guid` (database-native association)
|
|
625
|
+
- Fallback: Heuristics (±30s timestamp proximity, text similarity, sender
|
|
626
|
+
difference)
|
|
627
|
+
- Link tapbacks (emoji reactions) to message parts
|
|
628
|
+
- Deduplicate across CSV/DB sources:
|
|
629
|
+
- Exact GUID matching (primary)
|
|
630
|
+
- Content equivalence (fuzzy text match, same sender, same timestamp)
|
|
631
|
+
- Prefer DB-sourced data in conflicts (DB is authoritative for timestamps,
|
|
632
|
+
handles, etc.)
|
|
633
|
+
- Enforce schema via Zod validation (camelCase, type correctness)
|
|
634
|
+
|
|
635
|
+
**Algorithm Complexity:** O(n log n) for deduplication with GUID indexing
|
|
636
|
+
|
|
637
|
+
**Input:** One or both ingest outputs **Output:** Merged, deduplicated, linked
|
|
638
|
+
`messages.normalized.json`
|
|
639
|
+
|
|
640
|
+
### Stage 3: Enrich-AI
|
|
641
|
+
|
|
642
|
+
Augments messages with AI-powered analysis. Fully resumable and idempotent.
|
|
643
|
+
|
|
644
|
+
**Responsibilities:**
|
|
645
|
+
|
|
646
|
+
- Image analysis:
|
|
647
|
+
- Convert HEIC/TIFF to JPG preview (cached by filename)
|
|
648
|
+
- Gemini Vision API: structured prompt for caption + summary
|
|
649
|
+
- Audio transcription:
|
|
650
|
+
- Structured prompt requesting timestamps and speaker labels
|
|
651
|
+
- Handles long audio with chunking (streaming for >10min files)
|
|
652
|
+
- PDF summarization (key points extraction)
|
|
653
|
+
- Link enrichment:
|
|
654
|
+
- Firecrawl for full web scraping
|
|
655
|
+
- Provider-specific fallbacks (YouTube, Spotify, Twitter, Instagram)
|
|
656
|
+
- Generic HTML meta tag fallback
|
|
657
|
+
- Graceful degradation (never crashes, stores error in enrichment)
|
|
658
|
+
- Idempotent processing (skip if enrichment kind already exists)
|
|
659
|
+
- Checkpointing (save progress every N items)
|
|
660
|
+
- Resumable (load checkpoint, verify config hash, continue from last index)
|
|
661
|
+
- Rate limiting (jittered backoff for API limits)
|
|
662
|
+
- Incremental mode (process only new message GUIDs vs prior state)
|
|
663
|
+
|
|
664
|
+
**Idempotency Key:** `(message.media.id, enrichment.kind)`
|
|
665
|
+
|
|
666
|
+
**Input:** `messages.normalized.json`, optional checkpoint/state files
|
|
667
|
+
**Output:** `messages.enriched.json` with populated `media.enrichment[]` arrays
|
|
668
|
+
|
|
669
|
+
### Stage 4: Render-Markdown
|
|
670
|
+
|
|
671
|
+
Generates deterministic daily markdown files organized by date and time-of-day.
|
|
672
|
+
|
|
673
|
+
**Responsibilities:**
|
|
674
|
+
|
|
675
|
+
- Group messages by calendar date
|
|
676
|
+
- Sub-group by time-of-day sections (Morning 00:00-11:59, Afternoon 12:00-17:59,
|
|
677
|
+
Evening 18:00-23:59)
|
|
678
|
+
- Render each message with:
|
|
679
|
+
- Timestamp anchor for deep linking
|
|
680
|
+
- Sender name / "Me" indicator
|
|
681
|
+
- Message text or media preview
|
|
682
|
+
- Enrichments (image captions, transcriptions, link contexts) as formatted
|
|
683
|
+
blockquotes
|
|
684
|
+
- Render replies as nested blockquotes (up to configurable depth)
|
|
685
|
+
- Render tapbacks as emoji reactions (❤️ for "loved", etc.)
|
|
686
|
+
- Deterministic sorting by `(date, guid)` for reproducibility
|
|
687
|
+
|
|
688
|
+
**Determinism:** Identical input → identical output. No randomization, stable
|
|
689
|
+
key ordering.
|
|
690
|
+
|
|
691
|
+
**Input:** `messages.enriched.json` **Output:** Daily markdown files
|
|
692
|
+
(`timeline/YYYY-MM-DD.md`)
|
|
693
|
+
|
|
694
|
+
## Message Schema
|
|
695
|
+
|
|
696
|
+
The unified `Message` type represents all message kinds with a discriminated
|
|
697
|
+
union:
|
|
698
|
+
|
|
699
|
+
```typescript
|
|
700
|
+
type Message = {
|
|
701
|
+
guid: string // Unique identifier
|
|
702
|
+
messageKind: 'text' | 'media' | 'tapback' | 'notification'
|
|
703
|
+
date: string // ISO 8601 with Z suffix (UTC)
|
|
704
|
+
isFromMe: boolean
|
|
705
|
+
|
|
706
|
+
// Optional fields by kind
|
|
707
|
+
text?: string // For text/notification messages
|
|
708
|
+
media?: MediaMeta // For media messages (see below)
|
|
709
|
+
tapback?: TapbackInfo // For tapback messages
|
|
710
|
+
|
|
711
|
+
// Linking
|
|
712
|
+
replyingTo?: ReplyInfo // Links to parent message GUID
|
|
713
|
+
|
|
714
|
+
// Metadata
|
|
715
|
+
service: string // SMS, iMessage, etc.
|
|
716
|
+
handle?: string // Phone number or Apple ID
|
|
717
|
+
senderName?: string // Display name
|
|
718
|
+
groupGuid?: string // For split messages, original DB GUID
|
|
719
|
+
|
|
720
|
+
// Preservation fields
|
|
721
|
+
subject?: string
|
|
722
|
+
isAudioMessage?: boolean
|
|
723
|
+
isDeleted?: boolean
|
|
724
|
+
|
|
725
|
+
// Provenance
|
|
726
|
+
sourceType?: 'csv' | 'db'
|
|
727
|
+
sourceMetadata?: Record<string, unknown>
|
|
728
|
+
}
|
|
729
|
+
|
|
730
|
+
type MediaMeta = {
|
|
731
|
+
id: string // Unique media ID
|
|
732
|
+
type: 'image' | 'audio' | 'pdf' | 'video' | 'document'
|
|
733
|
+
filename?: string
|
|
734
|
+
path?: string // Absolute path if file exists
|
|
735
|
+
mimeType?: string
|
|
736
|
+
size?: number
|
|
737
|
+
duration?: number // For audio/video in seconds
|
|
738
|
+
enrichment?: MediaEnrichment[] // AI analysis results
|
|
739
|
+
provenance?: {
|
|
740
|
+
originalPath?: string
|
|
741
|
+
source: 'csv' | 'db'
|
|
742
|
+
lastSeen?: string
|
|
743
|
+
}
|
|
744
|
+
}
|
|
745
|
+
|
|
746
|
+
type MediaEnrichment = {
|
|
747
|
+
kind: 'image_analysis' | 'transcription' | 'pdf_summary' | 'link_context'
|
|
748
|
+
content: Record<string, unknown>
|
|
749
|
+
provider: string // 'gemini', 'firecrawl', etc.
|
|
750
|
+
model: string
|
|
751
|
+
version: string
|
|
752
|
+
createdAt: string // ISO 8601
|
|
753
|
+
error?: string // If enrichment failed
|
|
754
|
+
}
|
|
755
|
+
```
|
|
756
|
+
|
|
757
|
+
All dates are **ISO 8601 with Z suffix** (UTC). See
|
|
758
|
+
[Dates and Timezones](#dates-and-timezones) for conversion details.
|
|
759
|
+
|
|
760
|
+
## CLI Commands
|
|
761
|
+
|
|
762
|
+
### Main Pipeline Commands
|
|
763
|
+
|
|
764
|
+
#### `ingest-csv`
|
|
765
|
+
|
|
766
|
+
Import messages from iMazing CSV export.
|
|
767
|
+
|
|
768
|
+
```bash
|
|
769
|
+
pnpm cli ingest-csv \
|
|
770
|
+
-i messages.csv \
|
|
771
|
+
-o messages.csv.ingested.json \
|
|
772
|
+
-a ~/Library/Messages/Attachments \
|
|
773
|
+
-a /Volumes/Backup/old-attachments
|
|
774
|
+
```
|
|
775
|
+
|
|
776
|
+
**Options:**
|
|
777
|
+
|
|
778
|
+
- `-i, --input <path>` - iMazing CSV file (required)
|
|
779
|
+
- `-o, --output <path>` - Output JSON file (default:
|
|
780
|
+
`./messages.csv.ingested.json`)
|
|
781
|
+
- `-a, --attachments <dirs...>` - Root directories containing media files
|
|
782
|
+
|
|
783
|
+
#### `ingest-db`
|
|
784
|
+
|
|
785
|
+
Extract messages from macOS Messages.app SQLite database.
|
|
786
|
+
|
|
787
|
+
```bash
|
|
788
|
+
pnpm cli ingest-db \
|
|
789
|
+
-i ~/Library/Messages/chat.db \
|
|
790
|
+
-o messages.db.ingested.json \
|
|
791
|
+
--contact john@example.com
|
|
792
|
+
```
|
|
793
|
+
|
|
794
|
+
**Options:**
|
|
795
|
+
|
|
796
|
+
- `-i, --input <path>` - Messages.app database file (required)
|
|
797
|
+
- `-o, --output <path>` - Output JSON file (default:
|
|
798
|
+
`./messages.db.ingested.json`)
|
|
799
|
+
- `--contact <id>` - Filter by contact (phone or Apple ID)
|
|
800
|
+
- `-a, --attachments <dirs...>` - Attachment root directories
|
|
801
|
+
|
|
802
|
+
#### `normalize-link`
|
|
803
|
+
|
|
804
|
+
Merge sources, deduplicate, link replies/tapbacks, and validate schema.
|
|
805
|
+
|
|
806
|
+
```bash
|
|
807
|
+
pnpm cli normalize-link \
|
|
808
|
+
-i messages.csv.ingested.json messages.db.ingested.json \
|
|
809
|
+
-o messages.normalized.json \
|
|
810
|
+
-m all
|
|
811
|
+
```
|
|
812
|
+
|
|
813
|
+
**Options:**
|
|
814
|
+
|
|
815
|
+
- `-i, --input <paths...>` - Input JSON files (required, can specify multiple)
|
|
816
|
+
- `-o, --output <path>` - Output JSON file (default:
|
|
817
|
+
`./messages.normalized.json`)
|
|
818
|
+
- `-m, --merge-strategy <strategy>` - `exact` (GUID only) | `content` (content
|
|
819
|
+
equivalence) | `all` (both, default)
|
|
820
|
+
|
|
821
|
+
#### `enrich-ai`
|
|
822
|
+
|
|
823
|
+
Augment messages with AI analysis (images, audio, links). Resumable and
|
|
824
|
+
incremental.
|
|
825
|
+
|
|
826
|
+
```bash
|
|
827
|
+
pnpm cli enrich-ai \
|
|
828
|
+
-i messages.normalized.json \
|
|
829
|
+
-o messages.enriched.json \
|
|
830
|
+
--resume \
|
|
831
|
+
--incremental \
|
|
832
|
+
--rate-limit 1000 \
|
|
833
|
+
--max-retries 3 \
|
|
834
|
+
--checkpoint-interval 100 \
|
|
835
|
+
--enable-vision --enable-audio --enable-links \
|
|
836
|
+
-v
|
|
837
|
+
```
|
|
838
|
+
|
|
839
|
+
**Options:**
|
|
840
|
+
|
|
841
|
+
- `-i, --input <path>` - Input normalized JSON (required)
|
|
842
|
+
- `-o, --output <path>` - Output JSON file (default: `./messages.enriched.json`)
|
|
843
|
+
- `-c, --checkpoint-dir <path>` - Checkpoint directory (default:
|
|
844
|
+
`./.checkpoints`)
|
|
845
|
+
- `--resume` - Resume from last checkpoint
|
|
846
|
+
- `--incremental` - Only enrich messages new since last enrichment run
|
|
847
|
+
- `--state-file <path>` - Path to incremental state file (default:
|
|
848
|
+
`./.imessage-state.json`)
|
|
849
|
+
- `--reset-state` - Clear incremental state and enrich all messages
|
|
850
|
+
- `--rate-limit <ms>` - Delay between API calls (default: 1000)
|
|
851
|
+
- `--max-retries <n>` - Max retries on API errors (default: 3)
|
|
852
|
+
- `--checkpoint-interval <n>` - Save checkpoint every N items (default: 100)
|
|
853
|
+
- `--enable-vision` - Enable image analysis (default: true)
|
|
854
|
+
- `--enable-audio` - Enable audio transcription (default: true)
|
|
855
|
+
- `--enable-links` - Enable link enrichment (default: true)
|
|
856
|
+
|
|
857
|
+
#### `render-markdown`
|
|
858
|
+
|
|
859
|
+
Generate daily markdown files from enriched messages.
|
|
860
|
+
|
|
861
|
+
```bash
|
|
862
|
+
pnpm cli render-markdown \
|
|
863
|
+
-i messages.enriched.json \
|
|
864
|
+
-o ./timeline \
|
|
865
|
+
--group-by-time \
|
|
866
|
+
--nested-replies \
|
|
867
|
+
--max-nesting-depth 10 \
|
|
868
|
+
--start-date 2025-01-01 \
|
|
869
|
+
--end-date 2025-12-31
|
|
870
|
+
```
|
|
871
|
+
|
|
872
|
+
**Options:**
|
|
873
|
+
|
|
874
|
+
- `-i, --input <path>` - Input enriched JSON (required)
|
|
875
|
+
- `-o, --output <path>` - Output directory (default: `./timeline`)
|
|
876
|
+
- `--group-by-time` - Group by Morning/Afternoon/Evening (default: true)
|
|
877
|
+
- `--nested-replies` - Render replies as blockquotes (default: true)
|
|
878
|
+
- `--max-nesting-depth <n>` - Max blockquote nesting depth (default: 10)
|
|
879
|
+
- `--start-date <YYYY-MM-DD>` - Filter messages from this date
|
|
880
|
+
- `--end-date <YYYY-MM-DD>` - Filter messages until this date
|
|
881
|
+
|
|
882
|
+
### Utility Commands
|
|
883
|
+
|
|
884
|
+
#### `validate`
|
|
885
|
+
|
|
886
|
+
Validate JSON file against Message schema.
|
|
887
|
+
|
|
888
|
+
```bash
|
|
889
|
+
pnpm cli validate -i messages.json [-q]
|
|
890
|
+
```
|
|
891
|
+
|
|
892
|
+
**Options:**
|
|
893
|
+
|
|
894
|
+
- `-i, --input <path>` - JSON file to validate (required)
|
|
895
|
+
- `-q, --quiet` - Suppress detailed error output
|
|
896
|
+
|
|
897
|
+
**Output:** Exit code 0 on success, 1 on validation failure. Prints summary
|
|
898
|
+
stats.
|
|
899
|
+
|
|
900
|
+
#### `stats`
|
|
901
|
+
|
|
902
|
+
Show statistics about a message file.
|
|
903
|
+
|
|
904
|
+
```bash
|
|
905
|
+
pnpm cli stats -i messages.json [-v]
|
|
906
|
+
```
|
|
907
|
+
|
|
908
|
+
**Options:**
|
|
909
|
+
|
|
910
|
+
- `-i, --input <path>` - JSON file (required)
|
|
911
|
+
- `-v, --verbose` - Show per-kind breakdown
|
|
912
|
+
|
|
913
|
+
**Output:** Message count, breakdown by `messageKind`, date range, attachment
|
|
914
|
+
count, etc.
|
|
915
|
+
|
|
916
|
+
#### `doctor`
|
|
917
|
+
|
|
918
|
+
Run system diagnostics.
|
|
919
|
+
|
|
920
|
+
```bash
|
|
921
|
+
pnpm cli doctor [-v]
|
|
922
|
+
```
|
|
923
|
+
|
|
924
|
+
**Checks:**
|
|
925
|
+
|
|
926
|
+
- Node.js version (22+)
|
|
927
|
+
- Dependencies (pnpm packages)
|
|
928
|
+
- Config file exists and is readable
|
|
929
|
+
- API keys present (GEMINI_API_KEY, FIRECRAWL_API_KEY)
|
|
930
|
+
- Attachment directories accessible
|
|
931
|
+
- Write permissions to output directories
|
|
932
|
+
|
|
933
|
+
#### `init`
|
|
934
|
+
|
|
935
|
+
Generate starter configuration file.
|
|
936
|
+
|
|
937
|
+
```bash
|
|
938
|
+
pnpm cli init [-f json|yaml] [--force] [-o custom-path.yaml]
|
|
939
|
+
```
|
|
940
|
+
|
|
941
|
+
**Options:**
|
|
942
|
+
|
|
943
|
+
- `-f, --format <format>` - `json` or `yaml` (default: yaml)
|
|
944
|
+
- `--force` - Overwrite existing config
|
|
945
|
+
- `-o, --output <path>` - Custom config path
|
|
946
|
+
|
|
947
|
+
## Configuration
|
|
948
|
+
|
|
949
|
+
Configuration can be provided via `imessage-config.yaml` or
|
|
950
|
+
`imessage-config.json`. Create with `pnpm cli init` or manually:
|
|
951
|
+
|
|
952
|
+
```yaml
|
|
953
|
+
version: '1.0'
|
|
954
|
+
|
|
955
|
+
# Attachment directories to search for media files
|
|
956
|
+
attachmentRoots:
|
|
957
|
+
- ~/Library/Messages/Attachments
|
|
958
|
+
- /Volumes/Backup/old-attachments
|
|
959
|
+
|
|
960
|
+
# Google Gemini API configuration
|
|
961
|
+
gemini:
|
|
962
|
+
apiKey: ${GEMINI_API_KEY} # Loaded from environment
|
|
963
|
+
model: gemini-1.5-pro # Recommended model
|
|
964
|
+
rateLimitDelay: 1000 # Milliseconds between requests
|
|
965
|
+
maxRetries: 3 # Retry failed API calls
|
|
966
|
+
|
|
967
|
+
# Firecrawl (link enrichment) configuration
|
|
968
|
+
firecrawl:
|
|
969
|
+
apiKey: ${FIRECRAWL_API_KEY} # Optional, for link context
|
|
970
|
+
enabled: true
|
|
971
|
+
|
|
972
|
+
# Enrichment settings
|
|
973
|
+
enrichment:
|
|
974
|
+
enableVisionAnalysis: true # Image captions/summaries
|
|
975
|
+
enableAudioTranscription: true # Audio transcription
|
|
976
|
+
enableLinkEnrichment: true # Link context extraction
|
|
977
|
+
imageCacheDir: ./.cache/images # Preview cache location
|
|
978
|
+
checkpointInterval: 100 # Items per checkpoint
|
|
979
|
+
forceRefresh: false # Re-enrich existing
|
|
980
|
+
|
|
981
|
+
# Rendering settings
|
|
982
|
+
render:
|
|
983
|
+
groupByTimeOfDay: true # Morning/Afternoon/Evening sections
|
|
984
|
+
renderRepliesAsNested: true # Blockquote threading
|
|
985
|
+
renderTapbacksAsEmoji: true # ❤️ instead of text
|
|
986
|
+
maxNestingDepth: 10 # Max blockquote levels
|
|
987
|
+
```
|
|
988
|
+
|
|
989
|
+
**Environment Variables:**
|
|
990
|
+
|
|
991
|
+
- `GEMINI_API_KEY` - Google Gemini API key (required for enrichment)
|
|
992
|
+
- `FIRECRAWL_API_KEY` - Firecrawl API key (optional, for link enrichment)
|
|
993
|
+
- `TF_BUILD` - Set by CI systems (enables test reporters)
|
|
994
|
+
|
|
995
|
+
**Config Loading:**
|
|
996
|
+
|
|
997
|
+
- Looks for `imessage-config.yaml` or `imessage-config.json` in current
|
|
998
|
+
directory
|
|
999
|
+
- Supports environment variable expansion: `${VARIABLE_NAME}`
|
|
1000
|
+
- CLI `--config` flag overrides default path
|
|
1001
|
+
|
|
1002
|
+
## Data Flows & Examples
|
|
1003
|
+
|
|
1004
|
+
### Example 1: Single Source (CSV Only)
|
|
1005
|
+
|
|
1006
|
+
```bash
|
|
1007
|
+
# Ingest CSV
|
|
1008
|
+
pnpm cli ingest-csv -i messages.csv -o messages.ingested.json
|
|
1009
|
+
|
|
1010
|
+
# Normalize (single source, minimal work)
|
|
1011
|
+
pnpm cli normalize-link -i messages.ingested.json -o messages.normalized.json
|
|
1012
|
+
|
|
1013
|
+
# Enrich (first time, all messages)
|
|
1014
|
+
pnpm cli enrich-ai -i messages.normalized.json -o messages.enriched.json
|
|
1015
|
+
|
|
1016
|
+
# Render
|
|
1017
|
+
pnpm cli render-markdown -i messages.enriched.json -o ./timeline
|
|
1018
|
+
```
|
|
1019
|
+
|
|
1020
|
+
### Example 2: Dual Source with Incremental Enrichment
|
|
1021
|
+
|
|
1022
|
+
```bash
|
|
1023
|
+
# Ingest both sources
|
|
1024
|
+
pnpm cli ingest-csv -i messages.csv -o messages.csv.ingested.json
|
|
1025
|
+
pnpm cli ingest-db -i ~/Library/Messages/chat.db -o messages.db.ingested.json
|
|
1026
|
+
|
|
1027
|
+
# Normalize and merge
|
|
1028
|
+
pnpm cli normalize-link \
|
|
1029
|
+
-i messages.csv.ingested.json messages.db.ingested.json \
|
|
1030
|
+
-o messages.normalized.json
|
|
1031
|
+
|
|
1032
|
+
# First enrichment
|
|
1033
|
+
pnpm cli enrich-ai \
|
|
1034
|
+
-i messages.normalized.json \
|
|
1035
|
+
-o messages.enriched.json \
|
|
1036
|
+
--checkpoint-interval 100
|
|
1037
|
+
|
|
1038
|
+
# Later: new messages added, re-run incrementally
|
|
1039
|
+
pnpm cli enrich-ai \
|
|
1040
|
+
-i messages.normalized.json \
|
|
1041
|
+
-o messages.enriched.json \
|
|
1042
|
+
--incremental \
|
|
1043
|
+
--resume
|
|
1044
|
+
```
|
|
1045
|
+
|
|
1046
|
+
### Example 3: Resuming from Crash
|
|
1047
|
+
|
|
1048
|
+
```bash
|
|
1049
|
+
# Enrichment stops mid-way (power loss, API timeout, etc.)
|
|
1050
|
+
# Checkpoint saved: .checkpoints/enrich-checkpoint-abc123def.json
|
|
1051
|
+
|
|
1052
|
+
# Resume from checkpoint
|
|
1053
|
+
pnpm cli enrich-ai \
|
|
1054
|
+
-i messages.normalized.json \
|
|
1055
|
+
-o messages.enriched.json \
|
|
1056
|
+
--resume
|
|
1057
|
+
# → Continues from last processed index automatically
|
|
1058
|
+
```
|
|
1059
|
+
|
|
1060
|
+
## Dates and Timezones
|
|
1061
|
+
|
|
1062
|
+
All dates in JSON outputs are **ISO 8601 UTC with Z suffix** (e.g.,
|
|
1063
|
+
`2025-10-26T14:30:45.000Z`).
|
|
1064
|
+
|
|
1065
|
+
### CSV Import
|
|
1066
|
+
|
|
1067
|
+
- iMazing CSV format: `MM/DD/YYYY, HH:MM:SS` (local timezone, interpreted as
|
|
1068
|
+
UTC)
|
|
1069
|
+
- Converted to ISO 8601 with Z suffix
|
|
1070
|
+
|
|
1071
|
+
### Database Import
|
|
1072
|
+
|
|
1073
|
+
- Apple epoch: Seconds since 2001-01-01 00:00:00 UTC
|
|
1074
|
+
- Formula: `ISO = (appleSeconds + 978307200) * 1000` (convert to milliseconds,
|
|
1075
|
+
then to ISO)
|
|
1076
|
+
- Result: ISO 8601 with Z suffix
|
|
1077
|
+
|
|
1078
|
+
### Markdown Rendering
|
|
1079
|
+
|
|
1080
|
+
- Timestamps displayed in UTC
|
|
1081
|
+
- Grouped by calendar date (UTC)
|
|
1082
|
+
- To display in local timezone, render the timestamp differently in
|
|
1083
|
+
post-processing
|
|
1084
|
+
|
|
1085
|
+
## Idempotency and Determinism
|
|
1086
|
+
|
|
1087
|
+
### Idempotent Enrichment
|
|
1088
|
+
|
|
1089
|
+
Enrichment is **idempotent** by design:
|
|
1090
|
+
|
|
1091
|
+
- Check if `enrichment.kind` already exists for a message
|
|
1092
|
+
- Skip if present (already enriched)
|
|
1093
|
+
- Use `--force-refresh` to re-enrich specific kinds
|
|
1094
|
+
|
|
1095
|
+
```bash
|
|
1096
|
+
# First run: enrich all
|
|
1097
|
+
pnpm cli enrich-ai -i messages.normalized.json -o messages.enriched.json
|
|
1098
|
+
|
|
1099
|
+
# Later: add new enrichment kind (e.g., link context)
|
|
1100
|
+
# Existing image/audio enrichments preserved, new kind added
|
|
1101
|
+
pnpm cli enrich-ai -i messages.normalized.json -o messages.enriched.json \
|
|
1102
|
+
--enable-links # Other kinds disabled
|
|
1103
|
+
```
|
|
1104
|
+
|
|
1105
|
+
### Deterministic Rendering
|
|
1106
|
+
|
|
1107
|
+
Markdown output is **fully deterministic**:
|
|
1108
|
+
|
|
1109
|
+
- Messages sorted by `(date, guid)` before rendering
|
|
1110
|
+
- Enrichments sorted by kind within each message
|
|
1111
|
+
- JSON keys in sorted order
|
|
1112
|
+
- No randomization or time-dependent output
|
|
1113
|
+
|
|
1114
|
+
This means: `sha256(messages.enriched.json) → sha256(timeline/*.md)` is
|
|
1115
|
+
consistent across runs.
|
|
1116
|
+
|
|
1117
|
+
### Checkpoint Consistency
|
|
1118
|
+
|
|
1119
|
+
Checkpoints include config hash verification:
|
|
1120
|
+
|
|
1121
|
+
- Each checkpoint stores SHA256 of enrichment config
|
|
1122
|
+
- Resume only if config unchanged
|
|
1123
|
+
- Detects breaking changes (API key updates, disable/enable analysis modes)
|
|
1124
|
+
|
|
1125
|
+
## Performance & Optimization
|
|
1126
|
+
|
|
1127
|
+
### Concurrency
|
|
1128
|
+
|
|
1129
|
+
- **Ingest** (Stage 1): Single-threaded, fast (CSV parsing ~10k msgs/s)
|
|
1130
|
+
- **Normalize-Link** (Stage 2): Single-threaded, O(n log n) complexity (~1k
|
|
1131
|
+
msgs/s for dedup)
|
|
1132
|
+
- **Enrich-AI** (Stage 3): API call bound, respectful rate limiting (1-5
|
|
1133
|
+
msgs/min depending on Gemini quota)
|
|
1134
|
+
- **Render** (Stage 4): Single-threaded, fast (~10k msgs/s)
|
|
1135
|
+
|
|
1136
|
+
### Memory
|
|
1137
|
+
|
|
1138
|
+
- **Streaming where possible**: Large JSON files loaded once into memory
|
|
1139
|
+
- **Checkpoint interval**: Default 100 items keeps memory bounded
|
|
1140
|
+
- **Image cache**: Reuses converted previews by filename
|
|
1141
|
+
|
|
1142
|
+
### Cost Optimization
|
|
1143
|
+
|
|
1144
|
+
- **Incremental mode**: Only enrich new messages (~80% cost reduction for mature
|
|
1145
|
+
datasets)
|
|
1146
|
+
- **Selective enrichment**: Enable/disable analysis modes (`--enable-vision`,
|
|
1147
|
+
etc.)
|
|
1148
|
+
- **Image caching**: Preview conversion cached by filename (avoid re-processing)
|
|
1149
|
+
- **Fallback chain**: Use Firecrawl fallback before provider-specific parsing
|
|
1150
|
+
(reduce API calls)
|
|
1151
|
+
|
|
1152
|
+
### Suggested Workflow
|
|
1153
|
+
|
|
1154
|
+
```bash
|
|
1155
|
+
# Initial run (expensive, one-time)
|
|
1156
|
+
pnpm cli enrich-ai -i normalized.json -o enriched.json --checkpoint-interval 100
|
|
1157
|
+
|
|
1158
|
+
# Later: weekly incremental updates (cheap)
|
|
1159
|
+
pnpm cli enrich-ai -i normalized.json -o enriched.json --incremental --resume
|
|
1160
|
+
|
|
1161
|
+
# Yearly: full re-enrichment with new models
|
|
1162
|
+
pnpm cli enrich-ai -i normalized.json -o enriched.json --force-refresh
|
|
1163
|
+
```
|
|
1164
|
+
|
|
1165
|
+
## Advanced Usage
|
|
1166
|
+
|
|
1167
|
+
### Merging Multiple CSV/DB Exports
|
|
1168
|
+
|
|
1169
|
+
Combine multiple conversations into a single timeline:
|
|
1170
|
+
|
|
1171
|
+
```bash
|
|
1172
|
+
# Ingest each conversation separately
|
|
1173
|
+
pnpm cli ingest-csv -i chat-with-alice.csv -o alice.ingested.json
|
|
1174
|
+
pnpm cli ingest-csv -i chat-with-bob.csv -o bob.ingested.json
|
|
1175
|
+
|
|
1176
|
+
# Merge into single normalized file
|
|
1177
|
+
pnpm cli normalize-link \
|
|
1178
|
+
-i alice.ingested.json bob.ingested.json \
|
|
1179
|
+
-o messages.normalized.json
|
|
1180
|
+
|
|
1181
|
+
# Enrich and render as single timeline
|
|
1182
|
+
pnpm cli enrich-ai -i messages.normalized.json -o messages.enriched.json
|
|
1183
|
+
pnpm cli render-markdown -i messages.enriched.json -o ./timeline
|
|
1184
|
+
```
|
|
1185
|
+
|
|
1186
|
+
### Selective Date Range Rendering
|
|
1187
|
+
|
|
1188
|
+
Render only recent messages:
|
|
1189
|
+
|
|
1190
|
+
```bash
|
|
1191
|
+
pnpm cli render-markdown \
|
|
1192
|
+
-i messages.enriched.json \
|
|
1193
|
+
-o ./timeline \
|
|
1194
|
+
--start-date 2025-10-01 \
|
|
1195
|
+
--end-date 2025-10-31
|
|
1196
|
+
```
|
|
1197
|
+
|
|
1198
|
+
### Upgrading Enrichment Models
|
|
1199
|
+
|
|
1200
|
+
Re-enrich with newer Gemini models:
|
|
1201
|
+
|
|
1202
|
+
```bash
|
|
1203
|
+
# Update config with new model
|
|
1204
|
+
# imessage-config.yaml:
|
|
1205
|
+
# gemini:
|
|
1206
|
+
# model: gemini-2-flash # (hypothetical future model)
|
|
1207
|
+
|
|
1208
|
+
# Re-enrich with new model
|
|
1209
|
+
pnpm cli enrich-ai \
|
|
1210
|
+
-i messages.normalized.json \
|
|
1211
|
+
-o messages.enriched.json \
|
|
1212
|
+
--force-refresh
|
|
1213
|
+
```
|
|
1214
|
+
|
|
1215
|
+
## Testing
|
|
1216
|
+
|
|
1217
|
+
See our testing guide for configuration rationale and patterns:
|
|
1218
|
+
`docs/testing-best-practices.md`.
|
|
1219
|
+
|
|
1220
|
+
The project includes 50+ test files covering:
|
|
1221
|
+
|
|
1222
|
+
- Schema validation (happy path + invariant violations)
|
|
1223
|
+
- CSV/DB ingestion (parsing, path resolution, date conversion)
|
|
1224
|
+
- Linking algorithms (reply matching, tapback association)
|
|
1225
|
+
- Deduplication (GUID matching, content equivalence)
|
|
1226
|
+
- Enrichment idempotency (skip logic, force-refresh)
|
|
1227
|
+
- Checkpoint recovery (save/load, config hash verification)
|
|
1228
|
+
- Rendering (grouping, sorting, determinism)
|
|
1229
|
+
|
|
1230
|
+
### Run Tests
|
|
1231
|
+
|
|
1232
|
+
```bash
|
|
1233
|
+
# All tests
|
|
1234
|
+
pnpm test
|
|
1235
|
+
|
|
1236
|
+
# Watch mode (re-run on file change)
|
|
1237
|
+
pnpm test --watch
|
|
1238
|
+
|
|
1239
|
+
# Coverage report
|
|
1240
|
+
pnpm test:coverage
|
|
1241
|
+
```
|
|
1242
|
+
|
|
1243
|
+
### Coverage
|
|
1244
|
+
|
|
1245
|
+
Maintained at **70%+ branch coverage**. Critical paths (linking, dedup,
|
|
1246
|
+
enrichment) at 95%+.
|
|
1247
|
+
|
|
1248
|
+
## Contributing
|
|
1249
|
+
|
|
1250
|
+
Contributions welcome! Please:
|
|
1251
|
+
|
|
1252
|
+
1. Fork the repository
|
|
1253
|
+
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
|
|
1254
|
+
3. Add tests for new functionality
|
|
1255
|
+
4. Ensure tests pass: `pnpm test`
|
|
1256
|
+
5. Format code: `pnpm format`
|
|
1257
|
+
6. Lint: `pnpm lint`
|
|
1258
|
+
7. **Create a changeset**: `pnpm version:gen` (for user-facing changes)
|
|
1259
|
+
8. Commit with semantic message: `git commit -m "feat: add amazing feature"`
|
|
1260
|
+
- ✅ Hooks automatically validate commit format and code quality
|
|
1261
|
+
9. Push and create a pull request
|
|
1262
|
+
|
|
1263
|
+
### Development Setup
|
|
1264
|
+
|
|
1265
|
+
```bash
|
|
1266
|
+
# Install with dev dependencies
|
|
1267
|
+
pnpm install
|
|
1268
|
+
|
|
1269
|
+
# Start watch mode (auto-rebuild TypeScript)
|
|
1270
|
+
pnpm watch
|
|
1271
|
+
|
|
1272
|
+
# Run tests in watch mode during development
|
|
1273
|
+
pnpm test --watch
|
|
1274
|
+
|
|
1275
|
+
# Check code quality before committing
|
|
1276
|
+
pnpm quality-check
|
|
1277
|
+
```
|
|
1278
|
+
|
|
1279
|
+
### Branch protection and local push guard
|
|
1280
|
+
|
|
1281
|
+
To configure GitHub branch protection on `main` using our aggregate gate:
|
|
1282
|
+
|
|
1283
|
+
```bash
|
|
1284
|
+
bash scripts/setup-branch-protection.sh nathanvale chatline main
|
|
1285
|
+
```
|
|
1286
|
+
|
|
1287
|
+
This script enables auto-merge, enforces required checks (PR quality / gate,
|
|
1288
|
+
Commitlint / commitlint, PR Title Lint / lint), requires signed commits, and
|
|
1289
|
+
keeps history linear.
|
|
1290
|
+
|
|
1291
|
+
Local safeguard: direct pushes to `main`/`master` are blocked by a Husky
|
|
1292
|
+
pre-push hook. Create a feature branch and open a PR instead.
|
|
1293
|
+
|
|
1294
|
+
If you absolutely must override locally (not recommended):
|
|
1295
|
+
|
|
1296
|
+
```bash
|
|
1297
|
+
ALLOW_PUSH_PROTECTED=1 git push
|
|
1298
|
+
```
|
|
1299
|
+
|
|
1300
|
+
### Code Style
|
|
1301
|
+
|
|
1302
|
+
- **TypeScript**: Strict mode, no `any`
|
|
1303
|
+
- **Formatting**: Prettier with 80-char line limit
|
|
1304
|
+
- **Linting**: ESLint with recommended rules
|
|
1305
|
+
- **Testing**: Vitest with 70%+ coverage threshold
|
|
1306
|
+
- **Commits**: Conventional commits (feat:, fix:, docs:, etc.)
|
|
1307
|
+
- See [Automated Release Workflow](./docs/automated-release-workflow.md) for
|
|
1308
|
+
commit format guide
|
|
1309
|
+
|
|
1310
|
+
### Release Process
|
|
1311
|
+
|
|
1312
|
+
This project uses **automated releases** with Changesets:
|
|
1313
|
+
|
|
1314
|
+
- **Create changeset** for user-facing changes: `pnpm version:gen`
|
|
1315
|
+
- **Commit messages** validated automatically via Husky + commitlint
|
|
1316
|
+
- **CI/CD** creates "Version Packages" PR when changesets are merged
|
|
1317
|
+
- **Publishing** happens automatically when version PR is merged
|
|
1318
|
+
|
|
1319
|
+
📚 **Full documentation:**
|
|
1320
|
+
|
|
1321
|
+
- **[Automated Release Workflow](./docs/automated-release-workflow.md)** - Main
|
|
1322
|
+
release process
|
|
1323
|
+
- **[Pre-Release Guide](./docs/pre-release-guide.md)** - Canary, beta, and RC
|
|
1324
|
+
releases
|
|
1325
|
+
|
|
1326
|
+
### Release Channels
|
|
1327
|
+
|
|
1328
|
+
We support prerelease channels for fast feedback and safe promotion:
|
|
1329
|
+
|
|
1330
|
+
- `next` for early adopters (canary builds)
|
|
1331
|
+
- `beta` for feature-complete testing
|
|
1332
|
+
- `rc` for release candidates
|
|
1333
|
+
- `canary` snapshots for experimental builds
|
|
1334
|
+
- `alpha` for automated nightly snapshots
|
|
1335
|
+
|
|
1336
|
+
**Quick commands:**
|
|
1337
|
+
|
|
1338
|
+
```bash
|
|
1339
|
+
# Publish quick snapshot (only when NOT in pre-mode)
|
|
1340
|
+
pnpm release:snapshot:canary
|
|
1341
|
+
|
|
1342
|
+
# Publish versioned pre-release (when in pre-mode)
|
|
1343
|
+
pnpm changeset # Create changeset
|
|
1344
|
+
pnpm changeset version # Version as 0.0.1-next.0
|
|
1345
|
+
pnpm publish:pre # Publish to @next tag
|
|
1346
|
+
|
|
1347
|
+
# Enter/exit pre-release mode
|
|
1348
|
+
gh workflow run pre-mode.yml -f action=enter -f channel=next
|
|
1349
|
+
gh workflow run pre-mode.yml -f action=exit -f channel=next
|
|
1350
|
+
```
|
|
1351
|
+
|
|
1352
|
+
> ⚠️ **Note:** Snapshot releases (`pnpm release:snapshot:canary`) only work when
|
|
1353
|
+
> NOT in pre-release mode. When in pre-mode, use versioned pre-releases instead.
|
|
1354
|
+
|
|
1355
|
+
📚 **See the detailed guides:**
|
|
1356
|
+
|
|
1357
|
+
- **[Pre-Release Guide](./docs/pre-release-guide.md)** - Step-by-step publishing
|
|
1358
|
+
instructions
|
|
1359
|
+
- **[Release Channels Strategy](./docs/release-channels.md)** - Architecture and
|
|
1360
|
+
promotion flows
|
|
1361
|
+
|
|
1362
|
+
### Package Hygiene
|
|
1363
|
+
|
|
1364
|
+
We enforce a tight publish surface and solid metadata:
|
|
1365
|
+
|
|
1366
|
+
- Validated with publint and AreTheTypesWrong
|
|
1367
|
+
- Minimal tarball via `files` whitelist
|
|
1368
|
+
- Provenance enabled for trusted builds
|
|
1369
|
+
|
|
1370
|
+
See the full checklist and how to run the checks:
|
|
1371
|
+
[Package Hygiene & Metadata Quality](./docs/package-hygiene.md)
|
|
1372
|
+
|
|
1373
|
+
### Prettier formatting
|
|
1374
|
+
|
|
1375
|
+
We use a minimal, opinionated Prettier setup:
|
|
1376
|
+
|
|
1377
|
+
- Global 80-char width, trailing commas, single quotes, no semicolons
|
|
1378
|
+
- Deterministic JSON sorting via plugin
|
|
1379
|
+
- Non-mutating check for CI/local validation
|
|
1380
|
+
|
|
1381
|
+
Docs and rationale:
|
|
1382
|
+
[Prettier Best Practices & Formatting Strategy](./docs/prettier-best-practices.md)
|
|
1383
|
+
|
|
1384
|
+
## Troubleshooting
|
|
1385
|
+
|
|
1386
|
+
### "API rate limit exceeded"
|
|
1387
|
+
|
|
1388
|
+
**Solution:** Increase `--rate-limit` delay
|
|
1389
|
+
|
|
1390
|
+
```bash
|
|
1391
|
+
pnpm cli enrich-ai -i messages.normalized.json -o enriched.json --rate-limit 2000
|
|
1392
|
+
```
|
|
1393
|
+
|
|
1394
|
+
### "Checkpoint config hash mismatch"
|
|
1395
|
+
|
|
1396
|
+
**Cause:** Changed enrichment config (API key, enable/disable analysis)
|
|
1397
|
+
**Solution:** Use `--reset-state` to clear or manually delete
|
|
1398
|
+
`.imessage-state.json`
|
|
1399
|
+
|
|
1400
|
+
```bash
|
|
1401
|
+
pnpm cli enrich-ai -i messages.normalized.json -o enriched.json --reset-state
|
|
1402
|
+
```
|
|
1403
|
+
|
|
1404
|
+
### "Attachment paths not resolved"
|
|
1405
|
+
|
|
1406
|
+
**Cause:** Media file not found in attachment directories **Check:**
|
|
1407
|
+
|
|
1408
|
+
1. Verify path in config (`attachmentRoots`)
|
|
1409
|
+
2. Check file exists on disk
|
|
1410
|
+
3. Check file permissions **Result:** Path stored as filename with provenance
|
|
1411
|
+
metadata
|
|
1412
|
+
|
|
1413
|
+
### "Validation errors in normalized.json"
|
|
1414
|
+
|
|
1415
|
+
**Debug:**
|
|
1416
|
+
|
|
1417
|
+
```bash
|
|
1418
|
+
pnpm cli validate -i messages.normalized.json -v
|
|
1419
|
+
# Shows which fields failed validation
|
|
1420
|
+
```
|
|
1421
|
+
|
|
1422
|
+
**Common causes:**
|
|
1423
|
+
|
|
1424
|
+
- Missing `messageKind` field
|
|
1425
|
+
- Date not in ISO 8601 UTC format
|
|
1426
|
+
- Inconsistent data types (string vs number)
|
|
1427
|
+
|
|
1428
|
+
Run `pnpm cli doctor` for system-level diagnostics.
|
|
1429
|
+
|
|
1430
|
+
## FAQ
|
|
1431
|
+
|
|
1432
|
+
**Q: Can I use this on Linux/Windows?** A: CSV ingestion works everywhere.
|
|
1433
|
+
Database ingestion requires macOS (to access Messages.app). You can export from
|
|
1434
|
+
macOS and process on other systems.
|
|
1435
|
+
|
|
1436
|
+
**Q: How much storage do the outputs take?** A: Enriched JSON is typically 2-3x
|
|
1437
|
+
original normalized JSON (due to enrichment data). Markdown files are 1-2x
|
|
1438
|
+
enriched JSON. A 1000-message conversation: ~5-10MB JSON, ~10-20MB markdown.
|
|
1439
|
+
|
|
1440
|
+
**Q: Can I re-use enriched.json if I change the render config?** A: Yes!
|
|
1441
|
+
Rendering is deterministic and config-independent. Change render settings
|
|
1442
|
+
(grouping, nesting depth) and re-render without re-enriching.
|
|
1443
|
+
|
|
1444
|
+
**Q: What if I don't have API keys?** A: Enrichment skips (messages remain
|
|
1445
|
+
as-is). Set `--enable-vision false --enable-audio false --enable-links false` to
|
|
1446
|
+
disable. Rendering still works perfectly without enrichment.
|
|
1447
|
+
|
|
1448
|
+
**Q: How do I update my timeline when new messages arrive?** A: Re-export from
|
|
1449
|
+
Messages.app/iMazing, then run the full pipeline OR use `--incremental --resume`
|
|
1450
|
+
to process only new messages (80%+ faster).
|
|
1451
|
+
|
|
1452
|
+
**Q: Is my data private?** A: Yes. All processing is local. API calls to
|
|
1453
|
+
Gemini/Firecrawl are necessary for enrichment but never persist to artifacts. No
|
|
1454
|
+
data retained after processing. Set API keys via environment variables (not in
|
|
1455
|
+
config files).
|
|
1456
|
+
|
|
1457
|
+
## Technical Details
|
|
1458
|
+
|
|
1459
|
+
### Schema Invariants
|
|
1460
|
+
|
|
1461
|
+
Messages enforce cross-field constraints via Zod `superRefine()`:
|
|
1462
|
+
|
|
1463
|
+
- `messageKind='media'` → `media` field must exist and be complete
|
|
1464
|
+
- `messageKind='tapback'` → `tapback` field must exist
|
|
1465
|
+
- `messageKind='text'|'notification'` → may have text, must not have
|
|
1466
|
+
media/tapback
|
|
1467
|
+
- All dates must be ISO 8601 with Z suffix
|
|
1468
|
+
|
|
1469
|
+
### Linking Heuristics
|
|
1470
|
+
|
|
1471
|
+
Reply linking uses a confidence-scoring algorithm:
|
|
1472
|
+
|
|
1473
|
+
1. Check DB association (if present, use immediately)
|
|
1474
|
+
2. Search ±30s timestamp window
|
|
1475
|
+
3. Score candidates:
|
|
1476
|
+
- Timestamp distance: closer = higher score
|
|
1477
|
+
- Text similarity: matching keywords = higher score
|
|
1478
|
+
- Sender difference: different person = higher score (likely replying)
|
|
1479
|
+
4. Select highest score (or log as ambiguous if tie)
|
|
1480
|
+
|
|
1481
|
+
### Deduplication Strategy
|
|
1482
|
+
|
|
1483
|
+
CSV/DB deduplication uses a multi-pass approach:
|
|
1484
|
+
|
|
1485
|
+
1. Exact GUID matching (primary)
|
|
1486
|
+
2. Content equivalence (fuzzy text + same sender + same timestamp)
|
|
1487
|
+
3. Prefer DB values in conflicts (authoritiveness)
|
|
1488
|
+
4. Sort by GUID for determinism
|
|
1489
|
+
|
|
1490
|
+
### Idempotency Design
|
|
1491
|
+
|
|
1492
|
+
Enrichment is idempotent via kind-based deduplication:
|
|
1493
|
+
|
|
1494
|
+
- Each enrichment entry has a `kind` (e.g., `'image_analysis'`,
|
|
1495
|
+
`'transcription'`)
|
|
1496
|
+
- Check if `kind` already exists before enriching
|
|
1497
|
+
- `forceRefresh` replaces specific kind (preserves others)
|
|
1498
|
+
- Result: Safe to re-run without duplicating enrichments
|
|
1499
|
+
|
|
1500
|
+
## Roadmap
|
|
1501
|
+
|
|
1502
|
+
- [ ] Support for WhatsApp, Telegram exports
|
|
1503
|
+
- [ ] Batch API calls to reduce Gemini quota usage
|
|
1504
|
+
- [ ] Vector embeddings for similarity search
|
|
1505
|
+
- [ ] Web UI for browsing/searching timeline
|
|
1506
|
+
- [ ] Obsidian plugin for live sync
|
|
1507
|
+
- [ ] Self-hosted LLM support (Ollama, etc.)
|
|
1508
|
+
- [ ] Photo gallery view alongside markdown
|
|
1509
|
+
- [ ] Sentiment analysis and conversation metrics
|
|
1510
|
+
- [ ] Anonymous mode (redact PII)
|
|
1511
|
+
|
|
1512
|
+
## License
|
|
1513
|
+
|
|
1514
|
+
MIT © 2025
|
|
1515
|
+
|
|
1516
|
+
See [LICENSE](LICENSE) file for full text.
|
|
1517
|
+
|
|
1518
|
+
## Related Projects
|
|
1519
|
+
|
|
1520
|
+
- [iMazing](https://imazing.com/) - CSV export source
|
|
1521
|
+
- [Firecrawl](https://www.firecrawl.dev/) - Link enrichment API
|
|
1522
|
+
- [Google Gemini](https://ai.google.dev/) - Image/audio analysis API
|
|
1523
|
+
- [Obsidian](https://obsidian.md/) - Markdown vault system
|
|
1524
|
+
|
|
1525
|
+
## Contact & Support
|
|
1526
|
+
|
|
1527
|
+
- **Issues & Bugs**:
|
|
1528
|
+
[GitHub Issues](https://github.com/yourusername/chatline/issues)
|
|
1529
|
+
- **Discussions**:
|
|
1530
|
+
[GitHub Discussions](https://github.com/yourusername/chatline/discussions)
|
|
1531
|
+
- **Email**: support@example.com (replace with actual contact)
|
|
1532
|
+
|
|
1533
|
+
---
|
|
1534
|
+
|
|
1535
|
+
**Enjoying iMessage Timeline?** Please star ⭐ the repo and share with friends!
|