@nathanvale/chatline 0.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (216) hide show
  1. package/CHANGELOG.md +1 -0
  2. package/LICENSE +21 -0
  3. package/README.md +1535 -0
  4. package/dist/bin/index.js +5121 -0
  5. package/dist/cli/commands/clean.d.ts +17 -0
  6. package/dist/cli/commands/clean.d.ts.map +1 -0
  7. package/dist/cli/commands/clean.js +142 -0
  8. package/dist/cli/commands/clean.js.map +1 -0
  9. package/dist/cli/commands/doctor.d.ts +17 -0
  10. package/dist/cli/commands/doctor.d.ts.map +1 -0
  11. package/dist/cli/commands/doctor.js +202 -0
  12. package/dist/cli/commands/doctor.js.map +1 -0
  13. package/dist/cli/commands/enrich-ai.d.ts +17 -0
  14. package/dist/cli/commands/enrich-ai.d.ts.map +1 -0
  15. package/dist/cli/commands/enrich-ai.js +371 -0
  16. package/dist/cli/commands/enrich-ai.js.map +1 -0
  17. package/dist/cli/commands/index.d.ts +16 -0
  18. package/dist/cli/commands/index.d.ts.map +1 -0
  19. package/dist/cli/commands/index.js +16 -0
  20. package/dist/cli/commands/index.js.map +1 -0
  21. package/dist/cli/commands/ingest-csv.d.ts +17 -0
  22. package/dist/cli/commands/ingest-csv.d.ts.map +1 -0
  23. package/dist/cli/commands/ingest-csv.js +138 -0
  24. package/dist/cli/commands/ingest-csv.js.map +1 -0
  25. package/dist/cli/commands/ingest-db.d.ts +17 -0
  26. package/dist/cli/commands/ingest-db.d.ts.map +1 -0
  27. package/dist/cli/commands/ingest-db.js +159 -0
  28. package/dist/cli/commands/ingest-db.js.map +1 -0
  29. package/dist/cli/commands/init.d.ts +17 -0
  30. package/dist/cli/commands/init.d.ts.map +1 -0
  31. package/dist/cli/commands/init.js +110 -0
  32. package/dist/cli/commands/init.js.map +1 -0
  33. package/dist/cli/commands/normalize-link.d.ts +16 -0
  34. package/dist/cli/commands/normalize-link.d.ts.map +1 -0
  35. package/dist/cli/commands/normalize-link.js +144 -0
  36. package/dist/cli/commands/normalize-link.js.map +1 -0
  37. package/dist/cli/commands/render-markdown.d.ts +17 -0
  38. package/dist/cli/commands/render-markdown.d.ts.map +1 -0
  39. package/dist/cli/commands/render-markdown.js +218 -0
  40. package/dist/cli/commands/render-markdown.js.map +1 -0
  41. package/dist/cli/commands/stats.d.ts +17 -0
  42. package/dist/cli/commands/stats.d.ts.map +1 -0
  43. package/dist/cli/commands/stats.js +175 -0
  44. package/dist/cli/commands/stats.js.map +1 -0
  45. package/dist/cli/commands/validate.d.ts +17 -0
  46. package/dist/cli/commands/validate.d.ts.map +1 -0
  47. package/dist/cli/commands/validate.js +152 -0
  48. package/dist/cli/commands/validate.js.map +1 -0
  49. package/dist/cli/index.d.ts +13 -0
  50. package/dist/cli/index.d.ts.map +1 -0
  51. package/dist/cli/index.js +121 -0
  52. package/dist/cli/index.js.map +1 -0
  53. package/dist/cli/types.d.ts +93 -0
  54. package/dist/cli/types.d.ts.map +1 -0
  55. package/dist/cli/types.js +7 -0
  56. package/dist/cli/types.js.map +1 -0
  57. package/dist/cli/utils.d.ts +29 -0
  58. package/dist/cli/utils.d.ts.map +1 -0
  59. package/dist/cli/utils.js +53 -0
  60. package/dist/cli/utils.js.map +1 -0
  61. package/dist/cli.d.ts +9 -0
  62. package/dist/cli.d.ts.map +1 -0
  63. package/dist/cli.js +1805 -0
  64. package/dist/config/generator.d.ts +90 -0
  65. package/dist/config/generator.d.ts.map +1 -0
  66. package/dist/config/generator.js +320 -0
  67. package/dist/config/generator.js.map +1 -0
  68. package/dist/config/loader.d.ts +107 -0
  69. package/dist/config/loader.d.ts.map +1 -0
  70. package/dist/config/loader.js +251 -0
  71. package/dist/config/loader.js.map +1 -0
  72. package/dist/config/schema.d.ts +107 -0
  73. package/dist/config/schema.d.ts.map +1 -0
  74. package/dist/config/schema.js +169 -0
  75. package/dist/config/schema.js.map +1 -0
  76. package/dist/enrich/audio-transcription.d.ts +77 -0
  77. package/dist/enrich/audio-transcription.d.ts.map +1 -0
  78. package/dist/enrich/audio-transcription.js +370 -0
  79. package/dist/enrich/audio-transcription.js.map +1 -0
  80. package/dist/enrich/checkpoint.d.ts +137 -0
  81. package/dist/enrich/checkpoint.d.ts.map +1 -0
  82. package/dist/enrich/checkpoint.js +205 -0
  83. package/dist/enrich/checkpoint.js.map +1 -0
  84. package/dist/enrich/idempotency.d.ts +90 -0
  85. package/dist/enrich/idempotency.d.ts.map +1 -0
  86. package/dist/enrich/idempotency.js +188 -0
  87. package/dist/enrich/idempotency.js.map +1 -0
  88. package/dist/enrich/image-analysis.d.ts +62 -0
  89. package/dist/enrich/image-analysis.d.ts.map +1 -0
  90. package/dist/enrich/image-analysis.js +264 -0
  91. package/dist/enrich/image-analysis.js.map +1 -0
  92. package/dist/enrich/index.d.ts +60 -0
  93. package/dist/enrich/index.d.ts.map +1 -0
  94. package/dist/enrich/index.js +74 -0
  95. package/dist/enrich/index.js.map +1 -0
  96. package/dist/enrich/link-enrichment.d.ts +37 -0
  97. package/dist/enrich/link-enrichment.d.ts.map +1 -0
  98. package/dist/enrich/link-enrichment.js +202 -0
  99. package/dist/enrich/link-enrichment.js.map +1 -0
  100. package/dist/enrich/pdf-video-handling.d.ts +49 -0
  101. package/dist/enrich/pdf-video-handling.d.ts.map +1 -0
  102. package/dist/enrich/pdf-video-handling.js +325 -0
  103. package/dist/enrich/pdf-video-handling.js.map +1 -0
  104. package/dist/enrich/progress-tracker.d.ts +120 -0
  105. package/dist/enrich/progress-tracker.d.ts.map +1 -0
  106. package/dist/enrich/progress-tracker.js +220 -0
  107. package/dist/enrich/progress-tracker.js.map +1 -0
  108. package/dist/enrich/providers/firecrawl.d.ts +18 -0
  109. package/dist/enrich/providers/firecrawl.d.ts.map +1 -0
  110. package/dist/enrich/providers/firecrawl.js +48 -0
  111. package/dist/enrich/providers/firecrawl.js.map +1 -0
  112. package/dist/enrich/providers/generic.d.ts +16 -0
  113. package/dist/enrich/providers/generic.d.ts.map +1 -0
  114. package/dist/enrich/providers/generic.js +36 -0
  115. package/dist/enrich/providers/generic.js.map +1 -0
  116. package/dist/enrich/providers/index.d.ts +14 -0
  117. package/dist/enrich/providers/index.d.ts.map +1 -0
  118. package/dist/enrich/providers/index.js +13 -0
  119. package/dist/enrich/providers/index.js.map +1 -0
  120. package/dist/enrich/providers/instagram.d.ts +16 -0
  121. package/dist/enrich/providers/instagram.d.ts.map +1 -0
  122. package/dist/enrich/providers/instagram.js +43 -0
  123. package/dist/enrich/providers/instagram.js.map +1 -0
  124. package/dist/enrich/providers/spotify.d.ts +16 -0
  125. package/dist/enrich/providers/spotify.d.ts.map +1 -0
  126. package/dist/enrich/providers/spotify.js +45 -0
  127. package/dist/enrich/providers/spotify.js.map +1 -0
  128. package/dist/enrich/providers/twitter.d.ts +16 -0
  129. package/dist/enrich/providers/twitter.d.ts.map +1 -0
  130. package/dist/enrich/providers/twitter.js +43 -0
  131. package/dist/enrich/providers/twitter.js.map +1 -0
  132. package/dist/enrich/providers/types.d.ts +47 -0
  133. package/dist/enrich/providers/types.d.ts.map +1 -0
  134. package/dist/enrich/providers/types.js +15 -0
  135. package/dist/enrich/providers/types.js.map +1 -0
  136. package/dist/enrich/providers/youtube.d.ts +16 -0
  137. package/dist/enrich/providers/youtube.d.ts.map +1 -0
  138. package/dist/enrich/providers/youtube.js +43 -0
  139. package/dist/enrich/providers/youtube.js.map +1 -0
  140. package/dist/enrich/rate-limiting.d.ts +118 -0
  141. package/dist/enrich/rate-limiting.d.ts.map +1 -0
  142. package/dist/enrich/rate-limiting.js +258 -0
  143. package/dist/enrich/rate-limiting.js.map +1 -0
  144. package/dist/index.d.ts +688 -0
  145. package/dist/index.d.ts.map +1 -0
  146. package/dist/index.js +1729 -0
  147. package/dist/index.js.map +1 -0
  148. package/dist/ingest/dedup-merge.d.ts +82 -0
  149. package/dist/ingest/dedup-merge.d.ts.map +1 -0
  150. package/dist/ingest/dedup-merge.js +262 -0
  151. package/dist/ingest/dedup-merge.js.map +1 -0
  152. package/dist/ingest/ingest-csv.d.ts +62 -0
  153. package/dist/ingest/ingest-csv.d.ts.map +1 -0
  154. package/dist/ingest/ingest-csv.js +300 -0
  155. package/dist/ingest/ingest-csv.js.map +1 -0
  156. package/dist/ingest/ingest-db.d.ts +64 -0
  157. package/dist/ingest/ingest-db.d.ts.map +1 -0
  158. package/dist/ingest/ingest-db.js +172 -0
  159. package/dist/ingest/ingest-db.js.map +1 -0
  160. package/dist/ingest/link-replies-and-tapbacks.d.ts +53 -0
  161. package/dist/ingest/link-replies-and-tapbacks.d.ts.map +1 -0
  162. package/dist/ingest/link-replies-and-tapbacks.js +381 -0
  163. package/dist/ingest/link-replies-and-tapbacks.js.map +1 -0
  164. package/dist/normalize/date-converters.d.ts +45 -0
  165. package/dist/normalize/date-converters.d.ts.map +1 -0
  166. package/dist/normalize/date-converters.js +166 -0
  167. package/dist/normalize/date-converters.js.map +1 -0
  168. package/dist/normalize/path-validator.d.ts +65 -0
  169. package/dist/normalize/path-validator.d.ts.map +1 -0
  170. package/dist/normalize/path-validator.js +221 -0
  171. package/dist/normalize/path-validator.js.map +1 -0
  172. package/dist/normalize/validate-normalized.d.ts +45 -0
  173. package/dist/normalize/validate-normalized.d.ts.map +1 -0
  174. package/dist/normalize/validate-normalized.js +144 -0
  175. package/dist/normalize/validate-normalized.js.map +1 -0
  176. package/dist/render/embeds-blockquotes.d.ts +84 -0
  177. package/dist/render/embeds-blockquotes.d.ts.map +1 -0
  178. package/dist/render/embeds-blockquotes.js +204 -0
  179. package/dist/render/embeds-blockquotes.js.map +1 -0
  180. package/dist/render/grouping.d.ts +78 -0
  181. package/dist/render/grouping.d.ts.map +1 -0
  182. package/dist/render/grouping.js +134 -0
  183. package/dist/render/grouping.js.map +1 -0
  184. package/dist/render/index.d.ts +47 -0
  185. package/dist/render/index.d.ts.map +1 -0
  186. package/dist/render/index.js +245 -0
  187. package/dist/render/index.js.map +1 -0
  188. package/dist/render/reply-rendering.d.ts +88 -0
  189. package/dist/render/reply-rendering.d.ts.map +1 -0
  190. package/dist/render/reply-rendering.js +196 -0
  191. package/dist/render/reply-rendering.js.map +1 -0
  192. package/dist/schema/message.d.ts +125 -0
  193. package/dist/schema/message.d.ts.map +1 -0
  194. package/dist/schema/message.js +331 -0
  195. package/dist/schema/message.js.map +1 -0
  196. package/dist/utils/delta-detection.d.ts +107 -0
  197. package/dist/utils/delta-detection.d.ts.map +1 -0
  198. package/dist/utils/delta-detection.js +199 -0
  199. package/dist/utils/delta-detection.js.map +1 -0
  200. package/dist/utils/enrichment-merge.d.ts +135 -0
  201. package/dist/utils/enrichment-merge.d.ts.map +1 -0
  202. package/dist/utils/enrichment-merge.js +280 -0
  203. package/dist/utils/enrichment-merge.js.map +1 -0
  204. package/dist/utils/human.d.ts +15 -0
  205. package/dist/utils/human.d.ts.map +1 -0
  206. package/dist/utils/human.js +27 -0
  207. package/dist/utils/human.js.map +1 -0
  208. package/dist/utils/incremental-state.d.ts +133 -0
  209. package/dist/utils/incremental-state.d.ts.map +1 -0
  210. package/dist/utils/incremental-state.js +237 -0
  211. package/dist/utils/incremental-state.js.map +1 -0
  212. package/dist/utils/logger.d.ts +40 -0
  213. package/dist/utils/logger.d.ts.map +1 -0
  214. package/dist/utils/logger.js +176 -0
  215. package/dist/utils/logger.js.map +1 -0
  216. package/package.json +165 -0
package/README.md ADDED
@@ -0,0 +1,1535 @@
1
+ # Chatline
2
+
3
+ > Extract, enrich, and render your iMessage conversations into beautiful,
4
+ > AI-powered markdown timelines with full conversation threading and deep media
5
+ > analysis.
6
+
7
+ [![TypeScript](https://img.shields.io/badge/TypeScript-5.9+-3178c6?logo=typescript)](https://www.typescriptlang.org/)
8
+ [![Node.js](https://img.shields.io/badge/Node.js-22%2B-339933?logo=nodedotjs)](https://nodejs.org/)
9
+ [![License](https://img.shields.io/badge/License-MIT-blue)](#license)
10
+ [![CI](https://github.com/nathanvale/chatline/actions/workflows/pr-quality.yml/badge.svg?branch=main)](https://github.com/nathanvale/chatline/actions/workflows/pr-quality.yml)
11
+ [![CodeQL](https://github.com/nathanvale/chatline/actions/workflows/codeql.yml/badge.svg?branch=main)](https://github.com/nathanvale/chatline/actions/workflows/codeql.yml)
12
+ [![Tests](https://img.shields.io/badge/Tests-50%2B-brightgreen)](#testing)
13
+ [![Coverage](https://img.shields.io/badge/Coverage-70%25%2B-brightgreen)](#testing)
14
+
15
+ ## Overview
16
+
17
+ **iMessage Timeline** is a sophisticated data pipeline that transforms your
18
+ iMessage conversations into searchable, enriched markdown timelines. It
19
+ intelligently extracts messages from multiple sources (iMazing CSV exports,
20
+ macOS Messages.app SQLite database), deduplicates and links replies/reactions,
21
+ enriches with AI-powered analysis (image descriptions, audio transcription, link
22
+ summaries), and generates deterministic markdown files organized by date and
23
+ time-of-day.
24
+
25
+ Perfect for creating browsable conversation archives, enriched research notes,
26
+ or personal history exports.
27
+
28
+ ## 📚 Documentation
29
+
30
+ **Full documentation available at:**
31
+ [https://nathanvale.github.io/chatline/](https://nathanvale.github.io/chatline/)
32
+
33
+ The documentation site includes:
34
+
35
+ - Getting Started Guide
36
+ - CLI Usage
37
+ - Pipeline Technical Specifications
38
+ - Best Practices
39
+ - Release Management
40
+ - Troubleshooting
41
+
42
+ ### Automated Quality & Security
43
+
44
+ This repository is continuously checked by:
45
+
46
+ - **PR quality** workflow: lint, typecheck, full Vitest suite + V8 coverage,
47
+ delta quality checks
48
+ - **CodeQL**: static analysis for code vulnerabilities (JS/TS)
49
+ - **OSV Scanner**: open source vulnerability scanning of dependencies
50
+ - **Workflow Lint**: actionlint validation of GitHub Actions syntax
51
+ - **Dependency Review**: flags risky transitive additions on PRs
52
+ - **Package Hygiene**: build integrity (`publint`, type checks, dry pack)
53
+ - **Renovate**: automated npm/pnpm dependency maintenance (grouped, safe
54
+ automerge for dev minor/patch)
55
+ - **Dependabot**: weekly GitHub Actions version bumps only
56
+
57
+ Badges surface current status; failing checks block merges ensuring a
58
+ high-signal, low-noise pipeline.
59
+
60
+ ### Releases and automation
61
+
62
+ - Zero‑touch, canonical Changesets flow: CI opens a "Version Packages" PR,
63
+ auto‑merges it after required checks, publishes to npm (with tags), and
64
+ creates GitHub Releases with SBOM.
65
+ - Pre‑releases: enter/exit beta/rc via a workflow; CI publishes -beta.N/-rc.N to
66
+ the matching npm dist‑tag.
67
+ - Nightly snapshots (alpha): gated to run only while pre‑mode is active to avoid
68
+ confusion with stable releases.
69
+
70
+ #### Publishing and NPM_TOKEN
71
+
72
+ This repo uses Changesets to open a "Version Packages" PR and publish on merge.
73
+
74
+ - If an npm token is configured (repository secret `NPM_TOKEN`):
75
+ - The Changesets workflow authenticates to npm and runs `pnpm release`
76
+ - A GitHub Release is created for the tagged version
77
+ - If no npm token is configured:
78
+ - The workflow does NOT fail main
79
+ - It surfaces a clear warning in the logs and Job Summary:
80
+ - "NPM_TOKEN not set; skipping publish. Configure repository secret
81
+ 'NPM_TOKEN' to enable publishing."
82
+ - Versioning PR behavior still works; only publish is skipped
83
+
84
+ How to enable publishing:
85
+
86
+ 1. Create an npm access token with publish rights
87
+ 2. In GitHub → Settings → Secrets and variables → Actions → New repository
88
+ secret
89
+ 3. Name: `NPM_TOKEN`, Value: your token
90
+ 4. Re-run the "Changesets Manage & Publish" workflow on `main`
91
+
92
+ Tip: You can trigger the workflow manually from the Actions tab or by:
93
+
94
+ ```bash
95
+ gh workflow run "Changesets Manage & Publish" -r main
96
+ ```
97
+
98
+ Note: The workflow also posts an annotation and adds a short section to the Job
99
+ Summary whenever publish is skipped so it's visible in the GitHub UI without
100
+ failing the pipeline.
101
+
102
+ Docs:
103
+
104
+ - Canonical flow: `docs/releases/changesets-canonical.md`
105
+ - Branch protection policy: `docs/branch-protection-policy.md`
106
+ - CI standards: `docs/ci-workflow-standards.md`
107
+
108
+ ### CI performance (optional)
109
+
110
+ - setup-node already caches pnpm. If installs become a bottleneck, consider
111
+ adding a pnpm store cache step to PR quality (restore/save `.pnpm-store`). We
112
+ can wire this later once baseline run times are known.
113
+
114
+ ## CLI
115
+
116
+ The project publishes a CLI executable `chatline` and provides a fast
117
+ Bun-powered development loop.
118
+
119
+ - Dev (TypeScript direct): `pnpm dev -- --help`
120
+ - Built dist run: `pnpm cli -- --help`
121
+ - Installed (after publish): `chatline --help`
122
+
123
+ Docs:
124
+
125
+ - Detailed usage: `docs/cli-usage.md`
126
+ - Bun script rationale: `docs/bun-script-best-practices.md`
127
+
128
+ Pass arguments after `--` when using `pnpm dev` or `pnpm cli`.
129
+
130
+ ### Key Features
131
+
132
+ - **Multiple Sources**: Ingest from iMazing CSV exports and macOS Messages.app
133
+ database
134
+ - **Intelligent Linking**: Automatically link replies to parents and associate
135
+ emoji reactions (tapbacks)
136
+ - **Smart Deduplication**: Merge CSV/DB sources with GUID matching and content
137
+ equivalence detection
138
+ - **AI Enrichment**:
139
+ - Image analysis (HEIC/TIFF→JPG previews + Gemini Vision captions)
140
+ - Audio transcription (with speaker labels and timestamps)
141
+ - PDF summarization
142
+ - Link context extraction (Firecrawl + provider fallbacks)
143
+ - **Resumable Processing**: Checkpoint support for crash recovery and
144
+ incremental enrichment for processing only new messages
145
+ - **Deterministic Output**: Identical input always produces identical markdown
146
+ (reproducible pipelines)
147
+ - **Privacy-First**: Local-only mode, no API key persistence, full data control
148
+ - **Conversation Threading**: Nested replies and tapbacks rendered as readable
149
+ blockquotes
150
+ - **Type-Safe**: 100% TypeScript with Zod schema validation
151
+
152
+ ## For End Users
153
+
154
+ ### Installation & Setup
155
+
156
+ #### Prerequisites
157
+
158
+ - **Node.js** 22.20+
159
+ - **macOS** (for database export; CSV import works on any OS)
160
+ - **Gemini API Key** (for AI enrichment, get free at
161
+ https://aistudio.google.com)
162
+ - **Firecrawl API Key** (optional, for link enrichment, get at
163
+ https://www.firecrawl.dev)
164
+
165
+ #### Install Global CLI
166
+
167
+ ```bash
168
+ npm install -g /chatline
169
+ ```
170
+
171
+ This installs the `chatline` command globally, available from any
172
+ directory.
173
+
174
+ #### Environment Setup
175
+
176
+ Create a `.env` file in your working directory:
177
+
178
+ ```bash
179
+ GEMINI_API_KEY=your-api-key-here
180
+ FIRECRAWL_API_KEY=your-api-key-here
181
+ ```
182
+
183
+ Or export them in your shell:
184
+
185
+ ```bash
186
+ export GEMINI_API_KEY=your-api-key-here
187
+ export FIRECRAWL_API_KEY=your-api-key-here
188
+ ```
189
+
190
+ #### Verify Installation
191
+
192
+ ```bash
193
+ chatline doctor
194
+ ```
195
+
196
+ Should show all checks passing.
197
+
198
+ ### Quick Start (Consumer)
199
+
200
+ ```bash
201
+ # Initialize config (creates imessage-config.yaml)
202
+ chatline init
203
+
204
+ # Ingest CSV export from iMazing
205
+ chatline ingest-csv -i messages.csv -o messages.csv.ingested.json
206
+
207
+ # Ingest from macOS Messages.app database
208
+ chatline ingest-db -i db-export.json -o messages.db.ingested.json
209
+
210
+ # Normalize and link messages (merge sources, deduplicate, link replies)
211
+ chatline normalize-link \
212
+ -i messages.csv.ingested.json messages.db.ingested.json \
213
+ -o messages.normalized.json
214
+
215
+ # Enrich with AI (images, audio, links)
216
+ chatline enrich-ai \
217
+ -i messages.normalized.json \
218
+ -o messages.enriched.json \
219
+ --enable-vision --enable-audio --enable-links
220
+
221
+ # Render to markdown
222
+ chatline render-markdown \
223
+ -i messages.enriched.json \
224
+ -o ./timeline
225
+ ```
226
+
227
+ Output: A `timeline/` directory with daily markdown files, one per date.
228
+
229
+ ## Library Usage (Programmatic API)
230
+
231
+ ### Installation as Dependency
232
+
233
+ Install as a library in your Node.js/TypeScript project:
234
+
235
+ ```bash
236
+ npm install /chatline
237
+ # or
238
+ pnpm add /chatline
239
+ # or
240
+ yarn add /chatline
241
+ ```
242
+
243
+ ### TypeScript/JavaScript Import
244
+
245
+ ```typescript
246
+ import {
247
+ // Config Management
248
+ loadConfig,
249
+ generateConfigContent,
250
+ validateConfig,
251
+
252
+ // Ingest Functions
253
+ ingestCSV,
254
+ dedupAndMerge,
255
+
256
+ // Utilities
257
+ detectDelta,
258
+ mergeEnrichments,
259
+
260
+ // Rate Limiting
261
+ createRateLimiter,
262
+
263
+ // Types
264
+ type Message,
265
+ type Config,
266
+ type DeltaResult,
267
+ } from '@nathanvale/chatline'
268
+ ```
269
+
270
+ ### Example: Load and Validate Config
271
+
272
+ ```typescript
273
+ import { loadConfig, validateConfig } from '@nathanvale/chatline'
274
+
275
+ // Load config with auto-discovery (looks for imessage-config.yaml/json)
276
+ const config = await loadConfig()
277
+
278
+ // Load specific config file
279
+ const config = await loadConfig({ configPath: './custom-config.yaml' })
280
+
281
+ // Validate existing config object
282
+ const validated = validateConfig({
283
+ gemini: { apiKey: 'your-key' },
284
+ inputs: { csv: ['messages.csv'] },
285
+ })
286
+ ```
287
+
288
+ ### Example: Ingest Messages from CSV
289
+
290
+ ```typescript
291
+ import { ingestCSV, createExportEnvelope } from '@nathanvale/chatline'
292
+ import type { Message, IngestOptions } from '@nathanvale/chatline'
293
+
294
+ const options: IngestOptions = {
295
+ attachmentDir: '/path/to/attachments',
296
+ strictMode: false,
297
+ }
298
+
299
+ const messages: Message[] = ingestCSV('./messages.csv', options)
300
+
301
+ // Wrap in export envelope
302
+ const envelope = createExportEnvelope(messages)
303
+ console.log(`Ingested ${envelope.totalMessages} messages`)
304
+ ```
305
+
306
+ ### Example: Deduplicate and Merge Sources
307
+
308
+ ```typescript
309
+ import { dedupAndMerge } from '@nathanvale/chatline'
310
+ import type { Message } from '@nathanvale/chatline'
311
+
312
+ const csvMessages: Message[] = ingestCSV('./messages.csv', options)
313
+ const dbMessages: Message[] = JSON.parse(
314
+ fs.readFileSync('./db-export.json', 'utf-8'),
315
+ )
316
+
317
+ const result = dedupAndMerge(csvMessages, dbMessages)
318
+
319
+ console.log(`Merged ${result.mergedCount} messages`)
320
+ console.log(`Found ${result.stats.exactMatches} exact matches`)
321
+ console.log(`Deduped ${result.stats.duplicatesRemoved} duplicates`)
322
+ ```
323
+
324
+ ### Example: Detect New Messages (Incremental Processing)
325
+
326
+ ```typescript
327
+ import { detectDelta, extractGuidsFromMessages } from '@nathanvale/chatline'
328
+ import type { Message, DeltaResult } from '@nathanvale/chatline'
329
+
330
+ const currentMessages: Message[] = loadCurrentMessages()
331
+ const previousMessages: Message[] = loadPreviousCheckpoint()
332
+
333
+ const delta: DeltaResult = detectDelta(currentMessages, previousMessages)
334
+
335
+ console.log(`New messages: ${delta.new.length}`)
336
+ console.log(`Modified messages: ${delta.changed.length}`)
337
+ console.log(`Removed messages: ${delta.removed.length}`)
338
+
339
+ // Process only new messages
340
+ const newGuids = extractGuidsFromMessages(delta.new)
341
+ await enrichOnlyNew(newGuids)
342
+ ```
343
+
344
+ ### Example: Rate Limiting for API Calls
345
+
346
+ ```typescript
347
+ import { createRateLimiter } from '@nathanvale/chatline'
348
+ import type { RateLimitConfig } from '@nathanvale/chatline'
349
+
350
+ const limiter = createRateLimiter({
351
+ requestsPerSecond: 10,
352
+ maxRetries: 3,
353
+ retryDelayMs: 1000,
354
+ })
355
+
356
+ // Use with fetch or any async API call
357
+ const response = await limiter.execute(async () => {
358
+ return fetch('https://api.example.com/data')
359
+ })
360
+
361
+ console.log(`Status: ${response.status}`)
362
+ ```
363
+
364
+ ### Example: Generate Config Programmatically
365
+
366
+ ```typescript
367
+ import { generateConfigContent, getDefaultConfigPath } from '@nathanvale/chatline'
368
+ import fs from 'node:fs/promises'
369
+
370
+ // Generate YAML config with defaults
371
+ const yamlContent = generateConfigContent('yaml')
372
+ const configPath = getDefaultConfigPath('yaml')
373
+
374
+ await fs.writeFile(configPath, yamlContent, 'utf-8')
375
+ console.log(`Config written to ${configPath}`)
376
+
377
+ // Generate JSON config
378
+ const jsonContent = generateConfigContent('json')
379
+ await fs.writeFile('./imessage-config.json', jsonContent, 'utf-8')
380
+ ```
381
+
382
+ ### TypeScript Type Definitions
383
+
384
+ The package includes full TypeScript definitions for all exports:
385
+
386
+ ```typescript
387
+ import type {
388
+ // Core message types
389
+ Message,
390
+ MessageCore,
391
+ MediaMeta,
392
+ MediaEnrichment,
393
+ ReplyInfo,
394
+ TapbackInfo,
395
+
396
+ // Config types
397
+ Config,
398
+ ConfigFormat,
399
+
400
+ // Utility types
401
+ DeltaResult,
402
+ MergeStats,
403
+ IngestMergeResult,
404
+ EnrichmentMergeResult,
405
+
406
+ // Rate limiting types
407
+ RateLimitConfig,
408
+ RateLimitState,
409
+ ApiResponse,
410
+ } from '@nathanvale/chatline'
411
+ ```
412
+
413
+ ### Advanced: Custom Pipeline
414
+
415
+ ```typescript
416
+ import {
417
+ loadConfig,
418
+ ingestCSV,
419
+ dedupAndMerge,
420
+ detectDelta,
421
+ mergeEnrichments,
422
+ createRateLimiter,
423
+ } from '@nathanvale/chatline'
424
+ import type { Message, Config } from '@nathanvale/chatline'
425
+
426
+ async function runCustomPipeline() {
427
+ // 1. Load configuration
428
+ const config: Config = await loadConfig()
429
+
430
+ // 2. Ingest from multiple sources
431
+ const csvMessages = ingestCSV(config.inputs.csv[0], {
432
+ attachmentDir: config.paths?.attachmentRoot,
433
+ strictMode: false,
434
+ })
435
+
436
+ const dbMessages = JSON.parse(await fs.readFile(config.inputs.db, 'utf-8'))
437
+
438
+ // 3. Merge and deduplicate
439
+ const merged = dedupAndMerge(csvMessages, dbMessages)
440
+ console.log(`Merged to ${merged.mergedCount} unique messages`)
441
+
442
+ // 4. Detect changes since last run
443
+ const previous = await loadPreviousState()
444
+ const delta = detectDelta(merged.messages, previous)
445
+
446
+ // 5. Enrich only new messages with rate limiting
447
+ const limiter = createRateLimiter({ requestsPerSecond: 5 })
448
+
449
+ for (const message of delta.new) {
450
+ if (message.media?.kind === 'image') {
451
+ const enrichment = await limiter.execute(() =>
452
+ enrichImageWithGemini(message),
453
+ )
454
+ message.media.enrichment = enrichment
455
+ }
456
+ }
457
+
458
+ // 6. Merge enrichments back into full dataset
459
+ const enriched = mergeEnrichments(merged.messages, delta.new)
460
+
461
+ // 7. Save checkpoint
462
+ await saveState(enriched.messages)
463
+
464
+ return enriched
465
+ }
466
+ ```
467
+
468
+ ### See Also
469
+
470
+ - **CLI Usage**: See `docs/cli-usage.md` for command-line interface examples
471
+ - **Dual Distribution**: See `docs/dual-mode-distribution-best-practices.md` for
472
+ packaging details
473
+ - **API Documentation**: See generated TypeDoc output (coming soon)
474
+
475
+ ## For Developers
476
+
477
+ ### Development Setup
478
+
479
+ #### Clone & Install
480
+
481
+ ```bash
482
+ # Clone the repository
483
+ git clone https://github.com/yourusername/chatline.git
484
+ cd chatline
485
+
486
+ # Install dependencies
487
+ pnpm install
488
+
489
+ # Build TypeScript
490
+ pnpm build
491
+ ```
492
+
493
+ #### Local CLI Development
494
+
495
+ During development, run the CLI directly from TypeScript via Bun, or run from
496
+ the built `dist` output:
497
+
498
+ ```bash
499
+ # Fast dev (TypeScript direct via Bun)
500
+ pnpm dev -- --help
501
+ pnpm dev -- --config examples/imessage-config.yaml
502
+
503
+ # Or build, then run individual commands from dist
504
+ pnpm build
505
+ pnpm cli doctor
506
+ pnpm cli ingest-csv -i messages.csv -o output.json
507
+
508
+ # Watch mode for development (typecheck/build)
509
+ pnpm watch
510
+ ```
511
+
512
+ #### Running Tests
513
+
514
+ ```bash
515
+ # Run all tests
516
+ pnpm test
517
+
518
+ # Watch mode
519
+ pnpm test:watch
520
+
521
+ # With UI
522
+ pnpm test:ui
523
+
524
+ # Coverage report
525
+ pnpm coverage
526
+ ```
527
+
528
+ #### Code Quality
529
+
530
+ ```bash
531
+ # Lint code
532
+ pnpm lint
533
+ pnpm lint:fix
534
+
535
+ # Format code
536
+ pnpm format
537
+
538
+ # Run quality checks (pre-commit hook)
539
+ pnpm quality-check
540
+ ```
541
+
542
+ ### Bun-powered dev and tooling
543
+
544
+ This project keeps pnpm as the package manager and Vitest on Node for stable
545
+ tests, while using Bun for fast local development and tooling:
546
+
547
+ - Primary (CI/stable): `pnpm build`, `pnpm test`, `pnpm test:ci`
548
+ - Local convenience (Bun):
549
+ - `pnpm dev` – run the CLI from TypeScript via Bun
550
+ - `pnpm typecheck` – no-emit typechecking via `bunx tsc`
551
+ - `pnpm lint` / `pnpm lint:fix` – via `bunx eslint`
552
+ - `pnpm format` – via `bunx prettier`
553
+
554
+ Notes:
555
+
556
+ - Native addons (sharp, better-sqlite3) may build from source under Bun; Node
557
+ remains the primary test substrate.
558
+ - Firecrawl SDK is compatible with Bun; the MCP server and tests remain on Node.
559
+
560
+ ## Architecture
561
+
562
+ The pipeline follows a strict **4-stage architecture** with clear separation of
563
+ concerns:
564
+
565
+ ```
566
+ CSV/DB Exports
567
+
568
+ ├─────────────────┬──────────────────┐
569
+ ▼ ▼ ▼
570
+ Ingest-CSV Ingest-DB (Other sources)
571
+ │ │ │
572
+ └─────────────────┼──────────────────┘
573
+
574
+ [Stage 1: Ingest]
575
+ Parse & normalize
576
+
577
+ messages.*.ingested.json
578
+
579
+
580
+ [Stage 2: Normalize-Link]
581
+ Deduplicate, link replies/tapbacks
582
+
583
+ messages.normalized.json
584
+
585
+
586
+ [Stage 3: Enrich-AI] ◄── Resumable & Incremental
587
+ Add AI enrichments
588
+
589
+ messages.enriched.json
590
+
591
+
592
+ [Stage 4: Render-Markdown]
593
+ Generate daily files
594
+
595
+ timeline/*.md (output)
596
+ ```
597
+
598
+ ### Stage 1: Ingest
599
+
600
+ Extracts messages from CSV or SQLite database and normalizes to a unified
601
+ schema.
602
+
603
+ **Responsibilities:**
604
+
605
+ - Parse rows with field mapping (handle CSV/DB dialect differences)
606
+ - Convert dates (CSV UTC → ISO 8601, Apple epoch → ISO 8601)
607
+ - Split rows into `text`/`media`/`notification`/`tapback` messages
608
+ - Resolve attachment paths to absolute paths when possible
609
+ - Create stable part GUIDs for multi-attachment DB messages:
610
+ `p:<index>/<original_guid>`
611
+ - Preserve source metadata (CSV vs DB origin)
612
+
613
+ **Input:** iMazing CSV or Messages.app SQLite database **Output:** Normalized
614
+ `Message[]` in JSON envelope with metadata
615
+
616
+ ### Stage 2: Normalize-Link
617
+
618
+ Merges multiple sources, deduplicates, links replies/tapbacks, and validates
619
+ schema.
620
+
621
+ **Responsibilities:**
622
+
623
+ - Link replies to parents:
624
+ - Primary: DB `association_guid` (database-native association)
625
+ - Fallback: Heuristics (±30s timestamp proximity, text similarity, sender
626
+ difference)
627
+ - Link tapbacks (emoji reactions) to message parts
628
+ - Deduplicate across CSV/DB sources:
629
+ - Exact GUID matching (primary)
630
+ - Content equivalence (fuzzy text match, same sender, same timestamp)
631
+ - Prefer DB-sourced data in conflicts (DB is authoritative for timestamps,
632
+ handles, etc.)
633
+ - Enforce schema via Zod validation (camelCase, type correctness)
634
+
635
+ **Algorithm Complexity:** O(n log n) for deduplication with GUID indexing
636
+
637
+ **Input:** One or both ingest outputs **Output:** Merged, deduplicated, linked
638
+ `messages.normalized.json`
639
+
640
+ ### Stage 3: Enrich-AI
641
+
642
+ Augments messages with AI-powered analysis. Fully resumable and idempotent.
643
+
644
+ **Responsibilities:**
645
+
646
+ - Image analysis:
647
+ - Convert HEIC/TIFF to JPG preview (cached by filename)
648
+ - Gemini Vision API: structured prompt for caption + summary
649
+ - Audio transcription:
650
+ - Structured prompt requesting timestamps and speaker labels
651
+ - Handles long audio with chunking (streaming for >10min files)
652
+ - PDF summarization (key points extraction)
653
+ - Link enrichment:
654
+ - Firecrawl for full web scraping
655
+ - Provider-specific fallbacks (YouTube, Spotify, Twitter, Instagram)
656
+ - Generic HTML meta tag fallback
657
+ - Graceful degradation (never crashes, stores error in enrichment)
658
+ - Idempotent processing (skip if enrichment kind already exists)
659
+ - Checkpointing (save progress every N items)
660
+ - Resumable (load checkpoint, verify config hash, continue from last index)
661
+ - Rate limiting (jittered backoff for API limits)
662
+ - Incremental mode (process only new message GUIDs vs prior state)
663
+
664
+ **Idempotency Key:** `(message.media.id, enrichment.kind)`
665
+
666
+ **Input:** `messages.normalized.json`, optional checkpoint/state files
667
+ **Output:** `messages.enriched.json` with populated `media.enrichment[]` arrays
668
+
669
+ ### Stage 4: Render-Markdown
670
+
671
+ Generates deterministic daily markdown files organized by date and time-of-day.
672
+
673
+ **Responsibilities:**
674
+
675
+ - Group messages by calendar date
676
+ - Sub-group by time-of-day sections (Morning 00:00-11:59, Afternoon 12:00-17:59,
677
+ Evening 18:00-23:59)
678
+ - Render each message with:
679
+ - Timestamp anchor for deep linking
680
+ - Sender name / "Me" indicator
681
+ - Message text or media preview
682
+ - Enrichments (image captions, transcriptions, link contexts) as formatted
683
+ blockquotes
684
+ - Render replies as nested blockquotes (up to configurable depth)
685
+ - Render tapbacks as emoji reactions (❤️ for "loved", etc.)
686
+ - Deterministic sorting by `(date, guid)` for reproducibility
687
+
688
+ **Determinism:** Identical input → identical output. No randomization, stable
689
+ key ordering.
690
+
691
+ **Input:** `messages.enriched.json` **Output:** Daily markdown files
692
+ (`timeline/YYYY-MM-DD.md`)
693
+
694
+ ## Message Schema
695
+
696
+ The unified `Message` type represents all message kinds with a discriminated
697
+ union:
698
+
699
+ ```typescript
700
+ type Message = {
701
+ guid: string // Unique identifier
702
+ messageKind: 'text' | 'media' | 'tapback' | 'notification'
703
+ date: string // ISO 8601 with Z suffix (UTC)
704
+ isFromMe: boolean
705
+
706
+ // Optional fields by kind
707
+ text?: string // For text/notification messages
708
+ media?: MediaMeta // For media messages (see below)
709
+ tapback?: TapbackInfo // For tapback messages
710
+
711
+ // Linking
712
+ replyingTo?: ReplyInfo // Links to parent message GUID
713
+
714
+ // Metadata
715
+ service: string // SMS, iMessage, etc.
716
+ handle?: string // Phone number or Apple ID
717
+ senderName?: string // Display name
718
+ groupGuid?: string // For split messages, original DB GUID
719
+
720
+ // Preservation fields
721
+ subject?: string
722
+ isAudioMessage?: boolean
723
+ isDeleted?: boolean
724
+
725
+ // Provenance
726
+ sourceType?: 'csv' | 'db'
727
+ sourceMetadata?: Record<string, unknown>
728
+ }
729
+
730
+ type MediaMeta = {
731
+ id: string // Unique media ID
732
+ type: 'image' | 'audio' | 'pdf' | 'video' | 'document'
733
+ filename?: string
734
+ path?: string // Absolute path if file exists
735
+ mimeType?: string
736
+ size?: number
737
+ duration?: number // For audio/video in seconds
738
+ enrichment?: MediaEnrichment[] // AI analysis results
739
+ provenance?: {
740
+ originalPath?: string
741
+ source: 'csv' | 'db'
742
+ lastSeen?: string
743
+ }
744
+ }
745
+
746
+ type MediaEnrichment = {
747
+ kind: 'image_analysis' | 'transcription' | 'pdf_summary' | 'link_context'
748
+ content: Record<string, unknown>
749
+ provider: string // 'gemini', 'firecrawl', etc.
750
+ model: string
751
+ version: string
752
+ createdAt: string // ISO 8601
753
+ error?: string // If enrichment failed
754
+ }
755
+ ```
756
+
757
+ All dates are **ISO 8601 with Z suffix** (UTC). See
758
+ [Dates and Timezones](#dates-and-timezones) for conversion details.
759
+
760
+ ## CLI Commands
761
+
762
+ ### Main Pipeline Commands
763
+
764
+ #### `ingest-csv`
765
+
766
+ Import messages from iMazing CSV export.
767
+
768
+ ```bash
769
+ pnpm cli ingest-csv \
770
+ -i messages.csv \
771
+ -o messages.csv.ingested.json \
772
+ -a ~/Library/Messages/Attachments \
773
+ -a /Volumes/Backup/old-attachments
774
+ ```
775
+
776
+ **Options:**
777
+
778
+ - `-i, --input <path>` - iMazing CSV file (required)
779
+ - `-o, --output <path>` - Output JSON file (default:
780
+ `./messages.csv.ingested.json`)
781
+ - `-a, --attachments <dirs...>` - Root directories containing media files
782
+
783
+ #### `ingest-db`
784
+
785
+ Extract messages from macOS Messages.app SQLite database.
786
+
787
+ ```bash
788
+ pnpm cli ingest-db \
789
+ -i ~/Library/Messages/chat.db \
790
+ -o messages.db.ingested.json \
791
+ --contact john@example.com
792
+ ```
793
+
794
+ **Options:**
795
+
796
+ - `-i, --input <path>` - Messages.app database file (required)
797
+ - `-o, --output <path>` - Output JSON file (default:
798
+ `./messages.db.ingested.json`)
799
+ - `--contact <id>` - Filter by contact (phone or Apple ID)
800
+ - `-a, --attachments <dirs...>` - Attachment root directories
801
+
802
+ #### `normalize-link`
803
+
804
+ Merge sources, deduplicate, link replies/tapbacks, and validate schema.
805
+
806
+ ```bash
807
+ pnpm cli normalize-link \
808
+ -i messages.csv.ingested.json messages.db.ingested.json \
809
+ -o messages.normalized.json \
810
+ -m all
811
+ ```
812
+
813
+ **Options:**
814
+
815
+ - `-i, --input <paths...>` - Input JSON files (required, can specify multiple)
816
+ - `-o, --output <path>` - Output JSON file (default:
817
+ `./messages.normalized.json`)
818
+ - `-m, --merge-strategy <strategy>` - `exact` (GUID only) | `content` (content
819
+ equivalence) | `all` (both, default)
820
+
821
+ #### `enrich-ai`
822
+
823
+ Augment messages with AI analysis (images, audio, links). Resumable and
824
+ incremental.
825
+
826
+ ```bash
827
+ pnpm cli enrich-ai \
828
+ -i messages.normalized.json \
829
+ -o messages.enriched.json \
830
+ --resume \
831
+ --incremental \
832
+ --rate-limit 1000 \
833
+ --max-retries 3 \
834
+ --checkpoint-interval 100 \
835
+ --enable-vision --enable-audio --enable-links \
836
+ -v
837
+ ```
838
+
839
+ **Options:**
840
+
841
+ - `-i, --input <path>` - Input normalized JSON (required)
842
+ - `-o, --output <path>` - Output JSON file (default: `./messages.enriched.json`)
843
+ - `-c, --checkpoint-dir <path>` - Checkpoint directory (default:
844
+ `./.checkpoints`)
845
+ - `--resume` - Resume from last checkpoint
846
+ - `--incremental` - Only enrich messages new since last enrichment run
847
+ - `--state-file <path>` - Path to incremental state file (default:
848
+ `./.imessage-state.json`)
849
+ - `--reset-state` - Clear incremental state and enrich all messages
850
+ - `--rate-limit <ms>` - Delay between API calls (default: 1000)
851
+ - `--max-retries <n>` - Max retries on API errors (default: 3)
852
+ - `--checkpoint-interval <n>` - Save checkpoint every N items (default: 100)
853
+ - `--enable-vision` - Enable image analysis (default: true)
854
+ - `--enable-audio` - Enable audio transcription (default: true)
855
+ - `--enable-links` - Enable link enrichment (default: true)
856
+
857
+ #### `render-markdown`
858
+
859
+ Generate daily markdown files from enriched messages.
860
+
861
+ ```bash
862
+ pnpm cli render-markdown \
863
+ -i messages.enriched.json \
864
+ -o ./timeline \
865
+ --group-by-time \
866
+ --nested-replies \
867
+ --max-nesting-depth 10 \
868
+ --start-date 2025-01-01 \
869
+ --end-date 2025-12-31
870
+ ```
871
+
872
+ **Options:**
873
+
874
+ - `-i, --input <path>` - Input enriched JSON (required)
875
+ - `-o, --output <path>` - Output directory (default: `./timeline`)
876
+ - `--group-by-time` - Group by Morning/Afternoon/Evening (default: true)
877
+ - `--nested-replies` - Render replies as blockquotes (default: true)
878
+ - `--max-nesting-depth <n>` - Max blockquote nesting depth (default: 10)
879
+ - `--start-date <YYYY-MM-DD>` - Filter messages from this date
880
+ - `--end-date <YYYY-MM-DD>` - Filter messages until this date
881
+
882
+ ### Utility Commands
883
+
884
+ #### `validate`
885
+
886
+ Validate JSON file against Message schema.
887
+
888
+ ```bash
889
+ pnpm cli validate -i messages.json [-q]
890
+ ```
891
+
892
+ **Options:**
893
+
894
+ - `-i, --input <path>` - JSON file to validate (required)
895
+ - `-q, --quiet` - Suppress detailed error output
896
+
897
+ **Output:** Exit code 0 on success, 1 on validation failure. Prints summary
898
+ stats.
899
+
900
+ #### `stats`
901
+
902
+ Show statistics about a message file.
903
+
904
+ ```bash
905
+ pnpm cli stats -i messages.json [-v]
906
+ ```
907
+
908
+ **Options:**
909
+
910
+ - `-i, --input <path>` - JSON file (required)
911
+ - `-v, --verbose` - Show per-kind breakdown
912
+
913
+ **Output:** Message count, breakdown by `messageKind`, date range, attachment
914
+ count, etc.
915
+
916
+ #### `doctor`
917
+
918
+ Run system diagnostics.
919
+
920
+ ```bash
921
+ pnpm cli doctor [-v]
922
+ ```
923
+
924
+ **Checks:**
925
+
926
+ - Node.js version (22+)
927
+ - Dependencies (pnpm packages)
928
+ - Config file exists and is readable
929
+ - API keys present (GEMINI_API_KEY, FIRECRAWL_API_KEY)
930
+ - Attachment directories accessible
931
+ - Write permissions to output directories
932
+
933
+ #### `init`
934
+
935
+ Generate starter configuration file.
936
+
937
+ ```bash
938
+ pnpm cli init [-f json|yaml] [--force] [-o custom-path.yaml]
939
+ ```
940
+
941
+ **Options:**
942
+
943
+ - `-f, --format <format>` - `json` or `yaml` (default: yaml)
944
+ - `--force` - Overwrite existing config
945
+ - `-o, --output <path>` - Custom config path
946
+
947
+ ## Configuration
948
+
949
+ Configuration can be provided via `imessage-config.yaml` or
950
+ `imessage-config.json`. Create with `pnpm cli init` or manually:
951
+
952
+ ```yaml
953
+ version: '1.0'
954
+
955
+ # Attachment directories to search for media files
956
+ attachmentRoots:
957
+ - ~/Library/Messages/Attachments
958
+ - /Volumes/Backup/old-attachments
959
+
960
+ # Google Gemini API configuration
961
+ gemini:
962
+ apiKey: ${GEMINI_API_KEY} # Loaded from environment
963
+ model: gemini-1.5-pro # Recommended model
964
+ rateLimitDelay: 1000 # Milliseconds between requests
965
+ maxRetries: 3 # Retry failed API calls
966
+
967
+ # Firecrawl (link enrichment) configuration
968
+ firecrawl:
969
+ apiKey: ${FIRECRAWL_API_KEY} # Optional, for link context
970
+ enabled: true
971
+
972
+ # Enrichment settings
973
+ enrichment:
974
+ enableVisionAnalysis: true # Image captions/summaries
975
+ enableAudioTranscription: true # Audio transcription
976
+ enableLinkEnrichment: true # Link context extraction
977
+ imageCacheDir: ./.cache/images # Preview cache location
978
+ checkpointInterval: 100 # Items per checkpoint
979
+ forceRefresh: false # Re-enrich existing
980
+
981
+ # Rendering settings
982
+ render:
983
+ groupByTimeOfDay: true # Morning/Afternoon/Evening sections
984
+ renderRepliesAsNested: true # Blockquote threading
985
+ renderTapbacksAsEmoji: true # ❤️ instead of text
986
+ maxNestingDepth: 10 # Max blockquote levels
987
+ ```
988
+
989
+ **Environment Variables:**
990
+
991
+ - `GEMINI_API_KEY` - Google Gemini API key (required for enrichment)
992
+ - `FIRECRAWL_API_KEY` - Firecrawl API key (optional, for link enrichment)
993
+ - `TF_BUILD` - Set by CI systems (enables test reporters)
994
+
995
+ **Config Loading:**
996
+
997
+ - Looks for `imessage-config.yaml` or `imessage-config.json` in current
998
+ directory
999
+ - Supports environment variable expansion: `${VARIABLE_NAME}`
1000
+ - CLI `--config` flag overrides default path
1001
+
1002
+ ## Data Flows & Examples
1003
+
1004
+ ### Example 1: Single Source (CSV Only)
1005
+
1006
+ ```bash
1007
+ # Ingest CSV
1008
+ pnpm cli ingest-csv -i messages.csv -o messages.ingested.json
1009
+
1010
+ # Normalize (single source, minimal work)
1011
+ pnpm cli normalize-link -i messages.ingested.json -o messages.normalized.json
1012
+
1013
+ # Enrich (first time, all messages)
1014
+ pnpm cli enrich-ai -i messages.normalized.json -o messages.enriched.json
1015
+
1016
+ # Render
1017
+ pnpm cli render-markdown -i messages.enriched.json -o ./timeline
1018
+ ```
1019
+
1020
+ ### Example 2: Dual Source with Incremental Enrichment
1021
+
1022
+ ```bash
1023
+ # Ingest both sources
1024
+ pnpm cli ingest-csv -i messages.csv -o messages.csv.ingested.json
1025
+ pnpm cli ingest-db -i ~/Library/Messages/chat.db -o messages.db.ingested.json
1026
+
1027
+ # Normalize and merge
1028
+ pnpm cli normalize-link \
1029
+ -i messages.csv.ingested.json messages.db.ingested.json \
1030
+ -o messages.normalized.json
1031
+
1032
+ # First enrichment
1033
+ pnpm cli enrich-ai \
1034
+ -i messages.normalized.json \
1035
+ -o messages.enriched.json \
1036
+ --checkpoint-interval 100
1037
+
1038
+ # Later: new messages added, re-run incrementally
1039
+ pnpm cli enrich-ai \
1040
+ -i messages.normalized.json \
1041
+ -o messages.enriched.json \
1042
+ --incremental \
1043
+ --resume
1044
+ ```
1045
+
1046
+ ### Example 3: Resuming from Crash
1047
+
1048
+ ```bash
1049
+ # Enrichment stops mid-way (power loss, API timeout, etc.)
1050
+ # Checkpoint saved: .checkpoints/enrich-checkpoint-abc123def.json
1051
+
1052
+ # Resume from checkpoint
1053
+ pnpm cli enrich-ai \
1054
+ -i messages.normalized.json \
1055
+ -o messages.enriched.json \
1056
+ --resume
1057
+ # → Continues from last processed index automatically
1058
+ ```
1059
+
1060
+ ## Dates and Timezones
1061
+
1062
+ All dates in JSON outputs are **ISO 8601 UTC with Z suffix** (e.g.,
1063
+ `2025-10-26T14:30:45.000Z`).
1064
+
1065
+ ### CSV Import
1066
+
1067
+ - iMazing CSV format: `MM/DD/YYYY, HH:MM:SS` (local timezone, interpreted as
1068
+ UTC)
1069
+ - Converted to ISO 8601 with Z suffix
1070
+
1071
+ ### Database Import
1072
+
1073
+ - Apple epoch: Seconds since 2001-01-01 00:00:00 UTC
1074
+ - Formula: `ISO = (appleSeconds + 978307200) * 1000` (convert to milliseconds,
1075
+ then to ISO)
1076
+ - Result: ISO 8601 with Z suffix
1077
+
1078
+ ### Markdown Rendering
1079
+
1080
+ - Timestamps displayed in UTC
1081
+ - Grouped by calendar date (UTC)
1082
+ - To display in local timezone, render the timestamp differently in
1083
+ post-processing
1084
+
1085
+ ## Idempotency and Determinism
1086
+
1087
+ ### Idempotent Enrichment
1088
+
1089
+ Enrichment is **idempotent** by design:
1090
+
1091
+ - Check if `enrichment.kind` already exists for a message
1092
+ - Skip if present (already enriched)
1093
+ - Use `--force-refresh` to re-enrich specific kinds
1094
+
1095
+ ```bash
1096
+ # First run: enrich all
1097
+ pnpm cli enrich-ai -i messages.normalized.json -o messages.enriched.json
1098
+
1099
+ # Later: add new enrichment kind (e.g., link context)
1100
+ # Existing image/audio enrichments preserved, new kind added
1101
+ pnpm cli enrich-ai -i messages.normalized.json -o messages.enriched.json \
1102
+ --enable-links # Other kinds disabled
1103
+ ```
1104
+
1105
+ ### Deterministic Rendering
1106
+
1107
+ Markdown output is **fully deterministic**:
1108
+
1109
+ - Messages sorted by `(date, guid)` before rendering
1110
+ - Enrichments sorted by kind within each message
1111
+ - JSON keys in sorted order
1112
+ - No randomization or time-dependent output
1113
+
1114
+ This means: `sha256(messages.enriched.json) → sha256(timeline/*.md)` is
1115
+ consistent across runs.
1116
+
1117
+ ### Checkpoint Consistency
1118
+
1119
+ Checkpoints include config hash verification:
1120
+
1121
+ - Each checkpoint stores SHA256 of enrichment config
1122
+ - Resume only if config unchanged
1123
+ - Detects breaking changes (API key updates, disable/enable analysis modes)
1124
+
1125
+ ## Performance & Optimization
1126
+
1127
+ ### Concurrency
1128
+
1129
+ - **Ingest** (Stage 1): Single-threaded, fast (CSV parsing ~10k msgs/s)
1130
+ - **Normalize-Link** (Stage 2): Single-threaded, O(n log n) complexity (~1k
1131
+ msgs/s for dedup)
1132
+ - **Enrich-AI** (Stage 3): API call bound, respectful rate limiting (1-5
1133
+ msgs/min depending on Gemini quota)
1134
+ - **Render** (Stage 4): Single-threaded, fast (~10k msgs/s)
1135
+
1136
+ ### Memory
1137
+
1138
+ - **Streaming where possible**: Large JSON files loaded once into memory
1139
+ - **Checkpoint interval**: Default 100 items keeps memory bounded
1140
+ - **Image cache**: Reuses converted previews by filename
1141
+
1142
+ ### Cost Optimization
1143
+
1144
+ - **Incremental mode**: Only enrich new messages (~80% cost reduction for mature
1145
+ datasets)
1146
+ - **Selective enrichment**: Enable/disable analysis modes (`--enable-vision`,
1147
+ etc.)
1148
+ - **Image caching**: Preview conversion cached by filename (avoid re-processing)
1149
+ - **Fallback chain**: Use Firecrawl fallback before provider-specific parsing
1150
+ (reduce API calls)
1151
+
1152
+ ### Suggested Workflow
1153
+
1154
+ ```bash
1155
+ # Initial run (expensive, one-time)
1156
+ pnpm cli enrich-ai -i normalized.json -o enriched.json --checkpoint-interval 100
1157
+
1158
+ # Later: weekly incremental updates (cheap)
1159
+ pnpm cli enrich-ai -i normalized.json -o enriched.json --incremental --resume
1160
+
1161
+ # Yearly: full re-enrichment with new models
1162
+ pnpm cli enrich-ai -i normalized.json -o enriched.json --force-refresh
1163
+ ```
1164
+
1165
+ ## Advanced Usage
1166
+
1167
+ ### Merging Multiple CSV/DB Exports
1168
+
1169
+ Combine multiple conversations into a single timeline:
1170
+
1171
+ ```bash
1172
+ # Ingest each conversation separately
1173
+ pnpm cli ingest-csv -i chat-with-alice.csv -o alice.ingested.json
1174
+ pnpm cli ingest-csv -i chat-with-bob.csv -o bob.ingested.json
1175
+
1176
+ # Merge into single normalized file
1177
+ pnpm cli normalize-link \
1178
+ -i alice.ingested.json bob.ingested.json \
1179
+ -o messages.normalized.json
1180
+
1181
+ # Enrich and render as single timeline
1182
+ pnpm cli enrich-ai -i messages.normalized.json -o messages.enriched.json
1183
+ pnpm cli render-markdown -i messages.enriched.json -o ./timeline
1184
+ ```
1185
+
1186
+ ### Selective Date Range Rendering
1187
+
1188
+ Render only recent messages:
1189
+
1190
+ ```bash
1191
+ pnpm cli render-markdown \
1192
+ -i messages.enriched.json \
1193
+ -o ./timeline \
1194
+ --start-date 2025-10-01 \
1195
+ --end-date 2025-10-31
1196
+ ```
1197
+
1198
+ ### Upgrading Enrichment Models
1199
+
1200
+ Re-enrich with newer Gemini models:
1201
+
1202
+ ```bash
1203
+ # Update config with new model
1204
+ # imessage-config.yaml:
1205
+ # gemini:
1206
+ # model: gemini-2-flash # (hypothetical future model)
1207
+
1208
+ # Re-enrich with new model
1209
+ pnpm cli enrich-ai \
1210
+ -i messages.normalized.json \
1211
+ -o messages.enriched.json \
1212
+ --force-refresh
1213
+ ```
1214
+
1215
+ ## Testing
1216
+
1217
+ See our testing guide for configuration rationale and patterns:
1218
+ `docs/testing-best-practices.md`.
1219
+
1220
+ The project includes 50+ test files covering:
1221
+
1222
+ - Schema validation (happy path + invariant violations)
1223
+ - CSV/DB ingestion (parsing, path resolution, date conversion)
1224
+ - Linking algorithms (reply matching, tapback association)
1225
+ - Deduplication (GUID matching, content equivalence)
1226
+ - Enrichment idempotency (skip logic, force-refresh)
1227
+ - Checkpoint recovery (save/load, config hash verification)
1228
+ - Rendering (grouping, sorting, determinism)
1229
+
1230
+ ### Run Tests
1231
+
1232
+ ```bash
1233
+ # All tests
1234
+ pnpm test
1235
+
1236
+ # Watch mode (re-run on file change)
1237
+ pnpm test --watch
1238
+
1239
+ # Coverage report
1240
+ pnpm test:coverage
1241
+ ```
1242
+
1243
+ ### Coverage
1244
+
1245
+ Maintained at **70%+ branch coverage**. Critical paths (linking, dedup,
1246
+ enrichment) at 95%+.
1247
+
1248
+ ## Contributing
1249
+
1250
+ Contributions welcome! Please:
1251
+
1252
+ 1. Fork the repository
1253
+ 2. Create a feature branch (`git checkout -b feature/amazing-feature`)
1254
+ 3. Add tests for new functionality
1255
+ 4. Ensure tests pass: `pnpm test`
1256
+ 5. Format code: `pnpm format`
1257
+ 6. Lint: `pnpm lint`
1258
+ 7. **Create a changeset**: `pnpm version:gen` (for user-facing changes)
1259
+ 8. Commit with semantic message: `git commit -m "feat: add amazing feature"`
1260
+ - ✅ Hooks automatically validate commit format and code quality
1261
+ 9. Push and create a pull request
1262
+
1263
+ ### Development Setup
1264
+
1265
+ ```bash
1266
+ # Install with dev dependencies
1267
+ pnpm install
1268
+
1269
+ # Start watch mode (auto-rebuild TypeScript)
1270
+ pnpm watch
1271
+
1272
+ # Run tests in watch mode during development
1273
+ pnpm test --watch
1274
+
1275
+ # Check code quality before committing
1276
+ pnpm quality-check
1277
+ ```
1278
+
1279
+ ### Branch protection and local push guard
1280
+
1281
+ To configure GitHub branch protection on `main` using our aggregate gate:
1282
+
1283
+ ```bash
1284
+ bash scripts/setup-branch-protection.sh nathanvale chatline main
1285
+ ```
1286
+
1287
+ This script enables auto-merge, enforces required checks (PR quality / gate,
1288
+ Commitlint / commitlint, PR Title Lint / lint), requires signed commits, and
1289
+ keeps history linear.
1290
+
1291
+ Local safeguard: direct pushes to `main`/`master` are blocked by a Husky
1292
+ pre-push hook. Create a feature branch and open a PR instead.
1293
+
1294
+ If you absolutely must override locally (not recommended):
1295
+
1296
+ ```bash
1297
+ ALLOW_PUSH_PROTECTED=1 git push
1298
+ ```
1299
+
1300
+ ### Code Style
1301
+
1302
+ - **TypeScript**: Strict mode, no `any`
1303
+ - **Formatting**: Prettier with 80-char line limit
1304
+ - **Linting**: ESLint with recommended rules
1305
+ - **Testing**: Vitest with 70%+ coverage threshold
1306
+ - **Commits**: Conventional commits (feat:, fix:, docs:, etc.)
1307
+ - See [Automated Release Workflow](./docs/automated-release-workflow.md) for
1308
+ commit format guide
1309
+
1310
+ ### Release Process
1311
+
1312
+ This project uses **automated releases** with Changesets:
1313
+
1314
+ - **Create changeset** for user-facing changes: `pnpm version:gen`
1315
+ - **Commit messages** validated automatically via Husky + commitlint
1316
+ - **CI/CD** creates "Version Packages" PR when changesets are merged
1317
+ - **Publishing** happens automatically when version PR is merged
1318
+
1319
+ 📚 **Full documentation:**
1320
+
1321
+ - **[Automated Release Workflow](./docs/automated-release-workflow.md)** - Main
1322
+ release process
1323
+ - **[Pre-Release Guide](./docs/pre-release-guide.md)** - Canary, beta, and RC
1324
+ releases
1325
+
1326
+ ### Release Channels
1327
+
1328
+ We support prerelease channels for fast feedback and safe promotion:
1329
+
1330
+ - `next` for early adopters (canary builds)
1331
+ - `beta` for feature-complete testing
1332
+ - `rc` for release candidates
1333
+ - `canary` snapshots for experimental builds
1334
+ - `alpha` for automated nightly snapshots
1335
+
1336
+ **Quick commands:**
1337
+
1338
+ ```bash
1339
+ # Publish quick snapshot (only when NOT in pre-mode)
1340
+ pnpm release:snapshot:canary
1341
+
1342
+ # Publish versioned pre-release (when in pre-mode)
1343
+ pnpm changeset # Create changeset
1344
+ pnpm changeset version # Version as 0.0.1-next.0
1345
+ pnpm publish:pre # Publish to @next tag
1346
+
1347
+ # Enter/exit pre-release mode
1348
+ gh workflow run pre-mode.yml -f action=enter -f channel=next
1349
+ gh workflow run pre-mode.yml -f action=exit -f channel=next
1350
+ ```
1351
+
1352
+ > ⚠️ **Note:** Snapshot releases (`pnpm release:snapshot:canary`) only work when
1353
+ > NOT in pre-release mode. When in pre-mode, use versioned pre-releases instead.
1354
+
1355
+ 📚 **See the detailed guides:**
1356
+
1357
+ - **[Pre-Release Guide](./docs/pre-release-guide.md)** - Step-by-step publishing
1358
+ instructions
1359
+ - **[Release Channels Strategy](./docs/release-channels.md)** - Architecture and
1360
+ promotion flows
1361
+
1362
+ ### Package Hygiene
1363
+
1364
+ We enforce a tight publish surface and solid metadata:
1365
+
1366
+ - Validated with publint and AreTheTypesWrong
1367
+ - Minimal tarball via `files` whitelist
1368
+ - Provenance enabled for trusted builds
1369
+
1370
+ See the full checklist and how to run the checks:
1371
+ [Package Hygiene & Metadata Quality](./docs/package-hygiene.md)
1372
+
1373
+ ### Prettier formatting
1374
+
1375
+ We use a minimal, opinionated Prettier setup:
1376
+
1377
+ - Global 80-char width, trailing commas, single quotes, no semicolons
1378
+ - Deterministic JSON sorting via plugin
1379
+ - Non-mutating check for CI/local validation
1380
+
1381
+ Docs and rationale:
1382
+ [Prettier Best Practices & Formatting Strategy](./docs/prettier-best-practices.md)
1383
+
1384
+ ## Troubleshooting
1385
+
1386
+ ### "API rate limit exceeded"
1387
+
1388
+ **Solution:** Increase `--rate-limit` delay
1389
+
1390
+ ```bash
1391
+ pnpm cli enrich-ai -i messages.normalized.json -o enriched.json --rate-limit 2000
1392
+ ```
1393
+
1394
+ ### "Checkpoint config hash mismatch"
1395
+
1396
+ **Cause:** Changed enrichment config (API key, enable/disable analysis)
1397
+ **Solution:** Use `--reset-state` to clear or manually delete
1398
+ `.imessage-state.json`
1399
+
1400
+ ```bash
1401
+ pnpm cli enrich-ai -i messages.normalized.json -o enriched.json --reset-state
1402
+ ```
1403
+
1404
+ ### "Attachment paths not resolved"
1405
+
1406
+ **Cause:** Media file not found in attachment directories **Check:**
1407
+
1408
+ 1. Verify path in config (`attachmentRoots`)
1409
+ 2. Check file exists on disk
1410
+ 3. Check file permissions **Result:** Path stored as filename with provenance
1411
+ metadata
1412
+
1413
+ ### "Validation errors in normalized.json"
1414
+
1415
+ **Debug:**
1416
+
1417
+ ```bash
1418
+ pnpm cli validate -i messages.normalized.json -v
1419
+ # Shows which fields failed validation
1420
+ ```
1421
+
1422
+ **Common causes:**
1423
+
1424
+ - Missing `messageKind` field
1425
+ - Date not in ISO 8601 UTC format
1426
+ - Inconsistent data types (string vs number)
1427
+
1428
+ Run `pnpm cli doctor` for system-level diagnostics.
1429
+
1430
+ ## FAQ
1431
+
1432
+ **Q: Can I use this on Linux/Windows?** A: CSV ingestion works everywhere.
1433
+ Database ingestion requires macOS (to access Messages.app). You can export from
1434
+ macOS and process on other systems.
1435
+
1436
+ **Q: How much storage do the outputs take?** A: Enriched JSON is typically 2-3x
1437
+ original normalized JSON (due to enrichment data). Markdown files are 1-2x
1438
+ enriched JSON. A 1000-message conversation: ~5-10MB JSON, ~10-20MB markdown.
1439
+
1440
+ **Q: Can I re-use enriched.json if I change the render config?** A: Yes!
1441
+ Rendering is deterministic and config-independent. Change render settings
1442
+ (grouping, nesting depth) and re-render without re-enriching.
1443
+
1444
+ **Q: What if I don't have API keys?** A: Enrichment skips (messages remain
1445
+ as-is). Set `--enable-vision false --enable-audio false --enable-links false` to
1446
+ disable. Rendering still works perfectly without enrichment.
1447
+
1448
+ **Q: How do I update my timeline when new messages arrive?** A: Re-export from
1449
+ Messages.app/iMazing, then run the full pipeline OR use `--incremental --resume`
1450
+ to process only new messages (80%+ faster).
1451
+
1452
+ **Q: Is my data private?** A: Yes. All processing is local. API calls to
1453
+ Gemini/Firecrawl are necessary for enrichment but never persist to artifacts. No
1454
+ data retained after processing. Set API keys via environment variables (not in
1455
+ config files).
1456
+
1457
+ ## Technical Details
1458
+
1459
+ ### Schema Invariants
1460
+
1461
+ Messages enforce cross-field constraints via Zod `superRefine()`:
1462
+
1463
+ - `messageKind='media'` → `media` field must exist and be complete
1464
+ - `messageKind='tapback'` → `tapback` field must exist
1465
+ - `messageKind='text'|'notification'` → may have text, must not have
1466
+ media/tapback
1467
+ - All dates must be ISO 8601 with Z suffix
1468
+
1469
+ ### Linking Heuristics
1470
+
1471
+ Reply linking uses a confidence-scoring algorithm:
1472
+
1473
+ 1. Check DB association (if present, use immediately)
1474
+ 2. Search ±30s timestamp window
1475
+ 3. Score candidates:
1476
+ - Timestamp distance: closer = higher score
1477
+ - Text similarity: matching keywords = higher score
1478
+ - Sender difference: different person = higher score (likely replying)
1479
+ 4. Select highest score (or log as ambiguous if tie)
1480
+
1481
+ ### Deduplication Strategy
1482
+
1483
+ CSV/DB deduplication uses a multi-pass approach:
1484
+
1485
+ 1. Exact GUID matching (primary)
1486
+ 2. Content equivalence (fuzzy text + same sender + same timestamp)
1487
+ 3. Prefer DB values in conflicts (authoritiveness)
1488
+ 4. Sort by GUID for determinism
1489
+
1490
+ ### Idempotency Design
1491
+
1492
+ Enrichment is idempotent via kind-based deduplication:
1493
+
1494
+ - Each enrichment entry has a `kind` (e.g., `'image_analysis'`,
1495
+ `'transcription'`)
1496
+ - Check if `kind` already exists before enriching
1497
+ - `forceRefresh` replaces specific kind (preserves others)
1498
+ - Result: Safe to re-run without duplicating enrichments
1499
+
1500
+ ## Roadmap
1501
+
1502
+ - [ ] Support for WhatsApp, Telegram exports
1503
+ - [ ] Batch API calls to reduce Gemini quota usage
1504
+ - [ ] Vector embeddings for similarity search
1505
+ - [ ] Web UI for browsing/searching timeline
1506
+ - [ ] Obsidian plugin for live sync
1507
+ - [ ] Self-hosted LLM support (Ollama, etc.)
1508
+ - [ ] Photo gallery view alongside markdown
1509
+ - [ ] Sentiment analysis and conversation metrics
1510
+ - [ ] Anonymous mode (redact PII)
1511
+
1512
+ ## License
1513
+
1514
+ MIT © 2025
1515
+
1516
+ See [LICENSE](LICENSE) file for full text.
1517
+
1518
+ ## Related Projects
1519
+
1520
+ - [iMazing](https://imazing.com/) - CSV export source
1521
+ - [Firecrawl](https://www.firecrawl.dev/) - Link enrichment API
1522
+ - [Google Gemini](https://ai.google.dev/) - Image/audio analysis API
1523
+ - [Obsidian](https://obsidian.md/) - Markdown vault system
1524
+
1525
+ ## Contact & Support
1526
+
1527
+ - **Issues & Bugs**:
1528
+ [GitHub Issues](https://github.com/yourusername/chatline/issues)
1529
+ - **Discussions**:
1530
+ [GitHub Discussions](https://github.com/yourusername/chatline/discussions)
1531
+ - **Email**: support@example.com (replace with actual contact)
1532
+
1533
+ ---
1534
+
1535
+ **Enjoying iMessage Timeline?** Please star ⭐ the repo and share with friends!