agentel 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,837 @@
1
+ # History Source Handling
2
+
3
+ This document describes how agentlog currently discovers, imports, attributes,
4
+ and archives each supported history source. It is implementation documentation,
5
+ not a product promise: when provider storage formats change, this file should be
6
+ updated with the importer change.
7
+
8
+ ## Shared Import Pipeline
9
+
10
+ All supported sources are normalized into the same archive shape before write:
11
+
12
+ - `provider`: stable internal provider key, such as `codex`, `claude_code`, or
13
+ `cursor`.
14
+ - `sourceType`: the specific source family, such as `codex-cli-history` or
15
+ `cursor-agent-transcripts`.
16
+ - `sessionId`: provider-specific id when available; otherwise a stable hash.
17
+ - `cwd`: working directory if the source exposes one or agentlog can infer one.
18
+ - `repoCanonical`: git remote key from `cwd`, such as `github.com/org/repo`.
19
+ - `scopeCanonical`: non-repo storage scope for sessions without a reliable
20
+ working directory, such as `claude-desktop/uncategorized`.
21
+ - `messages`: normalized `user`, `assistant`, `system`, or `tool` messages with
22
+ ISO timestamps.
23
+ - `sourcePath`: local source file, directory, database, or export file.
24
+ - `sourceFiles`: optional list of concrete local files to copy into the raw
25
+ archive for sources backed by multiple files.
26
+ - `parserVersion`: centralized parser version for the source type.
27
+ - `events`: provider-independent canonical events generated from transcript
28
+ messages and structured tool metadata.
29
+ - stats metadata: `messageCount`, `userMessageCount`, aggregate `usage`, and
30
+ `models` are computed while writing the archive.
31
+
32
+ Before writing, message content is redacted by `src/redaction.js`. The redacted
33
+ transcript is written to `conversation.md`, `transcript.jsonl`, and
34
+ `events.jsonl`. Unredacted original source files are copied or referenced from
35
+ `session=<id>.raw/` with a manifest. Large multi-session stores such as Cursor
36
+ SQLite use a shared raw-source copy under `raw-sources/`, with each session raw
37
+ manifest pointing at that shared file instead of duplicating the same database
38
+ hundreds of times. The optional reveal cache stores the unredacted normalized
39
+ JSONL when enabled and when the session is repo-scoped. Reimports are skipped by
40
+ fingerprint unless the source file changes or the importer fingerprint version
41
+ changes.
42
+ Use `agentlog import --source cursor --since all --explain-skips` to print
43
+ Cursor skip reasons for one run. Add `--json` to inspect `skipReasons` counts
44
+ and `skippedItems[]` with session ids, source types, paths, and titles.
45
+
46
+ ## Stats Metadata Contract
47
+
48
+ The history web stats view consumes normalized archive metadata. It does not
49
+ recompute user-message counts, token totals, or model lists from old transcripts
50
+ as a hidden compatibility layer. When these fields or parser semantics change
51
+ before v1, update the importer/archive writer and do a clean rebuild:
52
+
53
+ Token totals are split by direction where possible. Cache-read/cache-creation
54
+ usage is preserved separately and repeated provider request ids are counted once,
55
+ which avoids inflating Claude Code/Desktop sessions that repeat the same request
56
+ usage across assistant text and tool-call rows.
57
+
58
+ ```sh
59
+ agentlog reset --yes
60
+ agentlog init
61
+ agentlog import --source all --since all
62
+ ```
63
+
64
+ `agentlog reset` removes agentlog state and archive objects only; it does not
65
+ delete source application histories such as Cursor, Codex, Claude, Gemini, or
66
+ Devin logs.
67
+
68
+ Archive paths are grouped by repo or scope:
69
+
70
+ ```text
71
+ <data>/agentlog/sessions/
72
+ repo=<repo-key>/provider=<provider>/year=YYYY/month=MM/day=DD/
73
+ scope=<scope-key>/provider=<provider>/year=YYYY/month=MM/day=DD/
74
+ ```
75
+
76
+ Each session directory contains:
77
+
78
+ ```text
79
+ session=<id>.metadata.json
80
+ session=<id>.conversation.md
81
+ session=<id>.transcript.jsonl
82
+ session=<id>.events.jsonl
83
+ session=<id>.raw/
84
+ manifest.json
85
+ 001-<original-file-name>
86
+ ```
87
+
88
+ Raw folders contain original source files, not redacted derivatives. SQLite
89
+ sources include existing `-wal` and `-shm` sidecars. For large shared SQLite
90
+ stores, `manifest.json` entries can be references with `sharedRawPath` instead
91
+ of files copied directly inside that session's `.raw/` directory.
92
+ Directory-backed sources copy only the concrete files listed by the importer;
93
+ agentlog does not blindly copy entire source directories.
94
+
95
+ ## Canonical Events
96
+
97
+ `events.jsonl` is the provider-independent archive/search substrate. It uses
98
+ schema version `agentlog.events.v1` and these event kinds:
99
+
100
+ - `session.started`
101
+ - `prompt.submitted`
102
+ - `response.generated`
103
+ - `tool.called`
104
+ - `tool.completed`
105
+
106
+ Agentlog intentionally ports only the portable Forge idea here: canonical
107
+ prompt/response/tool events with parser versions. It does not port Forge's
108
+ organization, sensor, device, WorkOS, NATS, ClickHouse, Postgres, or policy
109
+ control-plane fields.
110
+
111
+ Tool calls and results should be normalized before archive write. Importers may
112
+ preserve the provider's original category as `rawCategory`, while canonical
113
+ events add viewer-facing display metadata:
114
+
115
+ - `metadata.toolCalls[]`: `id`, `name`, `displayName`, `category`, `title`,
116
+ `status`, `argument`, `rawInputSummary`, `inputPreview`, `target`, `icon`,
117
+ `categoryLabel`, and `provider`.
118
+ - `metadata.toolResult`: `provider`, `kind`, `title`, `summary`, `output`,
119
+ `lineCount`, `collapsed`, `category`, `categoryLabel`, `icon`, and optional
120
+ `status`.
121
+
122
+ The viewer reads canonical events or normalized metadata first. Text patterns
123
+ such as `Grep(...)` are legacy fallback only.
124
+
125
+ Provider-generated context sometimes appears in upstream logs as `role: user`.
126
+ Agentlog preserves those records in transcripts, but reclassifies known shapes
127
+ as `system` messages with `metadata.providerGenerated = true` and a
128
+ `metadata.contextKind` so they do not become `prompt.submitted` recall events.
129
+ The current allowlist covers common Codex blocks such as
130
+ `<environment_context>`, `# AGENTS.md instructions`, `# Files mentioned by the
131
+ user`, `<subagent_notification>`, `<turn_aborted>`, `<skill>`, and interruption
132
+ markers. It also covers Claude Code blocks such as `<persisted-output>`,
133
+ `<task-notification>`, `<tool_use_error>`, `<command-message>`,
134
+ `<local-command-caveat>`, `<local-command-stdout>`, `<system-reminder>`, skill
135
+ context headers, and interruption markers.
136
+
137
+ ## Parser Versions
138
+
139
+ Parser versions live in `src/parser-versions.js`. They are semantic-version
140
+ strings. The first npm release uses `1.0.0` as the parser baseline for every
141
+ source type.
142
+
143
+ After release, bump the affected source type in the same change whenever parser
144
+ output changes for the same raw input: message roles, timestamps, tool-call
145
+ metadata, tool-result metadata, source classification, fingerprints, or
146
+ canonical event text. Use patch bumps for narrow correctness fixes, minor bumps
147
+ for additive parser enrichment, and major bumps for source identity,
148
+ fingerprint, archive-contract, or meaningfully incompatible output changes.
149
+
150
+ | Source type | Version |
151
+ | --- | --- |
152
+ | `codex-cli-history` | `1.0.0` |
153
+ | `codex-desktop-history` | `1.0.0` |
154
+ | `cli-history` | `1.0.0` |
155
+ | `claude-sdk-history` | `1.0.0` |
156
+ | `claude-code-desktop-metadata` | `1.0.0` |
157
+ | `claude-workspace-desktop` | `1.0.0` |
158
+ | `cursor-workspace-sqlite` | `1.0.0` |
159
+ | `cursor-global-sqlite` | `1.0.0` |
160
+ | `cursor-raw-sqlite-salvage` | `1.0.0` |
161
+ | `cursor-agent-transcripts` | `1.0.0` |
162
+ | `devin-cli-history` | `1.0.0` |
163
+ | `gemini-cli-history` | `1.0.0` |
164
+ | `cline-task-history` | `1.0.0` |
165
+ | `opencode-history` | `1.0.0` |
166
+ | `aider-chat-history` | `1.0.0` |
167
+ | `antigravity-history` | `1.0.0` |
168
+ | `web-chat-export` | `1.0.0` |
169
+ | `chatgpt-export` | `1.0.0` |
170
+ | `claude-web-export` | `1.0.0` |
171
+ | `claude-web-memory` | `1.0.0` |
172
+ | `import` | `1.0.0` |
173
+
174
+ `cursor-sqlite-history` and `antigravity-brain` are compatibility aliases for
175
+ older labels. Fingerprints include the parser version prefix, so changing the
176
+ version makes reimport replace stale archive copies.
177
+
178
+ ## Search And Recall Compatibility
179
+
180
+ `agentlog history` indexes `events.jsonl` first. Search results can include
181
+ `event_id`, `event_kind`, `message_index`, and `matched_text`, then aggregate
182
+ back to sessions for CLI/skill compatibility. Archives without `events.jsonl`
183
+ remain searchable through transcript/markdown fallback, and missing
184
+ `conversation.md` files are materialized from transcripts when needed.
185
+
186
+ Recall quality has deterministic tests in `test/recall-eval.test.js` with
187
+ fixtures under `test/fixtures/recall-evals.json`. Add a fixture when a vague
188
+ real-world query should reliably find a representative archived session.
189
+
190
+ ## Source Order
191
+
192
+ The setup UI, import defaults, and history source filters use this grouped order:
193
+
194
+ 1. OpenAI: Codex CLI, Codex Desktop, ChatGPT
195
+ 2. Anthropic: Claude Code CLI, Claude Code Desktop, Claude Workspace,
196
+ Claude.ai, Claude SDK jobs
197
+ 3. Google: Gemini CLI, Antigravity
198
+ 4. Cognition: Devin CLI
199
+ 5. Other: Cursor, Cline, OpenCode, Aider
200
+
201
+ `agentlog import --source all` uses the default import order from
202
+ `src/sources.js`: `codex-cli`, `codex-desktop`, `claude`,
203
+ `claude-code-desktop`, `claude-workspace`, `gemini-cli`, `antigravity`,
204
+ `devin-cli`, `cursor`, `cline`, `opencode`, `aider`. Claude SDK jobs are
205
+ intentionally opt-in. Windsurf is disabled for now because current Cascade
206
+ transcripts are encrypted binary stores.
207
+
208
+ The background supervisor polls the watcher source list selected near the end of
209
+ `agentlog init`. New configs still support `imports.autoDiscoverSources=true`,
210
+ but init now records the chosen watcher list exactly by setting
211
+ `imports.autoDiscoverSources=false`.
212
+
213
+ Supervisor imports use `imports.defaultSinceDays` as a rolling window. Cursor
214
+ SQLite store scans and raw recovery are disabled in supervisor ticks, so old
215
+ deleted/migrated fragments and legacy SQLite-only conversations are recovered
216
+ only by explicit full imports such as `agentlog import --source cursor --since
217
+ all`. The supervisor still imports newer Cursor agent transcript logs and prunes
218
+ duplicate transcript snapshots.
219
+
220
+ ## Supervisor And Full Import Contract
221
+
222
+ The supervisor is for going-forward archival. It should not silently perform
223
+ old-history repair work that belongs to an explicit full import. Keep these
224
+ rules in mind when adding or changing an importer:
225
+
226
+ - Supervisor ticks pass a rolling `--since` window from
227
+ `imports.defaultSinceDays`; parser backfills and old-history repairs should be
228
+ documented as explicit `agentlog import --source <source> --since all` flows.
229
+ - Cursor supervisor ticks set `cursorRecovery=false` and `supervisor=true`.
230
+ That skips raw SQLite salvage, raw companion merge backfill, workspace SQLite,
231
+ and global `cursorDiskKV` scans. Full Cursor imports keep those heavier
232
+ recovery paths enabled.
233
+ - Incremental Cursor pruning is scoped to source paths touched by that tick, so
234
+ the watcher can collapse duplicate live agent-transcript snapshots without
235
+ opportunistically rewriting old unrelated Cursor history.
236
+ - Cursor agent-transcript session ids are derived from the transcript root/thread
237
+ key, not message count. A growing live transcript should update one archived
238
+ session instead of minting snapshots every poll.
239
+ - `writeSession()` replaces existing archives with the same `sourcePath` by
240
+ default. Importers for many-sessions-per-container stores must opt out of that
241
+ behavior. Cursor workspace SQLite and Devin `sessions.db` are the important
242
+ examples; otherwise one session from the DB can delete its siblings. For
243
+ one-session-per-root snapshots such as Cursor agent transcripts, replacement
244
+ by `sourcePath` is desirable because it removes stale partial snapshots.
245
+ - Detached supervisor discovery does not have the same current working directory
246
+ as a manual shell import. Aider can discover the current repo during manual
247
+ imports, but the supervisor relies on configured roots and common directories
248
+ such as `~/Documents/GitHub`, `~/Developer`, `~/Projects`, `~/Code`, and
249
+ `~/Work`. Repos elsewhere need `AGENTLOG_AIDER_ROOT(S)` or equivalent config.
250
+
251
+ When introducing a new source, classify it before wiring it into the supervisor:
252
+
253
+ - one session per file/directory: sourcePath replacement is usually safe;
254
+ - many sessions per database/directory: disable sourcePath replacement and use a
255
+ stable provider session id;
256
+ - growing live transcript: make the session id stable across message-count
257
+ changes and allow sourcePath replacement to collapse snapshots;
258
+ - recovery/backfill parser: keep it behind explicit full imports unless the
259
+ source is cheap, current, and safe to repair incrementally.
260
+
261
+ ## Resume Commands
262
+
263
+ The web viewer exposes a copy-resume button only when agentlog can form a
264
+ stable local command for the archived source.
265
+
266
+ | Source | Resume command | Notes |
267
+ | --- | --- | --- |
268
+ | Codex CLI | `codex resume <session-id>` | Uses the Codex thread id from `~/.codex/state_5.sqlite`. |
269
+ | Codex Desktop | `codex resume <session-id>` | Uses the same Codex thread id. Codex decides whether the resumed session opens in the terminal flow. |
270
+ | Claude Code CLI | `claude -r <session-id>` | Uses the Claude Code JSONL session id. |
271
+ | Devin CLI | `devin -r <session-id>` | agentlog archives these as `devin-<session-id>` and strips that prefix for the resume command, for example `devin -r selective-lotus`. |
272
+ | Claude Code Desktop | No stable local resume command known. | Use Claude's own desktop/history surface or `agentlog show <session-id>`. |
273
+ | Claude Workspace | No stable local resume command known. | Workspace/local-agent session ids are not known to be accepted by Claude Code's CLI resume flag. |
274
+ | Claude SDK jobs | No interactive resume command. | These are programmatic/batch runs. |
275
+ | ChatGPT export | No local resume command. | Official exports are imported snapshots. |
276
+ | Claude.ai export | No local resume command. | Official exports are imported snapshots. |
277
+ | Gemini CLI | No stable local resume command is currently wired. | agentlog imports saved files but does not assume a Gemini CLI resume interface. |
278
+ | Antigravity | No stable local resume command known. | Imported artifacts are readable task/plan files. |
279
+ | Cursor | No stable local resume command known. | Cursor history should be reopened through Cursor if available. |
280
+ | Cline | No stable local resume command known. | Cline task folders can be restored through Cline's own history/recovery surfaces. |
281
+ | OpenCode | `opencode --session <session-id>` | agentlog archives these as `opencode-<session-id>` and strips that prefix for the resume command. |
282
+ | Aider | No stable local resume command known. | Aider histories are repo-local transcript snapshots. |
283
+
284
+ ## Codex CLI
285
+
286
+ - Import selector: `codex-cli`
287
+ - Provider: `codex`
288
+ - Source type: `codex-cli-history`
289
+ - Primary store: `~/.codex/state_5.sqlite`
290
+ - Session files: rollout paths referenced by the `threads` table, plus
291
+ unindexed `rollout-*.jsonl` files under `sessions` and `archived_sessions`
292
+ - Source split: `threads.source = "cli"`
293
+ - Overrides:
294
+ - `CODEX_STATE_DB` overrides the state database path.
295
+ - `CODEX_HOME` is used for the fallback sessions root.
296
+
297
+ The importer reads `id`, `rollout_path`, `created_at`, `updated_at`, `source`,
298
+ `cwd`, and `title` from the Codex state database using `sqlite3`. When the
299
+ database has the newer `stage1_outputs` table, agentlog also reads
300
+ `rollout_summary` and `raw_memory` as supplementary Codex summary documents and
301
+ adds them to the archived transcript. The importer also scans
302
+ `~/.codex/sessions` and `~/.codex/archived_sessions` for `rollout-*.jsonl` and
303
+ `rollout-*.jsonl.zst` files that are not referenced by the state database, so
304
+ older archived rollouts still get backed up.
305
+
306
+ The rollout JSONL parser captures readable `response_item` reasoning summaries,
307
+ Codex `event_msg` assistant/user messages, task and compaction markers, local
308
+ shell calls, web search calls, custom tool calls such as `apply_patch`, tool
309
+ outputs, and token-count usage deltas. Shell calls that run `apply_patch`
310
+ through a heredoc are promoted to edit tool calls with `patch`, `diff`, and
311
+ target path metadata. The working directory comes from the parsed transcript
312
+ first, then the `threads.cwd` column. If neither is available, the session is
313
+ archived under `codex/uncategorized` instead of inheriting the supervisor's
314
+ current directory. Repo attribution is computed from the resolved directory.
315
+ Reading `.zst` sessions requires `zstd` or `unzstd`.
316
+
317
+ ## Codex Desktop
318
+
319
+ - Import selector: `codex-desktop`
320
+ - Provider: `codex`
321
+ - Source type: `codex-desktop-history`
322
+ - Primary store: `~/.codex/state_5.sqlite`
323
+ - Session files: rollout paths referenced by the `threads` table
324
+ - Source split: `threads.source = "vscode"`
325
+ - Overrides: same as Codex CLI
326
+
327
+ Codex Desktop uses the same state database, summary-document handling, and
328
+ rollout parser as Codex CLI. The only distinction is the `threads.source` value.
329
+ This is why the web source dropdown can split Codex CLI and Codex Desktop even
330
+ though both archive under the same `codex` provider.
331
+
332
+ ## ChatGPT Export
333
+
334
+ - Import command: `agentlog import chatgpt --file <path> [--scope local|team]`
335
+ - Provider: `chatgpt`
336
+ - Source type: `web-chat-export`
337
+ - Source file: ChatGPT JSON export or ZIP containing a JSON export
338
+ - Default archive scope: `chatgpt`
339
+
340
+ ChatGPT is not scanned automatically from a desktop app. The user provides an
341
+ official export file. ZIP imports prefer `conversations.json`, then another JSON
342
+ file with `chat` in the name, then the first JSON file in the ZIP.
343
+
344
+ For OpenAI export mappings, agentlog reads each node message, normalizes
345
+ `author.role`, extracts `content.parts`, and uses `create_time` or `update_time`
346
+ as the timestamp. Web imports are scope-based by default because they generally
347
+ do not have a reliable local working directory.
348
+
349
+ ## Claude Code CLI
350
+
351
+ - Import selector: `claude`
352
+ - Provider: `claude_code`
353
+ - Source type: `cli-history`
354
+ - Primary store: `~/.claude/projects/*/*.jsonl`
355
+
356
+ Claude Code CLI files are discovered under `~/.claude/projects`. Each JSONL file
357
+ is classified before import. A file is treated as an interactive conversation
358
+ when the initial records include `type = "user"` or `type = "assistant"` with a
359
+ `message` object and no `entrypoint = "sdk-cli"`.
360
+
361
+ The Claude-specific JSONL parser extracts session ids, titles, cwd fields,
362
+ message roles, text content, timestamps, assistant thinking summaries,
363
+ `tool_use` calls, `tool_result` outputs, model, request id, stop status, and
364
+ token usage. Tool calls and results are normalized into the shared
365
+ `metadata.toolCalls[]`, `metadata.toolResult`, and `metadata.usage` shapes.
366
+ Bash or shell tool calls that invoke `apply_patch` are reclassified as edit
367
+ calls and retain the patch text under `arguments.diff`. Repo attribution is
368
+ computed from the parsed `cwd`; if no cwd is present the session is archived
369
+ under an uncategorized provider scope.
370
+
371
+ ## Claude SDK Jobs
372
+
373
+ - Import selector: `claude-sdk`
374
+ - Provider: `claude_sdk`
375
+ - Source type: `claude-sdk-history`
376
+ - Primary store: `~/.claude/projects/*/*.jsonl`
377
+ - Default setup state: unchecked
378
+
379
+ SDK jobs are stored in the same Claude project tree as interactive Claude Code
380
+ sessions. agentlog separates them by scanning the initial JSONL records for
381
+ `entrypoint = "sdk-cli"`. They are shown as a separate opt-in source because
382
+ batch runs can be much higher volume than interactive sessions.
383
+
384
+ When imported, SDK jobs use the same Claude-specific JSONL parser as Claude Code
385
+ CLI but archive under `claude_sdk`.
386
+
387
+ ## Claude Code Desktop
388
+
389
+ - Import selector: `claude-code-desktop`
390
+ - Provider: `claude_desktop`
391
+ - Source type: `claude-code-desktop-metadata`
392
+ - Primary store:
393
+ `~/Library/Application Support/Claude/claude-code-sessions/local_*.json`
394
+ - Audit transcript path:
395
+ `~/Library/Application Support/Claude/claude-code-sessions/local_<id>/audit.jsonl`
396
+ - Fallback scope: `claude-code-desktop/uncategorized`
397
+
398
+ Claude Code Desktop local files are JSON metadata records. When a matching
399
+ `audit.jsonl` exists, agentlog imports assistant, user, and tool summary events
400
+ from that audit file, including Anthropic-style `tool_use` and `tool_result`
401
+ blocks when the audit payload carries them. When no audit file exists, it
402
+ imports metadata-derived messages from `initialMessage` and selected folders
403
+ when present.
404
+
405
+ Discovery scans the Claude app storage once, but the user-facing source rows are
406
+ split by kind. `claude-code-desktop` is the Claude Code desktop-launch metadata
407
+ path; `claude-workspace` is Claude app local-agent/workspace mode. The older
408
+ generic `claude-desktop` aggregate is kept only as a compatibility import
409
+ selector and is not shown as a separate discovery row.
410
+
411
+ Working directory attribution comes from `originCwd`, then `cwd`, then the first
412
+ existing folder in `userSelectedFolders`. If no existing directory is available,
413
+ the session is archived under `claude-code-desktop/uncategorized` instead of
414
+ being assigned to whatever repo agentlog happens to run from.
415
+
416
+ ## Claude Workspace
417
+
418
+ - Import selector: `claude-workspace`
419
+ - Provider: `claude_desktop`
420
+ - Source type: `claude-workspace-desktop`
421
+ - Primary store:
422
+ `~/Library/Application Support/Claude/local-agent-mode-sessions/local_*.json`
423
+ - Audit transcript path:
424
+ `~/Library/Application Support/Claude/local-agent-mode-sessions/local_<id>/audit.jsonl`
425
+ - Fallback scope: `claude-desktop/uncategorized`
426
+
427
+ Claude Workspace uses the same parser as Claude Code Desktop but reads from the
428
+ Claude app local-agent mode directory. `audit.jsonl` is preferred when present.
429
+ Metadata fallback imports the initial prompt and selected folder context.
430
+
431
+ As with Claude Code Desktop, repo attribution only happens when an existing
432
+ working directory can be found. Otherwise the archive is intentionally
433
+ uncategorized.
434
+
435
+ ## Claude.ai Export
436
+
437
+ - Import command: `agentlog import claude-web --file <path> [--scope local|team]`
438
+ - Provider: `claude_web`
439
+ - Source types: `claude-web-export`, `claude-web-memory`
440
+ - Source file: Claude.ai JSON export or ZIP containing a JSON export
441
+ - Default archive scope: `claude_web`
442
+
443
+ Claude.ai is not scanned automatically from the desktop app. The user provides
444
+ an official export file. agentlog reads `chat_messages`, `messages`, or
445
+ `children`, normalizes sender/role fields, extracts text content, and uses
446
+ `created_at`, `updated_at`, or `timestamp`.
447
+
448
+ Project-file conversations are imported as project-scoped sessions. Top-level
449
+ conversation exports are assigned to a project only when Claude includes an
450
+ explicit project reference such as `project_uuid`, `project_id`, or a nested
451
+ `project` object. Some official Claude exports include `projects/*.json` and
452
+ project memories but omit the per-conversation project id in `conversations.json`;
453
+ those conversations remain under the account-level chat scope.
454
+
455
+ Memory exports are grouped under the synthetic `memory` chat folder instead of
456
+ the original project folder. Root memory is titled `Claude Memory`; project
457
+ memory is titled `Claude Project Memory: <project name>`. This keeps project
458
+ folders from implying that account-level conversations were reliably tagged to
459
+ Claude projects when the export did not preserve that relationship. Re-run
460
+ `agentlog import claude-web --file <path>` after importing an export that
461
+ contains conversation project ids or after memory parser semantics change.
462
+
463
+ Like ChatGPT export imports, Claude.ai imports are scope-based by default because
464
+ the export does not reliably describe a local repo.
465
+
466
+ ## Gemini CLI
467
+
468
+ - Import selector: `gemini-cli`
469
+ - Provider: `gemini_cli`
470
+ - Source type: `gemini-cli-history`
471
+ - Primary stores:
472
+ - `~/.gemini/tmp`
473
+ - `~/.gemini/history`
474
+ - `~/.gemini/sessions`
475
+ - `~/.gemini/exports`
476
+ - `$GEMINI_HOME` or `AGENTLOG_GEMINI_HOME_DIR` equivalents
477
+
478
+ agentlog scans Gemini CLI structured history files: `.json`, `.jsonl`, `.md`,
479
+ and `.markdown`. Under `~/.gemini/tmp`, it includes chat/checkpoint directories
480
+ and one-level JSON files, while excluding `shell_history`.
481
+
482
+ JSON and JSONL files use a Gemini-specific parser for `role` / `parts` content,
483
+ native Gemini CLI `type: "user"` / `type: "gemini"` content records, `model`
484
+ role normalization, `functionCall`, `functionResponse`, direct tool call/result
485
+ shapes, nested native `toolCalls[].result` entries, shell/code execution parts,
486
+ usage metadata, and checkpoint or restore events. Gemini tmp prompt logs and
487
+ chat JSONL sidecars with the same session id are coalesced so prompt-only
488
+ `logs.json` files do not overwrite richer assistant/tool transcripts. Markdown
489
+ files are split into messages by role headings such as
490
+ `# User`, `# Assistant`, or bold role labels. The working directory comes from
491
+ parsed cwd fields or Gemini tmp `.project_root` metadata. If no working
492
+ directory can be resolved, the session is archived under
493
+ `gemini-cli/uncategorized`.
494
+
495
+ ## Antigravity
496
+
497
+ - Import selector: `antigravity`
498
+ - Provider: `antigravity`
499
+ - Source type: `antigravity-brain`
500
+ - Primary readable store: `~/.gemini/antigravity/brain/*`
501
+ - Binary store counted but not decoded:
502
+ `~/.gemini/antigravity/conversations/*.pb`
503
+
504
+ agentlog imports readable Markdown artifacts from each task directory. Recognized
505
+ artifact names are `task.md`, `implementation_plan.md`, `walkthrough.md`, and
506
+ `plan.md`. Each artifact becomes an assistant message with a heading naming the
507
+ artifact. Timestamps come from artifact file mtimes.
508
+
509
+ The importer tries to infer a working directory from `file://...` links inside
510
+ the Markdown artifacts. If none can be inferred, it archives under
511
+ `antigravity/uncategorized`. Binary protobuf transcripts are counted in
512
+ discovery details but not imported as conversation messages yet.
513
+
514
+ ## Devin CLI
515
+
516
+ - Import selector: `devin-cli`
517
+ - Provider: `devin`
518
+ - Source type: `devin-cli-history`
519
+ - Primary store on macOS/Linux: `~/.local/share/devin/cli/sessions.db`
520
+ - Primary store on Windows: `%LOCALAPPDATA%\devin\cli\sessions.db`
521
+ - WAL files: `sessions.db-shm` and `sessions.db-wal`
522
+ - Override: `AGENTLOG_DEVIN_SESSIONS_DB` points at an alternate database file.
523
+
524
+ agentlog reads Devin for Terminal's SQLite store with `sqlite3`. It imports
525
+ visible rows from `sessions`, then reads `message_nodes` and reconstructs the
526
+ selected conversation branch by walking backward from `sessions.main_chain_id`
527
+ through `message_nodes.parent_node_id`. That avoids importing alternate branch
528
+ nodes that are present in the database but not part of the visible thread.
529
+
530
+ `message_nodes.chat_message` is JSON. agentlog normalizes `role`, `content`,
531
+ and `tool_calls`, skips system messages, and skips Devin context user messages
532
+ that begin with `<rules ...>`, `<available_skills>`, or `<git_status>`. Tool
533
+ results are preserved as `tool` messages with `metadata.toolResult`.
534
+ Assistant tool calls are stored in `metadata.toolCalls[]`; agentlog no longer
535
+ appends synthetic readable lines such as `Grep(TODO)` into assistant prose.
536
+ Devin's `metadata.extensions["chisel/tool_call_content"]` is used for small
537
+ display metadata (`title`, `status`, `kind`, and tool id) while arguments are
538
+ stored as redaction-aware summaries.
539
+
540
+ Timestamps come from `sessions.created_at`, `sessions.last_activity_at`, and
541
+ per-node `created_at` values. Devin currently stores these as Unix seconds.
542
+ The working directory comes from `sessions.working_directory`, so repo
543
+ attribution follows the project directory Devin was launched from.
544
+
545
+ If `message_nodes` contains no importable messages, agentlog falls back to
546
+ `prompt_history` so at least direct user prompts can be archived.
547
+
548
+ ## Cursor
549
+
550
+ - Import selector: `cursor`
551
+ - Provider: `cursor`
552
+ - Source types:
553
+ - `cursor-sqlite-history`
554
+ - `cursor-workspace-sqlite`
555
+ - `cursor-global-sqlite`
556
+ - `cursor-raw-sqlite-salvage`
557
+ - `cursor-agent-transcripts`
558
+ - Older workspace store:
559
+ `~/Library/Application Support/Cursor/User/workspaceStorage/*/state.vscdb`
560
+ - Global store used for Cursor composer and raw salvage data:
561
+ `~/Library/Application Support/Cursor/User/globalStorage/state.vscdb`
562
+ - Newer project transcript store:
563
+ `~/.cursor/projects/<project-slug>/agent-transcripts/**`
564
+ - Overrides:
565
+ - `AGENTLOG_CURSOR_WORKSPACE_STORAGE_DIR` overrides the workspace SQLite root.
566
+ - `AGENTLOG_CURSOR_GLOBAL_STORAGE_DIR` overrides the global SQLite root.
567
+ - `AGENTLOG_CURSOR_GLOBAL_STORAGE_DB` points at one explicit global SQLite DB.
568
+ - `AGENTLOG_CURSOR_PROJECTS_DIR` overrides the project transcript root.
569
+ - `AGENTLOG_CURSOR_HOME_DIR` is available for tests and alternate home roots.
570
+
571
+ For older Cursor stores, agentlog reads `state.vscdb` with `sqlite3` and selects
572
+ the `workbench.panel.aichat.view.aichat.chatdata`, `composer.composerData`,
573
+ `aiService.prompts`, and `aiService.generations` keys from `ItemTable`. It
574
+ recursively finds `bubbles` arrays, converts Cursor bubble types into user or
575
+ assistant messages, appends terminal/file-selection context, and uses bubble
576
+ timestamps when present. Cursor tool call, tool result, usage, model, status,
577
+ request id, composer id, and edit/diff-like records are normalized into the
578
+ shared `metadata.toolCalls[]`, `metadata.toolResult`, and `metadata.usage`
579
+ shapes when those fields appear in the stored blobs.
580
+
581
+ When an old Cursor workspace has no full `bubbles` transcript but still has
582
+ composer headers plus `aiService` prompt/generation breadcrumbs, agentlog imports
583
+ a fallback searchable timeline. Composer headers provide titles and time ranges;
584
+ matching `composer` generation rows become user messages and `apply` generation
585
+ rows become assistant "Applied changes" messages. This recovers older
586
+ UI-visible Cursor work where the local database preserved activity summaries but
587
+ not the full assistant prose.
588
+
589
+ Cursor also stores many UI-visible Composer/Agent conversations in the global
590
+ `cursorDiskKV` table instead of the per-workspace `ItemTable`. Those records are
591
+ usually split: `composerData:<composerId>` stores the title, original
592
+ created/updated times, workspace/context hints, and ordered
593
+ `fullConversationHeadersOnly`, while `bubbleId:<composerId>:<bubbleId>` stores
594
+ the individual message bodies. Some older rows instead put the ordered message
595
+ records directly in `composerData.conversation`; when headers are empty or
596
+ missing, that inline conversation array is the authoritative ordering source.
597
+ Agentlog imports these as `cursor-global-sqlite` by reading the best available
598
+ header/conversation list, loading only the matching bubble rows when needed,
599
+ ordering bubbles by that list, and using the composer-level timestamps instead of
600
+ Cursor's migrated bubble timestamps. This is the path that recovers old Cursor UI
601
+ conversations whose workspace `state.vscdb` now only shows selected composer ids
602
+ or prompt history.
603
+
604
+ When the global composer record omits a workspace folder, agentlog cross-checks
605
+ workspace `composer.composerData` and chat state for the same composer id and
606
+ uses that workspace folder for repo attribution. It also mines absolute paths
607
+ from Cursor bubble context, including object keys such as nested
608
+ `context.mentions.fileSelections["file:///..."]`, `relevantFiles`,
609
+ `attachedFolders`, workspace URIs, and recently viewed files. When old file
610
+ paths in the bubble context no longer exist, cwd inference walks up to the
611
+ nearest existing parent so the session can still resolve to the repo.
612
+
613
+ During dedupe, richer Cursor sources win over fallback sources. Newer project
614
+ agent transcripts rank highest, global `cursorDiskKV` composer records rank
615
+ above workspace SQLite, and `aiService` prompt/apply history ranks as a
616
+ breadcrumb fallback. This matters for old Cursor threads where the same UI
617
+ conversation appears twice: once as a full global composer with real historical
618
+ timestamps, and once as a workspace prompt-history snapshot stamped with the
619
+ SQLite file mtime. Exact cross-source duplicates, same-title near duplicates,
620
+ and prompt-history snippets with matching user prompts are dropped in favor of
621
+ the richer source.
622
+
623
+ Cursor can also leave older UI-visible conversations only as deleted or migrated
624
+ SQLite page fragments. To recover those, agentlog performs a best-effort raw
625
+ salvage pass over each workspace `state.vscdb` plus sibling `state.vscdb.backup`
626
+ and `state.vscdb-wal` files, and over the global storage `state.vscdb` with the
627
+ same backup/WAL siblings. This pass streams the bytes without mutating Cursor's
628
+ files, searches for parseable Cursor JSON records whose keys look like
629
+ `composerData:<id>`, `_composerData:<id>`, or
630
+ `bubbleId:<composerId>:<bubbleId>`, and imports them as
631
+ `cursor-raw-sqlite-salvage`. `composerData` fragments are parsed as whole
632
+ conversation containers when possible; `bubbleId` fragments are grouped by
633
+ composer id and sorted by Cursor timestamps or raw-file offset. Working
634
+ directories are inferred from workspace identifiers, selected file paths, tool
635
+ call metadata, absolute paths in tool output, and recovered shell prompts such
636
+ as `web-a37 %`. Unresolved recovered sessions remain under
637
+ `cursor/uncategorized`.
638
+
639
+ Raw salvage is intentionally conservative. It skips corrupt or incomplete JSON,
640
+ caps individual fragments so one damaged free-list page cannot stall an import,
641
+ deduplicates recovered sessions against the normal Cursor sources, and runs a
642
+ second conservative merge pass for raw companion fragments. Assistant/tool-only
643
+ raw fragments can attach to a same-project user session when the recovered
644
+ assistant prose has enough keyword overlap with the target. Raw fragments that
645
+ contain both user prompts and assistant/tool responses can attach to the best
646
+ matching workspace or agent transcript only when multiple recovered user prompts
647
+ already appear in that target. When a raw fragment is just a timestamp-shifted
648
+ copy of content already merged, it is dropped as contained. Merged messages are
649
+ annotated with `metadata.mergeReason = "cursor-raw-assistant-only"` or
650
+ `"cursor-raw-companion"`, plus `metadata.mergedFromSessionId`, so the recovery
651
+ remains auditable. The target session fingerprint includes the recovered
652
+ companion content, so a rerun can refresh a previously archived workspace thread
653
+ instead of skipping it as already imported. It can recover full assistant prose
654
+ when that prose still exists in parseable raw SQLite bytes; if Cursor has
655
+ compacted, vacuumed, encrypted, or synced only a server-side copy, the fallback
656
+ may only find `aiService` prompt/apply breadcrumbs or nothing.
657
+
658
+ Some raw fragments do not contain durable Cursor timestamps. When that happens,
659
+ the parser may need a synthetic timestamp while constructing the archive object,
660
+ but the history viewer treats timestamps that line up with Cursor SQLite file
661
+ mtimes as recovered/unknown rather than as fresh activity. This prevents a clean
662
+ reinstall or large raw-salvage import from making old Cursor threads appear as
663
+ if they all happened minutes ago.
664
+
665
+ Cursor progress bars count scan units for the current phase, not final
666
+ conversations. For example, `workspace stores: 79/123` means 79 of 123 Cursor
667
+ workspace SQLite files have been scanned. The final discovery row reports the
668
+ deduped session count after workspace rows, global `cursorDiskKV`, raw salvage,
669
+ and project transcript sources have all been merged.
670
+
671
+ The older SQLite path gets its working directory from the workspace
672
+ `workspace.json` file next to `state.vscdb`. If that file is missing or
673
+ unreadable, the session is archived under `cursor/uncategorized`.
674
+
675
+ For newer Cursor agent transcripts, agentlog scans
676
+ `~/.cursor/projects/<project-slug>/agent-transcripts` for `.json` and `.jsonl`
677
+ files and groups files by transcript session directory. It parses top-level
678
+ `role` plus `message.content` shapes, generic nested message shapes, and common
679
+ timestamp fields. JSON and JSONL transcripts also get Cursor-specific extraction
680
+ for `tool_calls`, `toolCalls`, `toolResults`, command outputs, edit records,
681
+ diff records, model/status/request metadata, and token usage. When no
682
+ per-message timestamp exists, it uses the source file's mtime with stable
683
+ millisecond offsets so imports do not get stamped with the time of import.
684
+
685
+ Cursor project slugs are decoded back to local paths when possible. For example,
686
+ `Users-bzhou-Documents-GitHub-spring-next` resolves to
687
+ `/Users/bzhou/Documents/GitHub/spring-next` if that directory exists. If no
688
+ working directory can be resolved for a newer transcript, it archives under
689
+ `cursor/uncategorized` instead of assigning the session to the current repo.
690
+
691
+ ## Cline
692
+
693
+ - Import selector: `cline`
694
+ - Provider: `cline`
695
+ - Source type: `cline-task-history`
696
+ - VS Code roots:
697
+ - `~/Library/Application Support/Code/User/globalStorage/saoudrizwan.claude-dev`
698
+ - `~/Library/Application Support/Code - Insiders/User/globalStorage/saoudrizwan.claude-dev`
699
+ - Linux/Windows equivalents under Code globalStorage
700
+ - JetBrains roots:
701
+ - `~/Library/Application Support/JetBrains/<IDE>/globalStorage/saoudrizwan.claude-dev`
702
+ - Linux/Windows equivalents under JetBrains globalStorage
703
+ - Primary task files:
704
+ - `state/taskHistory.json`
705
+ - `tasks/<task-id>/api_conversation_history.json`
706
+ - `tasks/<task-id>/ui_messages.json`
707
+ - `tasks/<task-id>/task_metadata.json`
708
+ - Override: `AGENTLOG_CLINE_ROOT` or `AGENTLOG_CLINE_ROOTS`
709
+
710
+ agentlog treats each `tasks/<task-id>` directory as one session. The API
711
+ conversation history is preferred because it preserves user/assistant roles and
712
+ Anthropic-style `tool_use` / `tool_result` blocks. If it is missing or empty,
713
+ agentlog falls back to `ui_messages.json` so the visible task can still be
714
+ archived. Raw backup includes the task files and the task history index when
715
+ present.
716
+
717
+ Cline task metadata and task history are used for title, model, working
718
+ directory, and timestamps when available. Checkpoint metadata, patch/diff files,
719
+ search/replace edit records, and checkpoint shadow git repositories are decoded
720
+ into assistant edit tool calls when present. When a nearby assistant turn exists,
721
+ the checkpoint diff is attached to that turn; otherwise it remains a
722
+ supplementary assistant event. Those tool calls carry unified diff text or
723
+ old/new string payloads so the web viewer can render the edits inline while the
724
+ original checkpoint files remain in raw backups.
725
+
726
+ ## OpenCode
727
+
728
+ - Import selector: `opencode`
729
+ - Provider: `opencode`
730
+ - Source type: `opencode-history`
731
+ - Primary data root: `~/.local/share/opencode`
732
+ - Storage roots:
733
+ - `~/.local/share/opencode/storage`
734
+ - `~/.local/share/opencode/project/<project-slug>/storage`
735
+ - Primary files:
736
+ - `storage/session/<project-id>/<session-id>.json`
737
+ - `storage/message/<session-id>/<message-id>.json`
738
+ - `storage/part/<message-id>/<part-id>.json`
739
+ - `storage/session_diff/<session-id>.json`
740
+ - `storage/project/<project-id>.json`
741
+ - Overrides:
742
+ - `AGENTLOG_OPENCODE_DATA_DIR` overrides the data root.
743
+ - `AGENTLOG_OPENCODE_STORAGE_DIR` or `AGENTLOG_OPENCODE_STORAGE_ROOTS`
744
+ points directly at one or more `storage` directories.
745
+
746
+ agentlog reads OpenCode's JSON session store directly. Sessions provide the
747
+ archive id and project id; message and part files provide role text, reasoning
748
+ text, tool calls, and tool outputs. Tool parts are normalized into
749
+ `metadata.toolCalls[]` and `metadata.toolResult` records so the web viewer can
750
+ render them with the same cards used for Codex, Claude, Devin, and Cursor.
751
+
752
+ When `session_diff/<session-id>.json` is present, agentlog adds a supplementary
753
+ edit tool call with the diff payload. Unified diff text is rendered inline in
754
+ the history web UI, and the original diff JSON remains in the raw archive.
755
+
756
+ ## Aider
757
+
758
+ - Import selector: `aider`
759
+ - Provider: `aider`
760
+ - Source type: `aider-chat-history`
761
+ - Default chat history file: `.aider.chat.history.md`
762
+ - Optional raw sidecars:
763
+ - `.aider.llm.history`
764
+ - `.aider.input.history`
765
+ - Scan roots:
766
+ - current process directory, unless it is the filesystem root or home directory
767
+ - common project roots such as `~/Documents/GitHub`, `~/Developer`,
768
+ `~/Projects`, `~/Code`, and `~/Work`
769
+ - Overrides:
770
+ - `AGENTLOG_AIDER_ROOT` or `AGENTLOG_AIDER_ROOTS`
771
+ - `AGENTLOG_AIDER_CHAT_HISTORY_FILE` or `AIDER_CHAT_HISTORY_FILE`
772
+ - `AGENTLOG_AIDER_LLM_HISTORY_FILE` / `AIDER_LLM_HISTORY_FILE`
773
+ - `AGENTLOG_AIDER_INPUT_HISTORY_FILE` / `AIDER_INPUT_HISTORY_FILE`
774
+
775
+ agentlog parses Aider's markdown transcript by treating each `#### <prompt>`
776
+ heading as a user message and the following Markdown block as the assistant
777
+ reply. `.aider.llm.history` is parsed when present to enrich assistant turns
778
+ with model, request id, and token usage metadata. The repo root is inferred by
779
+ walking upward to `.git`; otherwise the history file directory is used.
780
+
781
+ When a real git repository is available, agentlog conservatively correlates
782
+ nearby Aider auto-commits with transcript turns and attaches matching commit
783
+ diffs as edit tool-call metadata. Multiple matching commits can attach to the
784
+ same assistant turn and are recorded in `metadata.gitCommits`. Unrelated commits
785
+ are ignored; the original chat, LLM, and input history sidecars remain in raw
786
+ backups.
787
+
788
+ ## Windsurf
789
+
790
+ - Import selector: `windsurf`
791
+ - Provider: `windsurf`
792
+ - Source type: `windsurf-cascade-brain`
793
+ - Primary readable store: `~/.codeium/windsurf/brain/*`
794
+ - Binary store counted but not decoded: `~/.codeium/windsurf/cascade/*.pb`
795
+ - Status: disabled from setup, default imports, and history filters
796
+
797
+ Windsurf support is currently disabled. Current Cascade sessions are written as
798
+ encrypted binary stores, so agentlog can detect session IDs and workspace
799
+ metadata but cannot archive readable conversation text from the local files.
800
+
801
+ The older experimental helper can still read Markdown artifacts from Windsurf
802
+ Cascade brain directories when present. Recognized artifact names are `plan.md`,
803
+ `task.md`, `implementation_plan.md`, and `walkthrough.md`. Each artifact becomes
804
+ an assistant message with a heading naming the artifact. Timestamps come from
805
+ file mtimes.
806
+
807
+ The importer tries to infer a working directory from `file://...` links in the
808
+ Markdown. If none can be inferred, it archives under `windsurf/uncategorized`.
809
+ Binary Cascade protobuf stores are counted in discovery details but not decoded.
810
+
811
+ ## Collector And Live Monitoring
812
+
813
+ `agentlog start` runs the supervisor. The current supervisor periodically imports
814
+ the watcher source list selected during init and can run archive sync. The
815
+ collector path accepts OTLP JSON and stores telemetry payloads under the archive
816
+ telemetry directory.
817
+ For Cursor, the supervisor handles incremental logs going forward; explicit
818
+ full imports handle raw SQLite recovery/backfill.
819
+
820
+ Telemetry bridge setup is a one-time integration written during init when
821
+ selected. Claude Code and Gemini CLI receive native settings merges. Cline uses
822
+ documented environment-variable overrides, so agentlog writes an env file and
823
+ launcher helper under `~/.agentlog/cline/`. These bridges are not the same thing
824
+ as importing transcript history.
825
+
826
+ ## Known Gaps
827
+
828
+ - ChatGPT and Claude.ai are import-by-export only; agentlog does not read their
829
+ desktop app local stores.
830
+ - Windsurf is disabled because Cascade protobuf transcripts appear encrypted.
831
+ - Antigravity protobuf transcripts are counted but not decoded.
832
+ - Cursor older `state.vscdb` stores are best-effort because Cursor has changed
833
+ local storage layouts over time.
834
+ - Claude Desktop metadata-only sessions may contain only the initial prompt and
835
+ selected folders when `audit.jsonl` is unavailable.
836
+ - Any source without a reliable cwd may be archived under an uncategorized scope
837
+ rather than a repo.