npm - agentel - Versions diffs - 0.2.0 - Mend

agentel 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (36) hide show

package/LICENSE +21 -0
package/README.md +452 -0
package/agentlog-spec.md +551 -0
package/bin/agentlog-recall.js +8 -0
package/bin/agentlog.js +14 -0
package/docs/code-reference.md +1108 -0
package/docs/history-source-handling.md +837 -0
package/docs/release.md +69 -0
package/package.json +57 -0
package/src/archive.js +1130 -0
package/src/autostart.js +182 -0
package/src/canonical-events.js +575 -0
package/src/cli.js +7928 -0
package/src/collector.js +113 -0
package/src/commands/logs.js +51 -0
package/src/commands/server.js +11 -0
package/src/config.js +240 -0
package/src/doctor.js +102 -0
package/src/importers/aider.js +553 -0
package/src/importers/claude.js +349 -0
package/src/importers/cline.js +471 -0
package/src/importers/gemini.js +795 -0
package/src/importers/providers.js +149 -0
package/src/importers/shared.js +15 -0
package/src/importers.js +7063 -0
package/src/mcp.js +148 -0
package/src/parser-versions.js +62 -0
package/src/paths.js +61 -0
package/src/redaction.js +228 -0
package/src/repo.js +106 -0
package/src/search.js +619 -0
package/src/sources.js +86 -0
package/src/supervisor.js +217 -0
package/src/sync.js +677 -0
package/src/version.js +7 -0
package/src/web-accounts.js +122 -0

package/docs/history-source-handling.md ADDED Viewed

@@ -0,0 +1,837 @@
+# History Source Handling
+This document describes how agentlog currently discovers, imports, attributes,
+and archives each supported history source. It is implementation documentation,
+not a product promise: when provider storage formats change, this file should be
+updated with the importer change.
+## Shared Import Pipeline
+All supported sources are normalized into the same archive shape before write:
+- `provider`: stable internal provider key, such as `codex`, `claude_code`, or
+  `cursor`.
+- `sourceType`: the specific source family, such as `codex-cli-history` or
+  `cursor-agent-transcripts`.
+- `sessionId`: provider-specific id when available; otherwise a stable hash.
+- `cwd`: working directory if the source exposes one or agentlog can infer one.
+- `repoCanonical`: git remote key from `cwd`, such as `github.com/org/repo`.
+- `scopeCanonical`: non-repo storage scope for sessions without a reliable
+  working directory, such as `claude-desktop/uncategorized`.
+- `messages`: normalized `user`, `assistant`, `system`, or `tool` messages with
+  ISO timestamps.
+- `sourcePath`: local source file, directory, database, or export file.
+- `sourceFiles`: optional list of concrete local files to copy into the raw
+  archive for sources backed by multiple files.
+- `parserVersion`: centralized parser version for the source type.
+- `events`: provider-independent canonical events generated from transcript
+  messages and structured tool metadata.
+- stats metadata: `messageCount`, `userMessageCount`, aggregate `usage`, and
+  `models` are computed while writing the archive.
+Before writing, message content is redacted by `src/redaction.js`. The redacted
+transcript is written to `conversation.md`, `transcript.jsonl`, and
+`events.jsonl`. Unredacted original source files are copied or referenced from
+`session=<id>.raw/` with a manifest. Large multi-session stores such as Cursor
+SQLite use a shared raw-source copy under `raw-sources/`, with each session raw
+manifest pointing at that shared file instead of duplicating the same database
+hundreds of times. The optional reveal cache stores the unredacted normalized
+JSONL when enabled and when the session is repo-scoped. Reimports are skipped by
+fingerprint unless the source file changes or the importer fingerprint version
+changes.
+Use `agentlog import --source cursor --since all --explain-skips` to print
+Cursor skip reasons for one run. Add `--json` to inspect `skipReasons` counts
+and `skippedItems[]` with session ids, source types, paths, and titles.
+## Stats Metadata Contract
+The history web stats view consumes normalized archive metadata. It does not
+recompute user-message counts, token totals, or model lists from old transcripts
+as a hidden compatibility layer. When these fields or parser semantics change
+before v1, update the importer/archive writer and do a clean rebuild:
+Token totals are split by direction where possible. Cache-read/cache-creation
+usage is preserved separately and repeated provider request ids are counted once,
+which avoids inflating Claude Code/Desktop sessions that repeat the same request
+usage across assistant text and tool-call rows.
+```sh
+agentlog reset --yes
+agentlog init
+agentlog import --source all --since all
+```
+`agentlog reset` removes agentlog state and archive objects only; it does not
+delete source application histories such as Cursor, Codex, Claude, Gemini, or
+Devin logs.
+Archive paths are grouped by repo or scope:
+```text
+<data>/agentlog/sessions/
+  repo=<repo-key>/provider=<provider>/year=YYYY/month=MM/day=DD/
+  scope=<scope-key>/provider=<provider>/year=YYYY/month=MM/day=DD/
+```
+Each session directory contains:
+```text
+session=<id>.metadata.json
+session=<id>.conversation.md
+session=<id>.transcript.jsonl
+session=<id>.events.jsonl
+session=<id>.raw/
+  manifest.json
+  001-<original-file-name>
+```
+Raw folders contain original source files, not redacted derivatives. SQLite
+sources include existing `-wal` and `-shm` sidecars. For large shared SQLite
+stores, `manifest.json` entries can be references with `sharedRawPath` instead
+of files copied directly inside that session's `.raw/` directory.
+Directory-backed sources copy only the concrete files listed by the importer;
+agentlog does not blindly copy entire source directories.
+## Canonical Events
+`events.jsonl` is the provider-independent archive/search substrate. It uses
+schema version `agentlog.events.v1` and these event kinds:
+- `session.started`
+- `prompt.submitted`
+- `response.generated`
+- `tool.called`
+- `tool.completed`
+Agentlog intentionally ports only the portable Forge idea here: canonical
+prompt/response/tool events with parser versions. It does not port Forge's
+organization, sensor, device, WorkOS, NATS, ClickHouse, Postgres, or policy
+control-plane fields.
+Tool calls and results should be normalized before archive write. Importers may
+preserve the provider's original category as `rawCategory`, while canonical
+events add viewer-facing display metadata:
+- `metadata.toolCalls[]`: `id`, `name`, `displayName`, `category`, `title`,
+  `status`, `argument`, `rawInputSummary`, `inputPreview`, `target`, `icon`,
+  `categoryLabel`, and `provider`.
+- `metadata.toolResult`: `provider`, `kind`, `title`, `summary`, `output`,
+  `lineCount`, `collapsed`, `category`, `categoryLabel`, `icon`, and optional
+  `status`.
+The viewer reads canonical events or normalized metadata first. Text patterns
+such as `Grep(...)` are legacy fallback only.
+Provider-generated context sometimes appears in upstream logs as `role: user`.
+Agentlog preserves those records in transcripts, but reclassifies known shapes
+as `system` messages with `metadata.providerGenerated = true` and a
+`metadata.contextKind` so they do not become `prompt.submitted` recall events.
+The current allowlist covers common Codex blocks such as
+`<environment_context>`, `# AGENTS.md instructions`, `# Files mentioned by the
+user`, `<subagent_notification>`, `<turn_aborted>`, `<skill>`, and interruption
+markers. It also covers Claude Code blocks such as `<persisted-output>`,
+`<task-notification>`, `<tool_use_error>`, `<command-message>`,
+`<local-command-caveat>`, `<local-command-stdout>`, `<system-reminder>`, skill
+context headers, and interruption markers.
+## Parser Versions
+Parser versions live in `src/parser-versions.js`. They are semantic-version
+strings. The first npm release uses `1.0.0` as the parser baseline for every
+source type.
+After release, bump the affected source type in the same change whenever parser
+output changes for the same raw input: message roles, timestamps, tool-call
+metadata, tool-result metadata, source classification, fingerprints, or
+canonical event text. Use patch bumps for narrow correctness fixes, minor bumps
+for additive parser enrichment, and major bumps for source identity,
+fingerprint, archive-contract, or meaningfully incompatible output changes.
+| Source type | Version |
+| --- | --- |
+| `codex-cli-history` | `1.0.0` |
+| `codex-desktop-history` | `1.0.0` |
+| `cli-history` | `1.0.0` |
+| `claude-sdk-history` | `1.0.0` |
+| `claude-code-desktop-metadata` | `1.0.0` |
+| `claude-workspace-desktop` | `1.0.0` |
+| `cursor-workspace-sqlite` | `1.0.0` |
+| `cursor-global-sqlite` | `1.0.0` |
+| `cursor-raw-sqlite-salvage` | `1.0.0` |
+| `cursor-agent-transcripts` | `1.0.0` |
+| `devin-cli-history` | `1.0.0` |
+| `gemini-cli-history` | `1.0.0` |
+| `cline-task-history` | `1.0.0` |
+| `opencode-history` | `1.0.0` |
+| `aider-chat-history` | `1.0.0` |
+| `antigravity-history` | `1.0.0` |
+| `web-chat-export` | `1.0.0` |
+| `chatgpt-export` | `1.0.0` |
+| `claude-web-export` | `1.0.0` |
+| `claude-web-memory` | `1.0.0` |
+| `import` | `1.0.0` |
+`cursor-sqlite-history` and `antigravity-brain` are compatibility aliases for
+older labels. Fingerprints include the parser version prefix, so changing the
+version makes reimport replace stale archive copies.
+## Search And Recall Compatibility
+`agentlog history` indexes `events.jsonl` first. Search results can include
+`event_id`, `event_kind`, `message_index`, and `matched_text`, then aggregate
+back to sessions for CLI/skill compatibility. Archives without `events.jsonl`
+remain searchable through transcript/markdown fallback, and missing
+`conversation.md` files are materialized from transcripts when needed.
+Recall quality has deterministic tests in `test/recall-eval.test.js` with
+fixtures under `test/fixtures/recall-evals.json`. Add a fixture when a vague
+real-world query should reliably find a representative archived session.
+## Source Order
+The setup UI, import defaults, and history source filters use this grouped order:
+1. OpenAI: Codex CLI, Codex Desktop, ChatGPT
+2. Anthropic: Claude Code CLI, Claude Code Desktop, Claude Workspace,
+   Claude.ai, Claude SDK jobs
+3. Google: Gemini CLI, Antigravity
+4. Cognition: Devin CLI
+5. Other: Cursor, Cline, OpenCode, Aider
+`agentlog import --source all` uses the default import order from
+`src/sources.js`: `codex-cli`, `codex-desktop`, `claude`,
+`claude-code-desktop`, `claude-workspace`, `gemini-cli`, `antigravity`,
+`devin-cli`, `cursor`, `cline`, `opencode`, `aider`. Claude SDK jobs are
+intentionally opt-in. Windsurf is disabled for now because current Cascade
+transcripts are encrypted binary stores.
+The background supervisor polls the watcher source list selected near the end of
+`agentlog init`. New configs still support `imports.autoDiscoverSources=true`,
+but init now records the chosen watcher list exactly by setting
+`imports.autoDiscoverSources=false`.
+Supervisor imports use `imports.defaultSinceDays` as a rolling window. Cursor
+SQLite store scans and raw recovery are disabled in supervisor ticks, so old
+deleted/migrated fragments and legacy SQLite-only conversations are recovered
+only by explicit full imports such as `agentlog import --source cursor --since
+all`. The supervisor still imports newer Cursor agent transcript logs and prunes
+duplicate transcript snapshots.
+## Supervisor And Full Import Contract
+The supervisor is for going-forward archival. It should not silently perform
+old-history repair work that belongs to an explicit full import. Keep these
+rules in mind when adding or changing an importer:
+- Supervisor ticks pass a rolling `--since` window from
+  `imports.defaultSinceDays`; parser backfills and old-history repairs should be
+  documented as explicit `agentlog import --source <source> --since all` flows.
+- Cursor supervisor ticks set `cursorRecovery=false` and `supervisor=true`.
+  That skips raw SQLite salvage, raw companion merge backfill, workspace SQLite,
+  and global `cursorDiskKV` scans. Full Cursor imports keep those heavier
+  recovery paths enabled.
+- Incremental Cursor pruning is scoped to source paths touched by that tick, so
+  the watcher can collapse duplicate live agent-transcript snapshots without
+  opportunistically rewriting old unrelated Cursor history.
+- Cursor agent-transcript session ids are derived from the transcript root/thread
+  key, not message count. A growing live transcript should update one archived
+  session instead of minting snapshots every poll.
+- `writeSession()` replaces existing archives with the same `sourcePath` by
+  default. Importers for many-sessions-per-container stores must opt out of that
+  behavior. Cursor workspace SQLite and Devin `sessions.db` are the important
+  examples; otherwise one session from the DB can delete its siblings. For
+  one-session-per-root snapshots such as Cursor agent transcripts, replacement
+  by `sourcePath` is desirable because it removes stale partial snapshots.
+- Detached supervisor discovery does not have the same current working directory
+  as a manual shell import. Aider can discover the current repo during manual
+  imports, but the supervisor relies on configured roots and common directories
+  such as `~/Documents/GitHub`, `~/Developer`, `~/Projects`, `~/Code`, and
+  `~/Work`. Repos elsewhere need `AGENTLOG_AIDER_ROOT(S)` or equivalent config.
+When introducing a new source, classify it before wiring it into the supervisor:
+- one session per file/directory: sourcePath replacement is usually safe;
+- many sessions per database/directory: disable sourcePath replacement and use a
+  stable provider session id;
+- growing live transcript: make the session id stable across message-count
+  changes and allow sourcePath replacement to collapse snapshots;
+- recovery/backfill parser: keep it behind explicit full imports unless the
+  source is cheap, current, and safe to repair incrementally.
+## Resume Commands
+The web viewer exposes a copy-resume button only when agentlog can form a
+stable local command for the archived source.
+| Source | Resume command | Notes |
+| --- | --- | --- |
+| Codex CLI | `codex resume <session-id>` | Uses the Codex thread id from `~/.codex/state_5.sqlite`. |
+| Codex Desktop | `codex resume <session-id>` | Uses the same Codex thread id. Codex decides whether the resumed session opens in the terminal flow. |
+| Claude Code CLI | `claude -r <session-id>` | Uses the Claude Code JSONL session id. |
+| Devin CLI | `devin -r <session-id>` | agentlog archives these as `devin-<session-id>` and strips that prefix for the resume command, for example `devin -r selective-lotus`. |
+| Claude Code Desktop | No stable local resume command known. | Use Claude's own desktop/history surface or `agentlog show <session-id>`. |
+| Claude Workspace | No stable local resume command known. | Workspace/local-agent session ids are not known to be accepted by Claude Code's CLI resume flag. |
+| Claude SDK jobs | No interactive resume command. | These are programmatic/batch runs. |
+| ChatGPT export | No local resume command. | Official exports are imported snapshots. |
+| Claude.ai export | No local resume command. | Official exports are imported snapshots. |
+| Gemini CLI | No stable local resume command is currently wired. | agentlog imports saved files but does not assume a Gemini CLI resume interface. |
+| Antigravity | No stable local resume command known. | Imported artifacts are readable task/plan files. |
+| Cursor | No stable local resume command known. | Cursor history should be reopened through Cursor if available. |
+| Cline | No stable local resume command known. | Cline task folders can be restored through Cline's own history/recovery surfaces. |
+| OpenCode | `opencode --session <session-id>` | agentlog archives these as `opencode-<session-id>` and strips that prefix for the resume command. |
+| Aider | No stable local resume command known. | Aider histories are repo-local transcript snapshots. |
+## Codex CLI
+- Import selector: `codex-cli`
+- Provider: `codex`
+- Source type: `codex-cli-history`
+- Primary store: `~/.codex/state_5.sqlite`
+- Session files: rollout paths referenced by the `threads` table, plus
+  unindexed `rollout-*.jsonl` files under `sessions` and `archived_sessions`
+- Source split: `threads.source = "cli"`
+- Overrides:
+  - `CODEX_STATE_DB` overrides the state database path.
+  - `CODEX_HOME` is used for the fallback sessions root.
+The importer reads `id`, `rollout_path`, `created_at`, `updated_at`, `source`,
+`cwd`, and `title` from the Codex state database using `sqlite3`. When the
+database has the newer `stage1_outputs` table, agentlog also reads
+`rollout_summary` and `raw_memory` as supplementary Codex summary documents and
+adds them to the archived transcript. The importer also scans
+`~/.codex/sessions` and `~/.codex/archived_sessions` for `rollout-*.jsonl` and
+`rollout-*.jsonl.zst` files that are not referenced by the state database, so
+older archived rollouts still get backed up.
+The rollout JSONL parser captures readable `response_item` reasoning summaries,
+Codex `event_msg` assistant/user messages, task and compaction markers, local
+shell calls, web search calls, custom tool calls such as `apply_patch`, tool
+outputs, and token-count usage deltas. Shell calls that run `apply_patch`
+through a heredoc are promoted to edit tool calls with `patch`, `diff`, and
+target path metadata. The working directory comes from the parsed transcript
+first, then the `threads.cwd` column. If neither is available, the session is
+archived under `codex/uncategorized` instead of inheriting the supervisor's
+current directory. Repo attribution is computed from the resolved directory.
+Reading `.zst` sessions requires `zstd` or `unzstd`.
+## Codex Desktop
+- Import selector: `codex-desktop`
+- Provider: `codex`
+- Source type: `codex-desktop-history`
+- Primary store: `~/.codex/state_5.sqlite`
+- Session files: rollout paths referenced by the `threads` table
+- Source split: `threads.source = "vscode"`
+- Overrides: same as Codex CLI
+Codex Desktop uses the same state database, summary-document handling, and
+rollout parser as Codex CLI. The only distinction is the `threads.source` value.
+This is why the web source dropdown can split Codex CLI and Codex Desktop even
+though both archive under the same `codex` provider.
+## ChatGPT Export
+- Import command: `agentlog import chatgpt --file <path> [--scope local|team]`
+- Provider: `chatgpt`
+- Source type: `web-chat-export`
+- Source file: ChatGPT JSON export or ZIP containing a JSON export
+- Default archive scope: `chatgpt`
+ChatGPT is not scanned automatically from a desktop app. The user provides an
+official export file. ZIP imports prefer `conversations.json`, then another JSON
+file with `chat` in the name, then the first JSON file in the ZIP.
+For OpenAI export mappings, agentlog reads each node message, normalizes
+`author.role`, extracts `content.parts`, and uses `create_time` or `update_time`
+as the timestamp. Web imports are scope-based by default because they generally
+do not have a reliable local working directory.
+## Claude Code CLI
+- Import selector: `claude`
+- Provider: `claude_code`
+- Source type: `cli-history`
+- Primary store: `~/.claude/projects/*/*.jsonl`
+Claude Code CLI files are discovered under `~/.claude/projects`. Each JSONL file
+is classified before import. A file is treated as an interactive conversation
+when the initial records include `type = "user"` or `type = "assistant"` with a
+`message` object and no `entrypoint = "sdk-cli"`.
+The Claude-specific JSONL parser extracts session ids, titles, cwd fields,
+message roles, text content, timestamps, assistant thinking summaries,
+`tool_use` calls, `tool_result` outputs, model, request id, stop status, and
+token usage. Tool calls and results are normalized into the shared
+`metadata.toolCalls[]`, `metadata.toolResult`, and `metadata.usage` shapes.
+Bash or shell tool calls that invoke `apply_patch` are reclassified as edit
+calls and retain the patch text under `arguments.diff`. Repo attribution is
+computed from the parsed `cwd`; if no cwd is present the session is archived
+under an uncategorized provider scope.
+## Claude SDK Jobs
+- Import selector: `claude-sdk`
+- Provider: `claude_sdk`
+- Source type: `claude-sdk-history`
+- Primary store: `~/.claude/projects/*/*.jsonl`
+- Default setup state: unchecked
+SDK jobs are stored in the same Claude project tree as interactive Claude Code
+sessions. agentlog separates them by scanning the initial JSONL records for
+`entrypoint = "sdk-cli"`. They are shown as a separate opt-in source because
+batch runs can be much higher volume than interactive sessions.
+When imported, SDK jobs use the same Claude-specific JSONL parser as Claude Code
+CLI but archive under `claude_sdk`.
+## Claude Code Desktop
+- Import selector: `claude-code-desktop`
+- Provider: `claude_desktop`
+- Source type: `claude-code-desktop-metadata`
+- Primary store:
+  `~/Library/Application Support/Claude/claude-code-sessions/local_*.json`
+- Audit transcript path:
+  `~/Library/Application Support/Claude/claude-code-sessions/local_<id>/audit.jsonl`
+- Fallback scope: `claude-code-desktop/uncategorized`
+Claude Code Desktop local files are JSON metadata records. When a matching
+`audit.jsonl` exists, agentlog imports assistant, user, and tool summary events
+from that audit file, including Anthropic-style `tool_use` and `tool_result`
+blocks when the audit payload carries them. When no audit file exists, it
+imports metadata-derived messages from `initialMessage` and selected folders
+when present.
+Discovery scans the Claude app storage once, but the user-facing source rows are
+split by kind. `claude-code-desktop` is the Claude Code desktop-launch metadata
+path; `claude-workspace` is Claude app local-agent/workspace mode. The older
+generic `claude-desktop` aggregate is kept only as a compatibility import
+selector and is not shown as a separate discovery row.
+Working directory attribution comes from `originCwd`, then `cwd`, then the first
+existing folder in `userSelectedFolders`. If no existing directory is available,
+the session is archived under `claude-code-desktop/uncategorized` instead of
+being assigned to whatever repo agentlog happens to run from.
+## Claude Workspace
+- Import selector: `claude-workspace`
+- Provider: `claude_desktop`
+- Source type: `claude-workspace-desktop`
+- Primary store:
+  `~/Library/Application Support/Claude/local-agent-mode-sessions/local_*.json`
+- Audit transcript path:
+  `~/Library/Application Support/Claude/local-agent-mode-sessions/local_<id>/audit.jsonl`
+- Fallback scope: `claude-desktop/uncategorized`
+Claude Workspace uses the same parser as Claude Code Desktop but reads from the
+Claude app local-agent mode directory. `audit.jsonl` is preferred when present.
+Metadata fallback imports the initial prompt and selected folder context.
+As with Claude Code Desktop, repo attribution only happens when an existing
+working directory can be found. Otherwise the archive is intentionally
+uncategorized.
+## Claude.ai Export
+- Import command: `agentlog import claude-web --file <path> [--scope local|team]`
+- Provider: `claude_web`
+- Source types: `claude-web-export`, `claude-web-memory`
+- Source file: Claude.ai JSON export or ZIP containing a JSON export
+- Default archive scope: `claude_web`
+Claude.ai is not scanned automatically from the desktop app. The user provides
+an official export file. agentlog reads `chat_messages`, `messages`, or
+`children`, normalizes sender/role fields, extracts text content, and uses
+`created_at`, `updated_at`, or `timestamp`.
+Project-file conversations are imported as project-scoped sessions. Top-level
+conversation exports are assigned to a project only when Claude includes an
+explicit project reference such as `project_uuid`, `project_id`, or a nested
+`project` object. Some official Claude exports include `projects/*.json` and
+project memories but omit the per-conversation project id in `conversations.json`;
+those conversations remain under the account-level chat scope.
+Memory exports are grouped under the synthetic `memory` chat folder instead of
+the original project folder. Root memory is titled `Claude Memory`; project
+memory is titled `Claude Project Memory: <project name>`. This keeps project
+folders from implying that account-level conversations were reliably tagged to
+Claude projects when the export did not preserve that relationship. Re-run
+`agentlog import claude-web --file <path>` after importing an export that
+contains conversation project ids or after memory parser semantics change.
+Like ChatGPT export imports, Claude.ai imports are scope-based by default because
+the export does not reliably describe a local repo.
+## Gemini CLI
+- Import selector: `gemini-cli`
+- Provider: `gemini_cli`
+- Source type: `gemini-cli-history`
+- Primary stores:
+  - `~/.gemini/tmp`
+  - `~/.gemini/history`
+  - `~/.gemini/sessions`
+  - `~/.gemini/exports`
+  - `$GEMINI_HOME` or `AGENTLOG_GEMINI_HOME_DIR` equivalents
+agentlog scans Gemini CLI structured history files: `.json`, `.jsonl`, `.md`,
+and `.markdown`. Under `~/.gemini/tmp`, it includes chat/checkpoint directories
+and one-level JSON files, while excluding `shell_history`.
+JSON and JSONL files use a Gemini-specific parser for `role` / `parts` content,
+native Gemini CLI `type: "user"` / `type: "gemini"` content records, `model`
+role normalization, `functionCall`, `functionResponse`, direct tool call/result
+shapes, nested native `toolCalls[].result` entries, shell/code execution parts,
+usage metadata, and checkpoint or restore events. Gemini tmp prompt logs and
+chat JSONL sidecars with the same session id are coalesced so prompt-only
+`logs.json` files do not overwrite richer assistant/tool transcripts. Markdown
+files are split into messages by role headings such as
+`# User`, `# Assistant`, or bold role labels. The working directory comes from
+parsed cwd fields or Gemini tmp `.project_root` metadata. If no working
+directory can be resolved, the session is archived under
+`gemini-cli/uncategorized`.
+## Antigravity
+- Import selector: `antigravity`
+- Provider: `antigravity`
+- Source type: `antigravity-brain`
+- Primary readable store: `~/.gemini/antigravity/brain/*`
+- Binary store counted but not decoded:
+  `~/.gemini/antigravity/conversations/*.pb`
+agentlog imports readable Markdown artifacts from each task directory. Recognized
+artifact names are `task.md`, `implementation_plan.md`, `walkthrough.md`, and
+`plan.md`. Each artifact becomes an assistant message with a heading naming the
+artifact. Timestamps come from artifact file mtimes.
+The importer tries to infer a working directory from `file://...` links inside
+the Markdown artifacts. If none can be inferred, it archives under
+`antigravity/uncategorized`. Binary protobuf transcripts are counted in
+discovery details but not imported as conversation messages yet.
+## Devin CLI
+- Import selector: `devin-cli`
+- Provider: `devin`
+- Source type: `devin-cli-history`
+- Primary store on macOS/Linux: `~/.local/share/devin/cli/sessions.db`
+- Primary store on Windows: `%LOCALAPPDATA%\devin\cli\sessions.db`
+- WAL files: `sessions.db-shm` and `sessions.db-wal`
+- Override: `AGENTLOG_DEVIN_SESSIONS_DB` points at an alternate database file.
+agentlog reads Devin for Terminal's SQLite store with `sqlite3`. It imports
+visible rows from `sessions`, then reads `message_nodes` and reconstructs the
+selected conversation branch by walking backward from `sessions.main_chain_id`
+through `message_nodes.parent_node_id`. That avoids importing alternate branch
+nodes that are present in the database but not part of the visible thread.
+`message_nodes.chat_message` is JSON. agentlog normalizes `role`, `content`,
+and `tool_calls`, skips system messages, and skips Devin context user messages
+that begin with `<rules ...>`, `<available_skills>`, or `<git_status>`. Tool
+results are preserved as `tool` messages with `metadata.toolResult`.
+Assistant tool calls are stored in `metadata.toolCalls[]`; agentlog no longer
+appends synthetic readable lines such as `Grep(TODO)` into assistant prose.
+Devin's `metadata.extensions["chisel/tool_call_content"]` is used for small
+display metadata (`title`, `status`, `kind`, and tool id) while arguments are
+stored as redaction-aware summaries.
+Timestamps come from `sessions.created_at`, `sessions.last_activity_at`, and
+per-node `created_at` values. Devin currently stores these as Unix seconds.
+The working directory comes from `sessions.working_directory`, so repo
+attribution follows the project directory Devin was launched from.
+If `message_nodes` contains no importable messages, agentlog falls back to
+`prompt_history` so at least direct user prompts can be archived.
+## Cursor
+- Import selector: `cursor`
+- Provider: `cursor`
+- Source types:
+  - `cursor-sqlite-history`
+  - `cursor-workspace-sqlite`
+  - `cursor-global-sqlite`
+  - `cursor-raw-sqlite-salvage`
+  - `cursor-agent-transcripts`
+- Older workspace store:
+  `~/Library/Application Support/Cursor/User/workspaceStorage/*/state.vscdb`
+- Global store used for Cursor composer and raw salvage data:
+  `~/Library/Application Support/Cursor/User/globalStorage/state.vscdb`
+- Newer project transcript store:
+  `~/.cursor/projects/<project-slug>/agent-transcripts/**`
+- Overrides:
+  - `AGENTLOG_CURSOR_WORKSPACE_STORAGE_DIR` overrides the workspace SQLite root.
+  - `AGENTLOG_CURSOR_GLOBAL_STORAGE_DIR` overrides the global SQLite root.
+  - `AGENTLOG_CURSOR_GLOBAL_STORAGE_DB` points at one explicit global SQLite DB.
+  - `AGENTLOG_CURSOR_PROJECTS_DIR` overrides the project transcript root.
+  - `AGENTLOG_CURSOR_HOME_DIR` is available for tests and alternate home roots.
+For older Cursor stores, agentlog reads `state.vscdb` with `sqlite3` and selects
+the `workbench.panel.aichat.view.aichat.chatdata`, `composer.composerData`,
+`aiService.prompts`, and `aiService.generations` keys from `ItemTable`. It
+recursively finds `bubbles` arrays, converts Cursor bubble types into user or
+assistant messages, appends terminal/file-selection context, and uses bubble
+timestamps when present. Cursor tool call, tool result, usage, model, status,
+request id, composer id, and edit/diff-like records are normalized into the
+shared `metadata.toolCalls[]`, `metadata.toolResult`, and `metadata.usage`
+shapes when those fields appear in the stored blobs.
+When an old Cursor workspace has no full `bubbles` transcript but still has
+composer headers plus `aiService` prompt/generation breadcrumbs, agentlog imports
+a fallback searchable timeline. Composer headers provide titles and time ranges;
+matching `composer` generation rows become user messages and `apply` generation
+rows become assistant "Applied changes" messages. This recovers older
+UI-visible Cursor work where the local database preserved activity summaries but
+not the full assistant prose.
+Cursor also stores many UI-visible Composer/Agent conversations in the global
+`cursorDiskKV` table instead of the per-workspace `ItemTable`. Those records are
+usually split: `composerData:<composerId>` stores the title, original
+created/updated times, workspace/context hints, and ordered
+`fullConversationHeadersOnly`, while `bubbleId:<composerId>:<bubbleId>` stores
+the individual message bodies. Some older rows instead put the ordered message
+records directly in `composerData.conversation`; when headers are empty or
+missing, that inline conversation array is the authoritative ordering source.
+Agentlog imports these as `cursor-global-sqlite` by reading the best available
+header/conversation list, loading only the matching bubble rows when needed,
+ordering bubbles by that list, and using the composer-level timestamps instead of
+Cursor's migrated bubble timestamps. This is the path that recovers old Cursor UI
+conversations whose workspace `state.vscdb` now only shows selected composer ids
+or prompt history.
+When the global composer record omits a workspace folder, agentlog cross-checks
+workspace `composer.composerData` and chat state for the same composer id and
+uses that workspace folder for repo attribution. It also mines absolute paths
+from Cursor bubble context, including object keys such as nested
+`context.mentions.fileSelections["file:///..."]`, `relevantFiles`,
+`attachedFolders`, workspace URIs, and recently viewed files. When old file
+paths in the bubble context no longer exist, cwd inference walks up to the
+nearest existing parent so the session can still resolve to the repo.
+During dedupe, richer Cursor sources win over fallback sources. Newer project
+agent transcripts rank highest, global `cursorDiskKV` composer records rank
+above workspace SQLite, and `aiService` prompt/apply history ranks as a
+breadcrumb fallback. This matters for old Cursor threads where the same UI
+conversation appears twice: once as a full global composer with real historical
+timestamps, and once as a workspace prompt-history snapshot stamped with the
+SQLite file mtime. Exact cross-source duplicates, same-title near duplicates,
+and prompt-history snippets with matching user prompts are dropped in favor of
+the richer source.
+Cursor can also leave older UI-visible conversations only as deleted or migrated
+SQLite page fragments. To recover those, agentlog performs a best-effort raw
+salvage pass over each workspace `state.vscdb` plus sibling `state.vscdb.backup`
+and `state.vscdb-wal` files, and over the global storage `state.vscdb` with the
+same backup/WAL siblings. This pass streams the bytes without mutating Cursor's
+files, searches for parseable Cursor JSON records whose keys look like
+`composerData:<id>`, `_composerData:<id>`, or
+`bubbleId:<composerId>:<bubbleId>`, and imports them as
+`cursor-raw-sqlite-salvage`. `composerData` fragments are parsed as whole
+conversation containers when possible; `bubbleId` fragments are grouped by
+composer id and sorted by Cursor timestamps or raw-file offset. Working
+directories are inferred from workspace identifiers, selected file paths, tool
+call metadata, absolute paths in tool output, and recovered shell prompts such
+as `web-a37 %`. Unresolved recovered sessions remain under
+`cursor/uncategorized`.
+Raw salvage is intentionally conservative. It skips corrupt or incomplete JSON,
+caps individual fragments so one damaged free-list page cannot stall an import,
+deduplicates recovered sessions against the normal Cursor sources, and runs a
+second conservative merge pass for raw companion fragments. Assistant/tool-only
+raw fragments can attach to a same-project user session when the recovered
+assistant prose has enough keyword overlap with the target. Raw fragments that
+contain both user prompts and assistant/tool responses can attach to the best
+matching workspace or agent transcript only when multiple recovered user prompts
+already appear in that target. When a raw fragment is just a timestamp-shifted
+copy of content already merged, it is dropped as contained. Merged messages are
+annotated with `metadata.mergeReason = "cursor-raw-assistant-only"` or
+`"cursor-raw-companion"`, plus `metadata.mergedFromSessionId`, so the recovery
+remains auditable. The target session fingerprint includes the recovered
+companion content, so a rerun can refresh a previously archived workspace thread
+instead of skipping it as already imported. It can recover full assistant prose
+when that prose still exists in parseable raw SQLite bytes; if Cursor has
+compacted, vacuumed, encrypted, or synced only a server-side copy, the fallback
+may only find `aiService` prompt/apply breadcrumbs or nothing.
+Some raw fragments do not contain durable Cursor timestamps. When that happens,
+the parser may need a synthetic timestamp while constructing the archive object,
+but the history viewer treats timestamps that line up with Cursor SQLite file
+mtimes as recovered/unknown rather than as fresh activity. This prevents a clean
+reinstall or large raw-salvage import from making old Cursor threads appear as
+if they all happened minutes ago.
+Cursor progress bars count scan units for the current phase, not final
+conversations. For example, `workspace stores: 79/123` means 79 of 123 Cursor
+workspace SQLite files have been scanned. The final discovery row reports the
+deduped session count after workspace rows, global `cursorDiskKV`, raw salvage,
+and project transcript sources have all been merged.
+The older SQLite path gets its working directory from the workspace
+`workspace.json` file next to `state.vscdb`. If that file is missing or
+unreadable, the session is archived under `cursor/uncategorized`.
+For newer Cursor agent transcripts, agentlog scans
+`~/.cursor/projects/<project-slug>/agent-transcripts` for `.json` and `.jsonl`
+files and groups files by transcript session directory. It parses top-level
+`role` plus `message.content` shapes, generic nested message shapes, and common
+timestamp fields. JSON and JSONL transcripts also get Cursor-specific extraction
+for `tool_calls`, `toolCalls`, `toolResults`, command outputs, edit records,
+diff records, model/status/request metadata, and token usage. When no
+per-message timestamp exists, it uses the source file's mtime with stable
+millisecond offsets so imports do not get stamped with the time of import.
+Cursor project slugs are decoded back to local paths when possible. For example,
+`Users-bzhou-Documents-GitHub-spring-next` resolves to
+`/Users/bzhou/Documents/GitHub/spring-next` if that directory exists. If no
+working directory can be resolved for a newer transcript, it archives under
+`cursor/uncategorized` instead of assigning the session to the current repo.
+## Cline
+- Import selector: `cline`
+- Provider: `cline`
+- Source type: `cline-task-history`
+- VS Code roots:
+  - `~/Library/Application Support/Code/User/globalStorage/saoudrizwan.claude-dev`
+  - `~/Library/Application Support/Code - Insiders/User/globalStorage/saoudrizwan.claude-dev`
+  - Linux/Windows equivalents under Code globalStorage
+- JetBrains roots:
+  - `~/Library/Application Support/JetBrains/<IDE>/globalStorage/saoudrizwan.claude-dev`
+  - Linux/Windows equivalents under JetBrains globalStorage
+- Primary task files:
+  - `state/taskHistory.json`
+  - `tasks/<task-id>/api_conversation_history.json`
+  - `tasks/<task-id>/ui_messages.json`
+  - `tasks/<task-id>/task_metadata.json`
+- Override: `AGENTLOG_CLINE_ROOT` or `AGENTLOG_CLINE_ROOTS`
+agentlog treats each `tasks/<task-id>` directory as one session. The API
+conversation history is preferred because it preserves user/assistant roles and
+Anthropic-style `tool_use` / `tool_result` blocks. If it is missing or empty,
+agentlog falls back to `ui_messages.json` so the visible task can still be
+archived. Raw backup includes the task files and the task history index when
+present.
+Cline task metadata and task history are used for title, model, working
+directory, and timestamps when available. Checkpoint metadata, patch/diff files,
+search/replace edit records, and checkpoint shadow git repositories are decoded
+into assistant edit tool calls when present. When a nearby assistant turn exists,
+the checkpoint diff is attached to that turn; otherwise it remains a
+supplementary assistant event. Those tool calls carry unified diff text or
+old/new string payloads so the web viewer can render the edits inline while the
+original checkpoint files remain in raw backups.
+## OpenCode
+- Import selector: `opencode`
+- Provider: `opencode`
+- Source type: `opencode-history`
+- Primary data root: `~/.local/share/opencode`
+- Storage roots:
+  - `~/.local/share/opencode/storage`
+  - `~/.local/share/opencode/project/<project-slug>/storage`
+- Primary files:
+  - `storage/session/<project-id>/<session-id>.json`
+  - `storage/message/<session-id>/<message-id>.json`
+  - `storage/part/<message-id>/<part-id>.json`
+  - `storage/session_diff/<session-id>.json`
+  - `storage/project/<project-id>.json`
+- Overrides:
+  - `AGENTLOG_OPENCODE_DATA_DIR` overrides the data root.
+  - `AGENTLOG_OPENCODE_STORAGE_DIR` or `AGENTLOG_OPENCODE_STORAGE_ROOTS`
+    points directly at one or more `storage` directories.
+agentlog reads OpenCode's JSON session store directly. Sessions provide the
+archive id and project id; message and part files provide role text, reasoning
+text, tool calls, and tool outputs. Tool parts are normalized into
+`metadata.toolCalls[]` and `metadata.toolResult` records so the web viewer can
+render them with the same cards used for Codex, Claude, Devin, and Cursor.
+When `session_diff/<session-id>.json` is present, agentlog adds a supplementary
+edit tool call with the diff payload. Unified diff text is rendered inline in
+the history web UI, and the original diff JSON remains in the raw archive.
+## Aider
+- Import selector: `aider`
+- Provider: `aider`
+- Source type: `aider-chat-history`
+- Default chat history file: `.aider.chat.history.md`
+- Optional raw sidecars:
+  - `.aider.llm.history`
+  - `.aider.input.history`
+- Scan roots:
+  - current process directory, unless it is the filesystem root or home directory
+  - common project roots such as `~/Documents/GitHub`, `~/Developer`,
+    `~/Projects`, `~/Code`, and `~/Work`
+- Overrides:
+  - `AGENTLOG_AIDER_ROOT` or `AGENTLOG_AIDER_ROOTS`
+  - `AGENTLOG_AIDER_CHAT_HISTORY_FILE` or `AIDER_CHAT_HISTORY_FILE`
+  - `AGENTLOG_AIDER_LLM_HISTORY_FILE` / `AIDER_LLM_HISTORY_FILE`
+  - `AGENTLOG_AIDER_INPUT_HISTORY_FILE` / `AIDER_INPUT_HISTORY_FILE`
+agentlog parses Aider's markdown transcript by treating each `#### <prompt>`
+heading as a user message and the following Markdown block as the assistant
+reply. `.aider.llm.history` is parsed when present to enrich assistant turns
+with model, request id, and token usage metadata. The repo root is inferred by
+walking upward to `.git`; otherwise the history file directory is used.
+When a real git repository is available, agentlog conservatively correlates
+nearby Aider auto-commits with transcript turns and attaches matching commit
+diffs as edit tool-call metadata. Multiple matching commits can attach to the
+same assistant turn and are recorded in `metadata.gitCommits`. Unrelated commits
+are ignored; the original chat, LLM, and input history sidecars remain in raw
+backups.
+## Windsurf
+- Import selector: `windsurf`
+- Provider: `windsurf`
+- Source type: `windsurf-cascade-brain`
+- Primary readable store: `~/.codeium/windsurf/brain/*`
+- Binary store counted but not decoded: `~/.codeium/windsurf/cascade/*.pb`
+- Status: disabled from setup, default imports, and history filters
+Windsurf support is currently disabled. Current Cascade sessions are written as
+encrypted binary stores, so agentlog can detect session IDs and workspace
+metadata but cannot archive readable conversation text from the local files.
+The older experimental helper can still read Markdown artifacts from Windsurf
+Cascade brain directories when present. Recognized artifact names are `plan.md`,
+`task.md`, `implementation_plan.md`, and `walkthrough.md`. Each artifact becomes
+an assistant message with a heading naming the artifact. Timestamps come from
+file mtimes.
+The importer tries to infer a working directory from `file://...` links in the
+Markdown. If none can be inferred, it archives under `windsurf/uncategorized`.
+Binary Cascade protobuf stores are counted in discovery details but not decoded.
+## Collector And Live Monitoring
+`agentlog start` runs the supervisor. The current supervisor periodically imports
+the watcher source list selected during init and can run archive sync. The
+collector path accepts OTLP JSON and stores telemetry payloads under the archive
+telemetry directory.
+For Cursor, the supervisor handles incremental logs going forward; explicit
+full imports handle raw SQLite recovery/backfill.
+Telemetry bridge setup is a one-time integration written during init when
+selected. Claude Code and Gemini CLI receive native settings merges. Cline uses
+documented environment-variable overrides, so agentlog writes an env file and
+launcher helper under `~/.agentlog/cline/`. These bridges are not the same thing
+as importing transcript history.
+## Known Gaps
+- ChatGPT and Claude.ai are import-by-export only; agentlog does not read their
+  desktop app local stores.
+- Windsurf is disabled because Cascade protobuf transcripts appear encrypted.
+- Antigravity protobuf transcripts are counted but not decoded.
+- Cursor older `state.vscdb` stores are best-effort because Cursor has changed
+  local storage layouts over time.
+- Claude Desktop metadata-only sessions may contain only the initial prompt and
+  selected folders when `audit.jsonl` is unavailable.
+- Any source without a reliable cwd may be archived under an uncategorized scope
+  rather than a repo.