npm - agentel - Versions diffs - 0.2.6 → 0.3.0 - Mend

agentel 0.2.6 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (36) hide show

package/README.md +260 -79
package/docs/code-reference.md +130 -42
package/docs/history-source-handling.md +685 -153
package/docs/release.md +35 -8
package/npm-shrinkwrap.json +478 -0
package/package.json +20 -4
package/scripts/postinstall.js +156 -0
package/src/archive.js +1342 -50
package/src/canonical-events.js +346 -35
package/src/cli.js +8835 -843
package/src/collector.js +42 -4
package/src/config.js +26 -4
package/src/diffs.js +156 -0
package/src/doctor.js +48 -5
package/src/importers/claude.js +51 -4
package/src/importers/copilot.js +385 -0
package/src/importers/cursor-recovery.js +22 -0
package/src/importers/factory.js +396 -0
package/src/importers/gemini.js +41 -1
package/src/importers/grok.js +367 -0
package/src/importers/pi.js +422 -0
package/src/importers/providers.js +64 -5
package/src/importers.js +6429 -747
package/src/mcp.js +1 -0
package/src/memory-sources.js +671 -0
package/src/memory-store.js +0 -0
package/src/parser-versions.js +13 -0
package/src/pricing.js +84 -0
package/src/search.js +641 -215
package/src/session-store.js +405 -0
package/src/source-watch.js +293 -0
package/src/sources.js +60 -11
package/src/supervisor.js +197 -9
package/src/sync.js +6 -0
package/src/unavailable-sources.js +358 -0
package/src/web-export-instructions.js +6 -4

package/docs/history-source-handling.md CHANGED Viewed

@@ -17,7 +17,7 @@ All supported sources are normalized into the same archive shape before write:
 - `cwd`: working directory if the source exposes one or agentlog can infer one.
 - `repoCanonical`: git remote key from `cwd`, such as `github.com/org/repo`.
 - `scopeCanonical`: non-repo storage scope for sessions without a reliable
-  working directory, such as `claude-desktop/uncategorized`.
+  working directory, such as `claude-cowork/uncategorized`.
 - `messages`: normalized `user`, `assistant`, `system`, or `tool` messages with
   ISO timestamps.
 - `sourcePath`: local source file, directory, database, or export file.
@@ -59,6 +59,44 @@ usage is preserved separately and repeated provider request ids are counted once
 which avoids inflating Claude Code/Desktop sessions that repeat the same request
 usage across assistant text and tool-call rows.
+Session metadata stores compact `toolUsage` summaries derived from canonical
+`tool.called`/`tool.completed` events so global and project stats can aggregate
+most-used tools without rereading transcripts. Existing archives without that
+field need a clean reimport/rebuild before most-used tool stats are complete;
+the web stats layer does not reread event files in the request path.
+Archive schema v6 adds import-time summaries for work and durable context.
+`outputTokenWork`
+classifies assistant output tokens into `text`, `toolUse`, `reasoning`, and
+`unknown` buckets when message usage and message shape support it; mixed
+text/tool messages are counted as `unknown` instead of being split by a hidden
+heuristic. `outcomes` stores lightweight counts for edit tool calls, unique
+files touched by edit tools, and durable knowledge captures such as `AGENTS.md`,
+`CLAUDE.md`, provider memory files, skill definitions, project rules, and
+planning/decision docs. Canonical events v5 additionally emits `memory.read`,
+`memory.write`, and `memory.loaded` events for memory-file activity, and
+`outcomes` records `memoryReads`, `memoryWrites`, and `memoryLoads` separately
+from generic edit/tool counts. The stats API aggregates these fields into daily
+rows for output job-mix charts and tokens-per-meaningful-event ratios. Older
+archives should be rebuilt or reimported for coverage; the viewer does not scan
+old transcripts to recover these fields.
+View schema v9 carries those memory events through the compact browser payload
+and renders memory-file reads, writes, and loads as memory activity rather than
+ordinary file diff cards.
+Spend charts are derived from actual provider/export `costUsd` values when
+available, otherwise from a versioned model-pricing table and known token
+directions. Tokens that cannot be priced confidently remain in unpriced coverage
+fields rather than being multiplied by a single blended rate. Estimated spend
+payloads carry pricing source/version metadata from `src/pricing.js`, while
+provider-reported cost stays labeled as actual. The stats API also emits a
+30-day usage summary for the web viewer: latest-day spend/tokens/prompts,
+7-day spend, 30-day spend/tokens/prompts/sessions, and the top known model in
+that window. Browser stats payloads are range-shaped: the default response
+includes all-time scalar totals, the visible chart window, and the rolling
+activity heatmap; older activity years and all-time daily breakdowns are fetched
+on demand instead of being bundled into every first load.
 When provider token usage is missing, message-level estimates use visible text
 length plus visible tool-call summaries at roughly four characters per token.
 The generic archive estimate is stored as `metadata.tokenEstimate` on each
@@ -84,6 +122,12 @@ own SDK aggregate fields plus `split_stats.sdk`, and the web view renders an SDK
 jobs card and heatmap. This keeps high-volume batch automation searchable and
 auditable without letting it swamp interactive usage stats.
+Subagent child sessions are also excluded from primary stats by default. Parent
+sessions keep compact subagent run metadata and provider-level thread counters,
+while child sessions remain direct-addressable for transcript inspection and can
+be included by the web stats Subagents toggle or other explicit
+subagent-inclusive stats/search paths.
 Cursor sessions that still lack provider-reported usage get a separate
 `estimatedUsage` metadata field instead of synthetic `usage`. The estimate uses
 empirical per-assistant-turn Cursor rates by model family, with visible
@@ -95,15 +139,35 @@ split as non-assistant input, assistant output, and Claude thinking output, not
 reconstructed billing context windows.
 ```sh
-agentlog update --yes --since all
+agentlog update --yes
 ```
 `agentlog update` preserves config preferences, redaction settings, web account
-labels, source histories, and recall integrations. It removes derived local
-archive/import/index state and reimports configured local sources. Manual web
-exports still need to be imported again from the original export file when those
-archives need to be rebuilt. `agentlog reset` is the heavier path: it removes
-agentlog state and archive objects, including config, while still leaving source
+labels, manually imported ChatGPT/Claude.ai archive objects, local archive
+objects whose original source files are no longer present, source histories,
+and recall integrations. It removes derived local agent archive/import/index
+state and reimports configured local sources. Sessions whose source transcripts
+have already been cleaned up by the source application are restored from the
+previous agentlog archive instead of being silently dropped, and update/doctor
+report a source/provider/sourceType breakdown for those preserved unavailable
+sessions. When the preserved session is an old Claude Code archive with an
+Agentlog raw backup, restore the source transcript explicitly before a clean
+reimport:
+```sh
+agentlog repair claude-code-backups --dry-run
+agentlog repair claude-code-backups --yes
+agentlog update --yes --since all --sources claude
+```
+The default rebuild window is `imports.updateSince`, saved from the initial
+backfill or explicit all-source imports, falling back to `all` for legacy configs.
+The watcher's rolling
+`imports.defaultSinceDays` is not used by `agentlog update`; `--since` still
+overrides it for one run. Manual web exports only need to be
+imported again from the original export file when those chat archives themselves
+need to be rebuilt. `agentlog reset` is the heavier path: it removes agentlog
+state and archive objects, including config, while still leaving source
 application histories such as Cursor, Codex, Claude, Gemini, or Devin logs
 untouched.
@@ -137,13 +201,16 @@ agentlog does not blindly copy entire source directories.
 ## Canonical Events
 `events.jsonl` is the provider-independent archive/search substrate. It uses
-schema version `agentlog.events.v2` and these event kinds:
+schema version `agentlog.events.v5` and these event kinds:
 - `session.started`
 - `prompt.submitted`
 - `response.generated`
 - `tool.called`
 - `tool.completed`
+- `memory.read`
+- `memory.write`
+- `memory.loaded`
 Agentlog intentionally ports only the portable Forge idea here: canonical
 prompt/response/tool events with parser versions. It does not port Forge's
@@ -156,10 +223,12 @@ events add viewer-facing display metadata:
 - `metadata.toolCalls[]`: `id`, `name`, `displayName`, `category`, `title`,
   `status`, `argument`, `rawInputSummary`, `inputPreview`, `target`, `icon`,
-  `categoryLabel`, and `provider`.
+  `categoryLabel`, `provider`, and optional `structuredPatch` hunks when the
+  source exposes line-aware unified diffs.
 - `metadata.toolResult`: `id`, `name`, `provider`, `kind`, `title`, `summary`,
-  `output`, `lineCount`, `collapsed`, `category`, `categoryLabel`, `icon`, and
-  optional `status`.
+  `output`, `lineCount`, `collapsed`, `category`, `categoryLabel`, `icon`,
+  optional `status`, and optional provider diff hunks such as
+  `structuredPatch`.
 `tool.completed.parentEventId` links to the matching `tool.called` event when a
 provider exposes stable ids or matching tool names. When those are absent,
@@ -168,6 +237,9 @@ such as Devin CLI still preserve the call/result relationship.
 The viewer reads canonical events or normalized metadata first. Text patterns
 such as `Grep(...)` are legacy fallback only.
+For memory-file activity, the viewer follows the canonical
+`tool.called -> tool.completed -> memory.*` chain and labels the visible tool
+card as a memory read/write/load instead of a generic read or edit.
 Provider-generated context sometimes appears in upstream logs as `role: user`.
 Agentlog preserves those records in transcripts, but reclassifies known shapes
@@ -195,36 +267,49 @@ package-prefixed scheme.
 | Source type | Version |
 | --- | --- |
-| `codex-cli-history` | `0.2.6.0` |
-| `codex-desktop-history` | `0.2.6.0` |
-| `codex-sdk-history` | `0.2.6.0` |
-| `cli-history` | `0.2.6.0` |
-| `claude-sdk-history` | `0.2.6.0` |
-| `claude-code-desktop-metadata` | `0.2.6.0` |
-| `claude-workspace-desktop` | `0.2.6.0` |
-| `cursor-workspace-sqlite` | `0.2.6.0` |
-| `cursor-global-sqlite` | `0.2.6.0` |
-| `cursor-raw-sqlite-salvage` | `0.2.6.0` |
-| `cursor-agent-transcripts` | `0.2.6.0` |
-| `devin-cli-history` | `0.2.6.0` |
-| `gemini-cli-history` | `0.2.6.0` |
-| `cline-task-history` | `0.2.6.0` |
-| `opencode-cli-history` | `0.2.6.0` |
-| `opencode-cli-sqlite-history` | `0.2.6.0` |
-| `opencode-desktop-history` | `0.2.6.0` |
-| `opencode-desktop-sqlite-history` | `0.2.6.0` |
-| `opencode-web-sqlite-history` | `0.2.6.0` |
-| `opencode-history` | `0.2.6.0` |
-| `opencode-sqlite-history` | `0.2.6.0` |
-| `aider-chat-history` | `0.2.6.0` |
-| `antigravity-history` | `0.2.6.0` |
-| `antigravity-trajectory-summary` | `0.2.6.0` |
-| `windsurf-trajectory-export` | `0.2.6.0` |
-| `web-chat-export` | `0.2.6.0` |
-| `chatgpt-export` | `0.2.6.0` |
-| `claude-web-export` | `0.2.6.0` |
-| `claude-web-memory` | `0.2.6.0` |
-| `import` | `0.2.6.0` |
+| `codex-cli-history` | `0.3.0.0` |
+| `codex-desktop-history` | `0.3.0.0` |
+| `codex-sdk-history` | `0.3.0.0` |
+| `cli-history` | `0.3.0.0` |
+| `claude-sdk-history` | `0.3.0.0` |
+| `claude-code-desktop-metadata` | `0.3.0.0` |
+| `claude-workspace-desktop` | `0.3.0.0` |
+| `cursor-workspace-sqlite` | `0.3.0.0` |
+| `cursor-global-sqlite` | `0.3.0.0` |
+| `cursor-raw-sqlite-salvage` | `0.3.0.0` |
+| `cursor-agent-transcripts` | `0.3.0.0` |
+| `devin-cli-history` | `0.3.0.0` |
+| `devin-desktop-acp-events` | `0.3.0.0` |
+| `copilot-cli-history` | `0.3.0.0` |
+| `factory-droid-history` | `0.3.0.0` |
+| `grok-build-history` | `0.3.0.0` |
+| `pi-cli-history` | `0.3.0.0` |
+| `gemini-cli-history` | `0.3.0.0` |
+| `cline-task-history` | `0.3.0.0` |
+| `opencode-cli-history` | `0.3.0.0` |
+| `opencode-cli-sqlite-history` | `0.3.0.0` |
+| `opencode-desktop-history` | `0.3.0.0` |
+| `opencode-desktop-sqlite-history` | `0.3.0.0` |
+| `opencode-web-sqlite-history` | `0.3.0.0` |
+| `opencode-history` | `0.3.0.0` |
+| `opencode-sqlite-history` | `0.3.0.0` |
+| `aider-chat-history` | `0.3.0.0` |
+| `antigravity-history` | `0.3.0.0` |
+| `antigravity-transcript-log` | `0.3.0.0` |
+| `antigravity-cli-transcript-log` | `0.3.0.0` |
+| `antigravity-cli-brain` | `0.3.0.0` |
+| `antigravity-ide-transcript-log` | `0.3.0.0` |
+| `antigravity-ide-brain` | `0.3.0.0` |
+| `antigravity-summary-proto` | `0.3.0.0` |
+| `antigravity-trajectory-summary` | `0.3.0.0` |
+| `windsurf-cascade-brain` | `0.3.0.0` |
+| `windsurf-cascade-protobuf` | `0.3.0.0` |
+| `windsurf-trajectory-export` | `0.3.0.0` |
+| `web-chat-export` | `0.3.0.0` |
+| `chatgpt-export` | `0.3.0.0` |
+| `claude-web-export` | `0.3.0.0` |
+| `claude-web-memory` | `0.3.0.0` |
+| `import` | `0.3.0.0` |
 `cursor-sqlite-history` and `antigravity-brain` are compatibility aliases for
 older labels. Fingerprints include the parser version prefix, so changing the
@@ -238,21 +323,25 @@ back to sessions for CLI/skill compatibility. Archives without `events.jsonl`
 remain searchable through transcript/markdown fallback, and missing
 `conversation.md` files are materialized from transcripts when needed.
 The web session API reads pre-baked `view.json` for the default readable pane.
-When the raw Markdown source view is requested, the browser asks for a
-Markdown-only payload instead of downloading the full transcript again. Browser
-session payloads compact duplicated tool output and use ETag revalidation so
+`view.json` is a display cache, not the source of truth: it keeps transcript
+message content visible but omits duplicated tool-output bodies from structured
+metadata and canonical-event text so very long sessions can still be written and
+loaded. The full redacted transcript and canonical events remain in
+`transcript.jsonl` and `events.jsonl`. When the raw Markdown source view is
+requested, the browser asks for a Markdown-only payload instead of downloading
+the full transcript again. Browser session payloads use ETag revalidation so
 revisits and live refresh checks can avoid reparsing unchanged transcripts.
-The local BM25 JSON index stores term postings plus document metadata for
-compatibility. A SQLite FTS5 sidecar stores the same chunks for interactive
-search so browser, terminal, and MCP recall queries do not parse a large JSON
-index in short-lived search processes. Index format bumps trigger a rebuild from
-existing `transcript.jsonl` and `events.jsonl`; they do not require reparsing
-provider source files. The web search endpoint is optimized for typing: it uses
-the compatible warm FTS/index when present, skips obsolete or stale indexes
-rather than parsing/rebuilding inline, and does not scan every rendered Markdown
-archive as a fallback. Terminal and MCP recall search also avoid synchronous
-rebuilds and BM25 JSON parses, then fall back to the bounded Markdown search for
-legacy archives or misses.
+The normal local index rebuild writes a small JSON summary plus a SQLite FTS5
+sidecar for browser, terminal, and MCP recall queries. The older full BM25 JSON
+index still exists for explicit compatibility callers, but routine update and
+rebuild flows avoid generating that large serialized object. Index format bumps
+trigger a rebuild from existing `transcript.jsonl` and `events.jsonl`; they do
+not require reparsing provider source files. The web search endpoint is
+optimized for typing: it uses the compatible warm FTS/index when present, skips
+obsolete or stale indexes rather than parsing/rebuilding inline, and does not
+scan every rendered Markdown archive as a fallback. Terminal and MCP recall
+search also avoid synchronous rebuilds and BM25 JSON parses, then fall back to
+the bounded Markdown search for legacy archives or misses.
 Recall quality has deterministic tests in `test/recall-eval.test.js` with
 fixtures under `test/fixtures/recall-evals.json`. Add a fixture when a vague
@@ -263,25 +352,33 @@ real-world query should reliably find a representative archived session.
 The setup UI, import defaults, and history source filters use this grouped order:
 1. OpenAI: Codex CLI, Codex Desktop, Codex SDK jobs, ChatGPT
-2. Anthropic: Claude Code CLI, Claude Code Desktop, Claude Workspace,
+2. Anthropic: Claude Code CLI, Claude Code Desktop, Claude Cowork,
    Claude.ai, Claude SDK jobs
-3. Google: Gemini CLI, Antigravity
-4. Cognition: Devin CLI
-5. Other: Cursor, Cline, OpenCode CLI, OpenCode Desktop, OpenCode Web, Aider
+3. Google: Gemini CLI, Antigravity CLI, Antigravity 2.0, Antigravity IDE
+4. Cognition: Devin CLI, Devin Desktop, Windsurf
+5. GitHub: GitHub Copilot CLI
+6. Factory: Factory Droid
+7. xAI: Grok Build
+8. Other: Cursor, pi, Cline, OpenCode CLI, OpenCode Desktop, OpenCode Web, Aider
 `agentlog import --source all` uses the default import order from
 `src/sources.js`: `codex-cli`, `codex-desktop`, `claude`,
-`claude-code-desktop`, `claude-workspace`, `gemini-cli`, `antigravity`,
-`devin-cli`, `cursor`, `cline`, `opencode-cli`, `opencode-desktop`,
+`claude-code-desktop`, `claude-cowork`, `gemini-cli`, `antigravity-cli`, `antigravity`,
+`antigravity-ide`, `devin-cli`, `devin-desktop`, `windsurf`, `copilot-cli`, `factory`, `grok-build`,
+`cursor`, `pi`, `cline`, `opencode-cli`, `opencode-desktop`,
 `opencode-web`, `aider`. Codex SDK jobs and Claude SDK jobs are intentionally
-opt-in. Windsurf local cache scanning is disabled for now because current
-Cascade transcripts are encrypted binary stores, but downloaded trajectory
-Markdown exports are importable with an explicit path.
-The background watcher polls the watcher source list selected near the end of
-`agentlog init`. New configs still support `imports.autoDiscoverSources=true`,
-but init now records the chosen watcher list exactly by setting
-`imports.autoDiscoverSources=false`.
+opt-in. Windsurf local imports are intentionally partial: readable Cascade plan
+artifacts are archived when present, matching Cascade protobuf files are
+preserved as raw sources, and downloaded trajectory Markdown remains the stable
+full transcript path.
+The background watcher covers the watcher source list selected near the end of
+`agentlog init`. Sources with watchable history roots (`src/source-watch.js`)
+import a few seconds after a filesystem event and are otherwise re-polled on a
+15-minute heartbeat; sources without watch roots poll every 30 seconds with a
+5-minute idle cadence. New configs still support
+`imports.autoDiscoverSources=true`, but init now records the chosen watcher
+list exactly by setting `imports.autoDiscoverSources=false`.
 Supervisor imports use `imports.defaultSinceDays` as a rolling window. Cursor
 SQLite store scans and raw recovery are disabled in supervisor ticks, so old
@@ -344,7 +441,7 @@ stable local command for the archived source.
 | Claude Code CLI | `claude -r <session-id>` | Uses the Claude Code JSONL session id. |
 | Devin CLI | `devin -r <session-id>` | agentlog archives these as `devin-<session-id>` and strips that prefix for the resume command, for example `devin -r selective-lotus`. |
 | Claude Code Desktop | No stable local resume command known. | Use Claude's own desktop/history surface or `agentlog show <session-id>`. |
-| Claude Workspace | No stable local resume command known. | Workspace/local-agent session ids are not known to be accepted by Claude Code's CLI resume flag. |
+| Claude Cowork | No stable local resume command known. | Cowork/local-agent session ids are not known to be accepted by Claude Code's CLI resume flag. |
 | Claude SDK jobs | No interactive resume command. | These are programmatic/batch runs. |
 | ChatGPT export | No local resume command. | Official exports are imported snapshots. |
 | Claude.ai export | No local resume command. | Official exports are imported snapshots. |
@@ -360,29 +457,76 @@ stable local command for the archived source.
 - Import selector: `codex-cli`
 - Provider: `codex`
 - Source type: `codex-cli-history`
-- Primary store: `~/.codex/state_5.sqlite`
+- Primary stores: `~/.codex/state_5.sqlite` and
+  `~/.codex/session_index.jsonl`
 - Session files: rollout paths referenced by the `threads` table, plus
   unindexed `rollout-*.jsonl` files under `sessions` and `archived_sessions`
 - Source split: `threads.source = "cli"`
 - Overrides:
   - `CODEX_STATE_DB` overrides the state database path.
+  - `CODEX_SESSION_INDEX` overrides the session index path.
   - `CODEX_HOME` is used for the fallback sessions root.
 The importer reads `id`, `rollout_path`, `created_at`, `updated_at`, `source`,
-`cwd`, and `title` from the Codex state database using `sqlite3`. When the
-database has the newer `stage1_outputs` table, agentlog also reads
+`cwd`, `title`, and available subagent metadata columns from the Codex state
+database using `sqlite3`. It also reads `thread_spawn_edges` when present. It then
+prefers `~/.codex/session_index.jsonl` when a matching `thread_name` entry is
+present, because Codex Desktop can now keep the sidebar title there while
+leaving `threads.title` as the full first prompt. If the index has no title for
+a session, the parser falls back to the rollout `thread_name_updated` event
+when Codex emits one, then to non-prompt-shaped state titles and finally to
+first-user-message inference. When a prompt starts with `$agentlog-recall` and
+then continues with a separate task paragraph, fallback title inference skips
+the recall lookup line and titles the session from the task body. If existing
+Codex archives show long context titles, recall-query titles, or stale first
+prompts instead of the Codex sidebar title, reimport them with
+`agentlog import --source codex-desktop --since all` or
+`agentlog import --source codex-cli --since all`. When the database has the
+newer `stage1_outputs` table, agentlog also reads
 `rollout_summary` and `raw_memory` as supplementary Codex summary documents and
 adds them to the archived transcript. The importer also scans
 `~/.codex/sessions` and `~/.codex/archived_sessions` for `rollout-*.jsonl` and
 `rollout-*.jsonl.zst` files that are not referenced by the state database, so
 older archived rollouts still get backed up.
+Codex subagents are stored as ordinary rollout threads whose `threads.source`
+can be a JSON `subagent.thread_spawn` object, with parent/child relationships in
+`thread_spawn_edges` and optional `agent_nickname`, `agent_role`, and
+`agent_path` columns. Agentlog resolves those rows back to the parent's source
+split (`codex-cli-history`, `codex-desktop-history`, or `codex-sdk-history`),
+imports each child as `conversationKind = "codex_subagent"` with
+`parentComposerId` set to the parent thread id, and attaches compact run
+metadata to the parent as `metadata.sessionSummary.codexSubagentRuns`. The web
+viewer renders those runs inline and opens the child transcript in the same
+subagent modal used for Claude Code. Existing Codex archives need a full
+reimport, for example `agentlog import --source codex-desktop --since all`, to
+gain the child-session links.
+Subagent child sessions remain archived as direct-addressable session records so
+the modal, raw archive, and direct `agentlog show <child-id>` path can load the
+full transcript, but normal history lists, stats, and search hide
+`*_subagent` sessions unless an explicit subagent-inclusive path opts in.
 The rollout JSONL parser captures readable `response_item` reasoning summaries,
 Codex `event_msg` assistant/user messages, task and compaction markers, local
 shell calls, web search calls, custom tool calls such as `apply_patch`, tool
 outputs, and token-count usage deltas. Shell calls that run `apply_patch`
 through a heredoc are promoted to edit tool calls with `patch`, `diff`, and
-target path metadata. The working directory comes from the parsed transcript
+target path metadata. Codex token totals are normalized
+from `event_msg.token_count.info.total_token_usage`: `input_tokens` is split
+into fresh input and `cached_input_tokens`, output tokens are preserved, and
+`reasoning_output_tokens` is stored as a visible sub-count that is already
+included in Codex output totals. When the Codex state database exposes
+`threads.tokens_used`, agentlog stores it as the session-level provider total so
+the stats page can reconcile rollout splits with Codex's own thread counter.
+Because these fields are import-time metadata, changing Codex token semantics
+requires a full reimport, for example:
+```bash
+agentlog import --source codex-desktop --since all
+agentlog import --source codex-cli --since all
+```
+The working directory comes from the parsed transcript
 first, then the `threads.cwd` column. If neither is available, the session is
 archived under `codex/uncategorized` instead of inheriting the supervisor's
 current directory. Repo attribution is computed from the resolved directory.
@@ -393,15 +537,17 @@ Reading `.zst` sessions requires `zstd` or `unzstd`.
 - Import selector: `codex-desktop`
 - Provider: `codex`
 - Source type: `codex-desktop-history`
-- Primary store: `~/.codex/state_5.sqlite`
+- Primary stores: `~/.codex/state_5.sqlite` and
+  `~/.codex/session_index.jsonl`
 - Session files: rollout paths referenced by the `threads` table
 - Source split: `threads.source = "vscode"`
 - Overrides: same as Codex CLI
-Codex Desktop uses the same state database, summary-document handling, and
-rollout parser as Codex CLI. The only distinction is the `threads.source` value.
-This is why the web source dropdown can split Codex CLI and Codex Desktop even
-though both archive under the same `codex` provider.
+Codex Desktop uses the same state database, session-index title handling,
+summary-document handling, and rollout parser as Codex CLI. The only distinction
+is the `threads.source` value. This is why the web source dropdown can split
+Codex CLI and Codex Desktop even though both archive under the same `codex`
+provider.
 ## Codex SDK Jobs
@@ -427,22 +573,49 @@ in the separate SDK jobs aggregate instead of primary interactive totals.
 - Import command: `agentlog import chatgpt <path> [--scope local|team]`
 - Provider: `chatgpt`
 - Source type: `chatgpt-export`
-- Source file: ChatGPT JSON export or ZIP containing a JSON export
+- Source file: ChatGPT JSON export, OpenAI export ZIP, extracted
+  `OpenAI-export`, or `User Online Activity` folder
 - Default archive scope: `chatgpt`
-ChatGPT is not scanned automatically from a desktop app. The import command
-without a path prints official export instructions for OpenAI's Privacy Portal
-and ChatGPT Data Controls. The user then provides the downloaded official export
-file. ZIP imports prefer `conversations.json`, then another JSON file with
-`chat` in the name, then the first JSON file in the ZIP.
+ChatGPT is not scanned automatically from a desktop app. In a terminal, the
+import command without a path starts a walkthrough that asks for the export path
+or paths, account username/email, and display name.
+Use `agentlog import chatgpt --instructions` for static Privacy Portal and
+ChatGPT Data Controls instructions. Older ChatGPT exports usually contain a
+single `conversations.json`. Newer OpenAI privacy exports can arrive as
+`OpenAI-export/User Online Activity` with conversation data split across ZIPs or
+folders such as
+`Conversations__<account-hash>-chatgpt-0001-part-0001` and
+`...part-0002`. Import the parent `User Online Activity` folder when possible.
+The walkthrough also accepts each split `Conversations__...chatgpt...part`
+folder one at a time, ending on a blank line, so agentlog sees all split JSON
+files, manifests, `chat.html`, conversation ZIPs, and attached files together. Very large outer
+`OpenAI-export.zip` files should be unzipped first because Node and unzip tooling
+can hit multi-gigabyte file limits.
+ChatGPT attachment files are preserved in the shared raw export archive and are
+shown from normalized message metadata in the readable transcript. Fresh imports
+render image/file attachment cards instead of folding `[Attachment: ...]`
+placeholders into message text. Reimport ChatGPT exports after upgrading to
+populate the attachment metadata and viewer URLs.
+File cards are only linked when the exported raw archive actually contains the
+file bytes; ChatGPT privacy exports may list some uploaded PDFs or documents in
+conversation metadata without including the original file. ChatGPT tool calls
+such as `web.run` are normalized into tool-call cards, uploaded-file parsing
+messages are normalized as file tool results, and private-use citation markers
+including file citations render as citation labels instead of unsupported glyph
+boxes.
 For OpenAI export mappings, agentlog reads each node message, normalizes
-`author.role`, extracts `content.parts`, and uses `create_time` or `update_time`
-as the timestamp. Web imports are scope-based by default because they generally
-do not have a reliable local working directory. Since official exports do not
-usually include usage, the importer archives estimated per-message
-`metadata.usage` from native message content and marks the resulting session
-usage as estimated.
+`author.role`, extracts `content.parts`, records attachment and asset-pointer
+metadata, and uses `create_time` or `update_time` as the timestamp. Non-chat JSON
+such as `user_settings.json` is available for account metadata but is not counted
+as a conversation. Extensionless binary attachment files are preserved as raw
+files rather than parsed as JSON. Web imports are scope-based by default because
+they generally do not have a reliable local working directory. Since official
+exports do not usually include usage, the importer archives estimated
+per-message `metadata.usage` from native message content and marks the resulting
+session usage as estimated.
 ## Claude Code CLI
@@ -476,6 +649,12 @@ Tool calls and results are normalized into the shared
 `metadata.toolCalls[]`, `metadata.toolResult`, and `metadata.usage` shapes.
 Bash or shell tool calls that invoke `apply_patch` are reclassified as edit
 calls and retain the patch text under `arguments.diff`.
+Claude Code `Edit`/`Write` results also preserve provider `structuredPatch`
+hunks with absolute line starts so the web viewer can render numbered diffs.
+Existing Claude archives need a source reimport to gain this field; run
+`agentlog import --source claude --since all` and repeat for
+`claude-code-desktop`, `claude-cowork`, or `claude-sdk` if those sources are
+enabled.
 Tool results are matched back to prior `tool_use` ids when possible so result
 cards inherit the tool name instead of displaying only the raw tool-use id.
 Remote Control lifecycle records are also converted into provider-generated
@@ -489,6 +668,29 @@ also include Remote Control attachment counts/details, available tool names,
 MCP server names, queue timing/content, agent ids, slugs, API error counts, and
 MCP structured-content counts.
+For each Claude Code session with a working directory, agentlog also snapshots
+Claude subagent definitions from the user-level `~/.claude/agents` directory and
+the nearest project `.claude/agents` directory. It parses the Markdown
+frontmatter fields that Claude uses for subagents (`name`, `description`,
+`tools`, and `model`), records the effective project-over-user definition set in
+`metadata.sessionSummary.claudeSubagents`, and preserves the source `.md` files
+in the session raw manifest. The transcript is not padded with full subagent
+instructions; use the raw archive when the complete definition body is needed.
+Claude Code subagent run transcripts stored under
+`~/.claude/projects/<project>/<parent-session-id>/subagents/*.jsonl` are also
+attached to the parent session as `metadata.sessionSummary.claudeSubagentRuns`
+and imported as child sessions with `conversationKind = "claude_subagent"` and
+`parentComposerId` set to the parent Claude Code session id. The parent summary
+keeps compact run metadata, prompts, result previews, model names, usage totals,
+and tool counts; the child session carries the full normalized transcript and
+preserves both the JSONL and any sibling `.meta.json` file in raw storage. The
+web viewer renders run summaries inline at their transcript timestamps and links
+to the child session instead of dumping every subagent run at the top. Child run
+sessions are direct-addressable archive records, but normal history lists,
+stats, and search hide `*_subagent` sessions unless subagents are explicitly
+included.
 When the Claude desktop app has a matching
 `~/Library/Application Support/Claude/claude-code-sessions/**/local_*.json`
 record with `cliSessionId`, the CLI importer uses that sidecar's generated
@@ -552,33 +754,41 @@ when present.
 Discovery scans the Claude app storage once, but the user-facing source rows are
 split by kind. `claude-code-desktop` is the Claude Code desktop-launch metadata
-path; `claude-workspace` is Claude app local-agent/workspace mode. The older
-generic `claude-desktop` aggregate is kept only as a compatibility import
-selector and is not shown as a separate discovery row.
+path; `claude-cowork` is Claude app Cowork/local-agent mode. The older
+`claude-workspace` selector is accepted as an alias, and the generic
+`claude-desktop` aggregate is kept only as a compatibility import selector and
+is not shown as a separate discovery row.
 Working directory attribution comes from `originCwd`, then `cwd`, then the first
 existing folder in `userSelectedFolders`. If no existing directory is available,
 the session is archived under `claude-code-desktop/uncategorized` instead of
 being assigned to whatever repo agentlog happens to run from.
-## Claude Workspace
+## Claude Cowork
-- Import selector: `claude-workspace`
+- Import selector: `claude-cowork` (`claude-workspace` is accepted as a legacy alias)
 - Provider: `claude_desktop`
 - Source type: `claude-workspace-desktop`
 - Primary store:
   `~/Library/Application Support/Claude/local-agent-mode-sessions/local_*.json`
 - Audit transcript path:
   `~/Library/Application Support/Claude/local-agent-mode-sessions/local_<id>/audit.jsonl`
-- Fallback scope: `claude-desktop/uncategorized`
+- Fallback scope: `claude-cowork/uncategorized`
-Claude Workspace uses the same parser as Claude Code Desktop but reads from the
+Claude Cowork uses the same parser as Claude Code Desktop but reads from the
 Claude app local-agent mode directory. `audit.jsonl` is preferred when present.
-Metadata fallback imports the initial prompt and selected folder context.
+Metadata fallback imports the initial prompt and selected folder context. The
+session-level `model` in the local-agent metadata is recorded as authoritative
+`modelUsage`; audit `tool_use` rows are normalized back to that session model so
+internal tool orchestration does not appear as an extra user-visible model.
 As with Claude Code Desktop, repo attribution only happens when an existing
-working directory can be found. Otherwise the archive is intentionally
-uncategorized.
+project directory can be found. Claude's synthetic `/sessions/...` and
+`local-agent-mode-sessions/.../outputs` directories are ignored; selected user
+folders are preferred before falling back to the Cowork uncategorized scope.
+Existing archives with the old `claude-desktop/uncategorized` fallback or
+synthetic cwd attribution should be rebuilt with a full reimport, for example
+`agentlog import --source claude-cowork --since all`.
 ## Claude.ai Export
@@ -655,41 +865,135 @@ Gemini archives need `agentlog import --source gemini-cli --since all` after
 this parser bump to populate those titles. When Gemini shutdown interaction
 summaries appear in structured history, they are parsed into session metadata
 for tool-call counts, timing, model usage, and resume commands instead of being
-inserted into the chat transcript. Markdown
-files are split into messages by role headings such as
+inserted into the chat transcript.
+Gemini CLI subagents stored under
+`~/.gemini/tmp/<project>/chats/<parent-session-id>/<agent-id>.jsonl` are
+imported as child sessions with `conversationKind = "gemini_subagent"` and
+`parentComposerId` set to the parent Gemini session id. Parent `invoke_agent`
+tool calls preserve agent id/name, prompt, status, and progress/result summaries
+when those fields are present, and the parent session gets
+`metadata.sessionSummary.geminiSubagentRuns`. Re-run
+`agentlog import --source gemini-cli --since all` to populate normalized Gemini
+subagent links in existing archives.
+Markdown files are split into messages by role headings such as
 `# User`, `# Assistant`, or bold role labels. The working directory comes from
 parsed cwd fields or Gemini tmp `.project_root` metadata. If no working
 directory can be resolved, the session is archived under
 `gemini-cli/uncategorized`.
+## Antigravity CLI
+- Import selector: `antigravity-cli`
+- Provider: `antigravity_cli` (separate from the desktop app's `antigravity`)
+- Source types: `antigravity-cli-transcript-log`, `antigravity-cli-brain`
+- Primary transcript store:
+  `~/.gemini/antigravity-cli/brain/*/.system_generated/logs/transcript_full.jsonl`
+  (untruncated), falling back to `transcript.jsonl` (truncates long tool
+  outputs and marks them with `truncated_fields`)
+- Workspace attribution: `~/.gemini/antigravity-cli/history.jsonl` (one
+  `{display, timestamp, workspace}` line per user prompt)
+- Binary conversation store preserved but not decoded:
+  `~/.gemini/antigravity-cli/conversations/*.pb`
+- Environment overrides: `AGENTLOG_ANTIGRAVITY_CLI_HOME_DIR`
+The Antigravity CLI (Google's Gemini CLI successor, May 2026) reuses the
+desktop brain/conversations layout under its own home, so the importer reuses
+the whole desktop pipeline with the home redirected and CLI-specific source
+types. Transcript logs do not record the workspace, and deriving cwd from file
+links in responses can land on a subdirectory, so the prompt history file is
+the authoritative cwd source — it is matched by first-user-message text and
+nearest timestamp, and intentionally excluded from sourceFiles because it
+grows with every prompt and would churn session fingerprints. The desktop
+app's Electron state DB is never read for CLI sessions, and CLI/desktop
+conversation ids are distinct UUIDs, so the two sources do not collide.
+The transcript logs carry no model field or token usage. The only model
+signal is `<USER_SETTINGS_CHANGE>` text injected into USER_INPUT events
+("The user changed setting \`Model Selection\` from X to Y."); the parser
+tracks these changes, stamps the active model onto subsequent
+PLANNER_RESPONSE messages, and records all observed models in
+`sessionSummary.modelUsage`. This also applies to desktop Antigravity
+transcript logs.
+## Antigravity IDE
+- Import selector: `antigravity-ide`
+- Provider: `antigravity_ide`
+- Source types: `antigravity-ide-transcript-log`, `antigravity-ide-brain`
+- Primary store: `~/.gemini/antigravity-ide/` (same brain/conversations layout
+  as the 2.0 app)
+- Environment overrides: `AGENTLOG_ANTIGRAVITY_IDE_HOME_DIR`
+The Antigravity IDE split off from the 2.0 agent platform but kept the same
+data layout, and its home starts as a migration copy of the 2.0 home. The
+importer therefore drops conversations whose 2.0 copy is byte-identical or at
+least as fresh; the IDE only owns conversations it created or continued after
+the split. IDE session ids are prefixed `antigravity-ide-` because migrated
+conversation ids collide with 2.0's. The IDE's own knowledge/implicit memory
+stores are covered by memory backup under provider `antigravity_ide`.
 ## Antigravity
 - Import selector: `antigravity`
 - Provider: `antigravity`
-- Source types: `antigravity-brain`, `antigravity-trajectory-summary`
-- Primary readable store: `~/.gemini/antigravity/brain/*`
+- Source types: `antigravity-transcript-log`, `antigravity-brain`,
+  `antigravity-summary-proto`, `antigravity-trajectory-summary`
+- Primary transcript store:
+  `~/.gemini/antigravity/brain/*/.system_generated/logs/transcript.jsonl`
+- Legacy readable store: `~/.gemini/antigravity/brain/*`
+- Summary protobuf store: `~/.gemini/antigravity/agyhub_summaries_proto.pb`
 - Partial metadata store:
   `Application Support/Antigravity/User/globalStorage/state.vscdb`
-- Binary store counted but not decoded:
+- Binary conversation store preserved but not decoded:
   `~/.gemini/antigravity/conversations/*.pb`
-agentlog imports readable Markdown artifacts from each task directory. Recognized
-artifact names are `task.md`, `implementation_plan.md`, `walkthrough.md`, and
-`plan.md`. Each artifact becomes an assistant message with a heading naming the
-artifact. Timestamps come from artifact file mtimes.
-When a conversation has no readable Markdown artifacts, agentlog also imports
-Antigravity trajectory summaries from the app's VS Code-style global state DB.
-Those summaries preserve the conversation id, visible prompt/title, timestamps,
-and workspace path when present. They are marked as partial summaries and do not
-claim to be full transcripts. The global state DB is referenced in the raw
-manifest but not copied, since it can contain auth-bearing settings.
-The importer tries to infer a working directory from `file://...` links inside
-the Markdown artifacts or trajectory summary. If none can be inferred, it
-archives under `antigravity/uncategorized`. Binary protobuf transcripts are
-counted in discovery details but not imported as conversation messages yet.
-Antigravity token usage is therefore not populated from the current local stores.
+agentlog ranks Antigravity sources by fidelity and keeps one archive per
+conversation id: transcript logs first, then readable brain Markdown artifacts,
+then `agyhub_summaries_proto.pb`, then the legacy state DB summary row, then a
+binary-only discovery placeholder if no readable source exists. This keeps the
+v1 archive contract simple while still preserving the lower-fidelity evidence.
+Transcript logs import user messages, planner responses, system/error/history
+messages, and tool-like step outputs from `.system_generated/logs/transcript.jsonl`.
+Older `.system_generated/logs/overview.txt` files are also discovered; JSONL
+events are parsed when present, and plain overview text is imported as a partial
+assistant summary. Matching `conversations/<id>.pb` files are copied into the
+raw archive manifest for the session, but are not parsed into messages because
+they are not a stable plain protobuf transcript format.
+When transcript logs contain `INVOKE_SUBAGENT` steps, agentlog links the spawned
+Antigravity conversation id back to the parent. The child transcript is imported
+as a normal direct-addressable session with `conversationKind =
+"antigravity_subagent"` and `parentComposerId` set to the parent conversation
+id, while the parent gets `metadata.sessionSummary.antigravitySubagentRuns`.
+The web viewer displays those runs with the same subagent card/modal path used
+for Claude, Codex, and Cursor child sessions. Default history/search paths hide
+`*_subagent` child sessions unless subagents are explicitly included.
+Readable Markdown artifacts remain supported as the legacy Antigravity CLI path.
+Recognized artifact names are `task.md`, `implementation_plan.md`,
+`walkthrough.md`, and `plan.md`. Each artifact becomes an assistant message with
+a heading naming the artifact. If a summary protobuf or state DB summary exists
+for the same conversation id, agentlog uses it to fill title, timestamps, and
+workspace metadata while keeping the Markdown artifact as the imported content.
+`agyhub_summaries_proto.pb` and the VS Code-style global state DB row both
+produce partial summary sessions. They preserve conversation id, visible
+prompt/title, timestamps, step count, and workspace path when present, and are
+marked as partial summaries so they do not claim to be full transcripts. The
+global state DB is referenced in the raw manifest but not copied, since it can
+contain auth-bearing settings.
+The importer tries to infer a working directory from summary metadata,
+`CWD:` lines, and `file://...` links in Antigravity content. If none can be
+inferred, it archives under `antigravity/uncategorized`. Antigravity token usage
+is not populated from the current local stores.
+Existing Antigravity archives created before transcript-log and summary-protobuf
+support should be rebuilt with `agentlog import --source antigravity --since all`
+so the archive uses the new source ranking and raw protobuf preservation.
 ## Devin CLI
@@ -731,6 +1035,53 @@ attribution follows the project directory Devin was launched from.
 If `message_nodes` contains no importable messages, agentlog falls back to
 `prompt_history` so at least direct user prompts can be archived.
+Devin spawns subagents through the `run_subagent` tool (built-in profiles
+`subagent_explore` / `subagent_general`, plus custom `AGENT.md` profiles).
+agentlog records each spawn in the parent's
+`sessionSummary.devinSubagentRuns` with the profile, prompt preview, and
+foreground/background mode. When the tool arguments expose a child session id
+that matches another imported Devin session, the child is marked
+`conversationKind = "devin_subagent"` with `parentComposerId` pointing at the
+parent, so it collapses out of top-level lists like other `*_subagent`
+sessions. Spawns without a resolvable child still produce run entries with
+status `spawned`. Re-run `agentlog import --source devin-cli --since all` to
+populate links in existing archives.
+## Devin Desktop
+- Import selector: `devin-desktop`
+- Provider: `devin`
+- Source type: `devin-desktop-acp-events`
+- Event logs on macOS:
+  `~/Library/Application Support/Devin/User/acp-events/*.ndjson`
+- Metadata/index DB on macOS:
+  `~/Library/Application Support/Devin/User/globalStorage/state.vscdb`
+- Linux/Windows app roots follow the same VS Code/Electron layout under
+  `~/.config/Devin` or `%APPDATA%\Devin`.
+- Overrides:
+  - `AGENTLOG_DEVIN_DESKTOP_APP_SUPPORT_DIR` points at an alternate Devin app
+    support root.
+  - `AGENTLOG_DEVIN_DESKTOP_ACP_EVENTS_DIR` points directly at an ACP event log
+    directory.
+  - `AGENTLOG_DEVIN_DESKTOP_GLOBAL_STORAGE_DB` points at one explicit
+    `state.vscdb`.
+Devin Desktop writes ACP session event logs as UUID-named NDJSON files and keeps
+the session id to UUID mapping in `ItemTable['windsurf.acp.eventLog.index']`.
+The key prefix is inherited from Windsurf, but agentlog treats this as a Devin
+Desktop source and imports only `acp/devin-cli/*` and `acp/devin-cloud/*`
+sessions under provider `devin`. Cascade/Windsurf history remains the separate
+`windsurf` source.
+The importer joins streamed `user_message_chunk`, `agent_message_chunk`, and
+`agent_thought_chunk` rows. Devin thinking is preserved as supplementary
+assistant messages with `summaryKind=thinking`. ACP `tool_call` and
+`tool_call_update` rows are normalized into `metadata.toolCalls[]` and
+`metadata.toolResult`, and `usage_update` rows are aggregated into
+`sessionSummary.usage` for stats. The global state DB is used as an index but is
+recorded as a raw reference rather than copied into every session raw folder
+because it can contain auth/session material.
 ## Cursor
 - Import selector: `cursor`
@@ -883,12 +1234,31 @@ for `tool_calls`, `toolCalls`, `toolResults`, command outputs, edit records,
 diff records, model/status/request metadata, and token usage. When no
 per-message timestamp exists, it uses the source file's mtime with stable
 millisecond offsets so imports do not get stamped with the time of import.
+Transcript folders under
+`agent-transcripts/<parent-composer-id>/subagents/<subagent-id>` are imported as
+child sessions with `conversationKind = "cursor_subagent"` and
+`parentComposerId` set to the parent composer id. When the parent transcript is
+available in the same import, agentlog also attaches compact run metadata to the
+parent as `metadata.sessionSummary.cursorSubagentRuns`, and the web viewer shows
+those runs inline with a link to the child transcript. Existing Cursor transcript
+archives need a full reimport, for example
+`agentlog import --source cursor --since all`, to gain the normalized subagent
+metadata. As with Codex and Claude Code child runs, those child session records
+are direct-addressable but hidden from normal history lists, stats, and search
+unless subagents are explicitly included.
 Cursor project slugs are decoded back to local paths when possible. For example,
-`Users-bzhou-Documents-GitHub-spring-next` resolves to
-`/Users/bzhou/Documents/GitHub/spring-next` if that directory exists. If no
-working directory can be resolved for a newer transcript, it archives under
-`cursor/uncategorized` instead of assigning the session to the current repo.
+`Users-alex-Documents-GitHub-spring-next` resolves to
+`/Users/alex/Documents/GitHub/spring-next` if that directory exists. Newer
+agent-transcript imports prefer explicit paths in the transcript itself (tool
+arguments, command text, and nested metadata) before falling back to that slug,
+so a transcript stored under a stale Cursor project directory can still be
+attributed to the repository it actually read or edited. Existing Cursor
+transcript archives with stale project-slug attribution need a full reimport,
+for example `agentlog import --source cursor --since all`, to regenerate their
+metadata and archive location. If no working directory can be resolved for a
+newer transcript, it archives under `cursor/uncategorized` instead of assigning
+the session to the current repo.
 ## Cline
@@ -925,6 +1295,19 @@ supplementary assistant event. Those tool calls carry unified diff text or
 old/new string payloads so the web viewer can render the edits inline while the
 original checkpoint files remain in raw backups.
+Cline spawn linking distinguishes two tool shapes. Explicit subagent tools
+(`subagent`, `run_subagent`, `spawn_subagent`) mark the spawned task as
+`conversationKind = "cline_subagent"` with `parentComposerId` set to the
+parent task id. The `new_task` tool is a context handoff: the new task replaces
+the old one rather than running under it, so the child stays a top-level
+session but is still linked to the parent (`parentComposerId` plus
+`sessionSummary.clineSpawnedBy`) and listed in the parent's
+`sessionSummary.clineSubagentRuns`. Children are resolved by explicit task id
+when the arguments carry one, else by matching the handoff context against the
+child task's first user message inside a 30-minute window, else by an
+unambiguous single task started within a few minutes of the call. Ambiguous
+spawns produce run entries with status `spawned` and no link.
 ## OpenCode
 - Import selectors: `opencode-cli`, `opencode-desktop`, `opencode-web`, or `opencode` for all three
@@ -969,14 +1352,27 @@ created by Desktop, CLI, and Web clients, so agentlog classifies each SQLite
 session row individually. Desktop sessions are identified by session ids in
 OpenCode Desktop sidecar state such as `ai.opencode.desktop/*.dat`; sub-sessions
 inherit Desktop classification from a Desktop parent. CLI sessions are
-identified by session-level `agent` or `model` metadata. Remaining non-`local`
-shared core rows are labeled as Web sessions. Rows without reliable client
-evidence stay on the legacy `opencode-sqlite-history` source type. The
+identified by session-level `agent` or `model` metadata, or by CLI evidence in
+the sanitized message metadata when session rows omit those fields. Remaining
+non-`local` shared core rows are labeled as Web sessions. Rows without reliable
+client evidence stay on the legacy `opencode-sqlite-history` source type. The
 `session`, `message`, `part`, and `project` tables provide session
 metadata, working directory, user/assistant messages, reasoning text, tool
-calls, tool outputs, model/provider ids, cost, and token usage. Because the
-database is a multi-session source, raw preservation stores it as a shared raw
-source instead of duplicating the same file into every session archive.
+calls, tool outputs, model/provider ids, cost, and token usage. During SQLite
+reads, agentlog removes the bulky `message.data.summary` object before JSON
+transport; canonical transcript text still comes from the `part` table, and raw
+preservation keeps the original database byte-for-byte. Because the database is a
+multi-session source, raw preservation stores it as a shared raw source instead
+of duplicating the same file into every session archive.
+OpenCode subagent sessions are linked from the structured `session.parent_id`
+field. Child sessions are imported as `conversationKind = "opencode_subagent"`
+with `parentComposerId` set to the raw OpenCode parent session id, and the parent
+gets `metadata.sessionSummary.opencodeSubagentRuns`. Task-tool metadata such as
+`subagent_type`, `prompt`, `description`, and child `sessionId` is preserved when
+present so the web viewer can show the same subagent run cards used for other
+providers. Existing OpenCode archives should be rebuilt with `agentlog import
+--source opencode --since all` to populate normalized subagent links.
 agentlog also reads OpenCode's JSON session store directly. Sessions provide the
 archive id and project id; message and part files provide role text, reasoning
@@ -998,6 +1394,106 @@ When `session_diff/<session-id>.json` is present, agentlog adds a supplementary
 edit tool call with the diff payload. Unified diff text is rendered inline in
 the history web UI, and the original diff JSON remains in the raw archive.
+## GitHub Copilot CLI
+- Import selector: `copilot-cli`
+- Provider: `copilot`
+- Source type: `copilot-cli-history`
+- Primary store: `~/.copilot/session-state/<session-uuid>/events.jsonl`
+- Metadata sidecar: `workspace.yaml` (flat scalars: id, name, cwd, git_root,
+  repository, branch, created_at, updated_at); `plan.md` preserved as a source
+  file when present
+- Environment overrides: `AGENTLOG_COPILOT_SESSION_STATE_DIR`,
+  `AGENTLOG_COPILOT_HOME`, `COPILOT_HOME`
+Each events.jsonl line is `{type, data, id, timestamp, parentId}`. agentlog maps
+`user.message`/`assistant.message` to messages, `tool.execution_start` to
+assistant tool calls (arguments may arrive as a JSON string and are re-parsed),
+`tool.execution_complete` to tool results, and `session.model_change`,
+`session.compaction_complete`, and `subagent.started/completed` to system
+messages. Session-level usage comes from the `session.shutdown` event, which is
+only written for cleanly ended sessions on CLI 0.0.422+: per-model
+`modelMetrics` token splits are aggregated into `sessionSummary.usage` and
+`modelUsage`, with premium requests, AI credits (`totalNanoAiu` / 1e9), code
+change counters, and `session.task_complete` summaries recorded under
+`sessionSummary.copilotCli`. The `session-store.db` SQLite database is a
+rebuildable index of the same data and is not read. Older
+`~/.copilot/history-session-state/` legacy sessions are not imported.
+## Factory Droid
+- Import selector: `factory`
+- Provider: `factory`
+- Source type: `factory-droid-history`
+- Primary store: `~/.factory/sessions/<cwd-slug>/<session-uuid>.jsonl`, plus the
+  legacy flat layout `~/.factory/sessions/<session-uuid>.jsonl`
+- Metadata sidecar: `<session-uuid>.settings.json`
+- Environment overrides: `AGENTLOG_FACTORY_SESSIONS_DIR`, `FACTORY_HOME_OVERRIDE`
+Transcripts require a `session_start` header line carrying id, title, owner, and
+cwd. `message` lines wrap Anthropic-style content blocks: text, thinking
+(stored as `metadata.thinking`), tool_use (assistant tool calls), and
+tool_result (tool messages). Block keys drift between snake_case and camelCase
+across droid versions (`tool_use_id` vs `toolUseId`), and both are accepted.
+File-op tool results that carry `diffLines` with per-line old/new line numbers
+are converted to `structuredPatch` hunks. `todo_state` lines and unknown types
+are skipped. The same session id can exist in both legacy and per-project
+layouts after migration; the newer file by mtime wins. There is no per-message
+token usage on disk — the sidecar's aggregate `tokenUsage` becomes
+`sessionSummary.usage`, with model (BYOK `custom:`/`[Provider]` wrappers
+stripped for `modelUsage`), reasoning effort, autonomy mode, Factory credits,
+archive state, and subagent attribution (`decompSessionType`,
+`callingSessionId`, `childInclusiveTokenUsageBySessionId`) recorded under
+`sessionSummary.factoryDroid`.
+## Grok Build
+- Import selector: `grok-build`
+- Provider: `grok`
+- Source type: `grok-build-history`
+- Primary store: `~/.grok/sessions/<percent-encoded-cwd>/<session-id>/updates.jsonl`
+- Metadata sidecars: `summary.json`, `signals.json`; `events.jsonl` and
+  `plan.md` preserved as source files when present
+- Environment overrides: `AGENTLOG_GROK_SESSIONS_DIR`, `AGENTLOG_GROK_HOME`,
+  `GROK_HOME`
+updates.jsonl lines are ACP (Agent Client Protocol) JSON-RPC `session/update`
+notifications. Streaming chunks (`user_message_chunk`, `agent_message_chunk`,
+`agent_thought_chunk`) are concatenated into whole messages; `tool_call`
+becomes an assistant tool call and `tool_call_update` with completed/failed
+status becomes a tool result; `plan` and `current_mode_update` become system
+messages. The cwd comes from percent-decoding the parent directory name. Token
+telemetry is a cumulative `_meta.totalTokens` counter that can rewind during
+streaming, so the maximum observed value is recorded as an authoritative
+session total. `signals.json` rollups (turn count, context tokens, models used,
+session duration) land under `sessionSummary.grokBuild`. The old unofficial
+superagent grok-cli also uses `~/.grok` but stores transcripts in
+`~/.grok/grok.db` SQLite; it is a different product and is not imported.
+## pi
+- Import selector: `pi`
+- Provider: `pi`
+- Source type: `pi-cli-history`
+- Primary store: `~/.pi/agent/sessions/--<encoded-cwd>--/<timestamp>_<session-id>.jsonl`
+- Environment overrides: `AGENTLOG_PI_SESSION_DIR`,
+  `PI_CODING_AGENT_SESSION_DIR`, `PI_CODING_AGENT_DIR`
+pi session files start with a `session` header line (id, timestamp, cwd; v1
+headers also carry provider/modelId, v3 adds `parentSession` fork lineage).
+Entries form a tree via `id`/`parentId`; agentlog imports all entries in file
+order, including abandoned branches, and preserves the ids as
+`providerMessageId`/`parentMessageId`. Assistant messages carry per-message
+usage with token splits and dollar cost (`usage.cost.total` →
+`metadata.usage.costUsd`); `toolResult` messages become tool messages;
+`bashExecution` (`!` commands) become a user message plus a shell tool result
+with exit code. `model_change`, `thinking_level_change`, `compaction`,
+`branch_summary`, and `custom_message` entries become system messages;
+`custom` extension state and `label` entries are skipped. Note that custom
+`--session-dir` configurations store files flat rather than in per-cwd
+subdirectories; both layouts are scanned. The oh-my-pi fork uses `~/.omp` with
+a diverged schema and is not covered by this importer.
 ## Aider
 - Import selector: `aider`
@@ -1034,18 +1530,39 @@ backups.
 - Import selector: `windsurf`
 - Provider: `windsurf`
-- Source types: `windsurf-trajectory-export`, `windsurf-cascade-brain`
+- Source types: `windsurf-trajectory-export`, `windsurf-cascade-brain`,
+  `windsurf-cascade-protobuf`
 - Explicit export import: `agentlog import windsurf <downloaded-trajectory.md>`
   or `agentlog import windsurf <folder-of-trajectories>`
+- Repair-stub import: `agentlog import windsurf --claim <token>
+  <downloaded-trajectory.md>`
 - Primary readable store: `~/.codeium/windsurf/brain/*`
-- Binary store counted but not decoded: `~/.codeium/windsurf/cascade/*.pb`
-- Status: encrypted local cache scanning is disabled from setup, default imports,
-  and history filters; downloaded trajectory Markdown exports are importable
-Windsurf local cache scanning is currently disabled. Current Cascade sessions
-are written as encrypted binary stores, so agentlog can detect session IDs and
-workspace metadata but cannot archive readable conversation text from those
-local files.
+- Binary store preserved/counted but not decoded:
+  `~/.codeium/windsurf/cascade/*.pb`
+- Metadata cache: Windsurf global state `ItemTable['windsurf.acp.metadataCache']`
+  in `~/Library/Application Support/Windsurf/User/globalStorage/state.vscdb`
+  and backups
+- Environment overrides: `AGENTLOG_WINDSURF_HOME_DIR`,
+  `AGENTLOG_CODEIUM_HOME_DIR`, `AGENTLOG_WINDSURF_BRAIN_DIR`,
+  `AGENTLOG_WINDSURF_CASCADE_DIR`, `AGENTLOG_WINDSURF_GLOBAL_STORAGE_DB`,
+  `AGENTLOG_WINDSURF_APP_SUPPORT_DIR`
+Windsurf local imports are partial. Current Cascade sessions keep the full
+conversation body in high-entropy binary protobuf files, so agentlog preserves
+matching `cascade/<conversation-id>.pb` files as raw sources but does not decode
+them into transcript messages. When `brain/<conversation-id>/plan.md`,
+`task.md`, `implementation_plan.md`, or `walkthrough.md` exists, each readable
+artifact is archived as an assistant message and the matching metadata/pb files
+are preserved in the session raw folder.
+Agentlog also reads Windsurf's ACP metadata cache from global storage to attach
+titles, working directories, created timestamps, and updated timestamps to local
+Cascade session IDs. Protobuf-only conversations are archived as zero-message
+repair stubs when there is no readable brain artifact. The stubs preserve the
+raw `.pb`, show up in the web viewer with a deterministic `ws-...` repair token,
+and produce no canonical events, so normal history search and MCP recall do not
+index them as transcript content. The state DB is referenced in raw metadata but
+not copied because it can contain auth tokens.
 Windsurf's "Download trajectory" action produces Markdown headed
 `# Cascade Chat Conversation`. Agentlog imports those files explicitly with
@@ -1054,21 +1571,31 @@ provider transcript surface: `### User Input` sections become user messages,
 `### Planner Response` / assistant-like sections become assistant messages, and
 the original Markdown file is preserved in raw backups.
+When a web-viewer repair stub corresponds to the export, copy its token and run
+`agentlog import windsurf --claim <token> <downloaded-trajectory.md>`. Claimed
+imports replace the zero-message stub with the full Markdown transcript under
+the same `windsurf-<conversation-id>` archive id.
 Bulk export tools that automate Windsurf's hidden "Download trajectory" button
 can write many `*.md` files into one directory. Agentlog intentionally treats
-that directory as user-selected export input rather than scanning Windsurf's
-encrypted cache: run `agentlog import windsurf ~/windsurf-cascade-export` after
-the bulk exporter finishes.
+that directory as user-selected export input rather than local cache recovery:
+run `agentlog import windsurf ~/windsurf-cascade-export` after the bulk exporter
+finishes.
-The older experimental helper can still read Markdown artifacts from Windsurf
-Cascade brain directories when present. Recognized artifact names are `plan.md`,
-`task.md`, `implementation_plan.md`, and `walkthrough.md`. Each artifact becomes
-an assistant message with a heading naming the artifact. Timestamps come from
-file mtimes.
+The local importer reads Markdown artifacts from Windsurf Cascade brain
+directories when present. Recognized artifact names are `plan.md`, `task.md`,
+`implementation_plan.md`, and `walkthrough.md`. Each artifact becomes an
+assistant message with a heading naming the artifact. Timestamps come from file
+mtimes.
 The importer tries to infer a working directory from `file://...` links in the
 Markdown. If none can be inferred, it archives under `windsurf/uncategorized`.
-Binary Cascade protobuf stores are counted in discovery details but not decoded.
+Binary Cascade protobuf stores are counted in discovery details and preserved
+when they match an imported brain artifact or a repair stub, but not decoded. If
+import warnings name partial Windsurf conversations, use Download trajectory for
+a stable full transcript, or reopen the conversation in Windsurf and send
+`/recall` or a short message if a newer Windsurf build starts writing readable
+artifacts, then rerun `agentlog import --source windsurf --since all`.
 ## Collector And Live Monitoring
@@ -1094,8 +1621,13 @@ as importing transcript history.
 - ChatGPT and Claude.ai are import-by-export only; agentlog does not read their
   desktop app local stores.
-- Windsurf encrypted cache scanning is disabled; downloaded trajectory Markdown
-  exports are supported.
+- Windsurf local imports are partial; downloaded trajectory Markdown exports are
+  the supported path for stable full Cascade transcripts.
+- Windsurf has no subagent linking: Cascade conversation bodies are undecoded
+  protobufs and the ACP metadata cache exposes no parent/child or spawn fields,
+  so there is nothing to parse into `windsurf_subagent` sessions today. The
+  viewer accepts `sessionSummary.windsurfSubagentRuns` so runs render if a
+  future Windsurf build exposes spawn metadata.
 - Antigravity protobuf transcripts are counted but not decoded.
 - Cursor older `state.vscdb` stores are best-effort because Cursor has changed
   local storage layouts over time.