npm - agentel - Versions diffs - 0.2.5 → 0.2.8 - Mend

agentel 0.2.5 → 0.2.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (18) hide show

package/README.md +77 -37
package/docs/code-reference.md +26 -13
package/docs/history-source-handling.md +247 -82
package/docs/release.md +1 -1
package/package.json +5 -2
package/src/archive.js +200 -17
package/src/canonical-events.js +74 -25
package/src/cli.js +2561 -204
package/src/config.js +11 -0
package/src/doctor.js +2 -0
package/src/importers/claude.js +309 -11
package/src/importers/gemini.js +2 -1
package/src/importers/providers.js +22 -0
package/src/importers.js +2142 -212
package/src/parser-versions.js +1 -0
package/src/search.js +417 -176
package/src/sources.js +1 -0
package/src/web-export-instructions.js +79 -0

package/README.md CHANGED Viewed

@@ -9,8 +9,8 @@ Core capabilities:
 - markdown-primary, redacted local archive under `~/.agentlog/data/agentlog/`
 - canonical event JSONL alongside each transcript for provider-independent search
 - canonical repo keying from git remotes, first commits, or path hashes
-- Codex CLI, Codex Desktop, ChatGPT export, Claude Code CLI, Claude Code
-  Desktop, Claude Workspace, Claude.ai export, Gemini CLI, Antigravity,
+- Codex CLI, Codex Desktop, Codex SDK jobs, ChatGPT export, Claude Code CLI,
+  Claude Code Desktop, Claude Workspace, Claude.ai export, Gemini CLI, Antigravity,
   Devin CLI, and Cursor imports
 - event-first `agentlog history` search with markdown/transcript fallback
 - `agentlog-recall` MCP stdio server exposing `search_past_sessions`
@@ -41,7 +41,7 @@ ref for repeatable installs:
 ```sh
 npm install -g brianlzhou/agentlog
 # or
-npm install -g brianlzhou/agentlog#v0.2.5
+npm install -g brianlzhou/agentlog#v0.2.8
 agentlog init
 ```
@@ -77,7 +77,10 @@ npm test
 AGENTLOG_HOME=/tmp/agentlog-demo node ./bin/agentlog.js init --yes --skip-import --no-autostart --no-claude --no-recall --no-telemetry
 AGENTLOG_HOME=/tmp/agentlog-demo node ./bin/agentlog.js import --source codex-cli --since 30d
 AGENTLOG_HOME=/tmp/agentlog-demo node ./bin/agentlog.js import --source codex-desktop --since all
+AGENTLOG_HOME=/tmp/agentlog-demo node ./bin/agentlog.js import chatgpt
+AGENTLOG_HOME=/tmp/agentlog-demo node ./bin/agentlog.js import claude-web
 AGENTLOG_HOME=/tmp/agentlog-demo node ./bin/agentlog.js import chatgpt ~/Downloads/chatgpt-export.zip --username you@example.com
+AGENTLOG_HOME=/tmp/agentlog-demo node ./bin/agentlog.js import chatgpt "~/Downloads/OpenAI-export/User Online Activity" --username you@example.com
 AGENTLOG_HOME=/tmp/agentlog-demo node ./bin/agentlog.js import claude-web ~/Downloads/claude-export --username you --display-name "Personal Claude"
 AGENTLOG_HOME=/tmp/agentlog-demo node ./bin/agentlog.js import --source claude --since 30d
 AGENTLOG_HOME=/tmp/agentlog-demo node ./bin/agentlog.js import --source claude-code-desktop --since all
@@ -219,7 +222,7 @@ build step. Archives still keep stable `path:<hash>` keys for folders without
 git identity, but the UI displays the local path.
 Provider filters use one stable order: OpenAI (`codex-cli`, `codex-desktop`,
-`chatgpt`), Anthropic (`claude`, `claude-code-desktop`, `claude-workspace`,
+`codex-sdk`, `chatgpt`), Anthropic (`claude`, `claude-code-desktop`, `claude-workspace`,
 `claude-web`, `claude-sdk`), Google (`gemini-cli`, `antigravity`), Cognition
 (`devin-cli`), then other local tools (`cursor`, `cline`, `opencode`,
 `aider`).
@@ -362,23 +365,37 @@ alongside it:
 For large multi-session stores such as Cursor SQLite, the per-session raw
 manifest may reference one shared copy under `raw-sources/` instead of copying
 the same database into every session folder.
+Web chat imports may also reference a shared raw export archive; ChatGPT
+attachments remain preserved there and fresh imports render image/file cards in
+the readable transcript when the export includes the file bytes.
-`events.jsonl` uses the local `agentlog.events.v1` canonical event shape:
+`events.jsonl` uses the local `agentlog.events.v2` canonical event shape:
 `session.started`, `prompt.submitted`, `response.generated`, `tool.called`, and
-`tool.completed`. Parser versions are stamped by source type so importer output
-changes can trigger reimport with a new fingerprint. Recall/search builds a
-keyword index over event text first and falls back to transcript/markdown for
-legacy archives without events. The local search index stores compact term
-postings for CLI compatibility plus a SQLite FTS5 sidecar for fast web
-queries; when either index format changes, `agentlog history` and `agentlog
-index` rebuild it from archived transcripts/events without a full source
+`tool.completed`; completed tool events link back to the matching call when the
+source exposes stable ids or matching names. Parser versions are stamped by
+source type so importer output changes can trigger reimport with a new
+fingerprint. Recall/search builds a keyword index over event text first and
+falls back to transcript/markdown for legacy archives without events. The local
+search index stores compact term postings for CLI compatibility plus a SQLite
+FTS5 sidecar for fast web queries; when either index format changes,
+`agentlog history` and `agentlog index` rebuild it from archived
+transcripts/events without a full source
 reimport. The web viewer avoids doing that rebuild on a keystroke so a large
 old index, or a full-archive Markdown fallback, cannot block interactive
 search.
 Stats are import-time metadata, not viewer-time transcript repair. Archive
 metadata stores message counts, user-message counts, token usage, and models for
-each session, and the web stats view reads those fields directly. Cursor sessions
+each session, and the web stats view reads those fields directly. Token totals
+include cache-read/cache-write tokens when providers report them, while the
+stats payload and UI also keep input, output, cache, and reasoning sub-counts
+separately when available. Codex imports preserve `threads.tokens_used` as the
+provider total and split rollout `token_count` events into fresh input, cache
+read, output, and reasoning metadata. Codex SDK and
+Claude SDK batch jobs are kept out of primary activity totals, streaks, folder
+rankings, and provider/model charts; the stats payload and web view expose them
+as a separate SDK jobs section so high-volume automation does not drown out
+interactive work. Cursor sessions
 without provider-reported token usage can also carry separately labeled
 `estimatedUsage`, which the stats view includes while reporting estimated token
 coverage. ChatGPT and Claude.ai exports without provider usage get estimated
@@ -388,21 +405,34 @@ parts. During pre-v1 development, if those stats fields or parser semantics
 change, rebuild the local archive with
 `agentlog update --yes --since all`.
-ChatGPT and Claude.ai web exports are imported manually from an official `.zip`,
-an unzipped export folder, or a direct JSON file. These imports are stored as
-local scoped web-chat archives and displayed through virtual conversation roots
-such as `[chatgpt]conversations/<account-id>` and
+ChatGPT and Claude.ai are manual export providers. Run `agentlog import chatgpt`
+or `agentlog import claude-web` for current export instructions; after the
+provider emails a download link, pass the official `.zip`, unzipped export
+folder, or direct JSON file back to agentlog. These imports are stored as local
+scoped web-chat archives and displayed through virtual conversation roots such
+as `[chatgpt]conversations/<account-id>` and
 `[claude]conversations/<account-id>/<project>`. The importer records account
 metadata in `~/.agentlog/state/web-accounts.json`; use
 `agentlog import accounts list` to inspect mappings and
 `agentlog import accounts rename <provider> <account-id-or-username> --display-name <name>`
-to change the viewer display name. Claude.ai exports preserve conversation
-summaries and split structured thinking parts from visible assistant answers
-when the export includes that detail. Repeated manual uploads are incremental:
-unchanged conversations are skipped, and updated conversations replace the
-stable session for that provider/account/conversation id. Existing malformed
-pre-v1 web-chat archives are not migrated automatically; reimport from the
-original export after a reset or cleanup.
+to change the viewer display name.
+For newer OpenAI privacy exports named `OpenAI-export`, unzip the download and
+import the `User Online Activity` folder. Running `agentlog import chatgpt`
+without a path starts a walkthrough that asks for export paths one at a time,
+then account username/email and display name. ChatGPT
+conversations may be split across multiple
+`Conversations__...chatgpt...part-000N` ZIPs or folders; passing the parent
+folder is best, but the walkthrough can also collect the split part folders
+individually and preserve `chat.html`, manifests, ZIPs, and attached files in
+the shared raw export archive. Claude.ai exports preserve conversation summaries
+and split structured
+thinking parts from visible assistant answers when the export includes that
+detail. Repeated manual uploads are incremental: unchanged conversations are
+skipped, and updated conversations replace the stable session for that
+provider/account/conversation id. Existing malformed pre-v1 web-chat archives
+are not migrated automatically; reimport from the original export after a reset
+or cleanup.
 Tool calls and tool results are normalized before archive write where provider
 data is available. For example, Devin tool calls live in
@@ -422,16 +452,19 @@ importer/parser logic to rebuild the local archive without redoing setup:
 ```sh
 npm install -g agentel@latest
-agentlog update --yes --since all
+agentlog update --yes
 ```
 `agentlog update` preserves `config.json`, redaction settings, web account
-labels, source histories, and recall integrations. It removes derived local
-archive, import, index, cache, and sync bookkeeping, then reimports configured
-local sources from the stored preferences. It does not touch remote sync objects
-by default; use `agentlog sync replace` when the remote should match the rebuilt
-local archive. It also does not rediscover manual ChatGPT/Claude.ai export
-files; reimport those web exports from the original ZIP/folder when needed.
+labels, manually imported ChatGPT/Claude.ai archives, source histories, and
+recall integrations. It removes derived local archive, import, index, cache, and
+sync bookkeeping, then reimports configured local sources from the stored
+preferences. The rebuild window comes from the initial backfill or an explicit
+all-source import such as `agentlog import --source all --since all`; the
+fallback for legacy configs is `all`. The watcher's rolling
+`imports.defaultSinceDays` is not used by `agentlog update`. It does not touch
+remote sync objects by default; use `agentlog sync replace` when the remote
+should match the rebuilt local archive.
 Use `agentlog config` to change `~/.agentlog/config.json` without rerunning the
 init wizard:
@@ -471,16 +504,19 @@ local stores.
 After discovery, init offers a checkbox-style source picker. Rows marked `[x]`
 are selected; type one or more row numbers, such as `1 3 8`, to toggle sources
 on or off, then press Enter with no input to accept the current selection.
-Claude SDK jobs are shown as a separate opt-in source because batch SDK traffic
-can exceed interactive sessions. The selected sources are saved in config and
-used by later `agentlog import --source all` runs unless `--sources` is provided
-explicitly.
+Codex SDK jobs and Claude SDK jobs are shown as separate opt-in sources because
+batch SDK traffic can exceed interactive sessions. The selected sources are
+saved in config and used by later `agentlog import --source all` runs unless
+`--sources` is provided explicitly.
 Default init sources:
 - Codex CLI sessions and Codex Desktop sessions from Codex state, shown as
-  separate toggles
-- Claude Code CLI transcripts from `~/.claude/projects`
+  separate toggles, including linked Codex subagent child sessions when
+  `thread_spawn_edges` metadata is present; Codex SDK jobs are available as an
+  opt-in batch source
+- Claude Code CLI transcripts from `~/.claude/projects`, including subagent
+  definition snapshots and `subagents/*.jsonl` runs imported as child sessions
 - Claude Code Desktop metadata and Claude Workspace/local-agent sessions from
   the Claude app data, shown as separate toggles
 - Gemini CLI saved chats/checkpoints under `~/.gemini/tmp`, plus session/export JSONL stores with tool, usage, and checkpoint metadata
@@ -531,7 +567,11 @@ agentlog import --source all --since all
 agentlog import --sources codex-cli,codex-desktop,claude,claude-code-desktop,claude-workspace,gemini-cli,antigravity,devin-cli,cursor,cline,opencode-cli,opencode-desktop,opencode-web,aider --since all
 agentlog import --source codex-desktop --since 90d
 agentlog import --source codex-cli --since 30d
+agentlog import --source codex-sdk --since all
+agentlog import chatgpt
+agentlog import claude-web
 agentlog import chatgpt ~/Downloads/chatgpt-export.zip --username you@example.com
+agentlog import chatgpt "~/Downloads/OpenAI-export/User Online Activity" --username you@example.com
 agentlog import claude-web ~/Downloads/claude-export --username you --display-name "Personal Claude"
 agentlog import --source claude --since 30d
 agentlog import --source claude-code-desktop --since all

package/docs/code-reference.md CHANGED Viewed

@@ -99,11 +99,12 @@ low-signal filtering.
 Exports:
 - `CANONICAL_EVENT_SCHEMA_VERSION`: current event schema id,
-  `agentlog.events.v1`.
+  `agentlog.events.v2`.
 - `EVENT_KINDS`: constants for `session.started`, `prompt.submitted`,
   `response.generated`, `tool.called`, and `tool.completed`.
 - `normalizeSessionEvents(session, messages, options)`: maps transcript
-  messages into canonical events.
+  messages into canonical events and links `tool.completed` events to matching
+  `tool.called` parents.
 - `messageToCanonicalEvents(message, session, options)`: maps one message into
   zero or more canonical events.
 - `stableEventId(sessionId, messageIndex, kind, ordinal, content)`: creates a
@@ -219,7 +220,9 @@ Command handlers:
 - `integrationsCommand(args, env)`: canonical integration command group for
   recall surfaces.
 - `mcpCommand(args, flags, env)`: canonical MCP server command group.
-- `importCommand(args, flags, env)`: imports local sources or web export files.
+- `importCommand(args, flags, env)`: imports local sources and downloaded web
+  export files, or prints manual ChatGPT/Claude.ai export instructions when no
+  web export path is supplied.
 - `recallCommand(args, env)`: handles recall server/install/show/reindex flows.
 - `showRecallSession(sessionId, env)`: prints a session through the recall path.
 - `showCommand(sessionId, flags, env)`: prints, opens, or JSON-serializes a
@@ -430,7 +433,10 @@ Embedded history web app functions:
 - `renderMarkdownLink(label, href)`: renders safe links.
 - `renderSkillLink(name, skillPath)`: renders `$skill` links.
 - `compactSkillPath(value)`: shortens skill paths for display.
-- `setView(mode)`: toggles readable and raw Markdown views.
+- `captureRelativeScrollPosition()`: captures the detail pane scroll ratio.
+- `restoreRelativeScrollPosition(position, serial)`: restores a captured scroll
+  ratio after a view-mode layout swap.
+- `setView(mode, options)`: toggles readable and raw Markdown views.
 - `sessionDetailsText(payload)`: builds copyable session details.
 - `copyText(value)`: writes text to the clipboard.
 - `copySessionDetails()`: copies session metadata for pasting into agents.
@@ -555,9 +561,9 @@ Import dispatch and generic providers:
   Desktop/Workspace metadata and audit sessions.
 - `matchesImportedSessionRepo(session, repo, wantedRepos)`: checks repo filters
   for sessions with repo or scope attribution.
-- `importCodexProvider(provider, since, options, env)`: imports Codex CLI or
-  Desktop threads from state DB, rollout files, and Codex supplementary
-  summaries when available.
+- `importCodexProvider(provider, since, options, env)`: imports Codex CLI,
+  Desktop, or opt-in exec/SDK threads from the state DB, session index, rollout
+  files, and Codex supplementary summaries when available.
 - `importCursorProvider(provider, since, options, env)`: imports Cursor SQLite
   and Cursor project transcript sessions; supervisor calls set
   `cursorRecovery=false` to skip raw SQLite salvage/backfill.
@@ -581,7 +587,10 @@ Generic parsing helpers:
   shapes.
 - `extractClaudeMessagesFromEvent(event, provider, context)`: Claude Code/SDK
   JSONL parser for text, thinking, tool calls/results, model, request id, and
-  usage metadata, including `apply_patch` shell calls promoted to edit diffs.
+  usage metadata, including lineage fields, agent/slug/tool-use ids, MCP
+  structured content, API error metadata, richer Claude usage extras, Remote
+  Control lifecycle context messages, tool result name repair from prior
+  `tool_use` ids, and `apply_patch` shell calls promoted to edit diffs.
 - `updateClaudeParseContext(event, provider, context)`: keeps Claude model,
   session, and cwd context while parsing JSONL records.
 - `extractCodexSummaryMessage(event, provider)`: extracts readable Codex
@@ -599,6 +608,8 @@ Generic parsing helpers:
   summary for tool-call metadata.
 - `normalizeWebConversations(provider, data)`: normalizes web export
   conversations.
+- `webExportInstructions(source)`: returns the provider-specific manual export
+  instruction payload used by ChatGPT and Claude.ai import commands.
 - `chatgptMessages(conversation)`: parses ChatGPT export conversation nodes
   and attaches provider or estimated message usage.
 - `claudeMessages(conversation)`: parses Claude.ai export messages, separating
@@ -610,8 +621,8 @@ Discovery and summary helpers:
 - `summarizeFiles(files)`: counts files, projects, and oldest mtime.
 - `summarizeCodex(env, source)`: summary wrapper for Codex threads.
-- `summarizeCodexThreads(allThreads, source)`: summarizes Codex CLI/Desktop
-  counts.
+- `summarizeCodexThreads(allThreads, source)`: summarizes Codex CLI/Desktop or
+  opt-in exec/SDK counts.
 - `summarizeClaude()`: summarizes Claude Code CLI files.
 - `summarizeClaudeScan(scan)`: formats Claude scan results.
 - `summarizeClaudeSdk()`: summarizes Claude SDK job files.
@@ -643,11 +654,13 @@ Source location and file helpers:
 - `scanClaudeProjectFiles(options)`: scans and classifies Claude project JSONL.
 - `isClaudeConversationFile(file)`: tests whether a Claude file is interactive.
 - `classifyClaudeFile(file)`: classifies Claude JSONL as conversation, SDK job,
-  or other.
+  or other; Remote Control `sdk-cli` transcripts with Remote Control deferred
+  tool names are kept with interactive Claude Code conversations.
 - `readInitialLines(file, maxLines, maxBytes)`: reads a bounded prefix of a
   large JSONL file.
-- `readCodexThreads(env)`: queries Codex state DB for top-level threads and
-  optional `stage1_outputs` summary documents.
+- `readCodexThreads(env)`: queries Codex state DB for top-level threads, merges
+  `session_index.jsonl` titles, and reads optional `stage1_outputs` summary
+  documents.
 - `sqliteTableExists(dbPath, tableName)`: checks optional SQLite tables before
   querying version-dependent Codex state.
 - `codexStateDb(env)`: resolves Codex state DB path.