npm - @misha_misha/agentwatch - Versions diffs - 0.0.2 → 0.0.3 - Mend

@misha_misha/agentwatch 0.0.2 → 0.0.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (7) hide show

package/README.md +368 -312
package/dist/index.js +4602 -841
package/package.json +15 -3
package/dist/chunk-5QNDC2VP.js +0 -98
package/dist/chunk-EMMMIDXY.js +0 -37
package/dist/detect-JH6COHZ5.js +0 -7
package/dist/workspace-N6FQVHKD.js +0 -9

package/README.md CHANGED Viewed

@@ -4,9 +4,11 @@
 **See what every AI coding agent on your machine is doing — in one terminal.**
-Local-only observability for Claude Code, OpenClaw, and Cursor — all the
-tool calls, file reads & writes, shell commands, prompts, responses, costs,
-and permissions, in a single timeline.
+Local-only observability for Claude Code, Codex, Gemini CLI, Cursor, and
+OpenClaw — unified timeline, real token + cost accounting, compaction and
+anomaly detection, an MCP server agents can query their own history from,
+and an OpenTelemetry exporter with `gen_ai.*` semantic conventions. All
+local. No cloud. No telemetry. No sign-in.
 [![npm](https://img.shields.io/npm/v/@misha_misha/agentwatch.svg)](https://www.npmjs.com/package/@misha_misha/agentwatch)
 [![CI](https://github.com/mishanefedov/agentwatch/actions/workflows/ci.yml/badge.svg)](https://github.com/mishanefedov/agentwatch/actions/workflows/ci.yml)
@@ -15,19 +17,8 @@ and permissions, in a single timeline.
 </div>
-<!--
-  Hero demo GIF — recorded on a real machine, ~30s, showing:
-  1. `npm i -g @misha_misha/agentwatch`
-  2. `agentwatch` launches, events streaming
-  3. Press `?` → hotkey reference
-  4. Press `P` → projects grid → Enter → sessions → Enter → scoped timeline
-  5. Press Enter on a Bash event → full stdout + duration
-  6. Press `p` → permissions view with flagged risks
--->
 <div align="center">
-  <img src="./docs/demo.gif" alt="agentwatch demo" width="820"
-       onerror="this.style.display='none'" />
+  <img src="./docs/demo.gif" alt="agentwatch demo" width="820" />
 </div>
 ---
@@ -37,23 +28,16 @@ and permissions, in a single timeline.
 - [Why this exists](#why-this-exists)
 - [Install](#install)
 - [First 60 seconds](#first-60-seconds)
+- [Agent coverage](#agent-coverage)
 - [Features](#features)
-  - [Live multi-agent timeline](#live-multi-agent-timeline)
-  - [Event detail pane](#event-detail-pane)
-  - [Subagent drilldown](#subagent-drilldown)
-  - [Project and session navigation](#project-and-session-navigation)
-  - [Full-text search](#full-text-search)
-  - [Per-agent permission surface](#per-agent-permission-surface)
-  - [Per-session cost with cache accounting](#per-session-cost-with-cache-accounting)
-  - [Desktop notifications](#desktop-notifications)
-  - [Yank to clipboard](#yank-to-clipboard)
 - [Keyboard reference](#keyboard-reference)
-- [What agentwatch reads](#what-agentwatch-reads)
 - [Configuration](#configuration)
+- [What agentwatch reads](#what-agentwatch-reads)
+- [MCP server mode](#mcp-server-mode)
+- [OpenTelemetry exporter](#opentelemetry-exporter)
 - [How it compares](#how-it-compares)
 - [Limitations](#limitations)
 - [Non-goals](#non-goals)
-- [Roadmap](#roadmap)
 - [Architecture](#architecture)
 - [Development](#development)
 - [Security](#security)
@@ -63,27 +47,19 @@ and permissions, in a single timeline.
 ## Why this exists
-You're running three AI coding agents on one laptop. Claude Code in a
-terminal, Cursor as your IDE, OpenClaw churning on long-running tasks.
-Every one of them has its own log file in a different place, its own
-permission model, its own idea of what a "session" is. None of them tells
-you what the others are doing.
+You run three AI coding agents on one laptop. Claude Code in a terminal,
+Codex alongside it, Cursor as your IDE, maybe Gemini CLI for a quick
+review, maybe an OpenClaw sub-agent churning on a long task. Every one of
+them has its own log file, its own permission model, its own idea of what
+a "session" is. None of them tells you what the others are doing.
 When something goes wrong — a file rewritten unexpectedly, a spend spike,
-an `rm` you don't remember running — you're piecing it together from three
+an `rm` you don't remember running — you're piecing it together from five
 JSONLs and guessing.
-[`claude-devtools`](https://github.com/matt1398/claude-devtools) solves
-this beautifully for Claude Code. **agentwatch does the same thing for the
-whole multi-agent stack, in the terminal, with zero infra.**
-- Single timeline: every agent's tool call, file op, shell exec, prompt,
-  response
-- Per-agent attribution: `[auraqu] Bash: git log --oneline` or `[content_agent / sub:ab3c99fc] WebFetch: https://…`
-- One unified permissions view: Claude allow/deny, Cursor approval mode,
-  OpenClaw sub-agents
-- Local-only by design. No cloud. No telemetry. No sign-in. `lsof` will
-  show you there's nothing outbound.
+[`claude-devtools`](https://github.com/matt1398/claude-devtools) does this
+well for Claude Code. **agentwatch does it for the whole multi-agent
+stack, in the terminal, with zero infrastructure and zero network.**
 ---
@@ -97,56 +73,73 @@ agentwatch
 Requires:
 - **Node ≥ 20** (tested on 20 + 22 in CI)
-- **macOS or Linux** (Windows is intentionally out of scope for v0)
+- **macOS or Linux** (Windows intentionally out of scope for v0.x)
-agentwatch is published under the `@misha_misha` npm scope — the unscoped
-`agentwatch` name was already taken by a different tool from CyberArk.
-Once installed the binary on your `$PATH` is simply `agentwatch`.
+Published under the `@misha_misha` npm scope — the unscoped `agentwatch`
+name was already taken by a CyberArk tool. The installed binary on your
+`$PATH` is simply `agentwatch`.
 ---
 ## First 60 seconds
 ```bash
-# 1. Sanity-check that agentwatch sees your installed AI agents
-agentwatch doctor
+agentwatch doctor   # detects installed agents + readiness
+agentwatch          # launches the TUI
+agentwatch mcp      # runs the MCP stdio server (for agents, not humans)
+agentwatch --help
 ```
-You should see something like:
+`doctor` output looks like:
 ```
 workspace: /Users/you/IdeaProjects
 agents:
-  ● Claude Code    installed
-    config: /Users/you/.claude/settings.json
-  ○ Codex          not detected
-    config: /Users/you/.codex/config.toml
-  ● Cursor         installed
-    config: /Users/you/.cursor/mcp.json
-  ● Gemini CLI     installed
-    config: /Users/you/.gemini/settings.json
-  ● OpenClaw       installed
-    config: /Users/you/.openclaw
+  ● Claude Code        installed (events captured)
+  ● Codex              installed (events captured)
+  ● Gemini CLI         installed (events captured)
+  ● Cursor             installed (config-level only)
+  ● OpenClaw           installed (events captured)
+  ○ Aider              not detected
+  ○ Cline (VS Code)    not detected
 ```
-Then:
-```bash
-# 2. Launch the TUI
-agentwatch
-```
+Launch the TUI and every event your agents emit streams in. The last 4 MB
+of each active session is backfilled on startup so you have immediate
+context. Press **`?`** to see every hotkey.
-You land in the live timeline. Everything your agents do from this moment
-on streams in. The last ~64 KB of each active session is backfilled so you
-have immediate context.
+---
-- Press **`?`** — see every hotkey
-- Press **`P`** — open the projects grid (one card per workspace)
-- Use Claude Code in another terminal and watch events appear
-- Press **Enter** on a row to see the full content (file diff, full Bash
-  output, full prompt text, extended thinking)
-- Press **`q`** — quit, your shell scrollback is restored
+## Agent coverage
+What actually works per agent, as of v0.0.3. Features not listed here
+work across every agent (timeline, export, syntax highlighting, notifications,
+triggers, search, stale detection, clipboard yank).
+| Feature                        | Claude Code | Codex | Gemini CLI | Cursor | OpenClaw |
+| ------------------------------ | :---------: | :---: | :--------: | :----: | :------: |
+| Live events on timeline        | ✅          | ✅    | ✅         | 🟡     | ✅       |
+| Token usage + cost             | ✅          | ✅    | ✅         | ❌     | ✅       |
+| Tool call + result pairing     | ✅          | ✅    | ✅         | ❌     | 🟡       |
+| Per-turn token attribution     | ✅          | ✅    | ✅         | ❌     | ✅       |
+| Budget alarms (session + day)  | ✅          | ✅    | ✅         | ❌     | ✅       |
+| Anomaly detection (cost/loops) | ✅          | ✅    | ✅         | 🟡     | ✅       |
+| Compaction visualizer          | ✅          | ✅    | ❌         | —      | ❌       |
+| Permissions view               | ✅          | ✅    | ✅         | ✅     | ✅       |
+| Cross-session search           | ✅          | ✅    | ✅         | ❌     | ❌       |
+| Subagent drilldown             | ✅          | —     | 🟡         | —      | 🟡       |
+| Agent memory file overhead     | `CLAUDE.md` | `AGENTS.md` | `GEMINI.md` | `.cursorrules` | `OPENCLAW.md` |
+| OTel span coverage             | ✅          | ✅    | ✅         | 🟡     | ✅       |
+| MCP server exposes history     | ✅          | ✅    | ✅ (raw)   | ❌     | ❌       |
+- **Cursor** exposes config state (MCP servers, `.cursorrules`, approval
+  mode, sandbox) but its actual AI activity lives in a SQLite database we
+  haven't parsed yet. A thin read-only adapter is a follow-up.
+- **Gemini CLI** doesn't persist context-compaction markers to disk, so
+  compaction detection is Claude + Codex only.
+- **OpenClaw** doesn't persist tool_result content or compaction markers
+  to its JSONL — structural limit of what's on disk, not an adapter gap.
 ---
@@ -154,130 +147,173 @@ have immediate context.
 ### Live multi-agent timeline
-The main screen. Every event your agents emit, in reverse-chronological
-order by event timestamp (not arrival order — backfill from different
-session files is sorted correctly). Columns: time · agent · type ·
-`[project / sub-agent]` summary · duration · error flag.
+Main screen. Every event your agents emit, ordered by event timestamp (not
+arrival order, so backfill from different sessions merges correctly).
+Columns: time · agent · type · `[project]` summary · duration · error.
 ```
-09:54:01  openclaw     response       [content_agent] <think> Checked the knowledge base…
+09:54:01  openclaw     response       [content_agent] <think> Checked the KB…
 09:52:53  claude-code  response       [auraqu] Commit bddc363. q now exits instantly…
-09:52:48  claude-code  tool_call      [auraqu] Bash: git log -5 · 12ms
+09:52:48  codex        shell_exec     [dataset_research] ls -la · 12ms
 09:52:43  claude-code  tool_call      [auraqu] Edit: src/ui/App.tsx · 7ms
-09:51:51  claude-code  tool_call      [auraqu] Agent: Competitive landscape 2026 ▸ 52 child events
-09:28:06  claude-code  prompt         [mishanefedov] I mean what's the link, do I go through…
+09:51:51  gemini       file_write     [landing] write_file: public/llms.txt
+09:51:51  claude-code  tool_call      [auraqu] Agent: Competitive landscape ▸ 52 child events
 ```
-Event types: `prompt`, `response`, `tool_call`, `shell_exec`, `file_read`,
-`file_write`, `file_change`, `session_start`. Each gets a risk-based color
-(green / yellow / orange / red) so destructive shell execs and sensitive
-file writes jump out visually.
+Rows with an anomaly fire a red `◎` prefix on the type column.
 ### Event detail pane
-Press **`Enter`** on any focused row. Opens a full-screen pane with:
+Press **`Enter`** on any row. Opens a full-screen pane with:
-- Event metadata (time, agent, type, tool, path, cmd)
-- **Tokens / cost / duration** breakdown — `in=6 cache_create=25508 cache_read=16827 out=353` · `cost: $0.08 (claude-opus-4-6)` · `duration: 151ms`
-- **Tool result** — stdout / stderr for Bash, full file content for Read,
-  search matches for Grep
-- **Full text** — untruncated prompt or response
-- **Extended thinking** block when present
-- **Tool input** — JSON-pretty for Task / WebFetch / Grep arguments
+- Metadata (time, agent, type, tool, path, cmd)
+- Tokens / cost / duration (`in=6 cache_create=25508 cache_read=16827 out=353` · `$0.08 (claude-opus-4-6)` · `151ms`)
+- Tool result — stdout for Bash, file content for Read/Write, search matches for Grep — with syntax highlighting inferred from the tool + file extension
+- Full prompt or response text
+- Extended thinking block when present
+- Tool input JSON
 Scrollable with `↑↓` or `j/k`. `esc` closes.
 ### Subagent drilldown
-Parent `Agent` tool_use events show `▸ 52 child events`. Press **`x`** on
-one to scope the timeline to only that subagent's inner tool calls — every
-Bash, WebFetch, Grep, Read it ran during that Task. `X` unscopes.
-This is the only multi-agent-observability feature claude-devtools has that
-agentwatch matches *and* extends — because we cover non-Claude agents too.
+Parent `Agent` tool_use events show `▸ 52 child events`. Press **`x`** to
+scope the timeline to only that subagent's inner tool calls. `X` unscopes.
+Applies to Claude Code (Task tool) and partially to OpenClaw (per-agent
+delegation) and Gemini (subagent sessions).
-### Project and session navigation
+### Project + session navigation
 ```
-P  →  projects grid     (every workspace across every agent)
-↓  →  pick one, Enter
-       ↓
-       sessions list     (grouped Today / Yesterday / Last 7d / Older)
-       ↓  pick one, Enter
-              ↓
-              scoped timeline  (only events from this session)
+P → projects grid (one workspace per row, across all agents)
+     ↓ enter → sessions list (grouped Today / Yesterday / 7d / Older)
+             ↓ enter → scoped timeline
 ```
-Projects grid aggregates across agents: one row per filesystem path, with
-per-agent session counts, total cost, last activity. Sessions list shows
-the first user prompt + event count + duration + cost per session.
-`esc` always walks back one level.
+Projects grid aggregates across agents: per-agent session counts, total
+cost, last activity. `esc` walks back one level.
-### Full-text search
+### Cross-session search (`?`)
-Press **`/`**. Type a query. The timeline narrows to matches against
-summary, path, cmd, tool, agent, full text, or extended thinking. Live
-match count shown below the timeline. `esc` clears.
+Press **`?`** — fuzzy-substring search across every session file on disk
+(`~/.claude`, `~/.codex`, `~/.gemini`). Uses ripgrep if installed, falls
+back to a native scan. Enter on a hit scopes the timeline to that session.
-```
-/ Bash: rm   →   filters to every Bash with "rm" in its command
+Different from in-buffer search:
+- **`/`** — search the 500-event live buffer
+- **`?`** — search every session file ever written
+### Per-session cost with cache accounting
+Naive token counters are 3–10× wrong on Claude because `cache_read` is
+billed at 10% of input and `cache_creation` at 125%. agentwatch ships a
+per-model rate table (Claude opus/sonnet/haiku, GPT-5 / GPT-5-mini,
+Gemini 2.5 Pro/Flash) and computes true USD cost per turn. Cost shows:
+- Per-agent total in the side panel
+- Per-event in the detail pane
+- Per-session in the sessions list
+- Aggregate in the session's token attribution view (`[t]`)
+### Per-turn token attribution (`[t]`)
+Inside a scoped session, press **`t`**. Stacked bar per turn showing:
+- `user` — the preceding prompt (tokenized with `gpt-tokenizer`)
+- `memory file` — CLAUDE.md / AGENTS.md / GEMINI.md / .cursorrules / etc., read from the session's cwd
+- `tool I/O` — tool_input JSON + tool_result text
+- `thinking` — extended thinking block
+- `input (fresh)` / `cache read` / `cache create` / `output` — exact from the model's own usage record
+### Compaction visualizer (`[C]`)
+Inside a scoped session, press **`C`**. Horizontal bar of context fill %
+across turns, with `⋈` markers where the agent auto-compacted. Selected
+compaction shows before / after token counts and the dropped-token delta.
+Works on Claude Code (via `isCompactSummary`) and Codex (via
+`event_msg/turn_truncated`).
+### Budget alarms
+`~/.agentwatch/budgets.json`:
+```json
+{ "perSessionUsd": 5, "perDayUsd": 20 }
 ```
-### Per-agent permission surface
+Red banner in the Header when either cap is crossed; OS notification
+fires once per crossing. No kill switch — we don't control agents; we
+just shout.
-Press **`p`**. Scrollable view of:
+### Anomaly detection
-- **Claude Code** — allow / deny / `defaultMode`, plus flagged risks:
-  `Bash(*)` allows arbitrary shell; missing `~/.ssh` / `.aws` / `.gnupg`
-  denies; auto or bypass mode surfaces as red.
-- **Cursor** — approval mode, sandbox state, allow/deny counts, MCP server
-  list, discovered `.cursorrules` file paths.
-- **OpenClaw** — default workspace + per-sub-agent breakdown (name, emoji,
-  model, workspace).
+Three detectors, all fully local, all running on the 500-event buffer:
-Gemini CLI is documented and omitted — it exposes no permission model
-beyond auth.
+- **MAD z-score outliers** on cost, duration, and input tokens per agent
+  (`|z| > 3.5` by default — tune in `~/.agentwatch/anomaly.json`)
+- **Stuck-loop detector** with periods 1–4 — catches `A-A-A-…` and
+  `A-B-A-B-…` "apologize and retry" loops
+- Per-session rollup + OS notification on first flag + timeline `◎` marker
+  + `[D]` to dismiss the banner
-### Per-session cost with cache accounting
+### User-defined notification triggers
-Naive token summers are **3–10× wrong** on Claude because `cache_read` is
-billed at 10% of input while `cache_creation` is billed at 125%. agentwatch
-embeds a per-model rate table (opus-4-6, sonnet-4-6, haiku-4-5) and
-computes true USD cost per turn.
+`~/.agentwatch/triggers.json` — live-reloaded via chokidar:
-- Per-agent total cost shows in the side panel
-- Per-event cost shows in the detail pane
-- Per-session cost shows in the sessions list
+```json
+[
+  { "match": "curl .* \\| (bash|sh)", "title": "pipe-to-shell", "body": "{{agent}}: {{cmd}}" },
+  { "type": "file_write", "pathMatch": "^/etc/", "title": "/etc write" },
+  { "thresholdUsd": 0.5, "title": "expensive turn", "body": "cost {{cost}}" }
+]
+```
+Placeholders: `{{agent}} {{type}} {{cmd}} {{path}} {{tool}} {{summary}} {{cost}}`.
 ### Desktop notifications
-Fires only for events that happen **after** the TUI was launched (backfill
-is silent). Rate-limited to one alert per rule per 60s.
+Built-in alerts fire on sensitive events — `.env` access, `~/.ssh` /
+`~/.aws` / `~/.gnupg` paths, `rm -rf`, `sudo`, `curl | sh`, tool errors,
+budget breach, anomaly. Rate-limited (60s per rule key). Silent during
+backfill.
-Triggers built-in:
+Platform dispatch: `osascript` on macOS, `notify-send` on Linux,
+PowerShell `MessageBox` on Windows. Zero third-party dependencies.
-- `.env` file read or write
-- `~/.ssh`, `~/.aws`, `~/.gnupg` paths touched
-- `rm -rf`, `sudo`, `curl | sh` in shell_exec
-- Tool errors (`is_error: true`)
+### Per-agent permission surface (`[p]`)
-Platform dispatch: `osascript` on macOS, `notify-send` on Linux,
-PowerShell fallback on Windows. Zero dependencies.
+Scrollable view showing:
+- **Claude Code** — allow / deny / defaultMode; flagged risks (`Bash(*)`, missing `.ssh` denies, `auto` / `bypass` modes in red)
+- **Codex** — config.toml projects + trust_level; latest session's sandbox_policy, approval_policy, writable_roots, network_access, model
+- **Gemini CLI** — auth type, selected model, tool allow/block lists, trusted folders
+- **Cursor** — approval mode, sandbox state, MCP servers, discovered `.cursorrules`
+- **OpenClaw** — default workspace + per-sub-agent (name, emoji, model, workspace)
+### Session export (`[e]`)
+From a session list or scoped timeline, press **`e`**. Writes
+`./agentwatch-export/<agent>-<session>-<ts>.md` (human-readable transcript
+with tool calls as fenced blocks) and `.json` (raw events). Path copied to
+clipboard.
+### Syntax highlighting in the detail pane
-Custom regex triggers (e.g. "alert on any `psql.*prod`") are planned for
-v0.5.
+`cli-highlight` (tiny ANSI highlighter) applies to:
+- Tool input JSON
+- Tool result when the tool is Bash or the file extension is known (`.ts`, `.py`, `.rs`, `.go`, etc.)
+- Fenced blocks in user/assistant text
-### Yank to clipboard
+### Stale-session detection
-Press **`y`** on any focused event. Copies the most useful payload:
+Sessions and projects idle for > 5 minutes render dimmed with a `⊘ stale`
+badge. Un-greys on the next event.
-- Tool result content if available (the actual stdout / file body)
-- Otherwise full text (prompt / response)
-- Otherwise cmd / path / summary
+### Clipboard yank (`[y]`)
-macOS `pbcopy`, Linux `wl-copy` / `xclip` / `xsel`, Windows `clip`.
-Confirmation flashes in green at the footer (`✓ copied 4,210 chars`) or red
-if no clipboard tool is available.
+Copies the most useful payload (tool result > full text > cmd / path /
+summary). Uses `pbcopy`, `wl-copy` / `xclip` / `xsel`, or `clip`.
+Confirmation flashes at the footer.
 ---
@@ -292,8 +328,8 @@ Press **`?`** anytime to open this inside the TUI.
 | `↑ ↓` / `j k`      | move selection in the timeline                 |
 | `Enter`            | open event detail pane                         |
 | `esc`              | close current view / clear selection           |
-| `P`                | projects grid — every workspace on this machine |
-| `Enter` on project | sessions list for that project (by date)       |
+| `P`                | projects grid                                  |
+| `Enter` on project | sessions list for that project                 |
 | `Enter` on session | scoped timeline for that session               |
 | `q` / `Ctrl-C`     | quit                                           |
@@ -301,107 +337,181 @@ Press **`?`** anytime to open this inside the TUI.
 | Key  | Action                                                       |
 | ---- | ------------------------------------------------------------ |
-| `/`  | full-text search (summary, path, cmd, tool, text, thinking)  |
-| `f`  | cycle agent filter (Claude only → OpenClaw only → …)         |
+| `/`  | in-buffer search (last 500 events)                           |
+| `?`  | cross-session search (every session file on disk)            |
+| `f`  | cycle agent filter                                           |
 | `a`  | toggle agent side panel                                      |
 | `x`  | drill selected Agent event into its subagent run             |
 | `X`  | unscope subagent                                             |
 | `A`  | clear project filter                                         |
+| `Z`  | clear all filters                                            |
 ### Actions
 | Key       | Action                                      |
 | --------- | ------------------------------------------- |
 | `y`       | yank selected event content to clipboard    |
+| `e`       | export current session to `.md` + `.json`   |
 | `space`   | pause / resume live event stream            |
 | `c`       | clear event buffer                          |
+| `D`       | dismiss the current anomaly banner          |
-### Info views
+### Info overlays (only in a scoped session)
-| Key    | Action                                                    |
-| ------ | --------------------------------------------------------- |
-| `p`    | permissions view (Claude + Cursor + OpenClaw)             |
-| `↑↓`   | scroll inside permissions or detail pane                  |
+| Key    | Action                                    |
+| ------ | ----------------------------------------- |
+| `t`    | per-turn token attribution                |
+| `C`    | context compaction visualizer             |
+| `p`    | permissions view (works anywhere)         |
+---
+## Configuration
+Four config files, all optional. Loaded on startup; triggers reload live.
+| File                             | Purpose                                                  |
+| -------------------------------- | -------------------------------------------------------- |
+| `~/.agentwatch/triggers.json`    | User-defined notification rules (live-reloaded)          |
+| `~/.agentwatch/budgets.json`     | `perSessionUsd` / `perDayUsd` spend caps                 |
+| `~/.agentwatch/anomaly.json`     | `zScore`, `loopWindow`, `loopMinRepeats`, `minSamples`   |
+Environment variables:
+| Variable                       | Default                     | Purpose                                               |
+| ------------------------------ | --------------------------- | ----------------------------------------------------- |
+| `WORKSPACE_ROOT`               | `~/IdeaProjects` (fallback) | Where the generic filesystem watcher looks for edits  |
+| `AGENTWATCH_CONTEXT_WINDOW`    | `200000`                    | Tokens per window — used by compaction % calculation  |
+| `AGENTWATCH_OTLP_ENDPOINT`     | unset                       | Enables the OTel exporter when set                    |
+| `NO_COLOR`                     | unset                       | Standard honoring: disables ANSI colors if set        |
+Workspace fallback chain (used when `WORKSPACE_ROOT` isn't set):
+`~/IdeaProjects` → `~/src` → `~/code` → `~/Projects` → `~/dev` → `$HOME`.
 ---
 ## What agentwatch reads
-Everything is read-only. agentwatch writes to exactly two places: your
-terminal, and the clipboard (on explicit `y`).
+Read-only. agentwatch writes to exactly two places: your terminal and the
+clipboard (on explicit `y`) / disk (on explicit `e` to export).
+| Path                                                         | What                                     |
+| ------------------------------------------------------------ | ---------------------------------------- |
+| `~/.claude/projects/**/*.jsonl`                              | Claude Code session transcripts          |
+| `~/.claude/projects/**/subagents/*.jsonl`                    | Claude Code Task-spawned subagents       |
+| `~/.claude/settings.json`                                    | Claude permissions                       |
+| `~/.codex/sessions/**/rollout-*.jsonl`                       | Codex session transcripts                |
+| `~/.codex/config.toml`                                       | Codex permissions + trust levels         |
+| `~/.gemini/tmp/**/chats/*.json`                              | Gemini CLI transcripts + tool calls      |
+| `~/.gemini/settings.json` + `trustedFolders.json`            | Gemini permissions                       |
+| `~/.openclaw/agents/*/sessions/*.jsonl`                      | OpenClaw sub-agent sessions              |
+| `~/.openclaw/logs/config-audit.jsonl` + `openclaw.json`      | OpenClaw config audit + agent roster     |
+| `~/.cursor/{mcp.json, cli-config.json, ide_state.json}`      | Cursor config state                      |
+| Any `.cursorrules` / `.cursor/rules/*.mdc` under WORKSPACE   | Cursor project rules                     |
+| `{CLAUDE,AGENTS,GEMINI,OPENCLAW}.md` + `.windsurfrules` etc. | Per-agent memory files for token attribution |
+| `~/.agentwatch/*.json`                                       | User config (triggers / budgets / anomaly) |
+| `$WORKSPACE_ROOT` tree                                       | Filesystem change events                 |
+`SECURITY.md` carries the authoritative list and details of what is *not* read.
+---
+## MCP server mode
+Run agentwatch as an MCP server so other agents can query their own
+history. Install:
+```bash
+claude mcp add agentwatch -- npx -y @misha_misha/agentwatch mcp
+# or edit ~/.claude.json / ~/.cursor/mcp.json manually
+```
+Tools exposed:
-| Path                                          | What                                |
-| --------------------------------------------- | ----------------------------------- |
-| `~/.claude/projects/**/*.jsonl`               | Claude Code session transcripts     |
-| `~/.claude/projects/**/subagents/*.jsonl`     | Claude Code Task-spawned subagents  |
-| `~/.claude/settings.json`                     | Claude permissions                  |
-| `~/.openclaw/agents/*/sessions/*.jsonl`       | OpenClaw sub-agent sessions         |
-| `~/.openclaw/logs/config-audit.jsonl`         | OpenClaw config audit trail         |
-| `~/.openclaw/openclaw.json`                   | OpenClaw agent roster               |
-| `~/.cursor/{mcp.json, cli-config.json, ide_state.json}` | Cursor state               |
-| Any `.cursorrules` under `$WORKSPACE_ROOT`    | Cursor project rules                |
-| `$WORKSPACE_ROOT` tree (default `~/IdeaProjects`) | Filesystem change events        |
+| Tool                      | Args                              | Returns                                               |
+| ------------------------- | --------------------------------- | ----------------------------------------------------- |
+| `list_recent_sessions`    | `limit?: 1-100`                   | `[{agent, sessionId, project, lastActivity, sizeBytes}]` |
+| `get_session_events`      | `sessionId`, `maxBytes?: 1K-10M`  | Raw JSONL (tail-capped) for that session              |
+| `search_sessions`         | `query`, `limit?: 1-50`           | `[{session, agent, line}]` substring hits             |
+| `get_tool_usage_stats`    | `sessionId?`, `limit?: 1-500`     | Per-tool counts, totalDurationMs, errorCount          |
+| `get_session_cost`        | `sessionId`                       | `{totalCostUsd, turns, tokens, byModel}`              |
-The SECURITY.md file carries the authoritative list and the details on
-what is *not* read.
+See [`docs/features/mcp-server.md`](./docs/features/mcp-server.md).
 ---
-## Configuration
+## OpenTelemetry exporter
-No config file yet — every behavior has a sensible default and the
-scope is deliberately narrow. Two environment variables:
+Set `AGENTWATCH_OTLP_ENDPOINT=http://localhost:4318/v1/traces` to emit
+OTLP/HTTP spans for every agent event. Uses the OpenTelemetry GenAI
+semantic conventions so any consumer (Jaeger, Tempo, Honeycomb, Grafana)
+can interpret the data without custom dashboards.
-| Variable          | Default                             | Purpose                                                 |
-| ----------------- | ----------------------------------- | ------------------------------------------------------- |
-| `WORKSPACE_ROOT`  | `~/IdeaProjects` (fallback chain)   | Where the generic filesystem watcher looks for edits    |
-| `NO_COLOR`        | unset                               | Standard honoring: disables ANSI colors if set          |
+Attributes emitted:
-The workspace fallback chain (used when `WORKSPACE_ROOT` isn't set) is:
-`~/IdeaProjects` → `~/src` → `~/code` → `~/Projects` → `~/dev` → `$HOME`.
+- `gen_ai.system` (anthropic | openai | google | cursor | …)
+- `gen_ai.operation.name` (chat | tool_use | context_compaction | …)
+- `gen_ai.request.model` / `gen_ai.response.model`
+- `gen_ai.usage.input_tokens` / `gen_ai.usage.output_tokens`
+- `gen_ai.tool.name` / `gen_ai.tool.call.id`
+- `error.type` on tool errors
+- `agentwatch.session.id` / `agentwatch.cost_usd`
+- `agentwatch.cache_read_tokens` / `agentwatch.cache_create_tokens` / `agentwatch.cache_hit_ratio`
+- `agentwatch.context.fill_pct`
+- `agentwatch.risk_score`
-Per-user config files and per-adapter toggles land in v0.5 alongside
-custom notification triggers and budget alarms.
+OTel deps are loaded dynamically only when the env var is set — zero
+runtime cost when disabled.
 ---
 ## How it compares
-|                                  | agentwatch                          | claude-devtools            | Unfucked                   | Langfuse / Phoenix               |
-| -------------------------------- | ----------------------------------- | -------------------------- | -------------------------- | -------------------------------- |
-| Runs locally only                | ✓                                   | ✓                          | ✓                          | self-host possible               |
-| Multi-agent                      | ✓ Claude + OpenClaw + Cursor        | Claude only                | agent-agnostic, file-only  | production LLM apps              |
-| Per-agent event attribution      | ✓                                   | ✓                          | ✗ (file-level only)        | N/A                              |
-| Permission surface view          | ✓ (Claude + Cursor + OpenClaw)      | ✗                          | ✗                          | ✗                                |
-| Cost with cache-hit accounting   | ✓                                   | ✓                          | ✗                          | ✓                                |
-| Subagent drilldown               | ✓                                   | ✓                          | ✗                          | ✓ (LangChain)                    |
-| Install                          | `npm i -g`                          | Homebrew / Electron         | Homebrew / Rust binary     | Docker + Postgres                |
-| UI                               | TUI (ink)                           | Electron + web standalone  | CLI only                   | Web                              |
-| Bundle size                      | ~50 KB source                       | ~150 MB Electron app       | ~10 MB Rust binary         | Postgres + app                   |
-| Telemetry                        | none                                | none                       | none                       | opt-in                           |
+|                                    | **agentwatch**                                 | claude-devtools       | Claudex               | ccflare            | Langfuse / Phoenix           |
+| ---------------------------------- | ---------------------------------------------- | --------------------- | --------------------- | ------------------ | ---------------------------- |
+| Runs locally only                  | ✅                                             | ✅                    | ✅                    | ✅                 | self-host possible           |
+| Multi-agent                        | ✅ Claude, Codex, Gemini, Cursor (config), OpenClaw | Claude only           | Claude only           | Claude only        | production LLM apps          |
+| Real token + cost with cache       | ✅                                             | ✅                    | 🟡                    | ✅ (proxy-level)   | ✅                           |
+| Per-turn token attribution         | ✅                                             | ✅                    | ❌                    | ❌                 | ❌                           |
+| Compaction visualizer              | ✅                                             | ✅                    | ❌                    | ❌                 | ❌                           |
+| **Anomaly detection**              | **✅ MAD + stuck-loop**                         | rule-based only       | ❌                    | ❌                 | ❌                           |
+| **Budget alarms w/ OS notification** | **✅**                                         | ❌                    | ❌                    | ❌                 | ❌                           |
+| **User triggers (regex/threshold)**  | **✅ live-reload**                              | ❌                    | ❌                    | ❌                 | ❌                           |
+| **OTel exporter (gen_ai.*)**         | **✅**                                         | ❌                    | ❌                    | ❌                 | ✅ (its own format)          |
+| MCP server (self-query)            | ✅                                             | ❌                    | ✅                    | ❌                 | ❌                           |
+| Permission surface view            | ✅ 5 agents                                     | ❌                    | ❌                    | ❌                 | ❌                           |
+| Subagent drilldown                 | ✅                                             | ✅                    | ❌                    | ❌                 | ✅ (LangChain-specific)      |
+| Install                            | `npm i -g`                                     | Homebrew / Electron   | `npm i -g`            | Bun repo           | Docker + Postgres            |
+| UI                                 | TUI (Ink)                                      | Electron + standalone | Web UI                | Web + TUI          | Web                          |
+| Telemetry                          | none                                           | none                  | none                  | none               | opt-in                       |
+Three moats are genuinely unique: **anomaly detection** (statistical, not
+rule-based), **budget alarms**, and **OTel with gen_ai.* conventions**.
 ---
 ## Limitations
-- **Backfill isn't unbounded.** agentwatch reads the last ~64 KB of each
-  session file on startup. Older events still live in your jsonl — open
-  them directly if you need deeper history.
-- **Cursor activity is config-level only.** Cursor stores its full AI
-  activity in a SQLite DB we don't parse yet. We capture config changes +
-  recently-viewed files as a live signal. Full activity parsing is planned.
-- **Gemini CLI and Codex aren't instrumented yet.** Detection works in
-  `agentwatch doctor`; events don't land in the timeline. Follows in v0.4
-  and v0.5.
-- **macOS and Linux only.** Windows isn't supported in v0. chokidar edge
-  cases and notification dispatch need more testing before we promise it.
-- **Subagent correlation is best-effort.** We match parent `Agent`
-  tool_use to its subagent jsonl via the `agentId` in the `tool_result`
-  payload. If Claude Code changes that convention upstream, drilldown
-  needs an adapter tweak.
-- **The timeline window is 40 rows + header.** Scrolling deep into older
-  events doesn't smoothly extend the window yet; use the projects → sessions
-  drilldown or `/` search to find older events.
+- **agentwatch is a viewer, not a daemon.** It captures events only while
+  the TUI is running. A background-capture daemon is planned.
+- **Backfill is bounded.** On launch we read the last ~4 MB of each
+  active session file (roughly hundreds of events). For long gaps on
+  very active sessions, earliest events may fall out of the backfill
+  window. Keep agentwatch open in a tmux pane for zero gaps.
+- **Cursor activity is config-level only.** Cursor's AI activity lives in
+  a SQLite database we don't parse yet. We capture config changes +
+  `.cursorrules` + MCP servers + `.cursor/rules/*.mdc`. Full activity
+  parsing is a follow-up.
+- **Gemini and OpenClaw have data-structure gaps.** Gemini CLI doesn't
+  persist compaction markers to disk. OpenClaw doesn't persist
+  tool_result content or compaction markers. Not fixable from our side.
+- **Windsurf, Aider, Cline** are detected but not instrumented yet.
+- **macOS and Linux only.** Windows needs more chokidar + notifier
+  testing before we promise it.
+- **tokenizer is cl100k_base (gpt-tokenizer)**, which is ~5% off for
+  Claude. Exact tokens for input / cache / output come from the model's
+  own usage record; the ~5% approximation only affects the user /
+  thinking / tool I/O / memory-file categories in the attribution view.
 ---
@@ -411,84 +521,44 @@ Hard scope boundaries so agentwatch stays small and maintainable.
 - **Not cloud. Not SaaS. Not ever.**
 - **Not an agent itself.** It watches agents; it doesn't take actions.
-- **Not production LLM-app tracing.** [Langfuse](https://langfuse.com)
-  owns that.
+- **Not production LLM-app tracing.** [Langfuse](https://langfuse.com) owns that.
 - **Not enterprise compliance.** Anthropic's Compliance API covers that.
-- **Not orchestration.** Use [Mission Control](https://github.com/MeisnerDan/mission-control)
-  or [Stoneforge](https://stoneforge.ai) for running agents in parallel.
+- **Not orchestration.** Use Mission Control / Stoneforge for running agents in parallel.
 - **Not memory.** Use [claude-mem](https://github.com/thedotmack/claude-mem).
-- **Not governance / policy enforcement.** Use [DashClaw](https://github.com/ucsandman/DashClaw)
-  or [Castra](https://github.com/amangsingh/castra).
----
-## Roadmap
-The full, ticket-level roadmap lives in the [Linear project](https://linear.app/auraqu/project/agentwatch-748d6aa1c20a).
-Headlines:
-### v0.4 — parity mop-up
-- 7-category token attribution (CLAUDE.md / skills / mentions / tool I/O / thinking / team / user)
-- Context compaction visualizer
-- Syntax highlighting in detail pane
-- Markdown + JSON session export
-- Stale-session detection (dim after 5 min idle)
-- UX polish: breadcrumbs, consistent `esc`, home key, first-run tour
-### v0.5 — moats
-- User-defined regex / threshold notification triggers
-- Budget alarms with hard spend caps
-- OpenTelemetry exporter with rich semantic conventions
-- Cross-session ripgrep-fast search
-- MCP server mode — agent self-query (`get_timeline`, `get_cost`,
-  `search_sessions`)
-- **Web UI companion** — `agentwatch serve` on localhost:3456, same
-  adapters, Preact-based renderer with inline diffs and stacked token bars
-### v1.0 — ambitious
-- Semantic session search ("the session where I broke the build")
-- Diff-attribution (file change → prompt that caused it)
-- Cross-agent session correlation (Claude → Cursor stitched)
-- Tauri desktop app (3-5 MB bundle, not Electron)
-- Anomaly detection (stuck loops, cost spikes, error bursts)
-Feature requests: [GitHub issues](https://github.com/mishanefedov/agentwatch/issues).
-Before opening one, please skim the roadmap — most ideas are already
-tracked.
+- **Not governance / policy enforcement.** Use DashClaw / Castra.
 ---
 ## Architecture
-TypeScript monorepo shape. Three-layer mental model:
+TypeScript monorepo. Three-layer mental model:
 ```
-┌─────────────────────────────────────────────────────┐
-│  TUI layer  (ink / React)                           │
-│    Timeline · EventDetail · Permissions · Projects  │
-│    Sessions · AgentPanel · HelpView                 │
-└───────────────▲─────────────────────────────────────┘
-                │  EventSink.emit / enrich
-┌───────────────┴─────────────────────────────────────┐
-│  Adapter layer  (per agent)                         │
-│    claude-code  · openclaw  · cursor  · fs-watcher  │
-└───────────────▲─────────────────────────────────────┘
-                │  files read-only
-┌───────────────┴─────────────────────────────────────┐
-│  OS  (log files, config files, clipboard, notifier) │
-└─────────────────────────────────────────────────────┘
+┌─────────────────────────────────────────────────────────────┐
+│  TUI layer  (ink / React)                                   │
+│    Timeline · EventDetail · Permissions · Projects          │
+│    Sessions · Tokens · Compaction · CrossSearch · Header    │
+│                                                             │
+│  MCP server  (stdio — programmatic, not a UI)               │
+│    list_recent_sessions · get_session_events                │
+│    search_sessions · get_tool_usage_stats · get_session_cost │
+└─────────────────────────▲───────────────────────────────────┘
+                          │  EventSink.emit / enrich
+┌─────────────────────────┴───────────────────────────────────┐
+│  Adapter layer  (one per agent)                             │
+│    claude-code · codex · gemini · cursor · openclaw         │
+│    fs-watcher (generic)                                     │
+└─────────────────────────▲───────────────────────────────────┘
+                          │  files read-only
+┌─────────────────────────┴───────────────────────────────────┐
+│  OS  (log files, config files, clipboard, notifier)         │
+└─────────────────────────────────────────────────────────────┘
 ```
-- Adapters read files, translate raw log lines into a canonical
-  `AgentEvent`, emit through an `EventSink`.
-- The sink's `enrich(id, patch)` lets an adapter update a previously-
-  emitted event (e.g. when a Claude `tool_result` arrives late and needs
-  to attach duration + output to the original `tool_use`).
-- The TUI is a pure reducer over the event buffer. Filtering, search,
-  scope are all derived views — no mutation.
+- Adapters read files, translate raw log lines into canonical `AgentEvent`s, emit through an `EventSink`.
+- `EventSink.enrich(id, patch)` lets an adapter update a previously-emitted event (e.g. when a tool_result arrives late and needs to attach duration + output to the original tool_use).
+- The TUI is a pure reducer over the event buffer. Filtering, search, scope are derived views — no mutation.
+- The MCP server is a peer of the TUI: it reads the same session files on demand, via its own scan (no shared in-memory state with the TUI). This is a known duplication; see Linear for the refactor ticket.
 See `src/schema.ts` for the canonical event shape.
@@ -501,31 +571,18 @@ git clone https://github.com/mishanefedov/agentwatch.git
 cd agentwatch
 npm install
 npm run dev           # launch the TUI directly from source (tsx)
-npm test              # vitest, 14 tests
+npm test              # vitest — 97 tests
 npm run typecheck     # strict TypeScript
 npm run build         # tsup → dist/
 ```
-See [CONTRIBUTING.md](./CONTRIBUTING.md) for the contribution workflow,
-issue triage policy, and what's in scope for PRs.
-## Docs
+See [CONTRIBUTING.md](./CONTRIBUTING.md) for the contribution workflow.
-In-depth documentation lives in `docs/`:
+### Docs
-- **[`docs/features/`](./docs/features/)** — one spec per feature:
-  what it does, how to invoke, inputs, outputs, failure modes,
-  interactions. 12 files covering timeline, detail pane, subagent
-  drilldown, projects + sessions navigation, search, permissions,
-  cost accounting, notifications, clipboard yank, filesystem watcher,
-  agent detection.
-- **[`docs/testing/`](./docs/testing/)** — manual test procedures per
-  feature, plus a 15-minute pre-release walkthrough
-  ([`TEST-SCRIPT.md`](./docs/testing/TEST-SCRIPT.md)).
-- **[`docs/use-cases/`](./docs/use-cases/)** — real-world scenarios:
-  multi-agent triage on a monorepo, cost-overrun investigation,
-  security audit, stuck-loop detection, subagent post-mortem, .env
-  leak alert.
+- **[`docs/features/`](./docs/features/)** — feature specs (scope, inputs, outputs, failure modes). Being extended feature-by-feature.
+- **[`docs/testing/`](./docs/testing/)** — manual test procedures + a pre-release walkthrough.
+- **[`docs/use-cases/`](./docs/use-cases/)** — multi-agent triage, cost-overrun investigation, security audit, stuck-loop detection, subagent post-mortem, .env leak alert.
 ---
@@ -533,9 +590,9 @@ In-depth documentation lives in `docs/`:
 Local-first is a hard invariant.
-- **Zero network calls.** Verify with `lsof -i -p $(pgrep -n agentwatch)`.
+- **Zero network calls** unless you explicitly set `AGENTWATCH_OTLP_ENDPOINT` (to a host *you* chose, OTel output only).
 - **Zero telemetry.** Not opt-in, not opt-out — simply not there.
-- **All files read-only** except clipboard (on `y`) and terminal output.
+- **All files read-only** except the clipboard (on `y`) and `./agentwatch-export/` (on `e`).
 - Every path agentwatch reads is documented in [SECURITY.md](./SECURITY.md).
 Report vulnerabilities privately: `misha@auraqu.com` or via a
@@ -551,7 +608,6 @@ MIT © Misha Nefedov. See [LICENSE](./LICENSE).
 <div align="center">
-If agentwatch saves you a debugging hour, a ⭐ on the repo makes the effort
-worth it.
+If agentwatch saves you a debugging hour, a ⭐ on the repo makes the effort worth it.
 </div>