PyPI - tokenjam - Versions diffs - 0.3.3__tar.gz → 0.3.5__tar.gz - Mend

tokenjam 0.3.3tar.gz → 0.3.5tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (274) hide show

{tokenjam-0.3.3 → tokenjam-0.3.5}/CHANGELOG.md RENAMED Viewed

@@ -54,6 +54,11 @@ This release pivots TokenJam toward cost-optimization for autonomous agents. Fou
 - **`tj status` surfaces unknown plan tiers.** When sessions exist with `plan_tier = 'unknown'`, prints a one-line note pointing the user at `tj onboard --reconfigure`. Exit code unchanged.
 - **`tj optimize` plan-tier-aware rendering.** When every session in the window has `plan_tier = 'unknown'`, dollar figures are suppressed and a header note explains why. Mixed / partial-unknown windows render normally with an advisory note.
 - **MCP `get_optimize_report` tool.** Now accepts `findings: list[str]` (was `only: str`). Docstring surfaces for both API-billing and subscription-plan-efficiency phrasings.
+- **`tj tokenmaxx` tier ladder expanded to six tiers and renamed.** Two highest tiers renamed from `TokenChad` / `TokenGigaChad` to `TokenSuperMaxxer` / `TokenGigaMaxxer`, and a new `TokenMegaMaxxer` tier slots between them covering the 20× – 50× multiplier range. The previous top tier started at 20×; the new top tier starts at 50×, so the absolute headline for very-heavy users is more meaningful. Fire-emoji escalation matches the new tier count: 🔥 → 🔥🔥 → 🔥🔥🔥. The quip that previously belonged to `TokenGigaChad` ("Touch grass. Then run `tj optimize`.") now belongs to `TokenMegaMaxxer`; `TokenGigaMaxxer` gets its own escalated quip. JSON output's `tier` field carries the new label string verbatim; consumers reading the `tier` value must update accordingly.
+### Fixed
+- **Cache-only spans were costed at $0.** A prompt-cache hit (0 new input/output tokens but non-zero cache-read tokens) bills the cache-read rate, but both `calculate_cost` and `CostEngine.process_span` short-circuited on input/output alone, dropping the span as a no-op and under-reporting spend. The early-return guards now fire only when *all* token counts are zero, so cache-read (and cache-write) costs are charged correctly.
+- **Cache-write (cache-creation) tokens were dropped on the live ingest path.** The SDK integrations emit `gen_ai.usage.cache_creation_tokens`, the pricing table carries a `cache_write_per_mtok` rate, and `calculate_cost` already priced it — but the OTLP span parser and provider reader only read cache-read tokens, so cache-creation tokens never reached `CostEngine.process_span` and their (higher-rate) cost was never charged. `NormalizedSpan` now carries `cache_write_tokens`, both parsers populate it, and `process_span` charges it. Only the backfill path costed cache-write before; the live path now matches.
 ### Internal
 - **Registry-driven optimize analyzers.** `tokenjam/core/optimize.py` split into `tokenjam/core/optimize/` package with `registry.py`, `runner.py`, `types.py`, and `analyzers/` subpackage using `pkgutil` auto-discovery. New analyzers drop a file under `analyzers/` with a `@register("name")` decorator — nothing else needs editing. See `tokenjam/core/optimize/README.md`.

{tokenjam-0.3.3 → tokenjam-0.3.5}/CLAUDE.md RENAMED Viewed

@@ -4,7 +4,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
 ## Project Overview
-`tj` (TokenJam) is a local-first, OTel-native observability CLI for AI agents. No cloud backend, no signup. It captures telemetry from agent runtimes, stores it in a local DuckDB database, and exposes a CLI + local REST API for querying. Install via `pip install tokenjam`, run via `tj <subcommand>`. Requires Python >=3.10.
+`tj` (TokenJam) is a local-first, OTel-native **cost-optimization layer** for AI agents (with a full observability stack underneath). No cloud backend, no signup. It captures telemetry from agent runtimes, stores it in a local DuckDB database, and runs four named analyzers (`downsize` / `cache` / `script` / `trim`) that surface cost-saving candidates from real usage — plus a CLI, local REST API, web UI, and MCP server for querying. Install via `pipx install tokenjam` (recommended — sidesteps PEP 668 on Homebrew Python and Debian 12+/Ubuntu 24+) or `pip install tokenjam` in a venv. Run via `tj <subcommand>`. Requires Python >=3.10.
 ## Build & Development
@@ -61,10 +61,17 @@ Post-ingest hooks run synchronously after each span is written to DB:
 - **`tokenjam/core/db.py`**: `StorageBackend` protocol + `DuckDBBackend` + `InMemoryBackend` (for tests) + migration runner. Migrations are `(version, sql)` tuples in a `MIGRATIONS` list — never modify existing ones, only append. **Note:** `StorageBackend` doesn't cover every query. Some callers (e.g. `CostEngine`, `cmd_status`) access `db.conn` directly for queries not in the protocol (cost updates, active session lookups). Helper `_row_to_session()` is used to convert raw DuckDB rows.
 - **`tokenjam/core/ingest.py`**: `IngestPipeline` (central hub), `SpanSanitizer` (rejects oversized/malformed spans), `strip_captured_content()`. Post-ingest hooks (cost, alerts, schema) are optional and error-tolerant — hook failures are logged, never propagated.
 - **`tokenjam/core/pricing.py`**: `ModelRates` (frozen dataclass), `load_pricing_table()` (LRU-cached), `get_rates(provider, model)`. Falls back to default rates for unknown models.
-- **`tokenjam/core/cost.py`**: `calculate_cost()` (pure function, rounds to 8dp) + `CostEngine` (post-ingest hook that updates `spans.cost_usd` and `sessions.total_cost_usd` via `db.conn` — see db.py note). Pricing loaded from `pricing/models.toml`.
+- **`tokenjam/core/cost.py`**: `calculate_cost()` (pure function, rounds to 8dp) + `CostEngine` (post-ingest hook that updates `spans.cost_usd` and `sessions.total_cost_usd` via `db.conn` — see db.py note). Pricing loaded from `tokenjam/pricing/models.toml`. **Cache-read vs cache-write are separate fields** on `NormalizedSpan` (`cache_tokens` = read, `cache_write_tokens` = create); they bill at different rates and `calculate_cost` charges each at its own rate. The early-return no-op guard checks all four token counts (input/output/cache_read/cache_write) — see PR #90 and PR #92 for the cache-only-span and cache-write-on-live-path fixes.
 - **`tokenjam/core/alerts.py`**: `AlertEngine` with 13 alert types, `CooldownTracker` (in-memory, per agent+type, resets on restart), `AlertDispatcher` routing to 6 channel types (stdout, file, ntfy, webhook, Discord, Telegram). `AlertEngine.fire()` is the external entry point for other modules (SchemaValidator, DriftDetector) to fire alerts. Suppressed alerts are still persisted to DB but not dispatched to channels. Hardcoded thresholds: retry loop fires at 4+ identical tool calls in last 6 spans; failure rate fires at >20% errors in last 20 spans (checked every 5th error); session duration default 3600s. Stdout and file channels always include full detail regardless of `include_captured_content` config.
 - **`tokenjam/core/drift.py`**: `DriftDetector` — Z-score based behavioral drift detection, fires at session end.
-- **`tokenjam/core/optimize/`**: Package powering `tj optimize` and the `get_optimize_report` MCP tool. Public API re-exported from `__init__.py`: `build_report()` (orchestrator), `report_to_dict()`, `ANALYZER_REGISTRY`, `ANALYZER_ORDER`, plus result dataclasses. Architecture: `registry.py` holds the `@register("name")` decorator and `ANALYZER_REGISTRY` dict; `runner.py` defines `ANALYZER_ORDER` and orchestrates execution; `types.py` holds `AnalyzerContext` + result dataclasses + `MODEL_DOWNGRADE_CAVEAT`. Individual analyzers live in `analyzers/`, each as a single file registering via `@register`: `model_downgrade.py` (structural candidates — input < 5K tokens AND output < 500 tokens AND tool_calls ≤ 5; never claims quality equivalence, caveat baked into dataclass default), `budget_projection.py` (per-provider cycle spend vs `[budget.<provider>]` ceiling; only fires when budget > 0), `cache_efficacy.py`, `cache_recommend.py`, `prompt_bloat.py`, `workflow_restructure.py`. Analyzers receive an `AnalyzerContext` and operate on `db.conn` directly. To add a new analyzer: drop a file under `analyzers/`, decorate with `@register("name")`, append to `ANALYZER_ORDER` if ordering matters — `cmd_optimize --finding` choices auto-derive from the registry.
+- **`tokenjam/core/optimize/`**: Package powering `tj optimize` and the `get_optimize_report` MCP tool. Public API re-exported from `__init__.py`: `build_report()` (orchestrator), `report_to_dict()`, `ANALYZER_REGISTRY`, `ANALYZER_ORDER`, plus result dataclasses. Architecture: `registry.py` holds the `@register("name")` decorator and `ANALYZER_REGISTRY` dict; `runner.py` defines `ANALYZER_ORDER` and orchestrates execution; `types.py` holds `AnalyzerContext` + result dataclasses + `MODEL_DOWNGRADE_CAVEAT`. Individual analyzers live in `analyzers/`, each as a single file registering via `@register`. **Registry strings (the user-facing names) and file names are decoupled**:
+  - `model_downgrade.py` → `@register("downsize")` — structural candidates (input < 5K tokens AND output < 500 tokens AND tool_calls ≤ 5; never claims quality equivalence, caveat baked into dataclass default)
+  - `budget_projection.py` → `@register("budget-projection")` — per-provider cycle spend vs `[budget.<provider>]` ceiling; only fires when budget > 0
+  - `cache_efficacy.py` → `@register("cache")` — current cache-read efficacy per (provider, model)
+  - `cache_recommend.py` → `@register("cache-recommend")` — Anthropic-only structural prefix detection for `cache_control` placement
+  - `workflow_restructure.py` → `@register("script")` — `(tool_name, arg_shape)` cluster detection for deterministic-script candidates
+  - `prompt_bloat.py` → `@register("trim")` — LLMLingua-2 token-significance classification (requires `tokenjam[bloat]` extra)
+  Analyzers receive an `AnalyzerContext` and operate on `db.conn` directly. To add a new analyzer: drop a file under `analyzers/`, decorate with `@register("name")`, append to `ANALYZER_ORDER` if ordering matters — `cmd_optimize`'s positional `findings` Click choices auto-derive from the registry.
 - **`tokenjam/core/ingest_adapters/`**: Third-party trace-export adapters that normalize external payloads (`langfuse.py`, `helicone.py`, `otlp.py`) into `NormalizedSpan` for ingest. Each is reachable as a `tj backfill <name>` subcommand and accepts `--source-url` (live API) or `--source-file` (offline JSON dump). Adapters write deterministic span IDs derived from the source's identifiers so re-runs are idempotent. `otlp.py` shares span-mapping logic with the live `POST /api/v1/spans` route via `tokenjam/otel/otlp_parsing.py`.
 - **`tokenjam/core/export/`**: Routing-config snippet generators for `tj optimize --export-config`. Currently `claude_code.py` emits a JSONC fragment under a `tokenjam.routing_recommendations` namespace with honest-framing caveat comments baked in. Writes to `~/.config/tokenjam/exports/`; never touches `~/.claude/settings.json` or other external configs (no `--apply` flag — Claude Code doesn't currently honor TokenJam routing keys, so auto-writing would change nothing and erode trust).
 - **`tokenjam/core/backfill.py`**: Parses Claude Code on-disk session JSONL files into `NormalizedSpan`s. Cost is recomputed from `pricing/models.toml` because the on-disk format has no `cost_usd`. The parser tolerates the dated `claude-<family>-<ver>-YYYYMMDD` model-name suffixes Anthropic ships (handled by `core/pricing.py.get_rates()`, which strips the trailing 8-digit date suffix when no exact pricing match exists). Idempotency relies on deterministic span IDs derived from `(session_id, message uuid)` / `(session_id, tool_use id)`.
@@ -92,11 +99,12 @@ Post-ingest hooks run synchronously after each span is written to DB:
 - **`tj demo [scenario]`** (`cmd_demo.py`) — runs Agent Incident Library scenarios (zero-config, no API keys). `tj demo` lists all; `tj demo retry-loop` runs one.
 - **`tj doctor`** (`cmd_doctor.py`) — health checks (config, DB, secrets, webhooks, drift readiness, schema-vs-capture consistency). Exit 0 = ok, 1 = warnings, 2 = errors.
-- **`tj optimize`** (`cmd_optimize.py`) — six analyzers, registry-driven: `model-downgrade`, `budget-projection`, `cache-efficacy`, `cache-recommend`, `workflow-restructure`, `prompt-bloat`. Flags: `--since 30d`, `--finding <name>` (repeatable; choices auto-derive from `ANALYZER_REGISTRY` at click decoration time), `--budget <provider>`, `--budget-usd <amount>`, `--compare <period>` (window-cost diff vs prior period; accepts `previous` / `last-week` / `last-month` / `last-7d` / `last-30d` / `YYYY-MM-DD:YYYY-MM-DD`), `--export-config <target>` (writes a routing snippet — currently `claude-code` — under `~/.config/tokenjam/exports/`; no `--apply` flag by design). Plan-tier-aware rendering: subscription users see "implied API value" framing and token-share savings (never dollar "spend"); local users see token-only framing; unknown-plan users see dollar figures suppressed with a `tj onboard --reconfigure` hint. Opens the live DB read-only so it works alongside a running `tj serve`.
+- **`tj optimize`** (`cmd_optimize.py`) — six analyzers, registry-driven. **Analyzers are positional args** (not `--finding <name>`): `tj optimize downsize cache trim` runs three; bare `tj optimize` runs all. Registered names: `downsize`, `cache`, `cache-recommend`, `script`, `trim`, `budget-projection`. Flags: `--since 30d`, `--budget <provider>`, `--budget-usd <amount>`, `--compare <period>` (window-cost diff vs prior period; accepts `previous` / `last-week` / `last-month` / `last-7d` / `last-30d` / `YYYY-MM-DD:YYYY-MM-DD`), `--export-config <target>` (writes a routing snippet — currently `claude-code` — under `~/.config/tokenjam/exports/`; no `--apply` flag by design). Plan-tier-aware rendering: subscription users see "implied API value" framing and token-share savings (never dollar "spend"); local users see token-only framing; unknown-plan users see dollar figures suppressed with a `tj onboard --reconfigure` hint. Works alongside a running `tj serve` via the `/api/v1/optimize` HTTP fallback when the DuckDB write lock is held by the daemon.
+- **`tj tokenmaxx`** (`cmd_tokenmaxx.py`) — shareable spend-tier command. Reads last 30 days of usage, classifies into a 6-tier ladder (Sipper / Moderator / Maxxer / SuperMaxxer / MegaMaxxer / GigaMaxxer) using the multiplier vs the user's declared subscription plan as the primary classifier, with absolute USD/mo thresholds as the API-user fallback. Output is a bordered Panel designed for screenshotting. Plan-aware: shows the multiplier line only when the user has `[budget.<provider>] plan = "max_5x"` (or pro / max_20x / plus) configured. The companion landing page is `tokenjam.dev/tokenmaxxing`. Designed to never exit without an actionable next step — pairs the tier callout with the downsize savings figure inline.
 - **`tj cost`** (`cmd_cost.py`) — cost breakdown by `--group-by agent|model|day|tool`. Same `--compare <period>` flag as `tj optimize` for window-over-window diffs (▲/▼ indicators, per-agent and per-model top-shifts, dollar + token deltas).
 - **`tj backfill <source>`** (`cmd_backfill.py`) — ingest historical telemetry from external sources. Subcommands: `claude-code` (parses `~/.claude/projects/*.jsonl`, auto-invoked at the end of `tj onboard --claude-code`), `langfuse` (live API or JSON dump), `helicone` (live API or JSON dump), `otlp` (raw OTLP JSON via URL or file — reuses the same parser as the live `POST /api/v1/spans` route). All idempotent via deterministic span IDs.
 - **`tj onboard`** (`cmd_onboard.py`) — `--claude-code` and `--codex` flags trigger integration-specific flows. Prompts for plan tier (api / pro / max_5x / max_20x for Anthropic; api / plus / team / enterprise for OpenAI) and writes it to `[budget.<provider>] plan = "..."`. Supports `--reconfigure` to re-prompt against an existing config, and `--plan <tier>` for non-interactive use. Does NOT auto-write a default `usd = 200` cycle ceiling — subscription users get only the `plan` field; API users are explicitly asked whether they want a self-imposed ceiling.
-- **`tj report`** (`cmd_report.py`) — generates standalone HTML visualizations of analyzer findings (e.g. `tj report --bloat [<agent_id>]` renders the prompt-bloat analyzer's per-token significance). Writes to `~/.cache/tokenjam/reports/` (override via `TOKENJAM_REPORT_DIR`) and opens in the default browser.
+- **`tj report`** (`cmd_report.py`) — generates standalone HTML visualizations of analyzer findings. Currently `tj report --trim [<agent_id>]` renders the Trim analyzer's per-token significance (was `--bloat` pre-0.3.1, renamed alongside the analyzer's registry string). Writes to `~/.cache/tokenjam/reports/` (override via `TOKENJAM_REPORT_DIR`) and opens in the default browser.
 - **`tj policy list`** (`cmd_policy.py`) — read-only preview of the unified policy surface. Consolidates existing `[alerts]`, `[alerts.channels]`, `[defaults.budget]`, `[budget.<provider>]`, per-agent `budget`/`drift`/`sensitive_actions`/`output_schema`, and `[capture]` config into one table; each row carries its source TOML section. Supports `--json`. `tj policy add | edit | apply | remove | test` are intentionally absent this sprint — the unified config migration is next sprint's work. `policy` is in `no_db_commands` in `cli/main.py` so it doesn't open the DB. Rich source-section strings (`[budget.anthropic]`, `[[alerts.channels]]`) must be passed through `rich.markup.escape()` before rendering — otherwise Rich consumes them as style tags.
 All commands support `--json` for machine-readable output. Commands that query alerts use exit code 1 if active (unacknowledged, unsuppressed) alerts exist.
@@ -139,11 +147,13 @@ When a span has a `conversation_id` matching an existing session, it's attribute
 10. **Use semconv constants** — reference `GenAIAttributes` and `TjAttributes` from `tokenjam/otel/semconv.py` instead of hardcoding OTel attribute name strings.
 11. **OTel TracerProvider is global and set-once** — `trace.set_tracer_provider()` only works once per process. In tests, set the provider once at module level (not per-test in a fixture) and clear spans between tests. Use a custom `_CollectingExporter(SpanExporter)` since `InMemorySpanExporter` is not available in the installed OTel version. See `tests/agents/test_mock_scenarios.py` for the SDK test pattern and `tests/integration/test_full_pipeline.py` for the pipeline pattern.
 12. **New SDK integrations must call `ensure_initialised()`** — every `patch_*()` convenience function must call `from tokenjam.sdk.bootstrap import ensure_initialised; ensure_initialised()` before installing hooks. This lazily bootstraps the TracerProvider + IngestPipeline on first use.
-13. **PyPI package name is `tokenjam`, not `ocw`** — `pip install tokenjam` is the correct install command. The CLI command is `tj` and the Python package directory is `tokenjam/`. The published package name on PyPI is `tokenjam`. Never write `pip install ocw` in docs, examples, or comments.
-14. **`tj optimize` output must never claim quality equivalence** — the model-downgrade finding flags structural candidates only. Every user-visible string says "looks like" / "candidate" / "review before switching" — never "safe to downgrade" or "would have worked." The `MODEL_DOWNGRADE_CAVEAT` constant lives on `DowngradeFinding` as a dataclass default so it can't be removed by accident; it must also appear in human-readable CLI output. The same honesty discipline applies to all other analyzers — `cache-efficacy` ("you're getting X% of available caching"), `cache-recommend` (Anthropic-only, structural prefix detection), `workflow-restructure` ("structural shape matches", "review before replacing with a script"), `prompt-bloat` ("predicted low-significance regions; review before editing"). `tj optimize --export-config` snippets bake the caveat block into the JSONC output as comments.
+13. **PyPI package name is `tokenjam`, not `ocw`** — the package on PyPI is `tokenjam`. The CLI command is `tj`. The Python package directory is `tokenjam/`. **Recommended install: `pipx install tokenjam`** (sidesteps PEP 668 on Homebrew Python and Debian 12+/Ubuntu 24+). `pip install tokenjam` works inside a clean venv but fails on system Python with a misleading externally-managed-environment error. Never write `pip install ocw` in docs, examples, or comments.
+14. **`tj optimize` output must never claim quality equivalence** — the `downsize` finding flags structural candidates only. Every user-visible string says "looks like" / "candidate" / "review before switching" — never "safe to downgrade" or "would have worked." The `MODEL_DOWNGRADE_CAVEAT` constant lives on `DowngradeFinding` as a dataclass default so it can't be removed by accident; it must also appear in human-readable CLI output. The same honesty discipline applies to all other analyzers — `cache` ("you're getting X% of available caching"), `cache-recommend` (Anthropic-only, structural prefix detection), `script` ("structural shape matches", "review before replacing with a script"), `trim` ("predicted low-significance regions; review before editing"). `tj optimize --export-config` snippets bake the caveat block into the JSONC output as comments.
 15. **Version bump on release** — both `pyproject.toml` (`version = "X.Y.Z"`) and `sdk-ts/package.json` (`"version": "X.Y.Z"`) must be bumped to the new version before creating a GitHub release. The publish workflows (`publish-pypi.yml`, `publish-npm.yml`) trigger on `release published` events and will fail with 403 if the version already exists on PyPI/npm.
-16. **New optimize analyzers self-register** — drop a `.py` file under `tokenjam/core/optimize/analyzers/` with a function decorated `@register("name")` taking `AnalyzerContext`. Auto-discovery in `analyzers/__init__.py` walks the directory at import time. `cmd_optimize.py`'s `--finding` choices read from `ANALYZER_REGISTRY.keys()` at click decoration — no edits needed there. If your analyzer depends on (or is depended on by) another, append it to `ANALYZER_ORDER` in `runner.py` at the right position. Wave-2 analyzers attach their findings to `OptimizeReport.findings[name]` (generic dict); the older `model-downgrade` / `budget-projection` analyzers retain typed slots on `OptimizeReport` for backwards compat with `cmd_optimize` and the MCP server.
+16. **New optimize analyzers self-register** — drop a `.py` file under `tokenjam/core/optimize/analyzers/` with a function decorated `@register("name")` taking `AnalyzerContext`. Auto-discovery in `analyzers/__init__.py` walks the directory at import time. `cmd_optimize.py`'s positional `findings` Click choices read from `ANALYZER_REGISTRY.keys()` at decoration — no edits needed there. If your analyzer depends on (or is depended on by) another, append it to `ANALYZER_ORDER` in `runner.py` at the right position. Wave-2 analyzers attach their findings to `OptimizeReport.findings[name]` (generic dict); the older `downsize` (registered name; file is `model_downgrade.py`) and `budget-projection` analyzers retain typed slots on `OptimizeReport` for backwards compat with `cmd_optimize` and the MCP server.
 17. **OTLP parsing has one home** — `tokenjam/otel/otlp_parsing.py`. Both the live `POST /api/v1/spans` route and the `tj backfill otlp` adapter import `parse_otlp_span` and `extract_resource_attrs` from there. If you need to extend OTLP attribute extraction, do it once in that module; do not copy-paste into either caller.
+18. **Web UI must work fully offline** — `tokenjam/ui/index.html` is the served dashboard. It is intentionally a single-file SPA with **zero external HTTP loads at render time**. Preact + hooks + htm are vendored under `tokenjam/ui/vendor/` and wired via an `<script type="importmap">`; fonts use system-font fallbacks (no Google Fonts); the favicon is inlined as a `data:` URL. The FastAPI app mounts `/ui/vendor` as `StaticFiles`. The `tests/unit/test_ui_offline.py` regression test asserts no render-time external URLs exist anywhere outside `<a href>` (clickable links to github.com are fine — they only fetch on click). If you add a CDN font, script, or stylesheet, that test will fail. Vendor the asset locally instead. See issue #87 + PR #88.
+19. **Analyzer registry names ≠ file names** — registry strings (`downsize`, `cache`, `script`, `trim`) are decoupled from Python module filenames (`model_downgrade.py`, `cache_efficacy.py`, `workflow_restructure.py`, `prompt_bloat.py`). The 0.3.1 rename only changed `@register("...")` strings; file names stayed for git-blame continuity. When grepping for an analyzer, search both the registry string AND the older file-name keyword.
 ## Config
@@ -233,10 +243,11 @@ Key runtime dependency: `pytz` is required by DuckDB for `TIMESTAMPTZ` column ha
 - **[docs/installation.md](docs/installation.md)** — base install vs optional extras matrix. Documents `tokenjam[bloat]` (the ~2GB torch + transformers extra used by the Trim analyzer), framework adapter extras (`[langchain]` / `[crewai]` / `[autogen]` / `[litellm]`), and the MCP / dev extras.
 - **[docs/configuration.md](docs/configuration.md)** — full TOML config surface plus the "Content capture and privacy" section explaining the four `[capture]` toggles and how they interact with `alerts.include_captured_content`.
 - **Optimize product pages** — one per user-facing product, all under `docs/optimize/`:
-  - [`downsize.md`](docs/optimize/downsize.md) — model-downgrade candidate flagging (internal: `model-downgrade`)
-  - [`cache.md`](docs/optimize/cache.md) — `cache-efficacy` (current caching ratio) + `cache-recommend` (Anthropic-only breakpoint suggestions)
-  - [`script.md`](docs/optimize/script.md) — `workflow-restructure` clustering by `(tool_name, arg_shape)` signature
-  - [`trim.md`](docs/optimize/trim.md) — LLMLingua-2 token-significance classifier (`prompt-bloat`), install + capture requirements, performance numbers
+  - [`downsize.md`](docs/optimize/downsize.md) — cheaper-model candidate flagging (registry: `downsize`, file: `model_downgrade.py`)
+  - [`cache.md`](docs/optimize/cache.md) — `cache` (current caching ratio) + `cache-recommend` (Anthropic-only breakpoint suggestions)
+  - [`script.md`](docs/optimize/script.md) — `script` clustering by `(tool_name, arg_shape)` signature (file: `workflow_restructure.py`)
+  - [`trim.md`](docs/optimize/trim.md) — LLMLingua-2 token-significance classifier (`trim`, file: `prompt_bloat.py`), install + capture requirements, performance numbers
+- **[AGENTS.md](AGENTS.md)** — codebase conventions for contributors (referenced from the top-level README).
 - **Backfill adapters** — `docs/backfill/overview.md` lists the four sources (`claude-code` / `langfuse` / `helicone` / `otlp`) with the partnership-posture framing; per-adapter pages document modes (URL / file), field mapping, idempotency, and v1 limitations.
 - **[docs/policy/overview.md](docs/policy/overview.md)** — read-only preview of the unified policy surface (`tj policy list`). Notes that the `add` / `edit` / `apply` subcommands and the underlying `[policy]` config migration land next sprint.
 - **Internal specs** — `docs/internal/specs/` is reserved for canonical specs that production code references at long-term. Currently empty (sprint specs have been cleaned up after merge); add new ones here when a feature needs a stable, code-referenced source of truth.

{tokenjam-0.3.3 → tokenjam-0.3.5}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: tokenjam
-Version: 0.3.3
+Version: 0.3.5
 Summary: TokenJam — local-first OTel-native observability for Autonomous AI agents
 Project-URL: Homepage, https://opencla.watch
 Project-URL: Repository, https://github.com/Metabuilder-Labs/openclawwatch
@@ -23,6 +23,7 @@ Requires-Dist: apscheduler>=3.10
 Requires-Dist: click>=8.1
 Requires-Dist: duckdb>=0.10
 Requires-Dist: fastapi>=0.110
+Requires-Dist: fastmcp>=0.2
 Requires-Dist: genson>=1.2
 Requires-Dist: httpx>=0.27
 Requires-Dist: jsonschema>=4.0
@@ -53,7 +54,6 @@ Requires-Dist: langchain>=0.2; extra == 'langchain'
 Provides-Extra: litellm
 Requires-Dist: litellm>=1.40; extra == 'litellm'
 Provides-Extra: mcp
-Requires-Dist: fastmcp; extra == 'mcp'
 Description-Content-Type: text/markdown
 <div align="center">
@@ -74,9 +74,11 @@ TokenJam reads your agent's telemetry and tells you when to downsize, when to tr
 [![OTel](https://img.shields.io/badge/OTel-GenAI%20SemConv-3d8eff?labelColor=0d1117)](https://opentelemetry.io/docs/specs/semconv/gen-ai/)
 ```
-pip install tokenjam
+pipx install tokenjam
 ```
+<sub>Don't have pipx? `brew install pipx` on macOS, `apt install pipx` on Debian/Ubuntu, or see [docs/installation.md](docs/installation.md). `pip install tokenjam` also works in a clean venv.</sub>
 **No cloud · No signup · No vendor lock-in**
 </div>
@@ -147,11 +149,13 @@ Run all four with `tj optimize`. Run several with `tj optimize downsize cache tr
 For **Claude Code** users — zero code, auto-backfills your last 30 days:
 ```bash
-pip install "tokenjam[mcp]"
+pipx install 'tokenjam[mcp]'
 tj onboard --claude-code
 tj optimize          # cost-saving candidates from your actual usage
 ```
+To upgrade later: `pipx upgrade tokenjam` (then `tj stop && tj serve &` to reload the daemon, and `tj --version` to verify). See [docs/installation.md](docs/installation.md#upgrading).
 For any Python agent:
 ```python

{tokenjam-0.3.3 → tokenjam-0.3.5}/README.md RENAMED Viewed

@@ -16,9 +16,11 @@ TokenJam reads your agent's telemetry and tells you when to downsize, when to tr
 [![OTel](https://img.shields.io/badge/OTel-GenAI%20SemConv-3d8eff?labelColor=0d1117)](https://opentelemetry.io/docs/specs/semconv/gen-ai/)
 ```
-pip install tokenjam
+pipx install tokenjam
 ```
+<sub>Don't have pipx? `brew install pipx` on macOS, `apt install pipx` on Debian/Ubuntu, or see [docs/installation.md](docs/installation.md). `pip install tokenjam` also works in a clean venv.</sub>
 **No cloud · No signup · No vendor lock-in**
 </div>
@@ -89,11 +91,13 @@ Run all four with `tj optimize`. Run several with `tj optimize downsize cache tr
 For **Claude Code** users — zero code, auto-backfills your last 30 days:
 ```bash
-pip install "tokenjam[mcp]"
+pipx install 'tokenjam[mcp]'
 tj onboard --claude-code
 tj optimize          # cost-saving candidates from your actual usage
 ```
+To upgrade later: `pipx upgrade tokenjam` (then `tj stop && tj serve &` to reload the daemon, and `tj --version` to verify). See [docs/installation.md](docs/installation.md#upgrading).
 For any Python agent:
 ```python

{tokenjam-0.3.3 → tokenjam-0.3.5}/docs/claude-code-integration.md RENAMED Viewed

@@ -3,7 +3,7 @@
 Monitor every Claude Code session — costs, tool calls, API requests, errors — with two commands:
 ```bash
-pip install "tokenjam[mcp]"
+pipx install 'tokenjam[mcp]'
 tj onboard --claude-code
 # Restart Claude Code, then:
 tj status --agent claude-code-<project>

{tokenjam-0.3.3 → tokenjam-0.3.5}/docs/installation.md RENAMED Viewed

@@ -5,10 +5,36 @@ TokenJam ships as a Python package on PyPI and a TypeScript SDK on npm. Pick the
 ## Base install
 ```bash
+pipx install tokenjam
+```
+This is the recommended install path on **all platforms**. `pipx` automatically creates an isolated venv for the `tj` CLI, which means:
+- It works on macOS with Homebrew Python (which refuses `pip install` into its managed environment by default — [PEP 668](https://peps.python.org/pep-0668/)).
+- It works on Debian 12+ / Ubuntu 24+ (same PEP 668 enforcement).
+- It doesn't pollute your system Python or any project's venv.
+Don't have `pipx`? Install it with one of:
+| Platform | Command |
+|---|---|
+| macOS | `brew install pipx` |
+| Debian / Ubuntu | `apt install pipx` |
+| Windows | `py -m pip install --user pipx` |
+| Anywhere else | `python3 -m pip install --user pipx` |
+Then ensure pipx's bin dir is on your `PATH` with `pipx ensurepath`.
+### Alternative: pip in a venv
+If you prefer plain pip (or need to install into an existing project venv):
+```bash
+python3 -m venv .venv && source .venv/bin/activate
 pip install tokenjam
 ```
-This is enough for the CLI (`tj`), local REST API (`tj serve`), the four out-of-box optimize analyzers that don't need ML models, and every native SDK integration except LLMLingua-based Trim. Requires Python ≥ 3.10.
+Either path is enough for the CLI (`tj`), local REST API (`tj serve`), the four out-of-box optimize analyzers that don't need ML models, and every native SDK integration except LLMLingua-based Trim. Requires Python ≥ 3.10.
 After install, run:
@@ -37,7 +63,7 @@ TokenJam keeps heavyweight ML dependencies, framework adapters, and the MCP serv
 Combine multiple extras:
 ```bash
-pip install "tokenjam[mcp,bloat]"
+pipx install 'tokenjam[mcp,bloat]'
 ```
 ### Bloat extra details
@@ -48,6 +74,21 @@ If you run `tj optimize trim` without the extra installed, the analyzer self-reg
 See [`docs/optimize/trim.md`](optimize/trim.md) for performance numbers, capture requirements, and what the analyzer actually reports.
+## Upgrading
+```bash
+pipx upgrade tokenjam          # if you installed via pipx (recommended)
+pip install --upgrade tokenjam # if you're in a pip + venv setup
+```
+After upgrading:
+1. Restart the daemon to pick up the new code: `tj stop && tj serve &`
+2. DB migrations apply automatically on the next `tj` invocation — no manual step required
+3. Verify with `tj --version`
+PyPI's CDN occasionally lags ~1–2 min after a release. If `pipx upgrade` reports "already at the latest version" but the reported `tj --version` is older than what's on the [releases page](https://github.com/Metabuilder-Labs/tokenjam/releases), wait a minute and retry.
 ## TypeScript SDK
 ```bash

{tokenjam-0.3.3 → tokenjam-0.3.5}/docs/optimize/trim.md RENAMED Viewed

@@ -22,7 +22,7 @@ LLMLingua-2 pulls in PyTorch and transformers (~2GB). Kept out of the
 base install:
 ```bash
-pip install "tokenjam[bloat]"
+pipx install 'tokenjam[bloat]'
 ```
 The base `pip install tokenjam` does NOT pull torch. Trim shows up in

{tokenjam-0.3.3 → tokenjam-0.3.5}/docs/python-sdk.md RENAMED Viewed

@@ -5,7 +5,7 @@ For any Python agent — Anthropic, OpenAI, Gemini, Bedrock, LangChain, CrewAI,
 ## Install
 ```bash
-pip install tokenjam
+pipx install tokenjam
 tj onboard    # creates config, generates ingest secret
 tj doctor     # verify your setup
 ```

{tokenjam-0.3.3 → tokenjam-0.3.5}/examples/openclaw/README.md RENAMED Viewed

@@ -5,7 +5,7 @@ This is a config-only integration — no Python code needed. OpenClaw's built-in
 ## Step 1: Start TokenJam
 ```bash
-pip install tokenjam
+pipx install tokenjam
 tj onboard
 tj serve &
 ```

{tokenjam-0.3.3 → tokenjam-0.3.5}/incidents/hallucination-drift/README.md RENAMED Viewed

@@ -1,6 +1,6 @@
 # My agent worked yesterday. Today it's possessed.
-**Run it:** `pip install tokenjam && tj demo hallucination-drift`
+**Run it:** `pipx install tokenjam && tj demo hallucination-drift`
 ---
@@ -60,7 +60,7 @@ The demo uses `baseline_sessions = 5` for speed. In production, 10–50 sessions
 ## Try it yourself
 ```bash
-pip install tokenjam
+pipx install tokenjam
 tj demo hallucination-drift
 ```
@@ -75,4 +75,4 @@ To track drift on your real agent, wire up the TokenJam SDK, enable drift in `tj
 ---
-[TokenJam](https://github.com/Metabuilder-Labs/TokenJam) is a local-first, zero-signup observability CLI for AI agents. No cloud. No account. Just `pip install tokenjam` and start seeing what your agent actually does.
+[TokenJam](https://github.com/Metabuilder-Labs/TokenJam) is a local-first, zero-signup observability CLI for AI agents. No cloud. No account. Just `pipx install tokenjam` and start seeing what your agent actually does.

{tokenjam-0.3.3 → tokenjam-0.3.5}/incidents/retry-loop/README.md RENAMED Viewed

@@ -1,6 +1,6 @@
 # Your agent isn't flaky. You're blind.
-**Run it:** `pip install tokenjam && tj demo retry-loop`
+**Run it:** `pipx install tokenjam && tj demo retry-loop`
 ---
@@ -57,7 +57,7 @@ The loop was visible from span #4. Your logs didn't surface it until a user comp
 ## Try it yourself
 ```bash
-pip install tokenjam
+pipx install tokenjam
 tj demo retry-loop
 ```
@@ -72,4 +72,4 @@ To catch this in your real agent, wire up the TokenJam SDK (`@watch()` + `patch_
 ---
-[TokenJam](https://github.com/Metabuilder-Labs/TokenJam) is a local-first, zero-signup observability CLI for AI agents. No cloud. No account. Just `pip install tokenjam` and start seeing what your agent actually does.
+[TokenJam](https://github.com/Metabuilder-Labs/TokenJam) is a local-first, zero-signup observability CLI for AI agents. No cloud. No account. Just `pipx install tokenjam` and start seeing what your agent actually does.

{tokenjam-0.3.3 → tokenjam-0.3.5}/incidents/surprise-cost/README.md RENAMED Viewed

@@ -1,6 +1,6 @@
 # Why did my agent just spend $47 on a hello world?
-**Run it:** `pip install tokenjam && tj demo surprise-cost`
+**Run it:** `pipx install tokenjam && tj demo surprise-cost`
 ---
@@ -75,7 +75,7 @@ TokenJam fires `cost_budget_session` and `cost_budget_daily` alerts when limits
 ## Try it yourself
 ```bash
-pip install tokenjam
+pipx install tokenjam
 tj demo surprise-cost
 ```
@@ -90,4 +90,4 @@ To track real spend, instrument your agent with the tokenjam SDK and run `tj ser
 ---
-[TokenJam](https://github.com/Metabuilder-Labs/TokenJam) is a local-first, zero-signup observability CLI for AI agents. No cloud. No account. Just `pip install tokenjam` and start seeing what your agent actually does.
+[TokenJam](https://github.com/Metabuilder-Labs/TokenJam) is a local-first, zero-signup observability CLI for AI agents. No cloud. No account. Just `pipx install tokenjam` and start seeing what your agent actually does.

{tokenjam-0.3.3 → tokenjam-0.3.5}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
 [project]
 name = "tokenjam"
-version = "0.3.3"
+version = "0.3.5"
 description = "TokenJam — local-first OTel-native observability for Autonomous AI agents"
 readme = "README.md"
 requires-python = ">=3.10"
@@ -41,6 +41,11 @@ dependencies = [
     "httpx>=0.27",
     "apscheduler>=3.10",
     "websockets>=12.0",
+    # fastmcp ships in the base install (was in the [mcp] extra) so `tj mcp`
+    # works on a fresh `pipx install tokenjam` without requiring users to
+    # remember the extra. Claude Code's MCP integration is now a primary
+    # use case rather than an opt-in. Issue #101.
+    "fastmcp>=0.2",
 ]
 [project.urls]
@@ -54,7 +59,10 @@ crewai    = ["crewai>=0.28"]
 autogen   = ["pyautogen>=0.2"]
 litellm   = ["litellm>=1.40"]
 dev       = ["pytest", "pytest-asyncio", "httpx", "ruff", "mypy"]
-mcp       = ["fastmcp"]
+# Kept as a no-op extra for back-compat — `pipx install 'tokenjam[mcp]'` still
+# works, just installs the same fastmcp that's now in the base dependencies.
+# Documented in `docs/installation.md` so users know they no longer need it.
+mcp       = []
 # Trim analyzer (`tj optimize --finding prompt-bloat`). LLMLingua-2 pulls in
 # PyTorch and transformers, ~2GB total. Kept optional so the base install
 # stays small — most users don't run the bloat analyzer.

{tokenjam-0.3.3 → tokenjam-0.3.5}/sdk-ts/package.json RENAMED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@tokenjam/sdk",
-  "version": "0.3.3",
+  "version": "0.3.5",
   "description": "TypeScript SDK for TokenJam — local-first observability for AI agents",
   "main": "dist/index.js",
   "types": "dist/index.d.ts",

{tokenjam-0.3.3 → tokenjam-0.3.5}/tests/factories.py RENAMED Viewed

@@ -19,6 +19,7 @@ def make_llm_span(
     input_tokens: int = 1000,
     output_tokens: int = 200,
     cache_tokens: int = 0,
+    cache_write_tokens: int = 0,
     cost_usd: float | None = None,
     tool_name: str | None = None,
     status: str = "ok",
@@ -59,6 +60,7 @@ def make_llm_span(
         input_tokens=input_tokens,
         output_tokens=output_tokens,
         cache_tokens=cache_tokens,
+        cache_write_tokens=cache_write_tokens,
         cost_usd=cost_usd,
         conversation_id=conversation_id,
         attributes=attrs,

{tokenjam-0.3.3 → tokenjam-0.3.5}/tests/integration/test_db.py RENAMED Viewed

@@ -42,8 +42,8 @@ def _insert_agent(db, agent_id="test-agent"):
 def test_migrations_run_on_empty_db():
     backend = InMemoryBackend()
     rows = backend.conn.execute("SELECT version FROM schema_migrations").fetchall()
-    assert len(rows) == 4
-    assert {r[0] for r in rows} == {1, 2, 3, 4}
+    assert len(rows) == 5
+    assert {r[0] for r in rows} == {1, 2, 3, 4, 5}
     backend.close()
@@ -52,7 +52,7 @@ def test_migrations_are_idempotent():
     # Running migrations again should not raise
     run_migrations(backend.conn)
     rows = backend.conn.execute("SELECT version FROM schema_migrations").fetchall()
-    assert len(rows) == 4
+    assert len(rows) == 5
     backend.close()

{tokenjam-0.3.3 → tokenjam-0.3.5}/tests/integration/test_full_pipeline.py RENAMED Viewed

@@ -34,7 +34,7 @@ from tokenjam.core.db import InMemoryBackend
 from tokenjam.core.ingest import IngestPipeline
 from tokenjam.core.models import AgentRecord, NormalizedSpan, SpanKind, SpanStatus
 from tokenjam.core.schema_validator import SchemaValidator
-from tokenjam.otel.provider import TjSpanExporter
+from tokenjam.otel.provider import TjSpanExporter, convert_otel_span
 from tokenjam.otel.semconv import GenAIAttributes
 from tokenjam.sdk.agent import watch, AgentSession, record_llm_call, record_tool_call
 from tokenjam.utils.time_parse import utcnow
@@ -159,6 +159,40 @@ def full_stack():
     db.close()
+# ── OTel ReadableSpan -> NormalizedSpan ──────────────────────────────────
+def test_convert_otel_span_extracts_cache_read_and_write_tokens():
+    """convert_otel_span indexes both cache-read and cache-creation tokens.
+    Regression: provider previously read only CACHE_READ_TOKENS, dropping
+    cache-creation tokens so cache-write cost was never charged on this path.
+    """
+    collected: list[ReadableSpan] = []
+    class _Collector(SpanExporter):
+        def export(self, spans: Sequence[ReadableSpan]) -> SpanExportResult:
+            collected.extend(spans)
+            return SpanExportResult.SUCCESS
+        def shutdown(self) -> None:
+            pass
+    provider = TracerProvider()
+    provider.add_span_processor(SimpleSpanProcessor(_Collector()))
+    tracer = provider.get_tracer("test")
+    with tracer.start_as_current_span("gen_ai.llm.call") as span:
+        span.set_attribute(GenAIAttributes.REQUEST_MODEL, "claude-haiku-4-5")
+        span.set_attribute(GenAIAttributes.CACHE_READ_TOKENS, 1000)
+        span.set_attribute(GenAIAttributes.CACHE_CREATE_TOKENS, 2000)
+    assert len(collected) == 1
+    normalized = convert_otel_span(collected[0])
+    assert normalized.cache_tokens == 1000
+    assert normalized.cache_write_tokens == 2000
 # ── SDK -> Pipeline -> DB ─────────────────────────────────────────────────

{tokenjam-0.3.3 → tokenjam-0.3.5}/tests/manual-new-release-tests.md RENAMED Viewed

@@ -13,8 +13,15 @@ Run this after a new release publishes to PyPI to verify it works end-to-end. Th
 tj uninstall --yes 2>/dev/null
 rm -rf ~/.tj ~/.config/tj .tj
-pip3 install --upgrade tokenjam
+# Recommended install path (PEP 668-safe on Homebrew Python and
+# Debian 12+/Ubuntu 24+). `--force` so we reinstall even if a prior
+# version is present.
+pipx install --force tokenjam
 tj --version
+# Older `pip3 install --upgrade tokenjam` path still works inside
+# a clean venv but fails on system Python — that's the bug pipx
+# solves, and verifying pipx is what we ship docs telling users to do.
 ```
 **Pass criteria:** version matches the release being tested.
@@ -66,6 +73,32 @@ tj optimize --json | python3 -c \
 **Pass criteria:** every positional analyzer name runs without crashing. Optional analyzers (`cache-recommend`, `trim`) surface clear hints when their prereqs aren't met instead of erroring.
+## 4b. TokenMaxx tier classification
+```bash
+tj tokenmaxx
+# [ ] Bordered "TokenJam TokenMaxxing Report" panel renders
+# [ ] On api plan: shows absolute spend; no multiplier line
+# [ ] Action line surfaces either downsize savings or "no obvious
+#     savings flagged yet" (both are valid)
+# Verify the JSON tier label is one of the six valid v0.3.4 tiers.
+tj tokenmaxx --json | python3 -c \
+  "import json,sys;d=json.load(sys.stdin);ok={'TokenSipper','TokenModerator','TokenMaxxer','TokenSuperMaxxer','TokenMegaMaxxer','TokenGigaMaxxer'};assert d['tier'] in ok,d['tier'];print('ok:',d['tier'])"
+# Reconfigure to a subscription plan and re-run — the multiplier line
+# should appear. Pick whichever plan matches your test config.
+tj onboard --claude-code --reconfigure --plan max_5x
+tj tokenmaxx
+# [ ] Multiplier line "That's N× your Max 5x plan cost ($100/mo flat)."
+# [ ] Tier may shift if the multiplier crosses a boundary
+# Flip back to api so subsequent steps render dollar figures.
+tj onboard --claude-code --reconfigure --plan api
+```
+**Pass criteria:** the report renders without crashing, the JSON `tier` field carries one of the 6 v0.3.4 tier names, and the multiplier line appears under a subscription plan.
 ## 5. Backfill adapters (smoke against committed fixtures)
 ```bash
@@ -123,10 +156,45 @@ Spot-check:
 - [ ] Cost page shows non-zero USD values
 - [ ] Sidebar theme toggle works
+### Offline-UI verification (v0.3.4 — PR #88)
+Open Chrome DevTools (or your browser's equivalent) → **Network tab** → reload `http://127.0.0.1:7391/`.
+- [ ] **Zero failed requests** to `fonts.googleapis.com`, `fonts.gstatic.com`, `esm.sh`, or `tokenjam.dev`
+- [ ] Dashboard interactivity works (sidebar nav, tab switches) — proves the vendored Preact / htm under `/ui/vendor/` is being served, not loading from the CDN
+- [ ] Favicon renders (data: URL, no external fetch)
+Bonus: turn off wifi entirely, hard-refresh, and confirm the page still renders + the JS still hydrates. The whole dashboard must work air-gapped.
 ```bash
 tj stop
 ```
+## 9. Cache cost-correctness (v0.3.4 — PRs #90 + #92)
+Cache-only spans (cache_read > 0, input/output = 0) used to be costed at $0. Cache-creation tokens on the live OTLP path used to be silently dropped. Both fixed in v0.3.4.
+```bash
+# Spans table now has cache_write_tokens (migration 5).
+duckdb ~/.tj/telemetry.duckdb "PRAGMA table_info(spans)" 2>/dev/null \
+  | grep cache_write_tokens \
+  && echo "ok: cache_write_tokens column present"
+# Any captured Anthropic cache-hit span should have non-zero cost_usd.
+duckdb ~/.tj/telemetry.duckdb "
+  SELECT COUNT(*) AS hits,
+         MIN(cost_usd) AS min_cost
+  FROM spans
+  WHERE cache_tokens > 0
+    AND (input_tokens = 0 OR input_tokens IS NULL)
+    AND (output_tokens = 0 OR output_tokens IS NULL)
+" 2>/dev/null
+# [ ] If hits > 0: min_cost > 0 (cache hits ARE being costed; was $0 pre-0.3.4)
+# [ ] If hits = 0: this release's runs didn't trigger a pure cache-only span — fine, unit tests cover the path
+```
+If you don't have `duckdb` CLI installed, skip the SQL checks — the unit + synthetic tests covering these paths run in CI and are the canonical verification.
 ---
 ## Claude Code integration (smoke)
@@ -166,14 +234,16 @@ tj stop
 | Step | Pass criteria |
 |------|--------------|
-| 1 | `pip install --upgrade tokenjam` succeeds, version matches release |
+| 1 | `pipx install --force tokenjam` succeeds, version matches release |
 | 2 | Onboard prompts for plan tier; config records it; no auto `usd = 200` written |
 | 3 | Example runs without DB-lock errors; CLI shows real USD values |
-| 4 | All four optimize analyzers run; caveat appears in downgrade JSON; `plan` + `pricing_mode` in JSON output |
+| 4 | All optimize analyzers run; caveat appears in downgrade JSON; `plan` + `pricing_mode` in JSON output |
+| 4b | `tj tokenmaxx` renders the bordered report panel; JSON `tier` is one of the 6 v0.3.4 tier names; subscription plan shows multiplier line |
 | 5 | All three backfill adapters ingest from fixtures; re-runs are idempotent |
 | 6 | `--compare previous` produces a diff report; `--export-config` writes a snippet with caveat comments |
 | 7 | `tj policy list` renders the unified table |
-| 8 | `tj serve` starts, web UI loads, HTTP fallback works while server holds lock |
+| 8 | `tj serve` starts, web UI loads, HTTP fallback works while server holds lock; **zero external requests in DevTools Network tab** (offline-UI fix shipped in v0.3.4) |
+| 9 | `cache_write_tokens` column present on the spans table (migration 5); cache-hit spans show non-zero cost_usd |
 | Claude Code | Onboard writes settings.json + projects.json; re-run is a no-op |
 | Codex | Onboard writes `[otel]` + `[mcp_servers.tj]` to codex config; secret synced |

tokenjam 0.3.3__tar.gz → 0.3.5__tar.gz

tokenjam 0.3.3tar.gz → 0.3.5tar.gz