PyPI - tokenjam - Versions diffs - 0.3.4__tar.gz → 0.4.0__tar.gz - Mend

tokenjam 0.3.4tar.gz → 0.4.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (296) hide show

{tokenjam-0.3.4 → tokenjam-0.4.0}/.gitignore RENAMED Viewed

@@ -50,3 +50,7 @@ tj.toml
 # Superpowers skill output (internal planning artifacts, not for OSS)
 docs/superpowers/
 .gstack/
+# Per-release pre-release test logs are committed (for reference) — see
+# tests/results/. They're small markdown files, one per release run, and
+# serve as a record of what was verified at release time.

{tokenjam-0.3.4 → tokenjam-0.4.0}/CLAUDE.md RENAMED Viewed

@@ -4,7 +4,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
 ## Project Overview
-`tj` (TokenJam) is a local-first, OTel-native observability CLI for AI agents. No cloud backend, no signup. It captures telemetry from agent runtimes, stores it in a local DuckDB database, and exposes a CLI + local REST API for querying. Install via `pip install tokenjam`, run via `tj <subcommand>`. Requires Python >=3.10.
+`tj` (TokenJam) is a local-first, OTel-native **cost-optimization layer** for AI agents (with a full observability stack underneath). No cloud backend, no signup. It captures telemetry from agent runtimes, stores it in a local DuckDB database, and runs four named analyzers (`downsize` / `cache` / `script` / `trim`) that surface cost-saving candidates from real usage — plus a CLI, local REST API, web UI, and MCP server for querying. Install via `pipx install tokenjam` (recommended — sidesteps PEP 668 on Homebrew Python and Debian 12+/Ubuntu 24+) or `pip install tokenjam` in a venv. Run via `tj <subcommand>`. Requires Python >=3.10.
 ## Build & Development
@@ -14,7 +14,7 @@ pip install -e ".[dev]"
 # Linting and type checking
 ruff check tokenjam/                  # line-length=100, target py310
-mypy tokenjam/                        # strict mode
+mypy tokenjam/                        # partial config (not --strict; see [tool.mypy] in pyproject.toml)
 # Tests (CI runs all except e2e)
 pytest tests/unit/ tests/synthetic/ tests/agents/ tests/integration/
@@ -37,6 +37,27 @@ cd sdk-ts && npm install && npm test
 ```
+## Working with concurrent agents
+When more than one agent is editing this repo in parallel, **each agent must operate in its own git worktree**. A single working directory shares one `HEAD`, so two `git commit` calls from different agents land on whichever branch was checked out last — leading to commits leaking into the wrong PR. We've hit this multiple times.
+Spin up a per-task worktree before starting:
+```bash
+git worktree add ../tokenjam-<task> main
+cd ../tokenjam-<task>
+git checkout -b feat/<task>
+```
+When the PR merges and the branch is deleted, prune the worktree:
+```bash
+git worktree remove ../tokenjam-<task>
+```
+Symptom of a missed worktree: `git log` shows a commit on a branch you didn't intend (because another agent's `HEAD` was the checked-out one when your `git commit` ran). If you see this, do **not** force-push — rebase the stray commit off your branch first, and only force-push if you own every commit being rewritten.
+`.tj/config.toml` is intentionally untracked (see PR #145 + Critical Rule 20) and gets mutated at runtime by `tj onboard` / `tj serve` regenerating the local `ingest_secret`. Don't `git add` it back. The CI test `tests/unit/test_no_tracked_dev_secrets.py` guards against this.
 ## Architecture
 ### Data Flow
@@ -61,16 +82,25 @@ Post-ingest hooks run synchronously after each span is written to DB:
 - **`tokenjam/core/db.py`**: `StorageBackend` protocol + `DuckDBBackend` + `InMemoryBackend` (for tests) + migration runner. Migrations are `(version, sql)` tuples in a `MIGRATIONS` list — never modify existing ones, only append. **Note:** `StorageBackend` doesn't cover every query. Some callers (e.g. `CostEngine`, `cmd_status`) access `db.conn` directly for queries not in the protocol (cost updates, active session lookups). Helper `_row_to_session()` is used to convert raw DuckDB rows.
 - **`tokenjam/core/ingest.py`**: `IngestPipeline` (central hub), `SpanSanitizer` (rejects oversized/malformed spans), `strip_captured_content()`. Post-ingest hooks (cost, alerts, schema) are optional and error-tolerant — hook failures are logged, never propagated.
 - **`tokenjam/core/pricing.py`**: `ModelRates` (frozen dataclass), `load_pricing_table()` (LRU-cached), `get_rates(provider, model)`. Falls back to default rates for unknown models.
-- **`tokenjam/core/cost.py`**: `calculate_cost()` (pure function, rounds to 8dp) + `CostEngine` (post-ingest hook that updates `spans.cost_usd` and `sessions.total_cost_usd` via `db.conn` — see db.py note). Pricing loaded from `pricing/models.toml`.
+- **`tokenjam/core/cost.py`**: `calculate_cost()` (pure function, rounds to 8dp) + `CostEngine` (post-ingest hook that updates `spans.cost_usd` and `sessions.total_cost_usd` via `db.conn` — see db.py note). Pricing loaded from `tokenjam/pricing/models.toml`. **Cache-read vs cache-write are separate fields** on `NormalizedSpan` (`cache_tokens` = read, `cache_write_tokens` = create); they bill at different rates and `calculate_cost` charges each at its own rate. The early-return no-op guard checks all four token counts (input/output/cache_read/cache_write) — see PR #90 and PR #92 for the cache-only-span and cache-write-on-live-path fixes.
 - **`tokenjam/core/alerts.py`**: `AlertEngine` with 13 alert types, `CooldownTracker` (in-memory, per agent+type, resets on restart), `AlertDispatcher` routing to 6 channel types (stdout, file, ntfy, webhook, Discord, Telegram). `AlertEngine.fire()` is the external entry point for other modules (SchemaValidator, DriftDetector) to fire alerts. Suppressed alerts are still persisted to DB but not dispatched to channels. Hardcoded thresholds: retry loop fires at 4+ identical tool calls in last 6 spans; failure rate fires at >20% errors in last 20 spans (checked every 5th error); session duration default 3600s. Stdout and file channels always include full detail regardless of `include_captured_content` config.
 - **`tokenjam/core/drift.py`**: `DriftDetector` — Z-score based behavioral drift detection, fires at session end.
-- **`tokenjam/core/optimize/`**: Package powering `tj optimize` and the `get_optimize_report` MCP tool. Public API re-exported from `__init__.py`: `build_report()` (orchestrator), `report_to_dict()`, `ANALYZER_REGISTRY`, `ANALYZER_ORDER`, plus result dataclasses. Architecture: `registry.py` holds the `@register("name")` decorator and `ANALYZER_REGISTRY` dict; `runner.py` defines `ANALYZER_ORDER` and orchestrates execution; `types.py` holds `AnalyzerContext` + result dataclasses + `MODEL_DOWNGRADE_CAVEAT`. Individual analyzers live in `analyzers/`, each as a single file registering via `@register`: `model_downgrade.py` (structural candidates — input < 5K tokens AND output < 500 tokens AND tool_calls ≤ 5; never claims quality equivalence, caveat baked into dataclass default), `budget_projection.py` (per-provider cycle spend vs `[budget.<provider>]` ceiling; only fires when budget > 0), `cache_efficacy.py`, `cache_recommend.py`, `prompt_bloat.py`, `workflow_restructure.py`. Analyzers receive an `AnalyzerContext` and operate on `db.conn` directly. To add a new analyzer: drop a file under `analyzers/`, decorate with `@register("name")`, append to `ANALYZER_ORDER` if ordering matters — `cmd_optimize --finding` choices auto-derive from the registry.
+- **`tokenjam/core/optimize/`**: Package powering `tj optimize` and the `get_optimize_report` MCP tool. Public API re-exported from `__init__.py`: `build_report()` (orchestrator), `report_to_dict()`, `ANALYZER_REGISTRY`, `ANALYZER_ORDER`, plus result dataclasses. Architecture: `registry.py` holds the `@register("name")` decorator and `ANALYZER_REGISTRY` dict; `runner.py` defines `ANALYZER_ORDER` and orchestrates execution; `types.py` holds `AnalyzerContext` + result dataclasses + `MODEL_DOWNGRADE_CAVEAT`. Individual analyzers live in `analyzers/`, each as a single file registering via `@register`. **Registry strings (the user-facing names) and file names are decoupled**:
+  - `model_downgrade.py` → `@register("downsize")` — structural candidates (input < 5K tokens AND output < 500 tokens AND tool_calls ≤ 5; never claims quality equivalence, caveat baked into dataclass default)
+  - `budget_projection.py` → `@register("budget-projection")` — per-provider cycle spend vs `[budget.<provider>]` ceiling; only fires when budget > 0
+  - `cache_efficacy.py` → `@register("cache")` — current cache-read efficacy per (provider, model)
+  - `cache_recommend.py` → `@register("cache-recommend")` — Anthropic-only structural prefix detection for `cache_control` placement
+  - `workflow_restructure.py` → `@register("script")` — `(tool_name, arg_shape)` cluster detection for deterministic-script candidates
+  - `prompt_bloat.py` → `@register("trim")` — LLMLingua-2 token-significance classification (requires `tokenjam[bloat]` extra)
+  Analyzers receive an `AnalyzerContext` and operate on `db.conn` directly. To add a new analyzer: drop a file under `analyzers/`, decorate with `@register("name")`, append to `ANALYZER_ORDER` if ordering matters — `cmd_optimize`'s positional `findings` Click choices auto-derive from the registry.
+  **Recoverable-savings contract** (issues #111/#122): every *savings* analyzer's result dataclass carries `estimated_recoverable_usd` / `estimated_recoverable_tokens` / `estimate_basis` / `estimate_confidence` (`"heuristic"`). All four are on **one time basis — recoverable over the analyzed window** (`downsize` keeps a separate `monthly_savings_usd` for its CLI projection line, but `estimated_recoverable_usd` is the window figure so Overview tiles are comparable). `cache-recommend` and `budget-projection` deliberately carry **no** recoverable field (not savings analyzers); the Overview waste band is registry-driven off the presence of `estimated_recoverable_usd`, so a future analyzer (e.g. reuse) appears with no UI change. `report_to_dict`/`report_from_dict` round-trip these fields. Honesty discipline (Critical Rule 14) is mandatory — every estimate is "estimated recoverable", never "saves you".
 - **`tokenjam/core/ingest_adapters/`**: Third-party trace-export adapters that normalize external payloads (`langfuse.py`, `helicone.py`, `otlp.py`) into `NormalizedSpan` for ingest. Each is reachable as a `tj backfill <name>` subcommand and accepts `--source-url` (live API) or `--source-file` (offline JSON dump). Adapters write deterministic span IDs derived from the source's identifiers so re-runs are idempotent. `otlp.py` shares span-mapping logic with the live `POST /api/v1/spans` route via `tokenjam/otel/otlp_parsing.py`.
 - **`tokenjam/core/export/`**: Routing-config snippet generators for `tj optimize --export-config`. Currently `claude_code.py` emits a JSONC fragment under a `tokenjam.routing_recommendations` namespace with honest-framing caveat comments baked in. Writes to `~/.config/tokenjam/exports/`; never touches `~/.claude/settings.json` or other external configs (no `--apply` flag — Claude Code doesn't currently honor TokenJam routing keys, so auto-writing would change nothing and erode trust).
 - **`tokenjam/core/backfill.py`**: Parses Claude Code on-disk session JSONL files into `NormalizedSpan`s. Cost is recomputed from `pricing/models.toml` because the on-disk format has no `cost_usd`. The parser tolerates the dated `claude-<family>-<ver>-YYYYMMDD` model-name suffixes Anthropic ships (handled by `core/pricing.py.get_rates()`, which strips the trailing 8-digit date suffix when no exact pricing match exists). Idempotency relies on deterministic span IDs derived from `(session_id, message uuid)` / `(session_id, tool_use id)`.
 - **`tokenjam/core/schema_validator.py`**: Validates tool outputs against declared or genson-inferred JSON Schema. Only fires on `gen_ai.tool.call` spans with `gen_ai.tool.output` in attributes. Schema priority: 1) declared file from agent config `output_schema`, 2) inferred schema from `DriftBaseline.output_schema_inferred`. Caches schemas in-memory per agent.
 - **`tokenjam/core/models.py`**: All domain dataclasses — `NormalizedSpan`, `SessionRecord`, `Alert`, `DriftBaseline`, filter types, etc. `NormalizedSpan` carries `billing_account` (provider-only: `anthropic` / `openai` / `google` / `bedrock` / `local.ollama`). `SessionRecord` carries `plan_tier` (api / pro / max_5x / max_20x / plus / team / enterprise / local / unknown) plus a derived `pricing_mode` property (`local` / `subscription` / `api` / `unknown`). Spans inherit plan via the session FK — analyzers JOIN through `SessionRecord` when they need plan context. See [`docs/architecture.md`](docs/architecture.md) → "OTel semconv extensions" for the full derivation rules.
 - **`tokenjam/core/config.py`**: `TjConfig` dataclass tree, TOML loading/writing, config file discovery. `ProviderBudget` carries an optional `plan` field (set by `tj onboard`'s plan-tier prompt) that `IngestPipeline._build_or_update_session` reads to populate `SessionRecord.plan_tier` at session creation. `CaptureConfig` has four fine-grained content-capture toggles (`prompts` / `completions` / `tool_inputs` / `tool_outputs`); `strip_captured_content()` in `core/ingest.py` enforces them at the single ingest-pipeline gate.
+- **`tokenjam/core/framing.py`**: **Single source of truth for plan-tier-aware rendering** (issue #110). `compute_framing(config, window_summary, by_provider_breakdown) -> Framing` decides whether dollar figures are shown verbatim (`api`), suppressed for token-share framing (`subscription`), shown as tokens-only (`local`), or shown with an "may overstate" qualifier (`unknown`). Plus `render_dollar()` / `render_savings()` (UI-facing compact formatters), and the shared helpers `pricing_mode_for` / `dominant_plan` / `config_declared_plan` (with the #106 global-config fallback) / `plan_tier_mix`. **Consumed by both the CLI (`cmd_optimize`, `cmd_tokenmaxx`) and the REST API** (which emits `Framing.to_dict()` as the `framing` block) — neither re-derives the rules. This module *reads* plan-tier/pricing-mode; the canonical derivation still lives on `SessionRecord.pricing_mode` + `SUBSCRIPTION_PLAN_TIERS` (semconv). When adding a dollar-bearing surface, consume this — do not re-implement the suppression rules.
 - **`tokenjam/sdk/agent.py`**: `@watch()` decorator creates session spans only. `record_llm_call()` and `record_tool_call()` create child spans for manual instrumentation. LLM call spans from provider clients require `patch_anthropic()`, `patch_openai()`, etc.
 - **`tokenjam/sdk/transport.py`**: `HttpTransport` — buffers up to 1000 spans, retries with exponential backoff (3 attempts, 2s base). Used when `tj serve` runs as a separate process.
 - **`tokenjam/sdk/bootstrap.py`**: `ensure_initialised()` — lazy, thread-safe, idempotent bootstrap of config -> DB -> IngestPipeline -> TracerProvider. Called automatically by `@watch()` and all `patch_*()` functions. Registers atexit flush.
@@ -79,10 +109,10 @@ Post-ingest hooks run synchronously after each span is written to DB:
 - **`tokenjam/otel/exporters.py`**: Prometheus metric reader setup via `build_prometheus_exporter()`.
 - **`tokenjam/otel/otlp_parsing.py`**: Shared OTLP JSON → `NormalizedSpan` parser. Two callers: `api/routes/spans.py` (live `POST /api/v1/spans`) and `core/ingest_adapters/otlp.py` (`tj backfill otlp`). Keep parsing in this one place — the live receive path and the backfill adapter must agree on attribute extraction, billing_account derivation, and timestamp handling.
 - **`tokenjam/otel/semconv.py`**: `GenAIAttributes`, `TjAttributes` (includes `BILLING_ACCOUNT` and `PLAN_TIER`), `VALID_PLAN_TIERS` and `SUBSCRIPTION_PLAN_TIERS` frozensets — OTel GenAI semantic convention constants plus tj-specific extensions.
-- **`tokenjam/api/app.py`**: FastAPI app factory. `tj serve` starts it with uvicorn. Accepts `db`, `config`, `ingest_pipeline` for testability. Registers all routers under `/api/v1` plus `/metrics`.
+- **`tokenjam/api/app.py`**: FastAPI app factory (OpenAPI title `"TokenJam Lens"`). `tj serve` starts it with uvicorn. Accepts `db`, `config`, `ingest_pipeline` for testability. Registers all routers under `/api/v1` plus `/metrics`, `/health`, and the SPA at `/`. **`index.html` is read into a module string once at `create_app()` time** (`_index_html`) — so editing `tokenjam/ui/index.html` requires a `tj serve` restart to take effect; tests read the file from disk directly and aren't affected. Mounts `/ui/vendor` as `StaticFiles`.
 - **`tokenjam/api/middleware.py`**: `IngestAuthMiddleware` — protects `POST /api/v1/spans` with Bearer token. Returns `JSONResponse(401)` directly (not `HTTPException`, which doesn't propagate from `BaseHTTPMiddleware.dispatch`).
 - **`tokenjam/api/deps.py`**: `require_api_key` — FastAPI dependency for optional API key auth on GET endpoints. Only enforced when `api.auth.enabled = true` in config.
-- **`tokenjam/api/routes/`**: One file per resource — `spans.py` (OTLP JSON ingest), `traces.py`, `cost.py`, `tools.py`, `alerts.py`, `drift.py`, `metrics.py` (Prometheus text format from DB queries).
+- **`tokenjam/api/routes/`**: One file per resource — `spans.py` (OTLP JSON ingest), `traces.py`, `cost.py`, `cost_compare.py`, `tools.py`, `alerts.py`, `drift.py`, `optimize.py`, `budget.py`, `status.py`, `agents.py`, `metrics.py` (Prometheus text format from DB queries), `version.py` (unauthenticated `GET /health` → `{"status":"ok","version":...}` mounted with no prefix, plus `GET /api/v1/version`; the version is derived at runtime via `importlib.metadata.version("tokenjam")` — no hardcoded literal). **The dollar-bearing read routes (`/cost`, `/cost/compare`, `/optimize`, `/budget`) each return a `framing` block** (see `core/framing.py`) so the web UI renders plan-tier-aware figures without re-deriving the rules in JS. `/optimize` takes `?fast=true` to skip the expensive Trim analyzer (returns `skipped_analyzers`) for the polling Overview; `/cost` returns a window-bucketed `series` for the chart (see Web UI below). **Concurrency caveat:** the sync (`def`) read routes run in Starlette's threadpool and share the daemon's single DuckDB connection, which is not safe for concurrent use — fan-out callers (the Overview) must fetch sequentially. See issue #124.
 - **`tokenjam/mcp/server.py`**: FastMCP stdio server exposing observability data to Claude Code. Uses either a read-only DuckDB connection or HTTP proxy to `tj serve`. Initialized via `init()` from `cmd_mcp.py`.
 - **`tokenjam/cli/main.py`**: Root Click group with global options (`--config`, `--json`, `--no-color`, `--db`, `--agent`, `-v`). Registers all subcommands.
@@ -92,11 +122,12 @@ Post-ingest hooks run synchronously after each span is written to DB:
 - **`tj demo [scenario]`** (`cmd_demo.py`) — runs Agent Incident Library scenarios (zero-config, no API keys). `tj demo` lists all; `tj demo retry-loop` runs one.
 - **`tj doctor`** (`cmd_doctor.py`) — health checks (config, DB, secrets, webhooks, drift readiness, schema-vs-capture consistency). Exit 0 = ok, 1 = warnings, 2 = errors.
-- **`tj optimize`** (`cmd_optimize.py`) — six analyzers, registry-driven: `model-downgrade`, `budget-projection`, `cache-efficacy`, `cache-recommend`, `workflow-restructure`, `prompt-bloat`. Flags: `--since 30d`, `--finding <name>` (repeatable; choices auto-derive from `ANALYZER_REGISTRY` at click decoration time), `--budget <provider>`, `--budget-usd <amount>`, `--compare <period>` (window-cost diff vs prior period; accepts `previous` / `last-week` / `last-month` / `last-7d` / `last-30d` / `YYYY-MM-DD:YYYY-MM-DD`), `--export-config <target>` (writes a routing snippet — currently `claude-code` — under `~/.config/tokenjam/exports/`; no `--apply` flag by design). Plan-tier-aware rendering: subscription users see "implied API value" framing and token-share savings (never dollar "spend"); local users see token-only framing; unknown-plan users see dollar figures suppressed with a `tj onboard --reconfigure` hint. Opens the live DB read-only so it works alongside a running `tj serve`.
+- **`tj optimize`** (`cmd_optimize.py`) — six analyzers, registry-driven. **Analyzers are positional args** (not `--finding <name>`): `tj optimize downsize cache trim` runs three; bare `tj optimize` runs all. Registered names: `downsize`, `cache`, `cache-recommend`, `script`, `trim`, `budget-projection`. Flags: `--since 30d`, `--budget <provider>`, `--budget-usd <amount>`, `--compare <period>` (window-cost diff vs prior period; accepts `previous` / `last-week` / `last-month` / `last-7d` / `last-30d` / `YYYY-MM-DD:YYYY-MM-DD`), `--export-config <target>` (writes a routing snippet — currently `claude-code` — under `~/.config/tokenjam/exports/`; no `--apply` flag by design). Plan-tier-aware rendering: subscription users see "implied API value" framing and token-share savings (never dollar "spend"); local users see token-only framing; unknown-plan users see dollar figures suppressed with a `tj onboard --reconfigure` hint. Works alongside a running `tj serve` via the `/api/v1/optimize` HTTP fallback when the DuckDB write lock is held by the daemon.
+- **`tj tokenmaxx`** (`cmd_tokenmaxx.py`) — shareable spend-tier command. Reads last 30 days of usage, classifies into a 6-tier ladder (Sipper / Moderator / Maxxer / SuperMaxxer / MegaMaxxer / GigaMaxxer) using the multiplier vs the user's declared subscription plan as the primary classifier, with absolute USD/mo thresholds as the API-user fallback. Output is a bordered Panel designed for screenshotting. Plan-aware: shows the multiplier line only when the user has `[budget.<provider>] plan = "max_5x"` (or pro / max_20x / plus) configured — the declared-plan lookup uses `core/framing.config_declared_plan`, which falls back to the global `~/.config/tj/config.toml` when the active project config has no `[budget]` section (issue #106). The companion landing page is `tokenjam.dev/tokenmaxxing`. Designed to never exit without an actionable next step — pairs the tier callout with the downsize savings figure inline.
 - **`tj cost`** (`cmd_cost.py`) — cost breakdown by `--group-by agent|model|day|tool`. Same `--compare <period>` flag as `tj optimize` for window-over-window diffs (▲/▼ indicators, per-agent and per-model top-shifts, dollar + token deltas).
 - **`tj backfill <source>`** (`cmd_backfill.py`) — ingest historical telemetry from external sources. Subcommands: `claude-code` (parses `~/.claude/projects/*.jsonl`, auto-invoked at the end of `tj onboard --claude-code`), `langfuse` (live API or JSON dump), `helicone` (live API or JSON dump), `otlp` (raw OTLP JSON via URL or file — reuses the same parser as the live `POST /api/v1/spans` route). All idempotent via deterministic span IDs.
-- **`tj onboard`** (`cmd_onboard.py`) — `--claude-code` and `--codex` flags trigger integration-specific flows. Prompts for plan tier (api / pro / max_5x / max_20x for Anthropic; api / plus / team / enterprise for OpenAI) and writes it to `[budget.<provider>] plan = "..."`. Supports `--reconfigure` to re-prompt against an existing config, and `--plan <tier>` for non-interactive use. Does NOT auto-write a default `usd = 200` cycle ceiling — subscription users get only the `plan` field; API users are explicitly asked whether they want a self-imposed ceiling.
-- **`tj report`** (`cmd_report.py`) — generates standalone HTML visualizations of analyzer findings (e.g. `tj report --bloat [<agent_id>]` renders the prompt-bloat analyzer's per-token significance). Writes to `~/.cache/tokenjam/reports/` (override via `TOKENJAM_REPORT_DIR`) and opens in the default browser.
+- **`tj onboard`** (`cmd_onboard.py`) — `--claude-code` and `--codex` flags trigger integration-specific flows (writing to the **global** config). All paths — including plain `tj onboard` — prompt for plan tier (api / pro / max_5x / max_20x for Anthropic; api / plus / team / enterprise for OpenAI) and write it to `[budget.<provider>] plan = "..."`; `--plan <tier>` sets it non-interactively (issue #4). The plain path is Claude-first: its interactive prompt offers the Anthropic tiers, and an OpenAI-only `--plan` (plus/team/enterprise) is routed to `[budget.openai]`. Supports `--reconfigure` to re-prompt against an existing config. Does NOT auto-write a default `usd = 200` cycle ceiling — subscription users get only the `plan` field; API users are explicitly asked whether they want a self-imposed ceiling.
+- **`tj report`** (`cmd_report.py`) — generates standalone HTML visualizations of analyzer findings. Currently `tj report --trim [<agent_id>]` renders the Trim analyzer's per-token significance (was `--bloat` pre-0.3.1, renamed alongside the analyzer's registry string). Writes to `~/.cache/tokenjam/reports/` (override via `TOKENJAM_REPORT_DIR`) and opens in the default browser.
 - **`tj policy list`** (`cmd_policy.py`) — read-only preview of the unified policy surface. Consolidates existing `[alerts]`, `[alerts.channels]`, `[defaults.budget]`, `[budget.<provider>]`, per-agent `budget`/`drift`/`sensitive_actions`/`output_schema`, and `[capture]` config into one table; each row carries its source TOML section. Supports `--json`. `tj policy add | edit | apply | remove | test` are intentionally absent this sprint — the unified config migration is next sprint's work. `policy` is in `no_db_commands` in `cli/main.py` so it doesn't open the DB. Rich source-section strings (`[budget.anthropic]`, `[[alerts.channels]]`) must be passed through `rich.markup.escape()` before rendering — otherwise Rich consumes them as style tags.
 All commands support `--json` for machine-readable output. Commands that query alerts use exit code 1 if active (unacknowledged, unsuppressed) alerts exist.
@@ -121,6 +152,18 @@ For `GET /api/v1/drift`, if `agent_id` is missing, return `JSONResponse(status_c
 Integration tests use `httpx.AsyncClient` with `httpx.ASGITransport(app=app)` against `InMemoryBackend`. Synthetic alert tests use `unittest.mock.MagicMock` for the DB — you must explicitly set up `db.get_recent_spans.return_value` before calling `engine.evaluate()`, and silence channels with `engine.dispatcher.channels = []`.
+### Web UI ("TokenJam Lens")
+`tokenjam/ui/index.html` is the served dashboard — a **single-file Preact + htm SPA** (no build step, no TypeScript, no client-side router). "TokenJam Lens" is the **brand only**: it appears in `<title>`, the sidebar wordmark, and the OpenAPI title, but never in module names, route paths, or config keys. Screens: **Overview** (the default landing route — a triage front door), Status, Traces, Cost, Alerts, Drift, Optimize, Budget.
+- **Offline-first (Critical Rule 18):** every JS/CSS dep is vendored under `tokenjam/ui/vendor/` — Preact + hooks + htm (ESM via `<script type="importmap">`) and **uPlot** (vendored IIFE global `uPlot` + CSS, pinned in `docs/internal/lens-vendor-versions.md`). No render-time external HTTP. `tests/unit/test_ui_offline.py` enforces this; clickable `<a href>` links are the only allowed external URLs.
+- **Single compute path:** the UI reads everything from the REST API and **never re-implements analysis, aggregation, or plan-tier framing in JS** — it consumes the `framing` block (see `core/framing.py`). If the UI needs a number, extend the endpoint; don't compute it client-side.
+- **URL is the source of truth for filters:** state lives in the hash + query params (`#/cost?since=7d&group_by=model`); `getRoute()` parses it, `navigate()` writes it back omitting defaults. Window vocabulary matches the CLI (`1h`/`24h`/`7d`/`30d`/`90d` + `YYYY-MM-DD:YYYY-MM-DD`). The default landing route is Overview (empty hash → `getRoute()` returns `overview`; do **not** re-introduce a render-time `location.hash = ...` redirect — it raced the first render, issue #132).
+- **Charts:** `SpendChart` wraps uPlot, reads CSS custom properties (`--chart-1..5`) so it re-themes, and has a cursor tooltip. The spend chart spans the **full selected window** with zero-fill: `/api/v1/cost` returns a window-bucketed `series` (hourly buckets for ≤2-day windows, daily otherwise; epoch-second `bucket` keys) plus `series_bucket` + `window_start`/`window_end`, and the UI builds a continuous grid + pins the x-scale to the window (issues #133/#136).
+- **Run-rate** is a single linear figure projected to the end of the current calendar cycle (`daily_rate × days-remaining`), captioned "not a forecast". The forecasting boundary is deliberate: linear run-rate only — no EWMA, seasonality, or anomaly detection.
+- **Polling:** the Overview auto-refreshes every 30s only while the tab is visible (`document.visibilityState`) and **fetches its endpoints sequentially** (the daemon's single DuckDB connection isn't concurrency-safe — see the REST API caveat / #124). Detail screens refresh on user action.
+- **Testing the UI (no JS runner in CI):** the Python `test` job can't run JS, so UI fixes are guarded by **static-grep regression tests** in `tests/unit/test_lens_ui_regression.py` (assert buggy patterns are *gone* and new helpers are present) plus `test_ui_offline.py`. When iterating locally, validate syntax with `node --check` on the extracted `<script type="module">` block, and verify visually by running `tj serve` (or a seeded `create_app` + uvicorn on an alt port) and screenshotting with headless Chrome — there is intentionally no Playwright/Cypress.
 ### Session Continuity
 When a span has a `conversation_id` matching an existing session, it's attributed to that session (even across process restarts). New `conversation_id` = new session.
@@ -139,11 +182,14 @@ When a span has a `conversation_id` matching an existing session, it's attribute
 10. **Use semconv constants** — reference `GenAIAttributes` and `TjAttributes` from `tokenjam/otel/semconv.py` instead of hardcoding OTel attribute name strings.
 11. **OTel TracerProvider is global and set-once** — `trace.set_tracer_provider()` only works once per process. In tests, set the provider once at module level (not per-test in a fixture) and clear spans between tests. Use a custom `_CollectingExporter(SpanExporter)` since `InMemorySpanExporter` is not available in the installed OTel version. See `tests/agents/test_mock_scenarios.py` for the SDK test pattern and `tests/integration/test_full_pipeline.py` for the pipeline pattern.
 12. **New SDK integrations must call `ensure_initialised()`** — every `patch_*()` convenience function must call `from tokenjam.sdk.bootstrap import ensure_initialised; ensure_initialised()` before installing hooks. This lazily bootstraps the TracerProvider + IngestPipeline on first use.
-13. **PyPI package name is `tokenjam`, not `ocw`** — `pip install tokenjam` is the correct install command. The CLI command is `tj` and the Python package directory is `tokenjam/`. The published package name on PyPI is `tokenjam`. Never write `pip install ocw` in docs, examples, or comments.
-14. **`tj optimize` output must never claim quality equivalence** — the model-downgrade finding flags structural candidates only. Every user-visible string says "looks like" / "candidate" / "review before switching" — never "safe to downgrade" or "would have worked." The `MODEL_DOWNGRADE_CAVEAT` constant lives on `DowngradeFinding` as a dataclass default so it can't be removed by accident; it must also appear in human-readable CLI output. The same honesty discipline applies to all other analyzers — `cache-efficacy` ("you're getting X% of available caching"), `cache-recommend` (Anthropic-only, structural prefix detection), `workflow-restructure` ("structural shape matches", "review before replacing with a script"), `prompt-bloat` ("predicted low-significance regions; review before editing"). `tj optimize --export-config` snippets bake the caveat block into the JSONC output as comments.
+13. **PyPI package name is `tokenjam`, not `ocw`** — the package on PyPI is `tokenjam`. The CLI command is `tj`. The Python package directory is `tokenjam/`. **Recommended install: `pipx install tokenjam`** (sidesteps PEP 668 on Homebrew Python and Debian 12+/Ubuntu 24+). `pip install tokenjam` works inside a clean venv but fails on system Python with a misleading externally-managed-environment error. Never write `pip install ocw` in docs, examples, or comments.
+14. **`tj optimize` output must never claim quality equivalence** — the `downsize` finding flags structural candidates only. Every user-visible string says "looks like" / "candidate" / "review before switching" — never "safe to downgrade" or "would have worked." The `MODEL_DOWNGRADE_CAVEAT` constant lives on `DowngradeFinding` as a dataclass default so it can't be removed by accident; it must also appear in human-readable CLI output. The same honesty discipline applies to all other analyzers — `cache` ("you're getting X% of available caching"), `cache-recommend` (Anthropic-only, structural prefix detection), `script` ("structural shape matches", "review before replacing with a script"), `trim` ("predicted low-significance regions; review before editing"). `tj optimize --export-config` snippets bake the caveat block into the JSONC output as comments.
 15. **Version bump on release** — both `pyproject.toml` (`version = "X.Y.Z"`) and `sdk-ts/package.json` (`"version": "X.Y.Z"`) must be bumped to the new version before creating a GitHub release. The publish workflows (`publish-pypi.yml`, `publish-npm.yml`) trigger on `release published` events and will fail with 403 if the version already exists on PyPI/npm.
-16. **New optimize analyzers self-register** — drop a `.py` file under `tokenjam/core/optimize/analyzers/` with a function decorated `@register("name")` taking `AnalyzerContext`. Auto-discovery in `analyzers/__init__.py` walks the directory at import time. `cmd_optimize.py`'s `--finding` choices read from `ANALYZER_REGISTRY.keys()` at click decoration — no edits needed there. If your analyzer depends on (or is depended on by) another, append it to `ANALYZER_ORDER` in `runner.py` at the right position. Wave-2 analyzers attach their findings to `OptimizeReport.findings[name]` (generic dict); the older `model-downgrade` / `budget-projection` analyzers retain typed slots on `OptimizeReport` for backwards compat with `cmd_optimize` and the MCP server.
+16. **New optimize analyzers self-register** — drop a `.py` file under `tokenjam/core/optimize/analyzers/` with a function decorated `@register("name")` taking `AnalyzerContext`. Auto-discovery in `analyzers/__init__.py` walks the directory at import time. `cmd_optimize.py`'s positional `findings` Click choices read from `ANALYZER_REGISTRY.keys()` at decoration — no edits needed there. If your analyzer depends on (or is depended on by) another, append it to `ANALYZER_ORDER` in `runner.py` at the right position. Wave-2 analyzers attach their findings to `OptimizeReport.findings[name]` (generic dict); the older `downsize` (registered name; file is `model_downgrade.py`) and `budget-projection` analyzers retain typed slots on `OptimizeReport` for backwards compat with `cmd_optimize` and the MCP server.
 17. **OTLP parsing has one home** — `tokenjam/otel/otlp_parsing.py`. Both the live `POST /api/v1/spans` route and the `tj backfill otlp` adapter import `parse_otlp_span` and `extract_resource_attrs` from there. If you need to extend OTLP attribute extraction, do it once in that module; do not copy-paste into either caller.
+18. **Web UI must work fully offline** — `tokenjam/ui/index.html` is the served dashboard ("TokenJam Lens"; see Architecture → Web UI). It is intentionally a single-file SPA with **zero external HTTP loads at render time**. Preact + hooks + htm + **uPlot** are vendored under `tokenjam/ui/vendor/` (ESM via `<script type="importmap">`; uPlot as a plain `<script>` IIFE global); fonts use system-font fallbacks (no Google Fonts); the favicon is inlined as a `data:` URL. The FastAPI app mounts `/ui/vendor` as `StaticFiles`. The `tests/unit/test_ui_offline.py` regression test asserts no render-time external URLs exist anywhere outside `<a href>` (clickable links to github.com are fine — they only fetch on click) and that vendored CSS has no external `url()`. If you add a CDN font, script, or stylesheet, that test will fail. Vendor the asset locally instead. See issue #87 + PR #88.
+19. **Analyzer registry names ≠ file names** — registry strings (`downsize`, `cache`, `script`, `trim`) are decoupled from Python module filenames (`model_downgrade.py`, `cache_efficacy.py`, `workflow_restructure.py`, `prompt_bloat.py`). The 0.3.1 rename only changed `@register("...")` strings; file names stayed for git-blame continuity. When grepping for an analyzer, search both the registry string AND the older file-name keyword.
+20. **`.tj/config.toml` is untracked and must stay that way** — the file contains a live per-install `ingest_secret` and is regenerated by `tj onboard` / `tj serve`. It was committed in error from v0.2.0 through v0.3.5 (leaked secret in git history; see PR #145 + issue #141 finding #6). `.gitignore` covers it, and `tests/unit/test_no_tracked_dev_secrets.py` fails CI if it's re-added to the index. If you see `.tj/config.toml` in your `git status` as modified or new, that's expected — just don't `git add` it.
 ## Config
@@ -217,7 +263,7 @@ If a version already exists on PyPI or npm, the publish workflow fails with 403
 ## Packaging
-Build system is hatchling. The `pyproject.toml` requires `[tool.hatch.build.targets.wheel] packages = ["tj"]` because the package name (`tokenjam`) differs from the directory name (`tj`). Without this, `pip install -e .` fails.
+Build system is hatchling. `[tool.hatch.build.targets.wheel] packages = ["tokenjam"]` — the package directory is `tokenjam/` (matching the PyPI name); only the *CLI command* is `tj` (`[project.scripts] tj = "tokenjam.cli.main:cli"`). Non-`.py` assets under the package ship in the wheel automatically — this is how the vendored UI (`tokenjam/ui/index.html`, `tokenjam/ui/vendor/*`) and `tokenjam/pricing/models.toml` reach users.
 Key runtime dependency: `pytz` is required by DuckDB for `TIMESTAMPTZ` column handling — it's listed explicitly in `dependencies` because DuckDB doesn't declare it on all platforms.
@@ -233,10 +279,11 @@ Key runtime dependency: `pytz` is required by DuckDB for `TIMESTAMPTZ` column ha
 - **[docs/installation.md](docs/installation.md)** — base install vs optional extras matrix. Documents `tokenjam[bloat]` (the ~2GB torch + transformers extra used by the Trim analyzer), framework adapter extras (`[langchain]` / `[crewai]` / `[autogen]` / `[litellm]`), and the MCP / dev extras.
 - **[docs/configuration.md](docs/configuration.md)** — full TOML config surface plus the "Content capture and privacy" section explaining the four `[capture]` toggles and how they interact with `alerts.include_captured_content`.
 - **Optimize product pages** — one per user-facing product, all under `docs/optimize/`:
-  - [`downsize.md`](docs/optimize/downsize.md) — model-downgrade candidate flagging (internal: `model-downgrade`)
-  - [`cache.md`](docs/optimize/cache.md) — `cache-efficacy` (current caching ratio) + `cache-recommend` (Anthropic-only breakpoint suggestions)
-  - [`script.md`](docs/optimize/script.md) — `workflow-restructure` clustering by `(tool_name, arg_shape)` signature
-  - [`trim.md`](docs/optimize/trim.md) — LLMLingua-2 token-significance classifier (`prompt-bloat`), install + capture requirements, performance numbers
+  - [`downsize.md`](docs/optimize/downsize.md) — cheaper-model candidate flagging (registry: `downsize`, file: `model_downgrade.py`)
+  - [`cache.md`](docs/optimize/cache.md) — `cache` (current caching ratio) + `cache-recommend` (Anthropic-only breakpoint suggestions)
+  - [`script.md`](docs/optimize/script.md) — `script` clustering by `(tool_name, arg_shape)` signature (file: `workflow_restructure.py`)
+  - [`trim.md`](docs/optimize/trim.md) — LLMLingua-2 token-significance classifier (`trim`, file: `prompt_bloat.py`), install + capture requirements, performance numbers
+- **[AGENTS.md](AGENTS.md)** — codebase conventions for contributors (referenced from the top-level README).
 - **Backfill adapters** — `docs/backfill/overview.md` lists the four sources (`claude-code` / `langfuse` / `helicone` / `otlp`) with the partnership-posture framing; per-adapter pages document modes (URL / file), field mapping, idempotency, and v1 limitations.
 - **[docs/policy/overview.md](docs/policy/overview.md)** — read-only preview of the unified policy surface (`tj policy list`). Notes that the `add` / `edit` / `apply` subcommands and the underlying `[policy]` config migration land next sprint.
 - **Internal specs** — `docs/internal/specs/` is reserved for canonical specs that production code references at long-term. Currently empty (sprint specs have been cleaned up after merge); add new ones here when a feature needs a stable, code-referenced source of truth.

{tokenjam-0.3.4 → tokenjam-0.4.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: tokenjam
-Version: 0.3.4
+Version: 0.4.0
 Summary: TokenJam — local-first OTel-native observability for Autonomous AI agents
 Project-URL: Homepage, https://opencla.watch
 Project-URL: Repository, https://github.com/Metabuilder-Labs/openclawwatch
@@ -23,6 +23,7 @@ Requires-Dist: apscheduler>=3.10
 Requires-Dist: click>=8.1
 Requires-Dist: duckdb>=0.10
 Requires-Dist: fastapi>=0.110
+Requires-Dist: fastmcp>=0.2
 Requires-Dist: genson>=1.2
 Requires-Dist: httpx>=0.27
 Requires-Dist: jsonschema>=4.0
@@ -53,7 +54,6 @@ Requires-Dist: langchain>=0.2; extra == 'langchain'
 Provides-Extra: litellm
 Requires-Dist: litellm>=1.40; extra == 'litellm'
 Provides-Extra: mcp
-Requires-Dist: fastmcp; extra == 'mcp'
 Description-Content-Type: text/markdown
 <div align="center">
@@ -154,6 +154,8 @@ tj onboard --claude-code
 tj optimize          # cost-saving candidates from your actual usage
 ```
+To upgrade later: `pipx upgrade tokenjam` (then `tj stop && tj serve &` to reload the daemon, and `tj --version` to verify). See [docs/installation.md](docs/installation.md#upgrading).
 For any Python agent:
 ```python
@@ -259,6 +261,8 @@ tj serve               # start the web UI + REST API
 **Shipped in 0.3.x:** Downsize · Cache · Script · Trim · Claude Code + Codex onboarding · MCP server · Web UI · Backfill adapters (Langfuse, Helicone, OTLP) · Period comparison · Routing-config export · Read-only policy preview
 **Up next:**
+- [ ] **[TokenJam Lens](https://github.com/Metabuilder-Labs/tokenjam/milestone/1)** — local dashboard rebrand: new Overview triage front-door, Optimize detail tab, real spend-over-time charts, cross-screen drill-through
+- [ ] **[Reuse analyzer](https://github.com/Metabuilder-Labs/tokenjam/milestone/2)** — fifth analyzer: detects clusters of sessions with repeated planning, exports reviewable skeleton templates you can convert into slash commands or scripts
 - [ ] `tj policy add | edit | apply` — unified rule surface
 - [ ] `tj replay` — replay captured sessions against new model versions
 - [ ] TypeScript framework patches (LangChain JS, OpenAI Agents SDK)

{tokenjam-0.3.4 → tokenjam-0.4.0}/README.md RENAMED Viewed

@@ -96,6 +96,8 @@ tj onboard --claude-code
 tj optimize          # cost-saving candidates from your actual usage
 ```
+To upgrade later: `pipx upgrade tokenjam` (then `tj stop && tj serve &` to reload the daemon, and `tj --version` to verify). See [docs/installation.md](docs/installation.md#upgrading).
 For any Python agent:
 ```python
@@ -201,6 +203,8 @@ tj serve               # start the web UI + REST API
 **Shipped in 0.3.x:** Downsize · Cache · Script · Trim · Claude Code + Codex onboarding · MCP server · Web UI · Backfill adapters (Langfuse, Helicone, OTLP) · Period comparison · Routing-config export · Read-only policy preview
 **Up next:**
+- [ ] **[TokenJam Lens](https://github.com/Metabuilder-Labs/tokenjam/milestone/1)** — local dashboard rebrand: new Overview triage front-door, Optimize detail tab, real spend-over-time charts, cross-screen drill-through
+- [ ] **[Reuse analyzer](https://github.com/Metabuilder-Labs/tokenjam/milestone/2)** — fifth analyzer: detects clusters of sessions with repeated planning, exports reviewable skeleton templates you can convert into slash commands or scripts
 - [ ] `tj policy add | edit | apply` — unified rule surface
 - [ ] `tj replay` — replay captured sessions against new model versions
 - [ ] TypeScript framework patches (LangChain JS, OpenAI Agents SDK)

{tokenjam-0.3.4 → tokenjam-0.4.0}/docs/installation.md RENAMED Viewed

@@ -74,6 +74,21 @@ If you run `tj optimize trim` without the extra installed, the analyzer self-reg
 See [`docs/optimize/trim.md`](optimize/trim.md) for performance numbers, capture requirements, and what the analyzer actually reports.
+## Upgrading
+```bash
+pipx upgrade tokenjam          # if you installed via pipx (recommended)
+pip install --upgrade tokenjam # if you're in a pip + venv setup
+```
+After upgrading:
+1. Restart the daemon to pick up the new code: `tj stop && tj serve &`
+2. DB migrations apply automatically on the next `tj` invocation — no manual step required
+3. Verify with `tj --version`
+PyPI's CDN occasionally lags ~1–2 min after a release. If `pipx upgrade` reports "already at the latest version" but the reported `tj --version` is older than what's on the [releases page](https://github.com/Metabuilder-Labs/tokenjam/releases), wait a minute and retry.
 ## TypeScript SDK
 ```bash

tokenjam-0.4.0/docs/internal/lens-vendor-versions.md ADDED Viewed

@@ -0,0 +1,22 @@
+# Vendored front-end libraries
+The local web UI (`tokenjam/ui/index.html`) is offline-first (CLAUDE.md Critical
+Rule 18): every JS/CSS dependency is vendored under `tokenjam/ui/vendor/` and
+served by the FastAPI `/ui/vendor` StaticFiles mount. No CDN loads at render
+time. One line per vendored library below.
+| Library | Version | Files | License | Notes |
+|---|---|---|---|---|
+| Preact | (as vendored) | `preact.js`, `preact-hooks.js` | MIT | ESM, via importmap |
+| htm | (as vendored) | `htm.js` | Apache-2.0 | ESM, via importmap |
+| uPlot | 1.6.32 | `uplot.js`, `uplot.css` | MIT | IIFE global `uPlot`, loaded via plain `<script>` (issue #112) |
+## Bump procedure
+1. Download the new release's `dist/` files from the upstream repo
+   (uPlot: `dist/uPlot.iife.min.js` + `dist/uPlot.min.css`).
+2. Replace the file under `tokenjam/ui/vendor/`, keeping the version-pin header
+   comment at the top.
+3. Update the version in the table above.
+4. Run `pytest tests/unit/test_ui_offline.py` — it asserts no render-time
+   external URLs and that the vendored files exist and ship in the wheel.

{tokenjam-0.3.4 → tokenjam-0.4.0}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
 [project]
 name = "tokenjam"
-version = "0.3.4"
+version = "0.4.0"
 description = "TokenJam — local-first OTel-native observability for Autonomous AI agents"
 readme = "README.md"
 requires-python = ">=3.10"
@@ -41,6 +41,11 @@ dependencies = [
     "httpx>=0.27",
     "apscheduler>=3.10",
     "websockets>=12.0",
+    # fastmcp ships in the base install (was in the [mcp] extra) so `tj mcp`
+    # works on a fresh `pipx install tokenjam` without requiring users to
+    # remember the extra. Claude Code's MCP integration is now a primary
+    # use case rather than an opt-in. Issue #101.
+    "fastmcp>=0.2",
 ]
 [project.urls]
@@ -54,7 +59,10 @@ crewai    = ["crewai>=0.28"]
 autogen   = ["pyautogen>=0.2"]
 litellm   = ["litellm>=1.40"]
 dev       = ["pytest", "pytest-asyncio", "httpx", "ruff", "mypy"]
-mcp       = ["fastmcp"]
+# Kept as a no-op extra for back-compat — `pipx install 'tokenjam[mcp]'` still
+# works, just installs the same fastmcp that's now in the base dependencies.
+# Documented in `docs/installation.md` so users know they no longer need it.
+mcp       = []
 # Trim analyzer (`tj optimize --finding prompt-bloat`). LLMLingua-2 pulls in
 # PyTorch and transformers, ~2GB total. Kept optional so the base install
 # stays small — most users don't run the bloat analyzer.

{tokenjam-0.3.4 → tokenjam-0.4.0}/sdk-ts/package.json RENAMED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@tokenjam/sdk",
-  "version": "0.3.4",
+  "version": "0.4.0",
   "description": "TypeScript SDK for TokenJam — local-first observability for AI agents",
   "main": "dist/index.js",
   "types": "dist/index.d.ts",

{tokenjam-0.3.4 → tokenjam-0.4.0}/tests/integration/test_api.py RENAMED Viewed

@@ -238,6 +238,154 @@ async def test_get_cost_returns_aggregated_rows(client):
     assert "total_cost_usd" in data
+async def test_trace_detail_includes_cache_write_tokens(db, client):
+    """The traces API exposes cache_write_tokens so per-span cost reconciles
+    from the displayed columns (#17 — it was the ~91% cost driver, hidden)."""
+    sp = make_llm_span(agent_id="a", model="claude-opus-4-8", provider="anthropic",
+                       input_tokens=2, output_tokens=465, cache_tokens=243597,
+                       cache_write_tokens=209000, cost_usd=1.4423)
+    db.insert_span(sp)
+    resp = await client.get(f"/api/v1/traces/{sp.trace_id}")
+    assert resp.status_code == 200
+    span = resp.json()["spans"][0]
+    assert span["cache_tokens"] == 243597
+    assert span["cache_write_tokens"] == 209000
+async def test_cost_rows_carry_cache_tokens(db, client):
+    """`/api/v1/cost` rows + totals include cache-read and cache-write (#17)."""
+    sp = make_llm_span(agent_id="a", model="claude-opus-4-8", provider="anthropic",
+                       input_tokens=2, output_tokens=465, cache_tokens=243597,
+                       cache_write_tokens=209000, cost_usd=1.4423)
+    db.insert_span(sp)
+    data = (await client.get("/api/v1/cost?group_by=model")).json()
+    assert data["rows"][0]["cache_tokens"] == 243597
+    assert data["rows"][0]["cache_write_tokens"] == 209000
+    assert data["total_cache_write_tokens"] == 209000
+async def test_cost_includes_window_series_for_chart(client):
+    """/api/v1/cost carries a window-bucketed series (per bucket+agent+model)
+    plus the bucket size and window bounds so the chart can span the full
+    selected window with zero-fill (#113/#133)."""
+    await _ingest_sample_span(client)
+    resp = await client.get("/api/v1/cost", params={"since": "7d", "group_by": "day"})
+    assert resp.status_code == 200
+    data = resp.json()
+    assert "series" in data and isinstance(data["series"], list)
+    assert data["series_bucket"] in ("hour", "day")
+    assert isinstance(data["window_start"], int) and isinstance(data["window_end"], int)
+    assert data["window_end"] >= data["window_start"]
+    if data["series"]:
+        item = data["series"][0]
+        assert {"bucket", "agent_id", "model", "cost_usd",
+                "input_tokens", "output_tokens"} <= set(item)
+        assert isinstance(item["bucket"], int)  # epoch seconds
+async def test_cost_series_buckets_hourly_for_short_window(client):
+    """A ≤2-day window buckets hourly so 24h renders with hourly ticks (#133)."""
+    await _ingest_sample_span(client)
+    resp = await client.get("/api/v1/cost", params={"since": "24h"})
+    assert resp.json()["series_bucket"] == "hour"
+_FRAMING_KEYS = {
+    "pricing_mode", "plan_tier", "plan_label", "plan_monthly_usd",
+    "subscription_share_pct", "api_share_pct", "display_rule", "qualifier_text",
+}
+async def test_cost_response_includes_framing_block(client):
+    await _ingest_sample_span(client)
+    resp = await client.get("/api/v1/cost")
+    assert resp.status_code == 200
+    framing = resp.json()["framing"]
+    assert _FRAMING_KEYS <= set(framing)
+async def test_optimize_response_includes_framing_block(client):
+    await _ingest_sample_span(client)
+    resp = await client.get("/api/v1/optimize?since=30d")
+    assert resp.status_code == 200
+    data = resp.json()
+    if data.get("error") == "no_data":
+        pytest.skip("no spans landed for optimize in this fixture")
+    assert _FRAMING_KEYS <= set(data["framing"])
+async def test_optimize_response_always_carries_downgrade_key(client):
+    """The downsize typed slot must always be present (null when no candidates)
+    so the UI can always render a Downsize section (#126)."""
+    await _ingest_sample_span(client)
+    resp = await client.get("/api/v1/optimize?since=30d")
+    data = resp.json()
+    if data.get("error") == "no_data":
+        pytest.skip("no spans landed for optimize in this fixture")
+    assert "downgrade" in data  # present even when null
+async def test_root_serves_lens_title(client):
+    """Brand pass (#114): the served dashboard <title> is 'TokenJam Lens'."""
+    resp = await client.get("/")
+    assert resp.status_code == 200
+    assert "<title>TokenJam Lens</title>" in resp.text
+async def test_optimize_chain_framing_and_recoverable_fields(client):
+    """The framing block (#110) + per-finding recoverable fields (#111) are
+    both present on /api/v1/optimize — validates the chain Overview relies on."""
+    await _ingest_sample_span(client)
+    resp = await client.get("/api/v1/optimize?since=30d")
+    data = resp.json()
+    if data.get("error") == "no_data":
+        pytest.skip("no spans landed for optimize in this fixture")
+    assert _FRAMING_KEYS <= set(data["framing"])
+    # The savings analyzers carry the recoverable contract fields (#111).
+    # cache-recommend / budget-projection intentionally do not.
+    findings = data.get("findings") or {}
+    for name in ("cache", "script", "trim"):
+        if name in findings:
+            assert "estimated_recoverable_usd" in findings[name]
+            assert "estimate_basis" in findings[name]
+async def test_optimize_fast_skips_trim(client):
+    """fast=true skips the expensive Trim analyzer and reports it (#114)."""
+    await _ingest_sample_span(client)
+    resp = await client.get("/api/v1/optimize?since=30d&fast=true")
+    assert resp.status_code == 200
+    data = resp.json()
+    if data.get("error") == "no_data":
+        pytest.skip("no spans landed for optimize in this fixture")
+    assert "trim" in data.get("skipped_analyzers", [])
+    assert "trim" not in (data.get("findings") or {})
+async def test_budget_framing_reflects_configured_subscription_plan(db):
+    """The budget surface has no window, so framing falls back to the
+    declared plan in config (#110)."""
+    from tokenjam.core.config import ProviderBudget
+    cfg = TjConfig(
+        version="1",
+        security=SecurityConfig(ingest_secret=INGEST_SECRET),
+        api=ApiConfig(auth=ApiAuthConfig(enabled=False)),
+    )
+    cfg.budgets["anthropic"] = ProviderBudget(plan="max_5x")
+    pipeline = IngestPipeline(db=db, config=cfg)
+    app = create_app(config=cfg, db=db, ingest_pipeline=pipeline)
+    transport = httpx.ASGITransport(app=app)
+    async with httpx.AsyncClient(transport=transport, base_url="http://test") as c:
+        resp = await c.get("/api/v1/budget")
+    assert resp.status_code == 200
+    framing = resp.json()["framing"]
+    assert framing["pricing_mode"] == "subscription"
+    assert framing["plan_tier"] == "max_5x"
+    assert framing["plan_label"] == "Max 5x plan"
+    assert framing["display_rule"] == "suppress_dollars_for_subscription_share"
 async def test_get_alerts_returns_list(client):
     resp = await client.get("/api/v1/alerts")
     assert resp.status_code == 200

tokenjam 0.3.4__tar.gz → 0.4.0__tar.gz

tokenjam 0.3.4tar.gz → 0.4.0tar.gz