PyPI - tokenjam - Versions diffs - 0.3.4__tar.gz → 0.3.5__tar.gz - Mend

tokenjam 0.3.4tar.gz → 0.3.5tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (273) hide show

{tokenjam-0.3.4 → tokenjam-0.3.5}/CLAUDE.md RENAMED Viewed

@@ -4,7 +4,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
 ## Project Overview
-`tj` (TokenJam) is a local-first, OTel-native observability CLI for AI agents. No cloud backend, no signup. It captures telemetry from agent runtimes, stores it in a local DuckDB database, and exposes a CLI + local REST API for querying. Install via `pip install tokenjam`, run via `tj <subcommand>`. Requires Python >=3.10.
+`tj` (TokenJam) is a local-first, OTel-native **cost-optimization layer** for AI agents (with a full observability stack underneath). No cloud backend, no signup. It captures telemetry from agent runtimes, stores it in a local DuckDB database, and runs four named analyzers (`downsize` / `cache` / `script` / `trim`) that surface cost-saving candidates from real usage — plus a CLI, local REST API, web UI, and MCP server for querying. Install via `pipx install tokenjam` (recommended — sidesteps PEP 668 on Homebrew Python and Debian 12+/Ubuntu 24+) or `pip install tokenjam` in a venv. Run via `tj <subcommand>`. Requires Python >=3.10.
 ## Build & Development
@@ -61,10 +61,17 @@ Post-ingest hooks run synchronously after each span is written to DB:
 - **`tokenjam/core/db.py`**: `StorageBackend` protocol + `DuckDBBackend` + `InMemoryBackend` (for tests) + migration runner. Migrations are `(version, sql)` tuples in a `MIGRATIONS` list — never modify existing ones, only append. **Note:** `StorageBackend` doesn't cover every query. Some callers (e.g. `CostEngine`, `cmd_status`) access `db.conn` directly for queries not in the protocol (cost updates, active session lookups). Helper `_row_to_session()` is used to convert raw DuckDB rows.
 - **`tokenjam/core/ingest.py`**: `IngestPipeline` (central hub), `SpanSanitizer` (rejects oversized/malformed spans), `strip_captured_content()`. Post-ingest hooks (cost, alerts, schema) are optional and error-tolerant — hook failures are logged, never propagated.
 - **`tokenjam/core/pricing.py`**: `ModelRates` (frozen dataclass), `load_pricing_table()` (LRU-cached), `get_rates(provider, model)`. Falls back to default rates for unknown models.
-- **`tokenjam/core/cost.py`**: `calculate_cost()` (pure function, rounds to 8dp) + `CostEngine` (post-ingest hook that updates `spans.cost_usd` and `sessions.total_cost_usd` via `db.conn` — see db.py note). Pricing loaded from `pricing/models.toml`.
+- **`tokenjam/core/cost.py`**: `calculate_cost()` (pure function, rounds to 8dp) + `CostEngine` (post-ingest hook that updates `spans.cost_usd` and `sessions.total_cost_usd` via `db.conn` — see db.py note). Pricing loaded from `tokenjam/pricing/models.toml`. **Cache-read vs cache-write are separate fields** on `NormalizedSpan` (`cache_tokens` = read, `cache_write_tokens` = create); they bill at different rates and `calculate_cost` charges each at its own rate. The early-return no-op guard checks all four token counts (input/output/cache_read/cache_write) — see PR #90 and PR #92 for the cache-only-span and cache-write-on-live-path fixes.
 - **`tokenjam/core/alerts.py`**: `AlertEngine` with 13 alert types, `CooldownTracker` (in-memory, per agent+type, resets on restart), `AlertDispatcher` routing to 6 channel types (stdout, file, ntfy, webhook, Discord, Telegram). `AlertEngine.fire()` is the external entry point for other modules (SchemaValidator, DriftDetector) to fire alerts. Suppressed alerts are still persisted to DB but not dispatched to channels. Hardcoded thresholds: retry loop fires at 4+ identical tool calls in last 6 spans; failure rate fires at >20% errors in last 20 spans (checked every 5th error); session duration default 3600s. Stdout and file channels always include full detail regardless of `include_captured_content` config.
 - **`tokenjam/core/drift.py`**: `DriftDetector` — Z-score based behavioral drift detection, fires at session end.
-- **`tokenjam/core/optimize/`**: Package powering `tj optimize` and the `get_optimize_report` MCP tool. Public API re-exported from `__init__.py`: `build_report()` (orchestrator), `report_to_dict()`, `ANALYZER_REGISTRY`, `ANALYZER_ORDER`, plus result dataclasses. Architecture: `registry.py` holds the `@register("name")` decorator and `ANALYZER_REGISTRY` dict; `runner.py` defines `ANALYZER_ORDER` and orchestrates execution; `types.py` holds `AnalyzerContext` + result dataclasses + `MODEL_DOWNGRADE_CAVEAT`. Individual analyzers live in `analyzers/`, each as a single file registering via `@register`: `model_downgrade.py` (structural candidates — input < 5K tokens AND output < 500 tokens AND tool_calls ≤ 5; never claims quality equivalence, caveat baked into dataclass default), `budget_projection.py` (per-provider cycle spend vs `[budget.<provider>]` ceiling; only fires when budget > 0), `cache_efficacy.py`, `cache_recommend.py`, `prompt_bloat.py`, `workflow_restructure.py`. Analyzers receive an `AnalyzerContext` and operate on `db.conn` directly. To add a new analyzer: drop a file under `analyzers/`, decorate with `@register("name")`, append to `ANALYZER_ORDER` if ordering matters — `cmd_optimize --finding` choices auto-derive from the registry.
+- **`tokenjam/core/optimize/`**: Package powering `tj optimize` and the `get_optimize_report` MCP tool. Public API re-exported from `__init__.py`: `build_report()` (orchestrator), `report_to_dict()`, `ANALYZER_REGISTRY`, `ANALYZER_ORDER`, plus result dataclasses. Architecture: `registry.py` holds the `@register("name")` decorator and `ANALYZER_REGISTRY` dict; `runner.py` defines `ANALYZER_ORDER` and orchestrates execution; `types.py` holds `AnalyzerContext` + result dataclasses + `MODEL_DOWNGRADE_CAVEAT`. Individual analyzers live in `analyzers/`, each as a single file registering via `@register`. **Registry strings (the user-facing names) and file names are decoupled**:
+  - `model_downgrade.py` → `@register("downsize")` — structural candidates (input < 5K tokens AND output < 500 tokens AND tool_calls ≤ 5; never claims quality equivalence, caveat baked into dataclass default)
+  - `budget_projection.py` → `@register("budget-projection")` — per-provider cycle spend vs `[budget.<provider>]` ceiling; only fires when budget > 0
+  - `cache_efficacy.py` → `@register("cache")` — current cache-read efficacy per (provider, model)
+  - `cache_recommend.py` → `@register("cache-recommend")` — Anthropic-only structural prefix detection for `cache_control` placement
+  - `workflow_restructure.py` → `@register("script")` — `(tool_name, arg_shape)` cluster detection for deterministic-script candidates
+  - `prompt_bloat.py` → `@register("trim")` — LLMLingua-2 token-significance classification (requires `tokenjam[bloat]` extra)
+  Analyzers receive an `AnalyzerContext` and operate on `db.conn` directly. To add a new analyzer: drop a file under `analyzers/`, decorate with `@register("name")`, append to `ANALYZER_ORDER` if ordering matters — `cmd_optimize`'s positional `findings` Click choices auto-derive from the registry.
 - **`tokenjam/core/ingest_adapters/`**: Third-party trace-export adapters that normalize external payloads (`langfuse.py`, `helicone.py`, `otlp.py`) into `NormalizedSpan` for ingest. Each is reachable as a `tj backfill <name>` subcommand and accepts `--source-url` (live API) or `--source-file` (offline JSON dump). Adapters write deterministic span IDs derived from the source's identifiers so re-runs are idempotent. `otlp.py` shares span-mapping logic with the live `POST /api/v1/spans` route via `tokenjam/otel/otlp_parsing.py`.
 - **`tokenjam/core/export/`**: Routing-config snippet generators for `tj optimize --export-config`. Currently `claude_code.py` emits a JSONC fragment under a `tokenjam.routing_recommendations` namespace with honest-framing caveat comments baked in. Writes to `~/.config/tokenjam/exports/`; never touches `~/.claude/settings.json` or other external configs (no `--apply` flag — Claude Code doesn't currently honor TokenJam routing keys, so auto-writing would change nothing and erode trust).
 - **`tokenjam/core/backfill.py`**: Parses Claude Code on-disk session JSONL files into `NormalizedSpan`s. Cost is recomputed from `pricing/models.toml` because the on-disk format has no `cost_usd`. The parser tolerates the dated `claude-<family>-<ver>-YYYYMMDD` model-name suffixes Anthropic ships (handled by `core/pricing.py.get_rates()`, which strips the trailing 8-digit date suffix when no exact pricing match exists). Idempotency relies on deterministic span IDs derived from `(session_id, message uuid)` / `(session_id, tool_use id)`.
@@ -92,11 +99,12 @@ Post-ingest hooks run synchronously after each span is written to DB:
 - **`tj demo [scenario]`** (`cmd_demo.py`) — runs Agent Incident Library scenarios (zero-config, no API keys). `tj demo` lists all; `tj demo retry-loop` runs one.
 - **`tj doctor`** (`cmd_doctor.py`) — health checks (config, DB, secrets, webhooks, drift readiness, schema-vs-capture consistency). Exit 0 = ok, 1 = warnings, 2 = errors.
-- **`tj optimize`** (`cmd_optimize.py`) — six analyzers, registry-driven: `model-downgrade`, `budget-projection`, `cache-efficacy`, `cache-recommend`, `workflow-restructure`, `prompt-bloat`. Flags: `--since 30d`, `--finding <name>` (repeatable; choices auto-derive from `ANALYZER_REGISTRY` at click decoration time), `--budget <provider>`, `--budget-usd <amount>`, `--compare <period>` (window-cost diff vs prior period; accepts `previous` / `last-week` / `last-month` / `last-7d` / `last-30d` / `YYYY-MM-DD:YYYY-MM-DD`), `--export-config <target>` (writes a routing snippet — currently `claude-code` — under `~/.config/tokenjam/exports/`; no `--apply` flag by design). Plan-tier-aware rendering: subscription users see "implied API value" framing and token-share savings (never dollar "spend"); local users see token-only framing; unknown-plan users see dollar figures suppressed with a `tj onboard --reconfigure` hint. Opens the live DB read-only so it works alongside a running `tj serve`.
+- **`tj optimize`** (`cmd_optimize.py`) — six analyzers, registry-driven. **Analyzers are positional args** (not `--finding <name>`): `tj optimize downsize cache trim` runs three; bare `tj optimize` runs all. Registered names: `downsize`, `cache`, `cache-recommend`, `script`, `trim`, `budget-projection`. Flags: `--since 30d`, `--budget <provider>`, `--budget-usd <amount>`, `--compare <period>` (window-cost diff vs prior period; accepts `previous` / `last-week` / `last-month` / `last-7d` / `last-30d` / `YYYY-MM-DD:YYYY-MM-DD`), `--export-config <target>` (writes a routing snippet — currently `claude-code` — under `~/.config/tokenjam/exports/`; no `--apply` flag by design). Plan-tier-aware rendering: subscription users see "implied API value" framing and token-share savings (never dollar "spend"); local users see token-only framing; unknown-plan users see dollar figures suppressed with a `tj onboard --reconfigure` hint. Works alongside a running `tj serve` via the `/api/v1/optimize` HTTP fallback when the DuckDB write lock is held by the daemon.
+- **`tj tokenmaxx`** (`cmd_tokenmaxx.py`) — shareable spend-tier command. Reads last 30 days of usage, classifies into a 6-tier ladder (Sipper / Moderator / Maxxer / SuperMaxxer / MegaMaxxer / GigaMaxxer) using the multiplier vs the user's declared subscription plan as the primary classifier, with absolute USD/mo thresholds as the API-user fallback. Output is a bordered Panel designed for screenshotting. Plan-aware: shows the multiplier line only when the user has `[budget.<provider>] plan = "max_5x"` (or pro / max_20x / plus) configured. The companion landing page is `tokenjam.dev/tokenmaxxing`. Designed to never exit without an actionable next step — pairs the tier callout with the downsize savings figure inline.
 - **`tj cost`** (`cmd_cost.py`) — cost breakdown by `--group-by agent|model|day|tool`. Same `--compare <period>` flag as `tj optimize` for window-over-window diffs (▲/▼ indicators, per-agent and per-model top-shifts, dollar + token deltas).
 - **`tj backfill <source>`** (`cmd_backfill.py`) — ingest historical telemetry from external sources. Subcommands: `claude-code` (parses `~/.claude/projects/*.jsonl`, auto-invoked at the end of `tj onboard --claude-code`), `langfuse` (live API or JSON dump), `helicone` (live API or JSON dump), `otlp` (raw OTLP JSON via URL or file — reuses the same parser as the live `POST /api/v1/spans` route). All idempotent via deterministic span IDs.
 - **`tj onboard`** (`cmd_onboard.py`) — `--claude-code` and `--codex` flags trigger integration-specific flows. Prompts for plan tier (api / pro / max_5x / max_20x for Anthropic; api / plus / team / enterprise for OpenAI) and writes it to `[budget.<provider>] plan = "..."`. Supports `--reconfigure` to re-prompt against an existing config, and `--plan <tier>` for non-interactive use. Does NOT auto-write a default `usd = 200` cycle ceiling — subscription users get only the `plan` field; API users are explicitly asked whether they want a self-imposed ceiling.
-- **`tj report`** (`cmd_report.py`) — generates standalone HTML visualizations of analyzer findings (e.g. `tj report --bloat [<agent_id>]` renders the prompt-bloat analyzer's per-token significance). Writes to `~/.cache/tokenjam/reports/` (override via `TOKENJAM_REPORT_DIR`) and opens in the default browser.
+- **`tj report`** (`cmd_report.py`) — generates standalone HTML visualizations of analyzer findings. Currently `tj report --trim [<agent_id>]` renders the Trim analyzer's per-token significance (was `--bloat` pre-0.3.1, renamed alongside the analyzer's registry string). Writes to `~/.cache/tokenjam/reports/` (override via `TOKENJAM_REPORT_DIR`) and opens in the default browser.
 - **`tj policy list`** (`cmd_policy.py`) — read-only preview of the unified policy surface. Consolidates existing `[alerts]`, `[alerts.channels]`, `[defaults.budget]`, `[budget.<provider>]`, per-agent `budget`/`drift`/`sensitive_actions`/`output_schema`, and `[capture]` config into one table; each row carries its source TOML section. Supports `--json`. `tj policy add | edit | apply | remove | test` are intentionally absent this sprint — the unified config migration is next sprint's work. `policy` is in `no_db_commands` in `cli/main.py` so it doesn't open the DB. Rich source-section strings (`[budget.anthropic]`, `[[alerts.channels]]`) must be passed through `rich.markup.escape()` before rendering — otherwise Rich consumes them as style tags.
 All commands support `--json` for machine-readable output. Commands that query alerts use exit code 1 if active (unacknowledged, unsuppressed) alerts exist.
@@ -139,11 +147,13 @@ When a span has a `conversation_id` matching an existing session, it's attribute
 10. **Use semconv constants** — reference `GenAIAttributes` and `TjAttributes` from `tokenjam/otel/semconv.py` instead of hardcoding OTel attribute name strings.
 11. **OTel TracerProvider is global and set-once** — `trace.set_tracer_provider()` only works once per process. In tests, set the provider once at module level (not per-test in a fixture) and clear spans between tests. Use a custom `_CollectingExporter(SpanExporter)` since `InMemorySpanExporter` is not available in the installed OTel version. See `tests/agents/test_mock_scenarios.py` for the SDK test pattern and `tests/integration/test_full_pipeline.py` for the pipeline pattern.
 12. **New SDK integrations must call `ensure_initialised()`** — every `patch_*()` convenience function must call `from tokenjam.sdk.bootstrap import ensure_initialised; ensure_initialised()` before installing hooks. This lazily bootstraps the TracerProvider + IngestPipeline on first use.
-13. **PyPI package name is `tokenjam`, not `ocw`** — `pip install tokenjam` is the correct install command. The CLI command is `tj` and the Python package directory is `tokenjam/`. The published package name on PyPI is `tokenjam`. Never write `pip install ocw` in docs, examples, or comments.
-14. **`tj optimize` output must never claim quality equivalence** — the model-downgrade finding flags structural candidates only. Every user-visible string says "looks like" / "candidate" / "review before switching" — never "safe to downgrade" or "would have worked." The `MODEL_DOWNGRADE_CAVEAT` constant lives on `DowngradeFinding` as a dataclass default so it can't be removed by accident; it must also appear in human-readable CLI output. The same honesty discipline applies to all other analyzers — `cache-efficacy` ("you're getting X% of available caching"), `cache-recommend` (Anthropic-only, structural prefix detection), `workflow-restructure` ("structural shape matches", "review before replacing with a script"), `prompt-bloat` ("predicted low-significance regions; review before editing"). `tj optimize --export-config` snippets bake the caveat block into the JSONC output as comments.
+13. **PyPI package name is `tokenjam`, not `ocw`** — the package on PyPI is `tokenjam`. The CLI command is `tj`. The Python package directory is `tokenjam/`. **Recommended install: `pipx install tokenjam`** (sidesteps PEP 668 on Homebrew Python and Debian 12+/Ubuntu 24+). `pip install tokenjam` works inside a clean venv but fails on system Python with a misleading externally-managed-environment error. Never write `pip install ocw` in docs, examples, or comments.
+14. **`tj optimize` output must never claim quality equivalence** — the `downsize` finding flags structural candidates only. Every user-visible string says "looks like" / "candidate" / "review before switching" — never "safe to downgrade" or "would have worked." The `MODEL_DOWNGRADE_CAVEAT` constant lives on `DowngradeFinding` as a dataclass default so it can't be removed by accident; it must also appear in human-readable CLI output. The same honesty discipline applies to all other analyzers — `cache` ("you're getting X% of available caching"), `cache-recommend` (Anthropic-only, structural prefix detection), `script` ("structural shape matches", "review before replacing with a script"), `trim` ("predicted low-significance regions; review before editing"). `tj optimize --export-config` snippets bake the caveat block into the JSONC output as comments.
 15. **Version bump on release** — both `pyproject.toml` (`version = "X.Y.Z"`) and `sdk-ts/package.json` (`"version": "X.Y.Z"`) must be bumped to the new version before creating a GitHub release. The publish workflows (`publish-pypi.yml`, `publish-npm.yml`) trigger on `release published` events and will fail with 403 if the version already exists on PyPI/npm.
-16. **New optimize analyzers self-register** — drop a `.py` file under `tokenjam/core/optimize/analyzers/` with a function decorated `@register("name")` taking `AnalyzerContext`. Auto-discovery in `analyzers/__init__.py` walks the directory at import time. `cmd_optimize.py`'s `--finding` choices read from `ANALYZER_REGISTRY.keys()` at click decoration — no edits needed there. If your analyzer depends on (or is depended on by) another, append it to `ANALYZER_ORDER` in `runner.py` at the right position. Wave-2 analyzers attach their findings to `OptimizeReport.findings[name]` (generic dict); the older `model-downgrade` / `budget-projection` analyzers retain typed slots on `OptimizeReport` for backwards compat with `cmd_optimize` and the MCP server.
+16. **New optimize analyzers self-register** — drop a `.py` file under `tokenjam/core/optimize/analyzers/` with a function decorated `@register("name")` taking `AnalyzerContext`. Auto-discovery in `analyzers/__init__.py` walks the directory at import time. `cmd_optimize.py`'s positional `findings` Click choices read from `ANALYZER_REGISTRY.keys()` at decoration — no edits needed there. If your analyzer depends on (or is depended on by) another, append it to `ANALYZER_ORDER` in `runner.py` at the right position. Wave-2 analyzers attach their findings to `OptimizeReport.findings[name]` (generic dict); the older `downsize` (registered name; file is `model_downgrade.py`) and `budget-projection` analyzers retain typed slots on `OptimizeReport` for backwards compat with `cmd_optimize` and the MCP server.
 17. **OTLP parsing has one home** — `tokenjam/otel/otlp_parsing.py`. Both the live `POST /api/v1/spans` route and the `tj backfill otlp` adapter import `parse_otlp_span` and `extract_resource_attrs` from there. If you need to extend OTLP attribute extraction, do it once in that module; do not copy-paste into either caller.
+18. **Web UI must work fully offline** — `tokenjam/ui/index.html` is the served dashboard. It is intentionally a single-file SPA with **zero external HTTP loads at render time**. Preact + hooks + htm are vendored under `tokenjam/ui/vendor/` and wired via an `<script type="importmap">`; fonts use system-font fallbacks (no Google Fonts); the favicon is inlined as a `data:` URL. The FastAPI app mounts `/ui/vendor` as `StaticFiles`. The `tests/unit/test_ui_offline.py` regression test asserts no render-time external URLs exist anywhere outside `<a href>` (clickable links to github.com are fine — they only fetch on click). If you add a CDN font, script, or stylesheet, that test will fail. Vendor the asset locally instead. See issue #87 + PR #88.
+19. **Analyzer registry names ≠ file names** — registry strings (`downsize`, `cache`, `script`, `trim`) are decoupled from Python module filenames (`model_downgrade.py`, `cache_efficacy.py`, `workflow_restructure.py`, `prompt_bloat.py`). The 0.3.1 rename only changed `@register("...")` strings; file names stayed for git-blame continuity. When grepping for an analyzer, search both the registry string AND the older file-name keyword.
 ## Config
@@ -233,10 +243,11 @@ Key runtime dependency: `pytz` is required by DuckDB for `TIMESTAMPTZ` column ha
 - **[docs/installation.md](docs/installation.md)** — base install vs optional extras matrix. Documents `tokenjam[bloat]` (the ~2GB torch + transformers extra used by the Trim analyzer), framework adapter extras (`[langchain]` / `[crewai]` / `[autogen]` / `[litellm]`), and the MCP / dev extras.
 - **[docs/configuration.md](docs/configuration.md)** — full TOML config surface plus the "Content capture and privacy" section explaining the four `[capture]` toggles and how they interact with `alerts.include_captured_content`.
 - **Optimize product pages** — one per user-facing product, all under `docs/optimize/`:
-  - [`downsize.md`](docs/optimize/downsize.md) — model-downgrade candidate flagging (internal: `model-downgrade`)
-  - [`cache.md`](docs/optimize/cache.md) — `cache-efficacy` (current caching ratio) + `cache-recommend` (Anthropic-only breakpoint suggestions)
-  - [`script.md`](docs/optimize/script.md) — `workflow-restructure` clustering by `(tool_name, arg_shape)` signature
-  - [`trim.md`](docs/optimize/trim.md) — LLMLingua-2 token-significance classifier (`prompt-bloat`), install + capture requirements, performance numbers
+  - [`downsize.md`](docs/optimize/downsize.md) — cheaper-model candidate flagging (registry: `downsize`, file: `model_downgrade.py`)
+  - [`cache.md`](docs/optimize/cache.md) — `cache` (current caching ratio) + `cache-recommend` (Anthropic-only breakpoint suggestions)
+  - [`script.md`](docs/optimize/script.md) — `script` clustering by `(tool_name, arg_shape)` signature (file: `workflow_restructure.py`)
+  - [`trim.md`](docs/optimize/trim.md) — LLMLingua-2 token-significance classifier (`trim`, file: `prompt_bloat.py`), install + capture requirements, performance numbers
+- **[AGENTS.md](AGENTS.md)** — codebase conventions for contributors (referenced from the top-level README).
 - **Backfill adapters** — `docs/backfill/overview.md` lists the four sources (`claude-code` / `langfuse` / `helicone` / `otlp`) with the partnership-posture framing; per-adapter pages document modes (URL / file), field mapping, idempotency, and v1 limitations.
 - **[docs/policy/overview.md](docs/policy/overview.md)** — read-only preview of the unified policy surface (`tj policy list`). Notes that the `add` / `edit` / `apply` subcommands and the underlying `[policy]` config migration land next sprint.
 - **Internal specs** — `docs/internal/specs/` is reserved for canonical specs that production code references at long-term. Currently empty (sprint specs have been cleaned up after merge); add new ones here when a feature needs a stable, code-referenced source of truth.

{tokenjam-0.3.4 → tokenjam-0.3.5}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: tokenjam
-Version: 0.3.4
+Version: 0.3.5
 Summary: TokenJam — local-first OTel-native observability for Autonomous AI agents
 Project-URL: Homepage, https://opencla.watch
 Project-URL: Repository, https://github.com/Metabuilder-Labs/openclawwatch
@@ -23,6 +23,7 @@ Requires-Dist: apscheduler>=3.10
 Requires-Dist: click>=8.1
 Requires-Dist: duckdb>=0.10
 Requires-Dist: fastapi>=0.110
+Requires-Dist: fastmcp>=0.2
 Requires-Dist: genson>=1.2
 Requires-Dist: httpx>=0.27
 Requires-Dist: jsonschema>=4.0
@@ -53,7 +54,6 @@ Requires-Dist: langchain>=0.2; extra == 'langchain'
 Provides-Extra: litellm
 Requires-Dist: litellm>=1.40; extra == 'litellm'
 Provides-Extra: mcp
-Requires-Dist: fastmcp; extra == 'mcp'
 Description-Content-Type: text/markdown
 <div align="center">
@@ -154,6 +154,8 @@ tj onboard --claude-code
 tj optimize          # cost-saving candidates from your actual usage
 ```
+To upgrade later: `pipx upgrade tokenjam` (then `tj stop && tj serve &` to reload the daemon, and `tj --version` to verify). See [docs/installation.md](docs/installation.md#upgrading).
 For any Python agent:
 ```python

{tokenjam-0.3.4 → tokenjam-0.3.5}/README.md RENAMED Viewed

@@ -96,6 +96,8 @@ tj onboard --claude-code
 tj optimize          # cost-saving candidates from your actual usage
 ```
+To upgrade later: `pipx upgrade tokenjam` (then `tj stop && tj serve &` to reload the daemon, and `tj --version` to verify). See [docs/installation.md](docs/installation.md#upgrading).
 For any Python agent:
 ```python

{tokenjam-0.3.4 → tokenjam-0.3.5}/docs/installation.md RENAMED Viewed

@@ -74,6 +74,21 @@ If you run `tj optimize trim` without the extra installed, the analyzer self-reg
 See [`docs/optimize/trim.md`](optimize/trim.md) for performance numbers, capture requirements, and what the analyzer actually reports.
+## Upgrading
+```bash
+pipx upgrade tokenjam          # if you installed via pipx (recommended)
+pip install --upgrade tokenjam # if you're in a pip + venv setup
+```
+After upgrading:
+1. Restart the daemon to pick up the new code: `tj stop && tj serve &`
+2. DB migrations apply automatically on the next `tj` invocation — no manual step required
+3. Verify with `tj --version`
+PyPI's CDN occasionally lags ~1–2 min after a release. If `pipx upgrade` reports "already at the latest version" but the reported `tj --version` is older than what's on the [releases page](https://github.com/Metabuilder-Labs/tokenjam/releases), wait a minute and retry.
 ## TypeScript SDK
 ```bash

{tokenjam-0.3.4 → tokenjam-0.3.5}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
 [project]
 name = "tokenjam"
-version = "0.3.4"
+version = "0.3.5"
 description = "TokenJam — local-first OTel-native observability for Autonomous AI agents"
 readme = "README.md"
 requires-python = ">=3.10"
@@ -41,6 +41,11 @@ dependencies = [
     "httpx>=0.27",
     "apscheduler>=3.10",
     "websockets>=12.0",
+    # fastmcp ships in the base install (was in the [mcp] extra) so `tj mcp`
+    # works on a fresh `pipx install tokenjam` without requiring users to
+    # remember the extra. Claude Code's MCP integration is now a primary
+    # use case rather than an opt-in. Issue #101.
+    "fastmcp>=0.2",
 ]
 [project.urls]
@@ -54,7 +59,10 @@ crewai    = ["crewai>=0.28"]
 autogen   = ["pyautogen>=0.2"]
 litellm   = ["litellm>=1.40"]
 dev       = ["pytest", "pytest-asyncio", "httpx", "ruff", "mypy"]
-mcp       = ["fastmcp"]
+# Kept as a no-op extra for back-compat — `pipx install 'tokenjam[mcp]'` still
+# works, just installs the same fastmcp that's now in the base dependencies.
+# Documented in `docs/installation.md` so users know they no longer need it.
+mcp       = []
 # Trim analyzer (`tj optimize --finding prompt-bloat`). LLMLingua-2 pulls in
 # PyTorch and transformers, ~2GB total. Kept optional so the base install
 # stays small — most users don't run the bloat analyzer.

{tokenjam-0.3.4 → tokenjam-0.3.5}/sdk-ts/package.json RENAMED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@tokenjam/sdk",
-  "version": "0.3.4",
+  "version": "0.3.5",
   "description": "TypeScript SDK for TokenJam — local-first observability for AI agents",
   "main": "dist/index.js",
   "types": "dist/index.d.ts",

{tokenjam-0.3.4 → tokenjam-0.3.5}/tests/manual-new-release-tests.md RENAMED Viewed

@@ -13,8 +13,15 @@ Run this after a new release publishes to PyPI to verify it works end-to-end. Th
 tj uninstall --yes 2>/dev/null
 rm -rf ~/.tj ~/.config/tj .tj
-pip3 install --upgrade tokenjam
+# Recommended install path (PEP 668-safe on Homebrew Python and
+# Debian 12+/Ubuntu 24+). `--force` so we reinstall even if a prior
+# version is present.
+pipx install --force tokenjam
 tj --version
+# Older `pip3 install --upgrade tokenjam` path still works inside
+# a clean venv but fails on system Python — that's the bug pipx
+# solves, and verifying pipx is what we ship docs telling users to do.
 ```
 **Pass criteria:** version matches the release being tested.
@@ -66,6 +73,32 @@ tj optimize --json | python3 -c \
 **Pass criteria:** every positional analyzer name runs without crashing. Optional analyzers (`cache-recommend`, `trim`) surface clear hints when their prereqs aren't met instead of erroring.
+## 4b. TokenMaxx tier classification
+```bash
+tj tokenmaxx
+# [ ] Bordered "TokenJam TokenMaxxing Report" panel renders
+# [ ] On api plan: shows absolute spend; no multiplier line
+# [ ] Action line surfaces either downsize savings or "no obvious
+#     savings flagged yet" (both are valid)
+# Verify the JSON tier label is one of the six valid v0.3.4 tiers.
+tj tokenmaxx --json | python3 -c \
+  "import json,sys;d=json.load(sys.stdin);ok={'TokenSipper','TokenModerator','TokenMaxxer','TokenSuperMaxxer','TokenMegaMaxxer','TokenGigaMaxxer'};assert d['tier'] in ok,d['tier'];print('ok:',d['tier'])"
+# Reconfigure to a subscription plan and re-run — the multiplier line
+# should appear. Pick whichever plan matches your test config.
+tj onboard --claude-code --reconfigure --plan max_5x
+tj tokenmaxx
+# [ ] Multiplier line "That's N× your Max 5x plan cost ($100/mo flat)."
+# [ ] Tier may shift if the multiplier crosses a boundary
+# Flip back to api so subsequent steps render dollar figures.
+tj onboard --claude-code --reconfigure --plan api
+```
+**Pass criteria:** the report renders without crashing, the JSON `tier` field carries one of the 6 v0.3.4 tier names, and the multiplier line appears under a subscription plan.
 ## 5. Backfill adapters (smoke against committed fixtures)
 ```bash
@@ -123,10 +156,45 @@ Spot-check:
 - [ ] Cost page shows non-zero USD values
 - [ ] Sidebar theme toggle works
+### Offline-UI verification (v0.3.4 — PR #88)
+Open Chrome DevTools (or your browser's equivalent) → **Network tab** → reload `http://127.0.0.1:7391/`.
+- [ ] **Zero failed requests** to `fonts.googleapis.com`, `fonts.gstatic.com`, `esm.sh`, or `tokenjam.dev`
+- [ ] Dashboard interactivity works (sidebar nav, tab switches) — proves the vendored Preact / htm under `/ui/vendor/` is being served, not loading from the CDN
+- [ ] Favicon renders (data: URL, no external fetch)
+Bonus: turn off wifi entirely, hard-refresh, and confirm the page still renders + the JS still hydrates. The whole dashboard must work air-gapped.
 ```bash
 tj stop
 ```
+## 9. Cache cost-correctness (v0.3.4 — PRs #90 + #92)
+Cache-only spans (cache_read > 0, input/output = 0) used to be costed at $0. Cache-creation tokens on the live OTLP path used to be silently dropped. Both fixed in v0.3.4.
+```bash
+# Spans table now has cache_write_tokens (migration 5).
+duckdb ~/.tj/telemetry.duckdb "PRAGMA table_info(spans)" 2>/dev/null \
+  | grep cache_write_tokens \
+  && echo "ok: cache_write_tokens column present"
+# Any captured Anthropic cache-hit span should have non-zero cost_usd.
+duckdb ~/.tj/telemetry.duckdb "
+  SELECT COUNT(*) AS hits,
+         MIN(cost_usd) AS min_cost
+  FROM spans
+  WHERE cache_tokens > 0
+    AND (input_tokens = 0 OR input_tokens IS NULL)
+    AND (output_tokens = 0 OR output_tokens IS NULL)
+" 2>/dev/null
+# [ ] If hits > 0: min_cost > 0 (cache hits ARE being costed; was $0 pre-0.3.4)
+# [ ] If hits = 0: this release's runs didn't trigger a pure cache-only span — fine, unit tests cover the path
+```
+If you don't have `duckdb` CLI installed, skip the SQL checks — the unit + synthetic tests covering these paths run in CI and are the canonical verification.
 ---
 ## Claude Code integration (smoke)
@@ -166,14 +234,16 @@ tj stop
 | Step | Pass criteria |
 |------|--------------|
-| 1 | `pip install --upgrade tokenjam` succeeds, version matches release |
+| 1 | `pipx install --force tokenjam` succeeds, version matches release |
 | 2 | Onboard prompts for plan tier; config records it; no auto `usd = 200` written |
 | 3 | Example runs without DB-lock errors; CLI shows real USD values |
-| 4 | All four optimize analyzers run; caveat appears in downgrade JSON; `plan` + `pricing_mode` in JSON output |
+| 4 | All optimize analyzers run; caveat appears in downgrade JSON; `plan` + `pricing_mode` in JSON output |
+| 4b | `tj tokenmaxx` renders the bordered report panel; JSON `tier` is one of the 6 v0.3.4 tier names; subscription plan shows multiplier line |
 | 5 | All three backfill adapters ingest from fixtures; re-runs are idempotent |
 | 6 | `--compare previous` produces a diff report; `--export-config` writes a snippet with caveat comments |
 | 7 | `tj policy list` renders the unified table |
-| 8 | `tj serve` starts, web UI loads, HTTP fallback works while server holds lock |
+| 8 | `tj serve` starts, web UI loads, HTTP fallback works while server holds lock; **zero external requests in DevTools Network tab** (offline-UI fix shipped in v0.3.4) |
+| 9 | `cache_write_tokens` column present on the spans table (migration 5); cache-hit spans show non-zero cost_usd |
 | Claude Code | Onboard writes settings.json + projects.json; re-run is a no-op |
 | Codex | Onboard writes `[otel]` + `[mcp_servers.tj]` to codex config; secret synced |

{tokenjam-0.3.4 → tokenjam-0.3.5}/tests/manual-pre-release-testing.md RENAMED Viewed

@@ -176,6 +176,33 @@ tj optimize --json | python3 -c \
   "import json,sys;r=json.load(sys.stdin);assert 'plan' in r and 'pricing_mode' in r;print('ok: plan-tier metadata present')"
 ```
+### 6f. TokenMaxx (v0.3.4 — six-tier ladder)
+```bash
+tj tokenmaxx
+# [ ] Bordered "TokenJam TokenMaxxing Report" panel renders
+# [ ] On api plan: shows absolute spend; no multiplier line
+# [ ] `tj optimize` reference in the action line renders in green bold
+# [ ] Share-line is teal, mentions @tokenjamdev (NOT #tokenmaxx)
+# Tier label must be one of the v0.3.4 six.
+tj tokenmaxx --json | python3 -c \
+  "import json,sys;d=json.load(sys.stdin);ok={'TokenSipper','TokenModerator','TokenMaxxer','TokenSuperMaxxer','TokenMegaMaxxer','TokenGigaMaxxer'};assert d['tier'] in ok,d['tier'];print('ok:',d['tier'])"
+# Force a different tier by exercising a subscription plan (use whatever
+# plan matches your test data — heavy usage on a Pro plan will land you
+# higher up the ladder than the same usage on a Max-20x plan).
+tj onboard --claude-code --reconfigure --plan max_5x
+tj tokenmaxx
+# [ ] Output now shows "That's N× your Max 5x plan cost ($100/mo flat)."
+# [ ] Tier name follows the v0.3.4 ladder (Sipper / Moderator / Maxxer /
+#     SuperMaxxer / MegaMaxxer / GigaMaxxer)
+# [ ] At thresholds: 1× / 4× / 10× / 20× / 50× crosses tier boundaries
+# Reset to api before continuing
+tj onboard --claude-code --reconfigure --plan api
+```
 ## 7. Plan-tier-aware rendering
 Reconfigure to a subscription plan and re-run `tj optimize` — output should reframe.
@@ -317,6 +344,41 @@ Spot-check (don't repeat every theme/typography detail — those are one-time UI
 If any UI element regresses *visibly broken* relative to main, dig in. Otherwise move on.
+### Offline-UI verification (v0.3.4 — issue #87 / PR #88)
+The dashboard must work fully offline. The "local-first, no data egress" pitch breaks the moment a render-time external load happens.
+Open Chrome DevTools (or your browser's equivalent) → **Network tab** → reload `http://127.0.0.1:7391/`.
+- [ ] **Zero failed requests** to `fonts.googleapis.com`, `fonts.gstatic.com`, `esm.sh`, or `tokenjam.dev`
+- [ ] Dashboard interactivity works (sidebar nav, tab switches) — proves the vendored Preact / htm under `/ui/vendor/` is serving correctly
+- [ ] Favicon renders (data: URL, no external fetch)
+The `tests/unit/test_ui_offline.py` regression test pins this contract in CI, but a manual eyeball catches anything the regex assertions miss (e.g. background-image URLs in CSS, inline `fetch()` calls in JS).
+### Cache cost-correctness (v0.3.4 — PRs #90 + #92)
+```bash
+# Spans table has cache_write_tokens (migration 5).
+duckdb ~/.tj/telemetry.duckdb "PRAGMA table_info(spans)" 2>/dev/null \
+  | grep cache_write_tokens \
+  && echo "ok: cache_write_tokens column present"
+# Any Anthropic cache-hit span (cache_read>0, no input/output) is now
+# costed. Pre-0.3.4 these were dropped as no-ops and silently $0.
+duckdb ~/.tj/telemetry.duckdb "
+  SELECT COUNT(*) AS hits, MIN(cost_usd) AS min_cost
+  FROM spans
+  WHERE cache_tokens > 0
+    AND (input_tokens = 0 OR input_tokens IS NULL)
+    AND (output_tokens = 0 OR output_tokens IS NULL)
+" 2>/dev/null
+# [ ] If hits > 0: min_cost > 0
+# [ ] If hits = 0: this run didn't trigger a pure cache-only span — fine
+```
+If `duckdb` CLI isn't installed, skip — the unit + synthetic tests covering these paths run in CI.
 ## 14. Clean up
 ```bash
@@ -402,6 +464,7 @@ tj status && tj traces && tj cost --since 1h
 tj optimize           # all analyzers
 tj optimize downsize
 tj optimize cache
+tj tokenmaxx          # six-tier ladder
 tj backfill langfuse --source-file tests/fixtures/langfuse_real_response.json
 tj backfill helicone --source-file tests/fixtures/helicone_real_response.json
 tj backfill otlp --source-file tests/fixtures/otlp_sample.json

{tokenjam-0.3.4 → tokenjam-0.3.5}/tests/unit/test_cost.py RENAMED Viewed

@@ -2,6 +2,8 @@
 from __future__ import annotations
 import logging
+import pytest
 from tokenjam.core.cost import calculate_cost
 from tokenjam.core.pricing import load_pricing_table, get_rates
@@ -56,12 +58,60 @@ def test_calculate_cost_cache_read_only():
 def test_calculate_cost_unknown_model_uses_default(caplog):
+    # Use a unique provider/model so the dedupe set doesn't suppress this run.
     with caplog.at_level(logging.WARNING, logger="tokenjam.core.cost"):
-        cost = calculate_cost("unknown_provider", "unknown_model", 1_000_000, 1_000_000)
+        cost = calculate_cost("test_unknown_provider", "test_unknown_model", 1_000_000, 1_000_000)
     # Default rates: 0.50 input, 2.00 output per MTok
     # (1M/1M * 0.50) + (1M/1M * 2.00) = 2.50
     assert cost == 2.5
-    assert "No pricing data for unknown_provider/unknown_model" in caplog.text
+    assert "No pricing data for test_unknown_provider/test_unknown_model" in caplog.text
+def test_calculate_cost_unknown_model_warns_only_once_per_pair(caplog):
+    """Backfilling many spans of the same unknown model used to spam the
+    warning N times. Now it's emitted once per (provider, model) per
+    process. Issue #98."""
+    import tokenjam.core.cost as cost_mod
+    # Reset dedupe set so this test is isolated.
+    cost_mod._UNKNOWN_MODEL_WARNED.clear()
+    with caplog.at_level(logging.WARNING, logger="tokenjam.core.cost"):
+        for _ in range(5):
+            calculate_cost("test_provider_xyz", "test_model_xyz", 1000, 200)
+    # Exactly one warning, not five.
+    matching = [r for r in caplog.records if "test_provider_xyz/test_model_xyz" in r.message]
+    assert len(matching) == 1, f"expected 1 warning, got {len(matching)}"
+    # A DIFFERENT unknown model in the same process should still warn (once).
+    caplog.clear()
+    with caplog.at_level(logging.WARNING, logger="tokenjam.core.cost"):
+        for _ in range(3):
+            calculate_cost("test_provider_xyz", "different_model", 1000, 200)
+    matching = [r for r in caplog.records if "different_model" in r.message]
+    assert len(matching) == 1
+def test_deprecated_anthropic_base_models_are_priced():
+    """Dated variants (claude-sonnet-4-20250514, etc.) resolve via the
+    YYYYMMDD-stripping fallback to the deprecated base entries we added in
+    pricing/models.toml. Issue #98 — was previously falling through to
+    defaults and spamming warnings."""
+    # Sonnet 4 (deprecated): $3 / $15 per MTok
+    cost = calculate_cost("anthropic", "claude-sonnet-4-20250514", 1_000_000, 1_000_000)
+    assert cost == pytest.approx(18.0)  # 3 + 15
+    # Opus 4 (deprecated): $15 / $75 per MTok
+    cost = calculate_cost("anthropic", "claude-opus-4-20250514", 1_000_000, 1_000_000)
+    assert cost == pytest.approx(90.0)  # 15 + 75
+    # Opus 4.1 (deprecated): $15 / $75 per MTok
+    cost = calculate_cost("anthropic", "claude-opus-4-1-20250805", 1_000_000, 1_000_000)
+    assert cost == pytest.approx(90.0)
+    # Haiku 3.5 (retired): $0.80 / $4 per MTok
+    cost = calculate_cost("anthropic", "claude-haiku-3-5-20241022", 1_000_000, 1_000_000)
+    assert cost == pytest.approx(4.8)  # 0.8 + 4
 def test_calculate_cost_zero_tokens_returns_zero_no_warning(caplog):

tokenjam-0.3.5/tokenjam/__init__.py ADDED Viewed

@@ -0,0 +1,6 @@
+from importlib.metadata import PackageNotFoundError, version as _pkg_version
+try:
+    __version__ = _pkg_version("tokenjam")
+except PackageNotFoundError:
+    __version__ = "0.0.0+unknown"

{tokenjam-0.3.4 → tokenjam-0.3.5}/tokenjam/api/app.py RENAMED Viewed

@@ -75,6 +75,7 @@ def create_app(
     from tokenjam.api.routes.agents import router as agents_router
     from tokenjam.api.routes.optimize import router as optimize_router
     from tokenjam.api.routes.cost_compare import router as cost_compare_router
+    from tokenjam.api.routes.version import router as version_router, health_router
     app.include_router(spans_router, prefix="/api/v1")
     app.include_router(traces_router, prefix="/api/v1")
@@ -87,6 +88,8 @@ def create_app(
     app.include_router(agents_router, prefix="/api/v1")
     app.include_router(optimize_router, prefix="/api/v1")
     app.include_router(cost_compare_router, prefix="/api/v1")
+    app.include_router(version_router, prefix="/api/v1")
+    app.include_router(health_router)  # /health — no prefix, for uptime probes
     app.include_router(metrics_router)  # /metrics — no prefix
     app.include_router(otlp_router)  # /v1/traces, /v1/metrics, /v1/logs — no prefix

tokenjam-0.3.5/tokenjam/api/routes/version.py ADDED Viewed

@@ -0,0 +1,20 @@
+"""GET /api/v1/version — package version, used by the UI footer.
+GET /health — process liveness probe (alias for uptime tooling, no prefix)."""
+from __future__ import annotations
+from fastapi import APIRouter
+from tokenjam import __version__
+router = APIRouter()
+health_router = APIRouter()
+@router.get("/version")
+async def get_version() -> dict:
+    return {"version": __version__}
+@health_router.get("/health")
+async def get_health() -> dict:
+    return {"status": "ok", "version": __version__}

{tokenjam-0.3.4 → tokenjam-0.3.5}/tokenjam/cli/cmd_report.py RENAMED Viewed

@@ -19,6 +19,7 @@ from datetime import datetime, timezone
 from pathlib import Path
 import click
+from rich.markup import escape as _rich_escape
 from tokenjam.utils.formatting import console
@@ -82,7 +83,7 @@ def _render_trim_report(
     if not finding.enabled:
         # Show the hint inline rather than producing an empty HTML file.
-        console.print(f"[yellow]Trim analyzer not ready:[/yellow]\n{finding.hint}")
+        console.print(f"[yellow]Trim analyzer not ready:[/yellow]\n{_rich_escape(finding.hint)}")
         return
     if not finding.per_prompt:

{tokenjam-0.3.4 → tokenjam-0.3.5}/tokenjam/cli/cmd_tokenmaxx.py RENAMED Viewed

@@ -193,12 +193,41 @@ def _fetch(db, config, since_dt, until_dt, since_str) -> tuple[float, float, int
 # ───────────────────────────── helpers ────────────────────────────────────
 def _config_declared_plan(config) -> str | None:
-    """Mirror cmd_optimize._config_declared_plan."""
+    """Return the user's declared subscription plan tier.
+    Checks the active config first; if no `[budget.<provider>].plan` is set
+    (common when running from a project dir whose `.tj/config.toml` has no
+    `[budget]` section), falls back to peeking at the global config at
+    `~/.config/tj/config.toml`. Without this fallback, tokenmaxx silently
+    rendered api-pricing framing in subdirectories even when the user had
+    set their plan globally via `tj onboard`. Issue #106.
+    """
     budgets = getattr(config, "budgets", None) or {}
     for provider in sorted(budgets.keys()):
         plan = getattr(budgets[provider], "plan", None)
         if plan:
             return str(plan)
+    # Active config has no plan — peek at the global config file directly.
+    try:
+        import sys
+        from pathlib import Path
+        if sys.version_info >= (3, 11):
+            import tomllib
+        else:
+            import tomli as tomllib  # type: ignore[no-redef]
+        global_path = Path.home() / ".config" / "tj" / "config.toml"
+        if not global_path.exists():
+            return None
+        with open(global_path, "rb") as f:
+            raw = tomllib.load(f)
+        budget_block = raw.get("budget") or {}
+        for provider in sorted(budget_block.keys()):
+            plan = (budget_block[provider] or {}).get("plan")
+            if plan:
+                return str(plan)
+    except (OSError, Exception):  # noqa: BLE001
+        return None
     return None

{tokenjam-0.3.4 → tokenjam-0.3.5}/tokenjam/cli/main.py RENAMED Viewed

@@ -3,7 +3,11 @@ from tokenjam.core.config import load_config
 from tokenjam.core.db import open_db
-@click.group()
+@click.group(
+    epilog="Upgrade with: pipx upgrade tokenjam "
+           "(then `tj stop && tj serve &` to reload the daemon). "
+           "Verify with `tj --version`.",
+)
 @click.version_option(package_name="tokenjam")
 @click.option("--config", "config_path", default=None, envvar="TJ_CONFIG",
               help="Config file path (default: auto-discover)")

tokenjam 0.3.4__tar.gz → 0.3.5__tar.gz

tokenjam 0.3.4tar.gz → 0.3.5tar.gz