PyPI - tokenjam - Versions diffs - 0.2.2__tar.gz → 0.2.3__tar.gz - Mend

tokenjam 0.2.2tar.gz → 0.2.3tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (215) hide show

{tokenjam-0.2.2 → tokenjam-0.2.3}/CLAUDE.md RENAMED Viewed

@@ -92,6 +92,8 @@ Post-ingest hooks run synchronously after each span is written to DB:
 - **`tokenjam/core/cost.py`**: `calculate_cost()` (pure function, rounds to 8dp) + `CostEngine` (post-ingest hook that updates `spans.cost_usd` and `sessions.total_cost_usd` via `db.conn` — see db.py note). Pricing loaded from `pricing/models.toml`.
 - **`tokenjam/core/alerts.py`**: `AlertEngine` with 13 alert types, `CooldownTracker` (in-memory, per agent+type, resets on restart), `AlertDispatcher` routing to 6 channel types (stdout, file, ntfy, webhook, Discord, Telegram). `AlertEngine.fire()` is the external entry point for other modules (SchemaValidator, DriftDetector) to fire alerts. Suppressed alerts are still persisted to DB but not dispatched to channels. Hardcoded thresholds: retry loop fires at 4+ identical tool calls in last 6 spans; failure rate fires at >20% errors in last 20 spans (checked every 5th error); session duration default 3600s. Stdout and file channels always include full detail regardless of `include_captured_content` config.
 - **`tokenjam/core/drift.py`**: `DriftDetector` — Z-score based behavioral drift detection, fires at session end.
+- **`tokenjam/core/optimize.py`**: Two analyzers used by `tj optimize` and the `get_optimize_report` MCP tool. `analyze_model_downgrade()` flags sessions whose structural shape (input < 5K tokens AND output < 500 tokens AND tool_calls ≤ 5) matches a class of work where a cheaper alternative model is worth reviewing — never claims quality equivalence. `MODEL_DOWNGRADE_CAVEAT` is in the dataclass default so it cannot be removed accidentally. `project_budget()` projects current cycle spend against a `[budget.<provider>]` ceiling; only fires when budget > 0. Both functions operate on `db.conn` directly.
+- **`tokenjam/core/backfill.py`**: Parses Claude Code on-disk session JSONL files into `NormalizedSpan`s. Cost is recomputed from `pricing/models.toml` because the on-disk format has no `cost_usd`. The parser tolerates the dated `claude-<family>-<ver>-YYYYMMDD` model-name suffixes Anthropic ships (handled by `core/pricing.py.get_rates()`, which strips the trailing 8-digit date suffix when no exact pricing match exists). Idempotency relies on deterministic span IDs derived from `(session_id, message uuid)` / `(session_id, tool_use id)`.
 - **`tokenjam/core/schema_validator.py`**: Validates tool outputs against declared or genson-inferred JSON Schema. Only fires on `gen_ai.tool.call` spans with `gen_ai.tool.output` in attributes. Schema priority: 1) declared file from agent config `output_schema`, 2) inferred schema from `DriftBaseline.output_schema_inferred`. Caches schemas in-memory per agent.
 - **`tokenjam/core/models.py`**: All domain dataclasses — `NormalizedSpan`, `SessionRecord`, `Alert`, `DriftBaseline`, filter types, etc.
 - **`tokenjam/core/config.py`**: `TjConfig` dataclass tree, TOML loading/writing, config file discovery.
@@ -129,6 +131,8 @@ Post-ingest hooks run synchronously after each span is written to DB:
 | `tj mcp` | `cmd_mcp.py` | Start the stdio MCP server for Claude Code integration |
 | `tj uninstall` | `cmd_uninstall.py` | Remove all TokenJam data, config, and daemon |
 | `tj doctor` | `cmd_doctor.py` | Health checks (config, DB, secrets, webhooks, drift readiness, schema-vs-capture consistency). Exit 0 = ok, 1 = warnings, 2 = errors |
+| `tj optimize` | `cmd_optimize.py` | Two analyzers: model-downgrade candidates + per-provider budget projection. `--since 30d`, `--only model\|budget`, `--budget <provider>`, `--budget-usd <amount>`. JSON output supported. Opens the live DB read-only so it works alongside a running `tj serve`. |
+| `tj backfill claude-code` | `cmd_backfill.py` | Parse `~/.claude/projects/*.jsonl` and ingest historical sessions. Idempotent — deterministic span IDs (SHA-256 of `session_id + uuid`) mean re-runs skip already-ingested rows. Auto-invoked at the end of `tj onboard --claude-code`. Future agent log formats (Codex, etc.) plug in as additional subcommands. |
 All commands support `--json` for machine-readable output. Commands that query alerts use exit code 1 if active (unacknowledged, unsuppressed) alerts exist.
@@ -167,12 +171,17 @@ When a span has a `conversation_id` matching an existing session, it's attribute
 11. **OTel TracerProvider is global and set-once** — `trace.set_tracer_provider()` only works once per process. In tests, set the provider once at module level (not per-test in a fixture) and clear spans between tests. Use a custom `_CollectingExporter(SpanExporter)` since `InMemorySpanExporter` is not available in the installed OTel version. See `tests/agents/test_mock_scenarios.py` for the SDK test pattern and `tests/integration/test_full_pipeline.py` for the pipeline pattern.
 12. **New SDK integrations must call `ensure_initialised()`** — every `patch_*()` convenience function must call `from tokenjam.sdk.bootstrap import ensure_initialised; ensure_initialised()` before installing hooks. This lazily bootstraps the TracerProvider + IngestPipeline on first use.
 13. **PyPI package name is `tokenjam`, not `ocw`** — `pip install tokenjam` is the correct install command. The CLI command is `tj` and the Python package directory is `tokenjam/`. The published package name on PyPI is `tokenjam`. Never write `pip install ocw` in docs, examples, or comments.
-14. **Version bump on release** — both `pyproject.toml` (`version = "X.Y.Z"`) and `sdk-ts/package.json` (`"version": "X.Y.Z"`) must be bumped to the new version before creating a GitHub release. The publish workflows (`publish-pypi.yml`, `publish-npm.yml`) trigger on `release published` events and will fail with 403 if the version already exists on PyPI/npm.
+14. **`tj optimize` output must never claim quality equivalence** — the model-downgrade finding flags structural candidates only. Every user-visible string says "looks like" / "candidate" / "review before switching" — never "safe to downgrade" or "would have worked." The `MODEL_DOWNGRADE_CAVEAT` constant lives on `DowngradeFinding` as a dataclass default so it can't be removed by accident; it must also appear in human-readable CLI output. Equivalent honesty applies to future optimize analyzers (cache-opportunity, prompt-bloat).
+15. **Version bump on release** — both `pyproject.toml` (`version = "X.Y.Z"`) and `sdk-ts/package.json` (`"version": "X.Y.Z"`) must be bumped to the new version before creating a GitHub release. The publish workflows (`publish-pypi.yml`, `publish-npm.yml`) trigger on `release published` events and will fail with 403 if the version already exists on PyPI/npm.
 ## Config
 Config is TOML, discovered at: `tj.toml` -> `.tj/config.toml` -> `~/.config/tj/config.toml`. Override with `--config` or `TJ_CONFIG` env var. Full config hierarchy is in `tokenjam/core/config.py` (`TjConfig` dataclass).
+Two distinct budget concepts coexist — do not conflate:
+- **`[defaults.budget]` / `[agents.<id>.budget]`** (`daily_usd`, `session_usd`) — per-agent alert thresholds checked on every span by `AlertEngine`.
+- **`[budget.<provider>]`** (`usd`, `cycle_start_day`, `applies_to_services`) — periodic monthly ceilings used only by `tj optimize` projections. Read-only at projection time; no alerts fire from these. `tj onboard --claude-code` writes a default `[budget.anthropic] usd = 200` if no provider budget is configured. The analyzer scopes spend by `provider` column and (optionally) by `agent_id IN applies_to_services`.
 `tj onboard --claude-code` and `tj onboard --codex` always write to the **global** config (`~/.config/tj/config.toml`) regardless of cwd. This is intentional: each coding-agent integration reads one ingest secret from a single global location (`~/.claude/settings.json` or `~/.codex/config.toml`), and per-project configs would rotate that secret on every onboard, breaking auth for previously onboarded projects. Onboarded Claude Code project paths are tracked in `~/.config/tj/projects.json` for clean uninstall. Codex onboarding is fully project-agnostic — Codex hardcodes `service.name=codex_exec` in its binary, so there is one Codex agent ID for all projects.
 ## Daemon (launchd / systemd)

{tokenjam-0.2.2 → tokenjam-0.2.3}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: tokenjam
-Version: 0.2.2
+Version: 0.2.3
 Summary: TokenJam — local-first OTel-native observability for Autonomous AI agents
 Project-URL: Homepage, https://opencla.watch
 Project-URL: Repository, https://github.com/Metabuilder-Labs/openclawwatch
@@ -85,6 +85,46 @@ Your agent sends emails, writes files, calls APIs, and spends your money — all
 ## What you get
+**Cost optimization for Claude Code — out of the box.** Run `tj onboard --claude-code` and TokenJam reads your existing Claude Code session logs (up to 30 days, whatever your local retention has kept) so you can run `tj optimize` immediately:
+```
+$ tj optimize --agent claude-code-myproj
+Analyzing 39 sessions, 1.8M tokens, $160.3500 spend (last 30d,
+claude-code-myproj)…
+  ① Model downgrade: 13% of sessions match a smaller-model candidate shape
+     • 5 of 39 sessions matched structural heuristics
+     • Would have cost ~$0.0140 on the smaller model vs $2.2500 actual (in
+window)
+     • Projected savings if pattern holds: $2.2400/mo
+     • Pattern: claude-opus-4-7 → claude-haiku-4-5
+     Examples:
+       2cce7903..  2 tool calls   0.8s   $0.4500  (claude-opus-4-7)
+       e292ccbe..  2 tool calls   0.8s   $0.4500  (claude-opus-4-7)
+       d59cb502..  2 tool calls   0.8s   $0.4500  (claude-opus-4-7)
+     ! Candidate-flagging heuristic, not a quality judgment. Review the
+example sessions before changing models.
+  ② Budget projection (anthropic, $200.0000/cycle): comfortably within budget
+     Run rate $160.3500/mo — 19% of cycle budget unused.
+```
+Two analyzers reading the same spans you'd otherwise pay LangSmith to host: structural model-downgrade candidate flagging (never claims quality equivalence — surfaces examples to review) and per-provider monthly budget projection. Works with **any** agent already sending TokenJam data, not just Claude Code.
+Try a tighter budget to see the over-budget renderer:
+```
+$ tj optimize --budget anthropic --budget-usd 50
+  ② Budget projection (anthropic, $50.0000/cycle): projected to exceed cycle
+budget
+     • Monthly run rate: $160.3500 (3.2× the budget)
+     • At current pace, budget exhausted on 2026-05-15 (0.0 day(s) from now)
+     • Days remaining in cycle: 16
+     • Projected cycle total: $162.8700, overage: $112.8700
+```
 **Real-time cost tracking.** Every LLM call is priced as it happens — by agent, model, session, and tool. Budget alerts fire before you hit the limit, not after.
 **Safety alerts.** Configure any tool call as a sensitive action (`send_email`, `delete_file`, `submit_form`) and get notified instantly via ntfy, Discord, Telegram, webhook, or stdout.
@@ -108,10 +148,11 @@ For **Claude Code**, **Codex**, and any agent that already emits OpenTelemetry.
 ```bash
 pip install "tokenjam[mcp]"
 tj onboard --claude-code    # or: tj onboard --codex
-# Restart your coding agent
+tj optimize                 # see cost-saving candidates + budget projection
+# Restart your coding agent for live telemetry
 ```
-Every session, API call, tool use, and error is now a tracked span with cost and alert evaluation. The MCP server gives your coding agent 13 tools to query its own telemetry mid-session — just ask "how much have I spent today?" or "are there any active alerts?"
+`tj onboard --claude-code` auto-backfills your existing session logs from `~/.claude/projects/` so `tj optimize` works on the first run — no waiting for new data to accumulate. The MCP server gives your coding agent 14 tools to query its own telemetry mid-session — just ask "how much have I spent today?" or "where could I save money?"
 [Full Claude Code & Codex setup →](#claude-code--coding-agents)
@@ -194,9 +235,6 @@ export OTEL_EXPORTER_OTLP_ENDPOINT=http://127.0.0.1:7391
 ```
 tj status
-```
-```
 ● my-email-agent   completed   (2m 14s)
   Cost today:     $0.0340 / $5.0000 limit
@@ -207,17 +245,17 @@ tj status
   send_email called (sensitive action: critical)
 ```
-https://github.com/user-attachments/assets/b94d13f6-1432-40d4-b093-6958d74f0e65
 ```bash
-tj status           # current state, cost, active alerts
-tj traces           # full span history with waterfall view
-tj cost --since 7d  # cost breakdown by agent, model, day
-tj alerts           # everything that fired while you were away
-tj budget           # view and set daily/session cost limits
-tj drift            # behavioral drift Z-scores vs baseline
-tj tools            # tool call history with error rates
-tj serve            # start the web UI + REST API
+tj status              # current state, cost, active alerts
+tj traces              # full span history with waterfall view
+tj cost --since 7d     # cost breakdown by agent, model, day
+tj optimize            # cost-saving candidates + budget projection
+tj backfill claude-code  # ingest historical sessions from ~/.claude/projects/
+tj alerts              # everything that fired while you were away
+tj budget              # view and set daily/session cost limits
+tj drift               # behavioral drift Z-scores vs baseline
+tj tools               # tool call history with error rates
+tj serve               # start the web UI + REST API
 ```
 ---
@@ -226,8 +264,6 @@ tj serve            # start the web UI + REST API
 `tj serve` starts a local dashboard at `http://127.0.0.1:7391/`.
-https://github.com/user-attachments/assets/ff09caec-3487-4542-8628-d62b7d92591f
 - **Status** — agent overview with cost, tokens, tool calls, and active alerts
 - **Traces** — trace list with span waterfall visualization
 - **Cost** — breakdown by agent, model, day, or tool
@@ -237,6 +273,24 @@ https://github.com/user-attachments/assets/ff09caec-3487-4542-8628-d62b7d92591f
 No signup, no cloud — runs entirely on your machine.
+### Screenshots
+<table>
+<tr>
+<td width="50%"><strong>Status</strong> — agent overview with cost, tokens, tool calls, and active alerts.<br><br><img src="docs/screenshots/tj-status.png" alt="tj status page" /></td>
+<td width="50%"><strong>Traces</strong> — recent traces with cost, duration, and span count. Click a row for the waterfall view.<br><br><img src="docs/screenshots/tj-traces.png" alt="tj traces page" /></td>
+</tr>
+<tr>
+<td width="50%"><strong>Cost</strong> — spend broken down by day, agent, model, or tool.<br><br><img src="docs/screenshots/tj-cost.png" alt="tj cost page" /></td>
+<td width="50%"><strong>Alerts</strong> — full alert history with severity filter and inline detail expansion.<br><br><img src="docs/screenshots/tj-alerts.png" alt="tj alerts page" /></td>
+</tr>
+<tr>
+<td colspan="2"><strong>Budget</strong> — view and edit daily/per-session cost limits per agent, with recent budget alerts inline.<br><br><img src="docs/screenshots/tj-budget.png" alt="tj budget page" /></td>
+</tr>
+</table>
 ---
 ## tj vs LangSmith vs Langfuse
@@ -248,6 +302,7 @@ LangSmith and Langfuse are excellent for tracing LLM API calls and running evals
 | Signup required | ❌ | ✅ | ✅ | ✅ |
 | Data leaves your machine | ❌ | ✅ | cloud only | ✅ |
 | Real-time sensitive action alerts | ✅ | ❌ | ❌ | ❌ |
+| Model-downgrade cost recommendations | ✅ | ❌ | ❌ | ❌ |
 | Behavioral drift detection | ✅ | ❌ | ❌ | ❌ |
 | Local-first, no cloud required | ✅ | ❌ | self-host only | ❌ |
 | OTel GenAI SemConv native | ✅ | partial | partial | partial |
@@ -261,13 +316,13 @@ LangSmith and Langfuse are excellent for tracing LLM API calls and running evals
 ### Claude Code
-Monitor every Claude Code session — costs, tool calls, API requests, errors — with two commands:
+Monitor every Claude Code session and get cost-optimization recommendations from your existing usage in three commands:
 ```bash
 pip install "tokenjam[mcp]"
-tj onboard --claude-code
-# Restart Claude Code, then:
-tj status --agent claude-code-<project>
+tj onboard --claude-code   # auto-backfills your existing session logs
+tj optimize                # cost-saving candidates + budget projection
+# Then restart Claude Code so live telemetry starts flowing
 ```
 `tj onboard --claude-code` does everything in one shot:
@@ -277,9 +332,28 @@ tj status --agent claude-code-<project>
 - Registers the MCP server globally (`claude mcp add --scope user tj -- tj mcp`)
 - Installs a background daemon (launchd on macOS, systemd on Linux)
 - Adds Docker harness-compatible OTLP env vars to `~/.zshrc`
+- **Reads your existing `~/.claude/projects/*.jsonl` session logs** and ingests them into the local DB so `tj optimize` returns real numbers on first run (idempotent — safe to re-run)
+- Writes a sensible default `[budget.anthropic] usd = 200` for the budget projector to project against — edit `~/.config/tj/config.toml` to change
 **Claude Code must be restarted** after running `tj onboard --claude-code`.
+#### `tj optimize` — what you actually get
+Two analyzers run over the spans TokenJam has captured. The output is read-only recommendations — `tj optimize` never changes how your agent runs.
+**① Model-downgrade candidates.** Flags sessions whose structural shape (short input, short output, few tool calls) matches a class of work where a cheaper model in the same provider family is worth reviewing. Never asserts the cheaper model *would have produced the same answer* — only that the shape is worth a look. Real examples are surfaced so you can spot-check before changing models.
+**② Budget projection.** Per-provider monthly projection against any `[budget.<provider>]` ceiling you've configured. Scopes spend by provider — an Anthropic budget excludes OpenAI spend. Shows exhaustion date, projected overage, and what the run rate would drop to if you acted on the downgrade candidates.
+```bash
+tj optimize                                # both analyzers, last 30 days
+tj optimize --only budget                  # just the projection
+tj optimize --budget anthropic --budget-usd 50   # test a different ceiling
+tj optimize --json                         # machine-readable for piping
+```
+Works alongside a running `tj serve` (read-only fallback). Also exposed as the `get_optimize_report` MCP tool — your coding agent can ask itself "where could I save money?" mid-session.
 **Adding more projects** — run once per project directory:
 ```bash
@@ -292,10 +366,11 @@ Each project gets its own agent ID (`claude-code-<repo-name>`), all sharing one
 ### MCP server
-The MCP server gives Claude Code direct access to your observability data inside the session. 13 tools available after restart:
+The MCP server gives Claude Code direct access to your observability data inside the session. 14 tools available after restart:
 | Tool | What it does |
 |---|---|
+| `get_optimize_report` | Cost-saving candidates and budget projection — fires for either question (e.g. "where could I save money?" / "will I exceed my budget?") |
 | `get_status` | Current agent state — tokens, cost, active alerts |
 | `get_budget_headroom` | Budget limit vs spend |
 | `list_active_sessions` | All running sessions across agents |

{tokenjam-0.2.2 → tokenjam-0.2.3}/README.md RENAMED Viewed

@@ -29,6 +29,46 @@ Your agent sends emails, writes files, calls APIs, and spends your money — all
 ## What you get
+**Cost optimization for Claude Code — out of the box.** Run `tj onboard --claude-code` and TokenJam reads your existing Claude Code session logs (up to 30 days, whatever your local retention has kept) so you can run `tj optimize` immediately:
+```
+$ tj optimize --agent claude-code-myproj
+Analyzing 39 sessions, 1.8M tokens, $160.3500 spend (last 30d,
+claude-code-myproj)…
+  ① Model downgrade: 13% of sessions match a smaller-model candidate shape
+     • 5 of 39 sessions matched structural heuristics
+     • Would have cost ~$0.0140 on the smaller model vs $2.2500 actual (in
+window)
+     • Projected savings if pattern holds: $2.2400/mo
+     • Pattern: claude-opus-4-7 → claude-haiku-4-5
+     Examples:
+       2cce7903..  2 tool calls   0.8s   $0.4500  (claude-opus-4-7)
+       e292ccbe..  2 tool calls   0.8s   $0.4500  (claude-opus-4-7)
+       d59cb502..  2 tool calls   0.8s   $0.4500  (claude-opus-4-7)
+     ! Candidate-flagging heuristic, not a quality judgment. Review the
+example sessions before changing models.
+  ② Budget projection (anthropic, $200.0000/cycle): comfortably within budget
+     Run rate $160.3500/mo — 19% of cycle budget unused.
+```
+Two analyzers reading the same spans you'd otherwise pay LangSmith to host: structural model-downgrade candidate flagging (never claims quality equivalence — surfaces examples to review) and per-provider monthly budget projection. Works with **any** agent already sending TokenJam data, not just Claude Code.
+Try a tighter budget to see the over-budget renderer:
+```
+$ tj optimize --budget anthropic --budget-usd 50
+  ② Budget projection (anthropic, $50.0000/cycle): projected to exceed cycle
+budget
+     • Monthly run rate: $160.3500 (3.2× the budget)
+     • At current pace, budget exhausted on 2026-05-15 (0.0 day(s) from now)
+     • Days remaining in cycle: 16
+     • Projected cycle total: $162.8700, overage: $112.8700
+```
 **Real-time cost tracking.** Every LLM call is priced as it happens — by agent, model, session, and tool. Budget alerts fire before you hit the limit, not after.
 **Safety alerts.** Configure any tool call as a sensitive action (`send_email`, `delete_file`, `submit_form`) and get notified instantly via ntfy, Discord, Telegram, webhook, or stdout.
@@ -52,10 +92,11 @@ For **Claude Code**, **Codex**, and any agent that already emits OpenTelemetry.
 ```bash
 pip install "tokenjam[mcp]"
 tj onboard --claude-code    # or: tj onboard --codex
-# Restart your coding agent
+tj optimize                 # see cost-saving candidates + budget projection
+# Restart your coding agent for live telemetry
 ```
-Every session, API call, tool use, and error is now a tracked span with cost and alert evaluation. The MCP server gives your coding agent 13 tools to query its own telemetry mid-session — just ask "how much have I spent today?" or "are there any active alerts?"
+`tj onboard --claude-code` auto-backfills your existing session logs from `~/.claude/projects/` so `tj optimize` works on the first run — no waiting for new data to accumulate. The MCP server gives your coding agent 14 tools to query its own telemetry mid-session — just ask "how much have I spent today?" or "where could I save money?"
 [Full Claude Code & Codex setup →](#claude-code--coding-agents)
@@ -138,9 +179,6 @@ export OTEL_EXPORTER_OTLP_ENDPOINT=http://127.0.0.1:7391
 ```
 tj status
-```
-```
 ● my-email-agent   completed   (2m 14s)
   Cost today:     $0.0340 / $5.0000 limit
@@ -151,17 +189,17 @@ tj status
   send_email called (sensitive action: critical)
 ```
-https://github.com/user-attachments/assets/b94d13f6-1432-40d4-b093-6958d74f0e65
 ```bash
-tj status           # current state, cost, active alerts
-tj traces           # full span history with waterfall view
-tj cost --since 7d  # cost breakdown by agent, model, day
-tj alerts           # everything that fired while you were away
-tj budget           # view and set daily/session cost limits
-tj drift            # behavioral drift Z-scores vs baseline
-tj tools            # tool call history with error rates
-tj serve            # start the web UI + REST API
+tj status              # current state, cost, active alerts
+tj traces              # full span history with waterfall view
+tj cost --since 7d     # cost breakdown by agent, model, day
+tj optimize            # cost-saving candidates + budget projection
+tj backfill claude-code  # ingest historical sessions from ~/.claude/projects/
+tj alerts              # everything that fired while you were away
+tj budget              # view and set daily/session cost limits
+tj drift               # behavioral drift Z-scores vs baseline
+tj tools               # tool call history with error rates
+tj serve               # start the web UI + REST API
 ```
 ---
@@ -170,8 +208,6 @@ tj serve            # start the web UI + REST API
 `tj serve` starts a local dashboard at `http://127.0.0.1:7391/`.
-https://github.com/user-attachments/assets/ff09caec-3487-4542-8628-d62b7d92591f
 - **Status** — agent overview with cost, tokens, tool calls, and active alerts
 - **Traces** — trace list with span waterfall visualization
 - **Cost** — breakdown by agent, model, day, or tool
@@ -181,6 +217,24 @@ https://github.com/user-attachments/assets/ff09caec-3487-4542-8628-d62b7d92591f
 No signup, no cloud — runs entirely on your machine.
+### Screenshots
+<table>
+<tr>
+<td width="50%"><strong>Status</strong> — agent overview with cost, tokens, tool calls, and active alerts.<br><br><img src="docs/screenshots/tj-status.png" alt="tj status page" /></td>
+<td width="50%"><strong>Traces</strong> — recent traces with cost, duration, and span count. Click a row for the waterfall view.<br><br><img src="docs/screenshots/tj-traces.png" alt="tj traces page" /></td>
+</tr>
+<tr>
+<td width="50%"><strong>Cost</strong> — spend broken down by day, agent, model, or tool.<br><br><img src="docs/screenshots/tj-cost.png" alt="tj cost page" /></td>
+<td width="50%"><strong>Alerts</strong> — full alert history with severity filter and inline detail expansion.<br><br><img src="docs/screenshots/tj-alerts.png" alt="tj alerts page" /></td>
+</tr>
+<tr>
+<td colspan="2"><strong>Budget</strong> — view and edit daily/per-session cost limits per agent, with recent budget alerts inline.<br><br><img src="docs/screenshots/tj-budget.png" alt="tj budget page" /></td>
+</tr>
+</table>
 ---
 ## tj vs LangSmith vs Langfuse
@@ -192,6 +246,7 @@ LangSmith and Langfuse are excellent for tracing LLM API calls and running evals
 | Signup required | ❌ | ✅ | ✅ | ✅ |
 | Data leaves your machine | ❌ | ✅ | cloud only | ✅ |
 | Real-time sensitive action alerts | ✅ | ❌ | ❌ | ❌ |
+| Model-downgrade cost recommendations | ✅ | ❌ | ❌ | ❌ |
 | Behavioral drift detection | ✅ | ❌ | ❌ | ❌ |
 | Local-first, no cloud required | ✅ | ❌ | self-host only | ❌ |
 | OTel GenAI SemConv native | ✅ | partial | partial | partial |
@@ -205,13 +260,13 @@ LangSmith and Langfuse are excellent for tracing LLM API calls and running evals
 ### Claude Code
-Monitor every Claude Code session — costs, tool calls, API requests, errors — with two commands:
+Monitor every Claude Code session and get cost-optimization recommendations from your existing usage in three commands:
 ```bash
 pip install "tokenjam[mcp]"
-tj onboard --claude-code
-# Restart Claude Code, then:
-tj status --agent claude-code-<project>
+tj onboard --claude-code   # auto-backfills your existing session logs
+tj optimize                # cost-saving candidates + budget projection
+# Then restart Claude Code so live telemetry starts flowing
 ```
 `tj onboard --claude-code` does everything in one shot:
@@ -221,9 +276,28 @@ tj status --agent claude-code-<project>
 - Registers the MCP server globally (`claude mcp add --scope user tj -- tj mcp`)
 - Installs a background daemon (launchd on macOS, systemd on Linux)
 - Adds Docker harness-compatible OTLP env vars to `~/.zshrc`
+- **Reads your existing `~/.claude/projects/*.jsonl` session logs** and ingests them into the local DB so `tj optimize` returns real numbers on first run (idempotent — safe to re-run)
+- Writes a sensible default `[budget.anthropic] usd = 200` for the budget projector to project against — edit `~/.config/tj/config.toml` to change
 **Claude Code must be restarted** after running `tj onboard --claude-code`.
+#### `tj optimize` — what you actually get
+Two analyzers run over the spans TokenJam has captured. The output is read-only recommendations — `tj optimize` never changes how your agent runs.
+**① Model-downgrade candidates.** Flags sessions whose structural shape (short input, short output, few tool calls) matches a class of work where a cheaper model in the same provider family is worth reviewing. Never asserts the cheaper model *would have produced the same answer* — only that the shape is worth a look. Real examples are surfaced so you can spot-check before changing models.
+**② Budget projection.** Per-provider monthly projection against any `[budget.<provider>]` ceiling you've configured. Scopes spend by provider — an Anthropic budget excludes OpenAI spend. Shows exhaustion date, projected overage, and what the run rate would drop to if you acted on the downgrade candidates.
+```bash
+tj optimize                                # both analyzers, last 30 days
+tj optimize --only budget                  # just the projection
+tj optimize --budget anthropic --budget-usd 50   # test a different ceiling
+tj optimize --json                         # machine-readable for piping
+```
+Works alongside a running `tj serve` (read-only fallback). Also exposed as the `get_optimize_report` MCP tool — your coding agent can ask itself "where could I save money?" mid-session.
 **Adding more projects** — run once per project directory:
 ```bash
@@ -236,10 +310,11 @@ Each project gets its own agent ID (`claude-code-<repo-name>`), all sharing one
 ### MCP server
-The MCP server gives Claude Code direct access to your observability data inside the session. 13 tools available after restart:
+The MCP server gives Claude Code direct access to your observability data inside the session. 14 tools available after restart:
 | Tool | What it does |
 |---|---|
+| `get_optimize_report` | Cost-saving candidates and budget projection — fires for either question (e.g. "where could I save money?" / "will I exceed my budget?") |
 | `get_status` | Current agent state — tokens, cost, active alerts |
 | `get_budget_headroom` | Budget limit vs spend |
 | `list_active_sessions` | All running sessions across agents |

tokenjam-0.2.3/docs/screenshots/tj-alerts.png ADDED Viewed

Binary file

tokenjam-0.2.3/docs/screenshots/tj-budget.png ADDED Viewed

Binary file

tokenjam-0.2.3/docs/screenshots/tj-cost.png ADDED Viewed

Binary file

tokenjam-0.2.3/docs/screenshots/tj-status.png ADDED Viewed

Binary file

tokenjam-0.2.3/docs/screenshots/tj-traces.png ADDED Viewed

Binary file

{tokenjam-0.2.2 → tokenjam-0.2.3}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
 [project]
 name = "tokenjam"
-version = "0.2.2"
+version = "0.2.3"
 description = "TokenJam — local-first OTel-native observability for Autonomous AI agents"
 readme = "README.md"
 requires-python = ">=3.10"

{tokenjam-0.2.2 → tokenjam-0.2.3}/sdk-ts/package-lock.json RENAMED Viewed

@@ -1,12 +1,12 @@
 {
   "name": "@tokenjam/sdk",
-  "version": "0.2.2",
+  "version": "0.2.3",
   "lockfileVersion": 3,
   "requires": true,
   "packages": {
     "": {
       "name": "@tokenjam/sdk",
-      "version": "0.2.2",
+      "version": "0.2.3",
       "license": "MIT",
       "devDependencies": {
         "@types/node": "^25.5.0",

{tokenjam-0.2.2 → tokenjam-0.2.3}/sdk-ts/package.json RENAMED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@tokenjam/sdk",
-  "version": "0.2.2",
+  "version": "0.2.3",
   "description": "TypeScript SDK for TokenJam — local-first observability for AI agents",
   "main": "dist/index.js",
   "types": "dist/index.d.ts",

{tokenjam-0.2.2 → tokenjam-0.2.3}/tests/integration/test_cli.py RENAMED Viewed

@@ -579,6 +579,84 @@ def test_budget_set_agent_writes_config(runner, db, config, tmp_path):
     assert saved_config.agents["test-agent"].budget.session_usd == 0.25
+def test_optimize_empty_db_outputs_friendly_message(runner, db, config):
+    result = _invoke(runner, db, config, ["optimize"])
+    assert result.exit_code == 0
+    assert "No usage data found" in result.output
+def test_optimize_flags_downgrade_candidate(runner, db, config):
+    """A small Opus session in the window should appear as a candidate."""
+    from datetime import timedelta
+    from tests.factories import make_llm_span
+    from tokenjam.utils.time_parse import utcnow
+    start = utcnow() - timedelta(days=2)
+    span = make_llm_span(
+        agent_id="test-agent",
+        model="claude-opus-4-7",
+        provider="anthropic",
+        input_tokens=1000,
+        output_tokens=200,
+        cost_usd=0.030,
+        session_id="s-opus",
+        start_time=start,
+    )
+    db.insert_span(span)
+    result = _invoke(runner, db, config, ["optimize"])
+    assert result.exit_code == 0
+    assert "Model downgrade" in result.output
+    # Mandatory caveat must appear in human output
+    assert "Candidate-flagging heuristic" in result.output
+def test_optimize_json_output_includes_caveat(runner, db, config):
+    from datetime import timedelta
+    from tests.factories import make_llm_span
+    from tokenjam.utils.time_parse import utcnow
+    span = make_llm_span(
+        agent_id="test-agent", model="claude-opus-4-7", provider="anthropic",
+        input_tokens=1000, output_tokens=200, cost_usd=0.030,
+        session_id="s", start_time=utcnow() - timedelta(days=1),
+    )
+    db.insert_span(span)
+    result = _invoke(runner, db, config, ["optimize", "--json"])
+    assert result.exit_code == 0
+    data = json.loads(result.output)
+    assert data["downgrade"] is not None
+    assert "Candidate-flagging heuristic" in data["downgrade"]["caveat"]
+def test_optimize_budget_projection_from_config(runner, db):
+    """Budget configured via [budget.anthropic] should surface a projection."""
+    from datetime import timedelta
+    from tests.factories import make_llm_span
+    from tokenjam.core.config import ProviderBudget
+    from tokenjam.utils.time_parse import utcnow
+    cfg = TjConfig(
+        version="1",
+        agents={"test-agent": AgentConfig(budget=BudgetConfig(daily_usd=5.0))},
+        budgets={"anthropic": ProviderBudget(usd=10.0, cycle_start_day=1)},
+    )
+    # Insert spend that exceeds the small budget
+    for i in range(5):
+        span = make_llm_span(
+            agent_id="test-agent", model="claude-opus-4-7", provider="anthropic",
+            input_tokens=10_000, output_tokens=1_000, cost_usd=20.0,
+            session_id=f"s{i}", start_time=utcnow() - timedelta(days=1),
+        )
+        db.insert_span(span)
+    result = _invoke(runner, db, cfg, ["optimize", "--only", "budget"])
+    assert result.exit_code == 0
+    assert "Budget projection" in result.output
+    assert "anthropic" in result.output
 def test_budget_set_negative_daily_rejected(runner, db, config, tmp_path):
     """tj budget --daily -5 should error, not silently clear the limit."""
     config_file = tmp_path / "config.toml"

tokenjam 0.2.2__tar.gz → 0.2.3__tar.gz

tokenjam 0.2.2tar.gz → 0.2.3tar.gz