PyPI - tokenjam - Versions diffs - 0.3.2__tar.gz → 0.3.4__tar.gz - Mend

tokenjam 0.3.2tar.gz → 0.3.4tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (273) hide show

{tokenjam-0.3.2 → tokenjam-0.3.4}/CHANGELOG.md RENAMED Viewed

@@ -54,6 +54,11 @@ This release pivots TokenJam toward cost-optimization for autonomous agents. Fou
 - **`tj status` surfaces unknown plan tiers.** When sessions exist with `plan_tier = 'unknown'`, prints a one-line note pointing the user at `tj onboard --reconfigure`. Exit code unchanged.
 - **`tj optimize` plan-tier-aware rendering.** When every session in the window has `plan_tier = 'unknown'`, dollar figures are suppressed and a header note explains why. Mixed / partial-unknown windows render normally with an advisory note.
 - **MCP `get_optimize_report` tool.** Now accepts `findings: list[str]` (was `only: str`). Docstring surfaces for both API-billing and subscription-plan-efficiency phrasings.
+- **`tj tokenmaxx` tier ladder expanded to six tiers and renamed.** Two highest tiers renamed from `TokenChad` / `TokenGigaChad` to `TokenSuperMaxxer` / `TokenGigaMaxxer`, and a new `TokenMegaMaxxer` tier slots between them covering the 20× – 50× multiplier range. The previous top tier started at 20×; the new top tier starts at 50×, so the absolute headline for very-heavy users is more meaningful. Fire-emoji escalation matches the new tier count: 🔥 → 🔥🔥 → 🔥🔥🔥. The quip that previously belonged to `TokenGigaChad` ("Touch grass. Then run `tj optimize`.") now belongs to `TokenMegaMaxxer`; `TokenGigaMaxxer` gets its own escalated quip. JSON output's `tier` field carries the new label string verbatim; consumers reading the `tier` value must update accordingly.
+### Fixed
+- **Cache-only spans were costed at $0.** A prompt-cache hit (0 new input/output tokens but non-zero cache-read tokens) bills the cache-read rate, but both `calculate_cost` and `CostEngine.process_span` short-circuited on input/output alone, dropping the span as a no-op and under-reporting spend. The early-return guards now fire only when *all* token counts are zero, so cache-read (and cache-write) costs are charged correctly.
+- **Cache-write (cache-creation) tokens were dropped on the live ingest path.** The SDK integrations emit `gen_ai.usage.cache_creation_tokens`, the pricing table carries a `cache_write_per_mtok` rate, and `calculate_cost` already priced it — but the OTLP span parser and provider reader only read cache-read tokens, so cache-creation tokens never reached `CostEngine.process_span` and their (higher-rate) cost was never charged. `NormalizedSpan` now carries `cache_write_tokens`, both parsers populate it, and `process_span` charges it. Only the backfill path costed cache-write before; the live path now matches.
 ### Internal
 - **Registry-driven optimize analyzers.** `tokenjam/core/optimize.py` split into `tokenjam/core/optimize/` package with `registry.py`, `runner.py`, `types.py`, and `analyzers/` subpackage using `pkgutil` auto-discovery. New analyzers drop a file under `analyzers/` with a `@register("name")` decorator — nothing else needs editing. See `tokenjam/core/optimize/README.md`.

{tokenjam-0.3.2 → tokenjam-0.3.4}/CLAUDE.md RENAMED Viewed

@@ -191,9 +191,9 @@ The Agent Incident Library at `incidents/` is separate: each scenario is a `scen
 ## Pricing
-Model pricing lives in `pricing/models.toml` (USD per million tokens). Structure: `[provider.model_name]` with `input_per_mtok`, `output_per_mtok`, and optional `cache_read_per_mtok`/`cache_write_per_mtok`. Unknown models fall back to default rates ($0.50/$2.00 per MTok) with a logged warning. The pricing table is LRU-cached at process startup — restart to pick up changes.
+Model pricing lives in `tokenjam/pricing/models.toml` (USD per million tokens) — the packaged file `core/pricing.py` loads via `PRICING_FILE = Path(__file__).parent.parent / "pricing" / "models.toml"`. There is no repo-root `pricing/` copy (it was moved into the package in v0.1.x so it ships in the wheel; editing a repo-root file would have no runtime effect). Structure: `[provider.model_name]` with `input_per_mtok`, `output_per_mtok`, and optional `cache_read_per_mtok`/`cache_write_per_mtok`. Unknown models fall back to default rates ($0.50/$2.00 per MTok) with a logged warning. The pricing table is LRU-cached at process startup — restart to pick up changes.
-Pricing is community-maintained: submit a PR editing `pricing/models.toml` when provider prices change. No code changes needed — the file is loaded at runtime.
+Pricing is community-maintained: submit a PR editing `tokenjam/pricing/models.toml` when provider prices change. No code changes needed — the file is loaded at runtime.
 ## CI

{tokenjam-0.3.2 → tokenjam-0.3.4}/CONTRIBUTING.md RENAMED Viewed

@@ -43,7 +43,7 @@ tokenjam/sdk/               @watch() decorator and provider/framework patches
 tokenjam/otel/              OTel TracerProvider and span exporter wiring
 tokenjam/utils/             Formatting, time parsing, ID generation
 sdk-ts/src/            TypeScript SDK (@tokenjam/sdk)
-pricing/models.toml    Community-maintained model pricing — PRs welcome here
+tokenjam/pricing/models.toml  Community-maintained model pricing — PRs welcome here
 tests/factories.py     Span factory — use this in all synthetic tests, never
                        construct NormalizedSpan directly
 ```
@@ -57,7 +57,7 @@ This project was built using parallel Claude Code agents. The `.claude/` directo
 ## Pricing table contributions
-The file `pricing/models.toml` is intentionally community-maintained. If a model is missing or prices have changed, open a PR with the update — no issue needed, just update the TOML and verify the format matches existing entries.
+The file `tokenjam/pricing/models.toml` is intentionally community-maintained. If a model is missing or prices have changed, open a PR with the update — no issue needed, just update the TOML and verify the format matches existing entries. (This is the file the cost engine loads at runtime; there is no separate repo-root copy.)
 ## Reporting issues

{tokenjam-0.3.2 → tokenjam-0.3.4}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: tokenjam
-Version: 0.3.2
+Version: 0.3.4
 Summary: TokenJam — local-first OTel-native observability for Autonomous AI agents
 Project-URL: Homepage, https://opencla.watch
 Project-URL: Repository, https://github.com/Metabuilder-Labs/openclawwatch
@@ -74,9 +74,11 @@ TokenJam reads your agent's telemetry and tells you when to downsize, when to tr
 [![OTel](https://img.shields.io/badge/OTel-GenAI%20SemConv-3d8eff?labelColor=0d1117)](https://opentelemetry.io/docs/specs/semconv/gen-ai/)
 ```
-pip install tokenjam
+pipx install tokenjam
 ```
+<sub>Don't have pipx? `brew install pipx` on macOS, `apt install pipx` on Debian/Ubuntu, or see [docs/installation.md](docs/installation.md). `pip install tokenjam` also works in a clean venv.</sub>
 **No cloud · No signup · No vendor lock-in**
 </div>
@@ -147,7 +149,7 @@ Run all four with `tj optimize`. Run several with `tj optimize downsize cache tr
 For **Claude Code** users — zero code, auto-backfills your last 30 days:
 ```bash
-pip install "tokenjam[mcp]"
+pipx install 'tokenjam[mcp]'
 tj onboard --claude-code
 tj optimize          # cost-saving candidates from your actual usage
 ```

{tokenjam-0.3.2 → tokenjam-0.3.4}/README.md RENAMED Viewed

@@ -16,9 +16,11 @@ TokenJam reads your agent's telemetry and tells you when to downsize, when to tr
 [![OTel](https://img.shields.io/badge/OTel-GenAI%20SemConv-3d8eff?labelColor=0d1117)](https://opentelemetry.io/docs/specs/semconv/gen-ai/)
 ```
-pip install tokenjam
+pipx install tokenjam
 ```
+<sub>Don't have pipx? `brew install pipx` on macOS, `apt install pipx` on Debian/Ubuntu, or see [docs/installation.md](docs/installation.md). `pip install tokenjam` also works in a clean venv.</sub>
 **No cloud · No signup · No vendor lock-in**
 </div>
@@ -89,7 +91,7 @@ Run all four with `tj optimize`. Run several with `tj optimize downsize cache tr
 For **Claude Code** users — zero code, auto-backfills your last 30 days:
 ```bash
-pip install "tokenjam[mcp]"
+pipx install 'tokenjam[mcp]'
 tj onboard --claude-code
 tj optimize          # cost-saving candidates from your actual usage
 ```

{tokenjam-0.3.2 → tokenjam-0.3.4}/docs/claude-code-integration.md RENAMED Viewed

@@ -3,7 +3,7 @@
 Monitor every Claude Code session — costs, tool calls, API requests, errors — with two commands:
 ```bash
-pip install "tokenjam[mcp]"
+pipx install 'tokenjam[mcp]'
 tj onboard --claude-code
 # Restart Claude Code, then:
 tj status --agent claude-code-<project>

{tokenjam-0.3.2 → tokenjam-0.3.4}/docs/installation.md RENAMED Viewed

@@ -5,10 +5,36 @@ TokenJam ships as a Python package on PyPI and a TypeScript SDK on npm. Pick the
 ## Base install
 ```bash
+pipx install tokenjam
+```
+This is the recommended install path on **all platforms**. `pipx` automatically creates an isolated venv for the `tj` CLI, which means:
+- It works on macOS with Homebrew Python (which refuses `pip install` into its managed environment by default — [PEP 668](https://peps.python.org/pep-0668/)).
+- It works on Debian 12+ / Ubuntu 24+ (same PEP 668 enforcement).
+- It doesn't pollute your system Python or any project's venv.
+Don't have `pipx`? Install it with one of:
+| Platform | Command |
+|---|---|
+| macOS | `brew install pipx` |
+| Debian / Ubuntu | `apt install pipx` |
+| Windows | `py -m pip install --user pipx` |
+| Anywhere else | `python3 -m pip install --user pipx` |
+Then ensure pipx's bin dir is on your `PATH` with `pipx ensurepath`.
+### Alternative: pip in a venv
+If you prefer plain pip (or need to install into an existing project venv):
+```bash
+python3 -m venv .venv && source .venv/bin/activate
 pip install tokenjam
 ```
-This is enough for the CLI (`tj`), local REST API (`tj serve`), the four out-of-box optimize analyzers that don't need ML models, and every native SDK integration except LLMLingua-based Trim. Requires Python ≥ 3.10.
+Either path is enough for the CLI (`tj`), local REST API (`tj serve`), the four out-of-box optimize analyzers that don't need ML models, and every native SDK integration except LLMLingua-based Trim. Requires Python ≥ 3.10.
 After install, run:
@@ -37,7 +63,7 @@ TokenJam keeps heavyweight ML dependencies, framework adapters, and the MCP serv
 Combine multiple extras:
 ```bash
-pip install "tokenjam[mcp,bloat]"
+pipx install 'tokenjam[mcp,bloat]'
 ```
 ### Bloat extra details

{tokenjam-0.3.2 → tokenjam-0.3.4}/docs/optimize/trim.md RENAMED Viewed

@@ -22,7 +22,7 @@ LLMLingua-2 pulls in PyTorch and transformers (~2GB). Kept out of the
 base install:
 ```bash
-pip install "tokenjam[bloat]"
+pipx install 'tokenjam[bloat]'
 ```
 The base `pip install tokenjam` does NOT pull torch. Trim shows up in

{tokenjam-0.3.2 → tokenjam-0.3.4}/docs/python-sdk.md RENAMED Viewed

@@ -5,7 +5,7 @@ For any Python agent — Anthropic, OpenAI, Gemini, Bedrock, LangChain, CrewAI,
 ## Install
 ```bash
-pip install tokenjam
+pipx install tokenjam
 tj onboard    # creates config, generates ingest secret
 tj doctor     # verify your setup
 ```

{tokenjam-0.3.2 → tokenjam-0.3.4}/examples/openclaw/README.md RENAMED Viewed

@@ -5,7 +5,7 @@ This is a config-only integration — no Python code needed. OpenClaw's built-in
 ## Step 1: Start TokenJam
 ```bash
-pip install tokenjam
+pipx install tokenjam
 tj onboard
 tj serve &
 ```

{tokenjam-0.3.2 → tokenjam-0.3.4}/incidents/hallucination-drift/README.md RENAMED Viewed

@@ -1,6 +1,6 @@
 # My agent worked yesterday. Today it's possessed.
-**Run it:** `pip install tokenjam && tj demo hallucination-drift`
+**Run it:** `pipx install tokenjam && tj demo hallucination-drift`
 ---
@@ -60,7 +60,7 @@ The demo uses `baseline_sessions = 5` for speed. In production, 10–50 sessions
 ## Try it yourself
 ```bash
-pip install tokenjam
+pipx install tokenjam
 tj demo hallucination-drift
 ```
@@ -75,4 +75,4 @@ To track drift on your real agent, wire up the TokenJam SDK, enable drift in `tj
 ---
-[TokenJam](https://github.com/Metabuilder-Labs/TokenJam) is a local-first, zero-signup observability CLI for AI agents. No cloud. No account. Just `pip install tokenjam` and start seeing what your agent actually does.
+[TokenJam](https://github.com/Metabuilder-Labs/TokenJam) is a local-first, zero-signup observability CLI for AI agents. No cloud. No account. Just `pipx install tokenjam` and start seeing what your agent actually does.

{tokenjam-0.3.2 → tokenjam-0.3.4}/incidents/retry-loop/README.md RENAMED Viewed

@@ -1,6 +1,6 @@
 # Your agent isn't flaky. You're blind.
-**Run it:** `pip install tokenjam && tj demo retry-loop`
+**Run it:** `pipx install tokenjam && tj demo retry-loop`
 ---
@@ -57,7 +57,7 @@ The loop was visible from span #4. Your logs didn't surface it until a user comp
 ## Try it yourself
 ```bash
-pip install tokenjam
+pipx install tokenjam
 tj demo retry-loop
 ```
@@ -72,4 +72,4 @@ To catch this in your real agent, wire up the TokenJam SDK (`@watch()` + `patch_
 ---
-[TokenJam](https://github.com/Metabuilder-Labs/TokenJam) is a local-first, zero-signup observability CLI for AI agents. No cloud. No account. Just `pip install tokenjam` and start seeing what your agent actually does.
+[TokenJam](https://github.com/Metabuilder-Labs/TokenJam) is a local-first, zero-signup observability CLI for AI agents. No cloud. No account. Just `pipx install tokenjam` and start seeing what your agent actually does.

{tokenjam-0.3.2 → tokenjam-0.3.4}/incidents/surprise-cost/README.md RENAMED Viewed

@@ -1,6 +1,6 @@
 # Why did my agent just spend $47 on a hello world?
-**Run it:** `pip install tokenjam && tj demo surprise-cost`
+**Run it:** `pipx install tokenjam && tj demo surprise-cost`
 ---
@@ -75,7 +75,7 @@ TokenJam fires `cost_budget_session` and `cost_budget_daily` alerts when limits
 ## Try it yourself
 ```bash
-pip install tokenjam
+pipx install tokenjam
 tj demo surprise-cost
 ```
@@ -90,4 +90,4 @@ To track real spend, instrument your agent with the tokenjam SDK and run `tj ser
 ---
-[TokenJam](https://github.com/Metabuilder-Labs/TokenJam) is a local-first, zero-signup observability CLI for AI agents. No cloud. No account. Just `pip install tokenjam` and start seeing what your agent actually does.
+[TokenJam](https://github.com/Metabuilder-Labs/TokenJam) is a local-first, zero-signup observability CLI for AI agents. No cloud. No account. Just `pipx install tokenjam` and start seeing what your agent actually does.

{tokenjam-0.3.2 → tokenjam-0.3.4}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
 [project]
 name = "tokenjam"
-version = "0.3.2"
+version = "0.3.4"
 description = "TokenJam — local-first OTel-native observability for Autonomous AI agents"
 readme = "README.md"
 requires-python = ">=3.10"

{tokenjam-0.3.2 → tokenjam-0.3.4}/sdk-ts/package.json RENAMED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@tokenjam/sdk",
-  "version": "0.3.2",
+  "version": "0.3.4",
   "description": "TypeScript SDK for TokenJam — local-first observability for AI agents",
   "main": "dist/index.js",
   "types": "dist/index.d.ts",

{tokenjam-0.3.2 → tokenjam-0.3.4}/tests/factories.py RENAMED Viewed

@@ -19,6 +19,7 @@ def make_llm_span(
     input_tokens: int = 1000,
     output_tokens: int = 200,
     cache_tokens: int = 0,
+    cache_write_tokens: int = 0,
     cost_usd: float | None = None,
     tool_name: str | None = None,
     status: str = "ok",
@@ -59,6 +60,7 @@ def make_llm_span(
         input_tokens=input_tokens,
         output_tokens=output_tokens,
         cache_tokens=cache_tokens,
+        cache_write_tokens=cache_write_tokens,
         cost_usd=cost_usd,
         conversation_id=conversation_id,
         attributes=attrs,

{tokenjam-0.3.2 → tokenjam-0.3.4}/tests/integration/test_db.py RENAMED Viewed

@@ -42,8 +42,8 @@ def _insert_agent(db, agent_id="test-agent"):
 def test_migrations_run_on_empty_db():
     backend = InMemoryBackend()
     rows = backend.conn.execute("SELECT version FROM schema_migrations").fetchall()
-    assert len(rows) == 4
-    assert {r[0] for r in rows} == {1, 2, 3, 4}
+    assert len(rows) == 5
+    assert {r[0] for r in rows} == {1, 2, 3, 4, 5}
     backend.close()
@@ -52,7 +52,7 @@ def test_migrations_are_idempotent():
     # Running migrations again should not raise
     run_migrations(backend.conn)
     rows = backend.conn.execute("SELECT version FROM schema_migrations").fetchall()
-    assert len(rows) == 4
+    assert len(rows) == 5
     backend.close()

{tokenjam-0.3.2 → tokenjam-0.3.4}/tests/integration/test_full_pipeline.py RENAMED Viewed

@@ -34,7 +34,7 @@ from tokenjam.core.db import InMemoryBackend
 from tokenjam.core.ingest import IngestPipeline
 from tokenjam.core.models import AgentRecord, NormalizedSpan, SpanKind, SpanStatus
 from tokenjam.core.schema_validator import SchemaValidator
-from tokenjam.otel.provider import TjSpanExporter
+from tokenjam.otel.provider import TjSpanExporter, convert_otel_span
 from tokenjam.otel.semconv import GenAIAttributes
 from tokenjam.sdk.agent import watch, AgentSession, record_llm_call, record_tool_call
 from tokenjam.utils.time_parse import utcnow
@@ -159,6 +159,40 @@ def full_stack():
     db.close()
+# ── OTel ReadableSpan -> NormalizedSpan ──────────────────────────────────
+def test_convert_otel_span_extracts_cache_read_and_write_tokens():
+    """convert_otel_span indexes both cache-read and cache-creation tokens.
+    Regression: provider previously read only CACHE_READ_TOKENS, dropping
+    cache-creation tokens so cache-write cost was never charged on this path.
+    """
+    collected: list[ReadableSpan] = []
+    class _Collector(SpanExporter):
+        def export(self, spans: Sequence[ReadableSpan]) -> SpanExportResult:
+            collected.extend(spans)
+            return SpanExportResult.SUCCESS
+        def shutdown(self) -> None:
+            pass
+    provider = TracerProvider()
+    provider.add_span_processor(SimpleSpanProcessor(_Collector()))
+    tracer = provider.get_tracer("test")
+    with tracer.start_as_current_span("gen_ai.llm.call") as span:
+        span.set_attribute(GenAIAttributes.REQUEST_MODEL, "claude-haiku-4-5")
+        span.set_attribute(GenAIAttributes.CACHE_READ_TOKENS, 1000)
+        span.set_attribute(GenAIAttributes.CACHE_CREATE_TOKENS, 2000)
+    assert len(collected) == 1
+    normalized = convert_otel_span(collected[0])
+    assert normalized.cache_tokens == 1000
+    assert normalized.cache_write_tokens == 2000
 # ── SDK -> Pipeline -> DB ─────────────────────────────────────────────────

tokenjam-0.3.4/tests/synthetic/test_cost_tracking.py ADDED Viewed

@@ -0,0 +1,261 @@
+"""Synthetic tests for CostEngine with an in-memory DuckDB backend."""
+from __future__ import annotations
+import duckdb
+import pytest
+from tokenjam.core.cost import CostEngine
+from tests.factories import make_llm_span, make_session
+class FakeDB:
+    """Minimal DB stub with just enough schema for CostEngine tests."""
+    def __init__(self) -> None:
+        self.conn = duckdb.connect(":memory:")
+        self.conn.execute("""
+            CREATE TABLE spans (
+                span_id TEXT PRIMARY KEY,
+                cost_usd DOUBLE
+            )
+        """)
+        self.conn.execute("""
+            CREATE TABLE sessions (
+                session_id TEXT PRIMARY KEY,
+                total_cost_usd DOUBLE
+            )
+        """)
+    def insert_span_stub(self, span_id: str) -> None:
+        self.conn.execute(
+            "INSERT INTO spans (span_id, cost_usd) VALUES (?, NULL)",
+            [span_id],
+        )
+    def insert_session_stub(self, session_id: str) -> None:
+        self.conn.execute(
+            "INSERT INTO sessions (session_id, total_cost_usd) VALUES (?, NULL)",
+            [session_id],
+        )
+    def get_span_cost(self, span_id: str) -> float | None:
+        row = self.conn.execute(
+            "SELECT cost_usd FROM spans WHERE span_id = ?", [span_id]
+        ).fetchone()
+        return row[0] if row else None
+    def get_session_cost(self, session_id: str) -> float | None:
+        row = self.conn.execute(
+            "SELECT total_cost_usd FROM sessions WHERE session_id = ?", [session_id]
+        ).fetchone()
+        return row[0] if row else None
+@pytest.fixture
+def fake_db() -> FakeDB:
+    return FakeDB()
+@pytest.fixture
+def engine(fake_db: FakeDB) -> CostEngine:
+    return CostEngine(fake_db)
+def test_cost_engine_updates_span_cost_in_db(fake_db: FakeDB, engine: CostEngine) -> None:
+    span = make_llm_span(
+        provider="anthropic", model="claude-haiku-4-5",
+        input_tokens=1000, output_tokens=200,
+    )
+    fake_db.insert_span_stub(span.span_id)
+    engine.process_span(span)
+    db_cost = fake_db.get_span_cost(span.span_id)
+    assert db_cost is not None
+    assert db_cost == pytest.approx(0.0016)
+    assert span.cost_usd == pytest.approx(0.0016)
+def test_cost_engine_updates_session_total_cost(fake_db: FakeDB, engine: CostEngine) -> None:
+    session = make_session()
+    fake_db.insert_session_stub(session.session_id)
+    span = make_llm_span(
+        provider="anthropic", model="claude-haiku-4-5",
+        input_tokens=1000, output_tokens=200,
+        session_id=session.session_id,
+    )
+    fake_db.insert_span_stub(span.span_id)
+    engine.process_span(span)
+    session_cost = fake_db.get_session_cost(session.session_id)
+    assert session_cost is not None
+    assert session_cost == pytest.approx(0.0016)
+def test_cost_engine_accumulates_across_multiple_spans(
+    fake_db: FakeDB, engine: CostEngine,
+) -> None:
+    session = make_session()
+    fake_db.insert_session_stub(session.session_id)
+    for _ in range(3):
+        span = make_llm_span(
+            provider="anthropic", model="claude-haiku-4-5",
+            input_tokens=1000, output_tokens=200,
+            session_id=session.session_id,
+        )
+        fake_db.insert_span_stub(span.span_id)
+        engine.process_span(span)
+    session_cost = fake_db.get_session_cost(session.session_id)
+    assert session_cost is not None
+    # 3 * 0.0016 = 0.0048
+    assert session_cost == pytest.approx(0.0048)
+def test_cost_engine_no_op_when_tokens_missing(fake_db: FakeDB, engine: CostEngine) -> None:
+    span = make_llm_span(
+        provider="anthropic", model="claude-haiku-4-5",
+        input_tokens=0, output_tokens=0,
+    )
+    fake_db.insert_span_stub(span.span_id)
+    engine.process_span(span)
+    db_cost = fake_db.get_span_cost(span.span_id)
+    assert db_cost is None
+    assert span.cost_usd is None
+def test_cost_engine_costs_cache_only_span(fake_db: FakeDB, engine: CostEngine) -> None:
+    # A span with no new input/output but cache-read tokens (a cache hit) still
+    # costs the cache-read rate and must be recorded, not dropped as a no-op.
+    # claude-haiku-4-5: cache_read=0.08 per MTok.
+    span = make_llm_span(
+        provider="anthropic", model="claude-haiku-4-5",
+        input_tokens=0, output_tokens=0, cache_tokens=1_000_000,
+    )
+    fake_db.insert_span_stub(span.span_id)
+    engine.process_span(span)
+    db_cost = fake_db.get_span_cost(span.span_id)
+    assert db_cost == pytest.approx(0.08)
+    assert span.cost_usd == pytest.approx(0.08)
+def test_cost_engine_cache_only_span_updates_session_total(
+    fake_db: FakeDB, engine: CostEngine,
+) -> None:
+    # The cache-only span's cost must also accumulate into the session total,
+    # not just the span row — dropping it previously under-reported the session.
+    session = make_session()
+    fake_db.insert_session_stub(session.session_id)
+    span = make_llm_span(
+        provider="anthropic", model="claude-haiku-4-5",
+        input_tokens=0, output_tokens=0, cache_tokens=1_000_000,
+        session_id=session.session_id,
+    )
+    fake_db.insert_span_stub(span.span_id)
+    engine.process_span(span)
+    session_cost = fake_db.get_session_cost(session.session_id)
+    assert session_cost == pytest.approx(0.08)
+def test_cost_engine_costs_cache_write_span(fake_db: FakeDB, engine: CostEngine) -> None:
+    # A span whose only tokens are cache-CREATION (cache write) must be costed at
+    # the cache-write rate, not dropped as a no-op and not charged the read rate.
+    # claude-haiku-4-5: cache_write=1.00 per MTok.
+    span = make_llm_span(
+        provider="anthropic", model="claude-haiku-4-5",
+        input_tokens=0, output_tokens=0, cache_write_tokens=1_000_000,
+    )
+    fake_db.insert_span_stub(span.span_id)
+    engine.process_span(span)
+    db_cost = fake_db.get_span_cost(span.span_id)
+    assert db_cost == pytest.approx(1.00)
+    assert span.cost_usd == pytest.approx(1.00)
+def test_cost_engine_costs_cache_read_and_write_together(
+    fake_db: FakeDB, engine: CostEngine,
+) -> None:
+    # Read and write cache tokens are priced at different rates and must both be
+    # charged. claude-haiku-4-5: cache_read=0.08, cache_write=1.00 per MTok.
+    span = make_llm_span(
+        provider="anthropic", model="claude-haiku-4-5",
+        input_tokens=0, output_tokens=0,
+        cache_tokens=1_000_000, cache_write_tokens=1_000_000,
+    )
+    fake_db.insert_span_stub(span.span_id)
+    engine.process_span(span)
+    db_cost = fake_db.get_span_cost(span.span_id)
+    assert db_cost == pytest.approx(1.08)
+    assert span.cost_usd == pytest.approx(1.08)
+def test_cost_engine_cache_write_span_updates_session_total(
+    fake_db: FakeDB, engine: CostEngine,
+) -> None:
+    # The cache-write span's cost must also accumulate into the session total.
+    session = make_session()
+    fake_db.insert_session_stub(session.session_id)
+    span = make_llm_span(
+        provider="anthropic", model="claude-haiku-4-5",
+        input_tokens=0, output_tokens=0, cache_write_tokens=1_000_000,
+        session_id=session.session_id,
+    )
+    fake_db.insert_span_stub(span.span_id)
+    engine.process_span(span)
+    session_cost = fake_db.get_session_cost(session.session_id)
+    assert session_cost == pytest.approx(1.00)
+def test_cost_engine_cache_write_pre_priced_does_not_double_count_session(
+    fake_db: FakeDB, engine: CostEngine,
+) -> None:
+    # A pre-priced span (cost_usd already set, e.g. from the parser) has its
+    # session cost handled by ingest's _build_or_update_session. process_span
+    # must still recompute the span cost but must NOT re-add to the session
+    # total, or cache-write spend would be double-counted.
+    session = make_session()
+    fake_db.conn.execute(
+        "INSERT INTO sessions (session_id, total_cost_usd) VALUES (?, ?)",
+        [session.session_id, 5.0],
+    )
+    span = make_llm_span(
+        provider="anthropic", model="claude-haiku-4-5",
+        input_tokens=0, output_tokens=0, cache_write_tokens=1_000_000,
+        session_id=session.session_id, cost_usd=1.00,
+    )
+    fake_db.insert_span_stub(span.span_id)
+    engine.process_span(span)
+    # Span cost recomputed, session total left untouched (no double-count).
+    assert fake_db.get_span_cost(span.span_id) == pytest.approx(1.00)
+    assert fake_db.get_session_cost(session.session_id) == pytest.approx(5.0)
+def test_cost_engine_no_op_when_provider_missing(fake_db: FakeDB, engine: CostEngine) -> None:
+    span = make_llm_span(input_tokens=1000, output_tokens=200)
+    span.provider = None
+    fake_db.insert_span_stub(span.span_id)
+    engine.process_span(span)
+    db_cost = fake_db.get_span_cost(span.span_id)
+    assert db_cost is None

tokenjam 0.3.2__tar.gz → 0.3.4__tar.gz

tokenjam 0.3.2tar.gz → 0.3.4tar.gz