PyPI - coderouter-cli - Versions diffs - 2.5.4__tar.gz → 2.6.0__tar.gz - Mend

coderouter-cli 2.5.4tar.gz → 2.6.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (209) hide show

{coderouter_cli-2.5.4 → coderouter_cli-2.6.0}/CHANGELOG.md RENAMED Viewed

@@ -6,6 +6,118 @@ versioning follows [SemVer](https://semver.org/).
 ---
+## [v2.6.0] — 2026-06-20 (Language Tax: measure, route, visualize)
+Minor release: makes the CJK **"language tax"** — cloud tokenizers bill
+Japanese/Chinese/Korean text ~1.2–1.5× more tokens per character than
+English, while local models are unaffected — measurable, routable, and
+visible. Built entirely on existing infrastructure; **no new core
+dependency** (the accurate tokenizer is the existing optional `accuracy`
+extra), **no network** (local `tokenizer.json` only), and **fully
+backward compatible** — the feature is inert until a provider declares
+`tokenizer_path`.
+### Added
+- **Language-tax measurement (`coderouter/language_tax.py`).** A leaf
+  module exposing `cjk_char_ratio`, `estimate_language_tax`,
+  `LanguageTaxBreakdown`, and `language_tax_usd`. CJK detection is
+  stdlib-only (Unicode range checks); the accurate token count is
+  delegated to the optional `accuracy` (`tokenizers`) backend with a
+  char/4 fallback. The tax multiplier is `tokens_accurate /
+  tokens_heuristic` — ~1.0 for English/code, ~2.0–4.0 for pure CJK.
+- **End-to-end cost integration.** `CostBreakdown` gains
+  `language_tax_multiplier` / `language_tax_usd`;
+  `compute_cost_for_attempt` accepts an optional `language_tax=`. Both
+  `cache-observed` emit sites in `routing/fallback.py` (streaming +
+  non-streaming) build a `LanguageTaxBreakdown` **only when the provider
+  declares `tokenizer_path`**, so the hot path is untouched by default.
+  The `cache-observed` log line now carries `language_tax_usd` /
+  `language_tax_multiplier`, and `MetricsCollector` aggregates per-provider
+  + total language-tax spend (mirroring the cost-savings aggregation).
+- **`ProviderConfig.tokenizer_path`** — optional path to a local
+  `tokenizer.json` for accurate (language-tax) token counting. Local-file
+  only; never contacts the HuggingFace Hub. Inert when unset.
+- **`cjk_ratio_min` auto-route matcher.** A new `RuleMatcher` variant that
+  routes turns whose latest user message CJK ratio ≥ threshold to a
+  (typically local, tax-free) profile, while ASCII/code turns fall through
+  to the cloud chain. Per-turn property mirroring `code_fence_ratio_min`.
+  ```yaml
+  auto_router:
+    rules:
+      - match: { cjk_ratio_min: 0.3 }   # JA-heavy turns → local
+        profile: local
+      - match: { has_tools: true }
+        profile: cloud
+    default_rule_profile: cloud
+  ```
+- **Dashboard "Cost & Language Tax" panel** on `/dashboard`: total spend,
+  cache savings, and CJK language-tax spend (aggregate + per-provider).
+  Also surfaces the previously-hidden cost aggregates.
+- **`token_estimation.extract_text_from_anthropic_request()`** — pulls the
+  concatenated request text for the accurate tokenizer leg.
+### Security
+- **Bump starlette 1.0.1 → 1.3.1**, clearing four advisories
+  (CVE-2026-48817 / CVE-2026-48818 / CVE-2026-54282 / CVE-2026-54283) that
+  failed the `cve-audit` CI job (`pip-audit --strict`).
+### Notes
+- 38 new tests (`test_language_tax`, `test_language_tax_integration`,
+  `test_auto_router_cjk`, extended dashboard contract). Full suite:
+  **1250 passed, 8 skipped**. ruff clean. The 5-deps invariant is intact.
+---
+## [v2.5.5] — 2026-06-06 (Claude Code >= 2.1.154 `system` role normalization)
+Patch release: ingress-side workaround for a Claude Code CLI regression.
+### Fixed
+- **Claude Code CLI >= 2.1.154 requests no longer 422 at ingress.**
+  Claude Code 2.1.154 introduced a regression where it emits messages with
+  `role: "system"` (and reportedly `ctx` / `msg`) inside the Anthropic
+  `messages` array, which the Messages API spec restricts to
+  `user` / `assistant`. CodeRouter's wire validation correctly rejected
+  these with `Input should be 'user' or 'assistant'` — breaking every
+  request from affected Claude Code versions (2.1.150 and earlier are fine).
+  A new `model_validator(mode="before")` on `AnthropicRequest` now
+  normalizes such payloads before validation:
+  - `role: "system"` → text content merged into the top-level `system`
+    field (newline-joined after any existing system prompt; text block
+    appended when `system` is a block list).
+  - Any other non-spec role (`ctx`, `msg`, ...) → coerced to `user`,
+    preserving conversation position (Anthropic merges consecutive
+    same-role turns, so this is safe).
+  - Messages with no salvageable text content are dropped (Anthropic
+    rejects empty turns).
+  - A `normalized-nonspec-message-roles` warning is logged whenever
+    normalization fires.
+  The strict `AnthropicMessage` role enum is **unchanged** — the wire
+  model still matches the Anthropic spec, and the native adapter forwards
+  a normalized (valid) payload to `api.anthropic.com`, avoiding the same
+  400 upstream.
+  Verified with 16 new unit tests (`tests/test_role_normalization.py`);
+  full suite 1191 passed / 0 failed on py3.12.
+  Refs: `anthropics/claude-code#63469`, `anthropics/claude-code#63473`,
+  `vllm-project/vllm#44000`
+---
 ## [v2.5.4] — 2026-06-05 (Gemma `<0xNN>` byte-fallback repair filter)
 Patch release: a new opt-in output filter that repairs Japanese (and other

{coderouter_cli-2.5.4 → coderouter_cli-2.6.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: coderouter-cli
-Version: 2.5.4
+Version: 2.6.0
 Summary: Local-first, free-first, fallback-built-in LLM router. Claude Code / OpenAI compatible.
 Project-URL: Homepage, https://github.com/zephel01/CodeRouter
 Project-URL: Repository, https://github.com/zephel01/CodeRouter

{coderouter_cli-2.5.4 → coderouter_cli-2.6.0}/coderouter/config/schemas.py RENAMED Viewed

@@ -185,6 +185,19 @@ class ProviderConfig(BaseModel):
     )
     timeout_s: float = Field(default=30.0, ge=1.0, le=600.0)
+    # v2.6 language-tax track: path to a LOCAL ``tokenizer.json`` for this
+    # provider's model, used to measure the CJK over-count vs the char/4
+    # baseline (see ``coderouter.language_tax``). Loaded local-file-only —
+    # never contacts the HuggingFace Hub. When unset, language-tax falls
+    # back to char/4 (multiplier 1.0) and the feature is silently inert.
+    tokenizer_path: str | None = Field(
+        default=None,
+        description=(
+            "Local tokenizer.json for accurate (language-tax) token "
+            "counting. No network access. Requires the 'accuracy' extra."
+        ),
+    )
     # Provider-specific extras merged into the outbound request body.
     # Use for non-standard fields like Ollama's `think: false`, `keep_alive`,
     # `options.num_ctx`, or any vendor-specific toggle. User-supplied request
@@ -763,6 +776,16 @@ class RuleMatcher(BaseModel):
       ``request.tools`` set). The ``has_tools`` matcher is the
       profile-level lever for steering tool-laden traffic to the right
       chain entirely.
+    Variants (v2.6 / language-tax routing):
+    - ``cjk_ratio_min: 0.3`` — CJK character ratio of the latest user
+      message is ``>=`` this threshold. Routes CJK-heavy turns (which
+      pay the cloud "language tax" of ~1.2-1.5x more tokens) to a local
+      model that bills nothing per token, while ASCII/code turns fall
+      through to the cloud chain. Per-turn property like
+      ``code_fence_ratio_min``; see
+      :func:`coderouter.language_tax.cjk_char_ratio`.
     """
     model_config = ConfigDict(extra="forbid")
@@ -773,6 +796,13 @@ class RuleMatcher(BaseModel):
     content_regex: str | None = None
     model_pattern: str | None = None
     content_token_count_min: int | None = Field(default=None, ge=1)
+    # v2.6 language-tax routing: CJK character ratio of the latest user
+    # message >= this threshold. Lets operators steer CJK-heavy traffic
+    # (which carries the cloud language tax) to a local model that bills
+    # nothing per token. Operates on the latest user message like
+    # ``code_fence_ratio_min`` (a per-turn property), not the whole
+    # request. See ``coderouter.language_tax.cjk_char_ratio``.
+    cjk_ratio_min: float | None = Field(default=None, ge=0.0, le=1.0)
     # [Unreleased]: tool-aware routing (OpenClaw + Raspberry Pi 由来).
     # See class docstring "Variants ([Unreleased] / tool-aware routing)"
     # above for the full rationale. Boolean shape mirrors ``has_image`` —
@@ -789,6 +819,7 @@ class RuleMatcher(BaseModel):
         "model_pattern",
         "content_token_count_min",
         "has_tools",
+        "cjk_ratio_min",
     )
     @model_validator(mode="after")

{coderouter_cli-2.5.4 → coderouter_cli-2.6.0}/coderouter/cost.py RENAMED Viewed

@@ -58,9 +58,13 @@ in the cost calc.
 from __future__ import annotations
 from dataclasses import dataclass
+from typing import TYPE_CHECKING
 from coderouter.config.schemas import CostConfig
+if TYPE_CHECKING:  # avoid an import cycle at runtime; used only for typing
+    from coderouter.language_tax import LanguageTaxBreakdown
 @dataclass(frozen=True)
 class CostBreakdown:
@@ -82,6 +86,12 @@ class CostBreakdown:
             chart. ``input_usd`` is "fresh input only" (does not
             include cache buckets); cache_read_usd / cache_creation_usd
             are the post-discount / post-premium values.
+        language_tax_multiplier: ``tokens_accurate / tokens_heuristic``
+            for the request text (v2.6 language-tax track). 1.0 when no
+            tax is measurable (English/code, or no accurate tokenizer).
+        language_tax_usd: USD share of ``total_usd`` attributable to the
+            CJK over-count vs CodeRouter's char/4 English baseline.
+            0.0 for free / local providers. See :mod:`coderouter.language_tax`.
     """
     total_usd: float = 0.0
@@ -90,6 +100,10 @@ class CostBreakdown:
     output_usd: float = 0.0
     cache_read_usd: float = 0.0
     cache_creation_usd: float = 0.0
+    # v2.6 language-tax track (additive; defaults keep pre-v2.6 behaviour
+    # and equality with a bare ``CostBreakdown()``).
+    language_tax_multiplier: float = 1.0
+    language_tax_usd: float = 0.0
 _PER_MILLION: float = 1_000_000.0
@@ -102,6 +116,7 @@ def compute_cost_for_attempt(
     output_tokens: int,
     cache_read_input_tokens: int,
     cache_creation_input_tokens: int,
+    language_tax: LanguageTaxBreakdown | None = None,
 ) -> CostBreakdown:
     """Translate per-attempt token counts into a USD :class:`CostBreakdown`.
@@ -144,6 +159,21 @@ def compute_cost_for_attempt(
     full_rate_for_cache_read = safe_read * input_rate
     savings_usd = full_rate_for_cache_read - cache_read_usd
+    # v2.6 language tax: the share of fresh-input spend attributable to
+    # the CJK over-count vs the char/4 English baseline. Defaults to a
+    # 1.0 multiplier / $0 when no LanguageTaxBreakdown is supplied, so
+    # the pre-v2.6 call shape is unchanged.
+    lt_multiplier = 1.0
+    lt_usd = 0.0
+    if language_tax is not None:
+        lt_multiplier = language_tax.tax_multiplier
+        from coderouter.language_tax import language_tax_usd
+        lt_usd = language_tax_usd(
+            language_tax.extra_tokens,
+            input_tokens_per_million=cost_config.input_tokens_per_million,
+        )
     return CostBreakdown(
         total_usd=total_usd,
         savings_usd=max(savings_usd, 0.0),
@@ -151,4 +181,6 @@ def compute_cost_for_attempt(
         output_usd=output_usd,
         cache_read_usd=cache_read_usd,
         cache_creation_usd=cache_creation_usd,
+        language_tax_multiplier=lt_multiplier,
+        language_tax_usd=lt_usd,
     )

{coderouter_cli-2.5.4 → coderouter_cli-2.6.0}/coderouter/ingress/dashboard_routes.py RENAMED Viewed

@@ -165,6 +165,26 @@ _DASHBOARD_HTML = r"""<!doctype html>
   </main>
   <footer class="max-w-7xl mx-auto px-4 md:px-6 pb-8">
+    <!-- Panel: Cost & Language Tax (v2.6) -->
+    <section class="bg-slate-900/60 border border-slate-800 rounded-lg p-4 mb-4">
+      <h2 class="text-sm font-semibold uppercase tracking-wider text-slate-400 mb-3">Cost &amp; Language Tax</h2>
+      <div class="grid grid-cols-3 gap-3">
+        <div class="rounded-md bg-slate-800/50 p-3">
+          <div class="text-xs text-slate-400">Total spend</div>
+          <div class="text-2xl font-semibold tabnum" data-bind="cost_total">$0.00</div>
+        </div>
+        <div class="rounded-md bg-slate-800/50 p-3">
+          <div class="text-xs text-slate-400">Cache savings</div>
+          <div class="text-2xl font-semibold tabnum text-green-400" data-bind="cost_savings">$0.00</div>
+        </div>
+        <div class="rounded-md bg-slate-800/50 p-3">
+          <div class="text-xs text-slate-400">Language tax (CJK)</div>
+          <div class="text-2xl font-semibold tabnum text-amber-400" data-bind="language_tax_total">$0.00</div>
+          <div class="text-xs text-slate-500" data-bind="language_tax_hint">no tokenizer configured</div>
+        </div>
+      </div>
+      <div id="language-tax-by-provider" class="text-xs text-slate-400 tabnum mt-3"></div>
+    </section>
     <section class="bg-slate-900/60 border border-slate-800 rounded-lg p-4">
       <h2 class="text-sm font-semibold uppercase tracking-wider text-slate-400 mb-3">Usage Mix</h2>
       <div id="usage-bar" class="flex h-3 rounded-full overflow-hidden bg-slate-800" role="img" aria-label="usage mix"></div>
@@ -435,6 +455,27 @@ _DASHBOARD_HTML = r"""<!doctype html>
     {"&": "&amp;", "<": "&lt;", ">": "&gt;", '"': "&quot;", "'": "&#39;"}[c]
   ));
+  // v2.6: cost + language-tax panel. The collector zero-fills these, so
+  // a fresh/local-only deployment shows $0.00 across the board.
+  const renderCostTax = (snap) => {
+    const c = snap.counters || {};
+    const usd = (x) => "$" + (Number(x) || 0).toFixed(4);
+    setBind("cost_total", usd(c.cost_total_usd_aggregate));
+    setBind("cost_savings", usd(c.cost_savings_usd_aggregate));
+    const taxTotal = Number(c.language_tax_usd_aggregate) || 0;
+    setBind("language_tax_total", usd(taxTotal));
+    setBind("language_tax_hint",
+      taxTotal > 0 ? "extra paid for CJK vs char/4 baseline"
+                   : "no tax measured (set provider tokenizer_path)");
+    const byProv = c.language_tax_usd || {};
+    const el = document.getElementById("language-tax-by-provider");
+    const rows = Object.entries(byProv).filter(([, v]) => Number(v) > 0);
+    el.innerHTML = rows.length === 0 ? "" :
+      rows.map(([n, v]) =>
+        '<span class="mr-4"><span class="text-slate-500">' + escapeHTML(n) +
+        '</span> ' + usd(v) + '</span>').join("");
+  };
   const renderSnapshot = (snap) => {
     const startup = snap.startup || {};
     const cfg = snap.config || {};
@@ -451,6 +492,7 @@ _DASHBOARD_HTML = r"""<!doctype html>
     renderSparkline(snap);
     renderRecent(snap);
     renderUsageMix(snap);
+    renderCostTax(snap);
   };
   const renderError = (msg) => {

coderouter_cli-2.6.0/coderouter/language_tax.py ADDED Viewed

@@ -0,0 +1,244 @@
+"""Language-tax measurement (Phase 1 PoC, 5-deps invariant).
+Why this module exists
+======================
+Cloud LLM tokenizers charge CJK text far more tokens-per-character
+than English. CodeRouter's core router uses a ``char/4`` heuristic
+(:mod:`coderouter.token_estimation`) which is *conservative for CJK*
+— i.e. it **under-counts** Japanese/Chinese/Korean text. That gap is
+the "language tax": a Japanese prompt that the heuristic prices at N
+tokens is actually billed at ~1.2-1.5x N by the cloud provider.
+Local models are unaffected (no per-token billing), so the tax only
+matters on the cloud leg. This module quantifies it so the cost
+tracker / dashboard can surface "how much extra am I paying to work
+in Japanese?".
+Design constraints (mirrors token_estimation_accurate.py)
+=========================================================
+* **No new core dependency.** CJK detection is pure ``str`` + Unicode
+  range checks (stdlib only). The *accurate* token count is delegated
+  to :func:`coderouter.token_estimation_accurate.count_tokens`, whose
+  precise backend (HuggingFace ``tokenizers``) is the existing
+  optional ``accuracy`` extra. When that backend is absent every
+  function still returns a sane value — the tax_multiplier simply
+  collapses to 1.0 because both legs use char/4.
+* **Local only / no network.** No tokenizer is ever downloaded; we
+  only pass through a caller-supplied local ``tokenizer.json`` path.
+* **Leaf module.** Imports only ``token_estimation`` /
+  ``token_estimation_accurate`` (both leaves), never the engine or
+  collector — keeps it trivially testable and circular-import-free.
+The tax multiplier, defined
+===========================
+``tax_multiplier = tokens_accurate / tokens_heuristic``
+where ``tokens_heuristic`` is the char/4 estimate (CodeRouter's
+English-calibrated baseline) and ``tokens_accurate`` is the real
+tokenizer count. Reading it:
+* English / code text → real tokenizers land near char/4, so the
+  multiplier is ~1.0 (no tax).
+* Japanese prose → real tokenizers emit ~0.5-1.0 tokens/char vs the
+  0.25 the heuristic assumes, so the multiplier lands ~2.0-4.0 on
+  *pure* CJK and ~1.2-1.5 on realistic mixed coding prompts (CJK
+  comments/instructions + ASCII code/identifiers).
+Confidence: **MODERATE.** char/4 is itself an approximation of
+English, so the multiplier is "tax relative to CodeRouter's own
+English baseline", not a lab-grade JA-vs-EN figure. It is, however,
+fully measurable with zero network and no guessing — which is why we
+prefer it to a translate-and-compare counterfactual.
+"""
+from __future__ import annotations
+from dataclasses import dataclass
+from pathlib import Path
+from typing import Any
+from coderouter.token_estimation import (
+    CHARS_PER_TOKEN_HEURISTIC,
+    extract_text_from_anthropic_request,
+)
+from coderouter.token_estimation_accurate import count_tokens
+# ---------------------------------------------------------------------------
+# CJK Unicode ranges
+# ---------------------------------------------------------------------------
+#
+# We count a character as "CJK" when it falls in one of the blocks that
+# real tokenizers fragment heavily. Latin, digits, punctuation and
+# whitespace are excluded so that an ASCII-only prompt scores 0.0 and a
+# pure-Japanese prompt scores ~1.0. Half-width katakana and full-width
+# forms are included because they tokenize like their full-width kin.
+#
+# Ranges are (low, high) inclusive code points.
+_CJK_RANGES: tuple[tuple[int, int], ...] = (
+    (0x3040, 0x309F),  # Hiragana
+    (0x30A0, 0x30FF),  # Katakana
+    (0x3400, 0x4DBF),  # CJK Unified Ideographs Extension A
+    (0x4E00, 0x9FFF),  # CJK Unified Ideographs (common Kanji/Hanzi)
+    (0xF900, 0xFAFF),  # CJK Compatibility Ideographs
+    (0xFF00, 0xFFEF),  # Half/Full-width forms (full-width punct, half kana)
+    (0x3000, 0x303F),  # CJK symbols & punctuation (、。「」etc.)
+    (0xAC00, 0xD7A3),  # Hangul syllables (Korean)
+    (0x1100, 0x11FF),  # Hangul Jamo
+    (0x20000, 0x2A6DF),  # CJK Ext. B (rare ideographs)
+)
+def _is_cjk(cp: int) -> bool:
+    return any(low <= cp <= high for low, high in _CJK_RANGES)
+# ---------------------------------------------------------------------------
+# Public API
+# ---------------------------------------------------------------------------
+def cjk_char_ratio(text: str) -> float:
+    """Fraction of *non-whitespace* characters in ``text`` that are CJK.
+    Whitespace is excluded from the denominator so that indentation /
+    blank lines in a code block don't dilute the score. Returns ``0.0``
+    for empty or whitespace-only / pure-ASCII text and ``1.0`` for pure
+    CJK. The value feeds the Phase-2 ``cjk_ratio_min`` auto-route
+    matcher and the Phase-1 reporting below.
+    """
+    if not text:
+        return 0.0
+    cjk = 0
+    total = 0
+    for ch in text:
+        if ch.isspace():
+            continue
+        total += 1
+        if _is_cjk(ord(ch)):
+            cjk += 1
+    if total == 0:
+        return 0.0
+    return cjk / total
+@dataclass(frozen=True)
+class LanguageTaxBreakdown:
+    """Per-text language-tax measurement.
+    Fields
+        char_count: non-whitespace-inclusive length of the text.
+        cjk_ratio: see :func:`cjk_char_ratio` (0.0-1.0).
+        tokens_heuristic: char/4 estimate (CodeRouter's English
+            baseline). Always available.
+        tokens_accurate: real tokenizer count when a ``tokenizer_path``
+            was supplied *and* the optional backend is installed;
+            otherwise equals ``tokens_heuristic`` (graceful fallback).
+        accurate_available: whether ``tokens_accurate`` came from the
+            precise backend (True) or fell back to char/4 (False).
+        tax_multiplier: ``tokens_accurate / tokens_heuristic``; 1.0
+            when no tax is measurable. See module docstring for the
+            MODERATE-confidence caveat.
+        extra_tokens: ``tokens_accurate - tokens_heuristic`` (>= 0 for
+            CJK; the visible "tax" in tokens).
+    """
+    char_count: int = 0
+    cjk_ratio: float = 0.0
+    tokens_heuristic: int = 0
+    tokens_accurate: int = 0
+    accurate_available: bool = False
+    tax_multiplier: float = 1.0
+    extra_tokens: int = 0
+def estimate_language_tax(
+    text: str,
+    *,
+    tokenizer_path: str | Path | None = None,
+) -> LanguageTaxBreakdown:
+    """Measure the language tax of ``text``.
+    With ``tokenizer_path`` pointing at a readable local
+    ``tokenizer.json`` (and the ``accuracy`` extra installed), the
+    accurate leg uses the real tokenizer and the multiplier reflects
+    the true char/4 under-count. Without it, both legs use char/4 and
+    the multiplier is 1.0 — the function never raises and never
+    touches the network.
+    """
+    if not text:
+        return LanguageTaxBreakdown()
+    heuristic = len(text) // CHARS_PER_TOKEN_HEURISTIC
+    accurate_raw = count_tokens(text, tokenizer_path=tokenizer_path)
+    # When the precise backend is unavailable, count_tokens returns the
+    # same char/4 value, so accurate == heuristic and we report no tax.
+    accurate_available = tokenizer_path is not None and accurate_raw != heuristic
+    # Guard against a zero-heuristic (text shorter than 4 chars) to keep
+    # the multiplier finite and meaningful.
+    if heuristic <= 0:
+        multiplier = 1.0
+        extra = max(accurate_raw - 0, 0)
+    else:
+        multiplier = accurate_raw / heuristic
+        extra = accurate_raw - heuristic
+    return LanguageTaxBreakdown(
+        char_count=len(text),
+        cjk_ratio=cjk_char_ratio(text),
+        tokens_heuristic=heuristic,
+        tokens_accurate=accurate_raw,
+        accurate_available=accurate_available,
+        tax_multiplier=multiplier,
+        extra_tokens=max(extra, 0),
+    )
+def language_tax_usd(
+    extra_tokens: int,
+    *,
+    input_tokens_per_million: float | None,
+) -> float:
+    """USD attributable to the language tax for one request leg.
+    ``extra_tokens`` is the :attr:`LanguageTaxBreakdown.extra_tokens`
+    delta; pricing is the provider's normal input rate. Returns 0.0 for
+    a free / unpriced (typically local) provider — mirroring
+    :func:`coderouter.cost.compute_cost_for_attempt`'s zero-on-None
+    behaviour so callers never special-case local models.
+    """
+    if not input_tokens_per_million or extra_tokens <= 0:
+        return 0.0
+    return extra_tokens * (input_tokens_per_million / 1_000_000.0)
+def estimate_language_tax_for_request(
+    system: Any,
+    messages: list[Any],
+    *,
+    tokenizer_path: str | Path | None = None,
+) -> LanguageTaxBreakdown:
+    """Measure the language tax of a whole Anthropic-shaped request.
+    Convenience wrapper used by the engine's cost-emit path: pulls the
+    concatenated request text (system + message text blocks) and runs it
+    through :func:`estimate_language_tax`. With no ``tokenizer_path`` the
+    multiplier is 1.0 (inert), so calling this on every request is safe
+    and cheap — the engine only invokes it when a provider declares a
+    local ``tokenizer.json``.
+    """
+    text = extract_text_from_anthropic_request(system=system, messages=messages)
+    return estimate_language_tax(text, tokenizer_path=tokenizer_path)
+__all__ = [
+    "LanguageTaxBreakdown",
+    "cjk_char_ratio",
+    "estimate_language_tax",
+    "estimate_language_tax_for_request",
+    "language_tax_usd",
+]

{coderouter_cli-2.5.4 → coderouter_cli-2.6.0}/coderouter/logging.py RENAMED Viewed

@@ -971,6 +971,10 @@ class CacheObservedPayload(TypedDict):
     streaming: bool
     cost_usd: float
     cost_savings_usd: float
+    # v2.6 language-tax track (optional; default 0.0 / 1.0 at the emit
+    # site keeps pre-v2.6 callers and log consumers working unchanged).
+    language_tax_usd: float
+    language_tax_multiplier: float
 def log_cache_observed(
@@ -986,6 +990,8 @@ def log_cache_observed(
     streaming: bool,
     cost_usd: float = 0.0,
     cost_savings_usd: float = 0.0,
+    language_tax_usd: float = 0.0,
+    language_tax_multiplier: float = 1.0,
 ) -> None:
     """Emit a ``cache-observed`` info record with the unified shape.
@@ -1013,6 +1019,8 @@ def log_cache_observed(
         "streaming": streaming,
         "cost_usd": cost_usd,
         "cost_savings_usd": cost_savings_usd,
+        "language_tax_usd": language_tax_usd,
+        "language_tax_multiplier": language_tax_multiplier,
     }
     logger.info("cache-observed", extra=payload)

{coderouter_cli-2.5.4 → coderouter_cli-2.6.0}/coderouter/metrics/collector.py RENAMED Viewed

@@ -190,6 +190,13 @@ class MetricsCollector(logging.Handler):
         self._cost_total_usd_aggregate: float = 0.0
         self._cost_savings_usd_aggregate: float = 0.0
+        # v2.6: per-provider language-tax spend — the USD share of input
+        # cost attributable to the CJK over-count vs the char/4 baseline.
+        # Zero for English/code workloads and for providers without a
+        # configured tokenizer_path. Surfaced alongside cost_total_usd.
+        self._language_tax_usd: dict[str, float] = {}
+        self._language_tax_usd_aggregate: float = 0.0
         # v2.0-F (L1): context budget guard counters. Per-profile counts
         # of warnings (over warn threshold) and trims (messages removed).
         # The ``latest_usage_ratio`` dict records the most recent ratio
@@ -388,6 +395,22 @@ class MetricsCollector(logging.Handler):
                         self._cost_savings_usd.get(provider, 0.0) + savings_usd
                     )
                     self._cost_savings_usd_aggregate += savings_usd
+                # v2.6: language-tax spend. Same defensive coercion as the
+                # cost fields; defaults to 0.0 for pre-v2.6 log lines and
+                # English/code traffic, so the aggregate only moves on
+                # CJK-heavy requests against a tokenizer-configured provider.
+                lt_usd_raw = extras.get("language_tax_usd", 0.0)
+                lt_usd = (
+                    float(lt_usd_raw)
+                    if isinstance(lt_usd_raw, int | float)
+                    else 0.0
+                )
+                if lt_usd > 0.0:
+                    self._language_tax_usd[provider] = (
+                        self._language_tax_usd.get(provider, 0.0) + lt_usd
+                    )
+                    self._language_tax_usd_aggregate += lt_usd
             elif event == "context-budget-warning":
                 # v2.0-F (L1): context usage exceeded the warn threshold.
                 # Track per-profile and aggregate, plus latest ratio gauge.
@@ -522,6 +545,10 @@ class MetricsCollector(logging.Handler):
                         "savings_usd": round(
                             self._cost_savings_usd.get(name, 0.0), 6
                         ),
+                        # v2.6: per-provider language-tax spend.
+                        "language_tax_usd": round(
+                            self._language_tax_usd.get(name, 0.0), 6
+                        ),
                     },
                 }
                 for name in providers
@@ -589,6 +616,14 @@ class MetricsCollector(logging.Handler):
                     "cost_savings_usd_aggregate": round(
                         self._cost_savings_usd_aggregate, 6
                     ),
+                    # v2.6: per-provider + aggregate language-tax spend.
+                    "language_tax_usd": {
+                        n: round(v, 6)
+                        for n, v in self._language_tax_usd.items()
+                    },
+                    "language_tax_usd_aggregate": round(
+                        self._language_tax_usd_aggregate, 6
+                    ),
                     # v2.0-F (L1): context budget guard aggregate counters.
                     "context_budget_warnings_total": self._context_budget_warnings_total,
                     "context_budget_trims_total": self._context_budget_trims_total,
@@ -682,6 +717,13 @@ class MetricsCollector(logging.Handler):
             self._cost_savings_usd_aggregate += float(
                 state.get("cost_savings_usd_aggregate", 0.0)
             )
+            for k, v in (state.get("language_tax_usd") or {}).items():
+                self._language_tax_usd[k] = (
+                    self._language_tax_usd.get(k, 0.0) + float(v)
+                )
+            self._language_tax_usd_aggregate += float(
+                state.get("language_tax_usd_aggregate", 0.0)
+            )
             self._chain_paid_gate_blocked_total += int(
                 state.get("chain_paid_gate_blocked_total", 0)
             )
@@ -737,6 +779,9 @@ class MetricsCollector(logging.Handler):
             self._cost_savings_usd.clear()
             self._cost_total_usd_aggregate = 0.0
             self._cost_savings_usd_aggregate = 0.0
+            # v2.6
+            self._language_tax_usd.clear()
+            self._language_tax_usd_aggregate = 0.0
             # v2.0-H (L6)
             self._partial_stitch_surfaced_total = 0
             # v2.0-I

coderouter-cli 2.5.4__tar.gz → 2.6.0__tar.gz

coderouter-cli 2.5.4tar.gz → 2.6.0tar.gz