PyPI - coderouter-cli - Versions diffs - 2.0.0__tar.gz → 2.1.0__tar.gz - Mend

coderouter-cli 2.0.0tar.gz → 2.1.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (163) hide show

{coderouter_cli-2.0.0 → coderouter_cli-2.1.0}/CHANGELOG.md RENAMED Viewed

@@ -6,6 +6,83 @@ versioning follows [SemVer](https://semver.org/).
 ---
+## [v2.1.0] — 2026-05-05 (Long-run Reliability 完成 — v2.0-G/H/I)
+**Theme: L4 品質劣化 / L6 mid-stream 失敗 / L5 idle 時障害の 3 系統を同時解決し、Long-run Reliability pillar を完成させる。** v2.0-F (L1 context overflow) と合わせ、6 系統障害のうち 4 系統を CodeRouter が能動的にガードする状態に到達。
+### v2.0-G: Drift Detection (L4 品質劣化ガード)
+**長時間 agent session でモデル応答品質が徐々に劣化する "drift" を自動検知し、corrective action を実行。** Ollama ローカルモデルが数時間稼働すると KV cache 汚染や VRAM 圧迫で応答が空になる / 短くなる / tool_use を返さなくなる現象 (L4) を 5 つのシグナルで検知。warn → promote (chain 降格) → reload (Ollama KV flush) の 3 段階アクションで品質を自動回復。
+| 機能 | 説明 |
+|---|---|
+| 5 Signal Detector | empty_response_rate / length_collapse / tool_silence_rate / stop_anomaly_rate / error_rate を per-provider rolling window で監視 |
+| `detect_drift()` | Pure function — severity none/mild/severe 判定 (severe×1 or mild×2 → severe) |
+| `drift_detection_action: off/warn/promote/reload` | profile 単位で guard 有効化 (default: off) |
+| `drift_detection_sensitivity: low/normal/high` | 閾値プリセット選択 |
+| promote action | AdaptiveAdjuster の rank demotion で traffic を別 provider へ迂回 |
+| reload action | Ollama `keep_alive=0` で KV cache flush → fresh context で再開 |
+| Cooldown & Recovery | 設定秒数後に rank 復帰 + window クリア |
+| `X-CodeRouter-Drift` header | response header で mild/severe ステータスを通知 (streaming 対応) |
+| Prometheus metrics | `coderouter_drift_detected_total`, `coderouter_drift_promoted_total`, `coderouter_drift_reload_total` |
+- Tests: ~930 → **~970** (+40, drift_detection 27 + drift_integration 10 + drift_actions 5)
+- Runtime deps: 5 → 5 (**36 sub-release 連続据え置き**)
+- Backward compat: 完全互換、`drift_detection_action` default は `"off"` — opt-in するまで既存挙動完全一致
+### 設定例
+```yaml
+profiles:
+  - name: long-session
+    providers: [ollama-qwen3]
+    drift_detection_action: reload      # off | warn | promote | reload
+    drift_detection_sensitivity: normal # low | normal | high
+    drift_detection_window_size: 20     # rolling window サイズ
+    drift_detection_cooldown_s: 300     # 復帰までの待機秒数
+```
+### 新規ファイル
+- `coderouter/guards/drift_detection.py` — 検知ロジック (observation model + detector + window manager)
+- `coderouter/guards/drift_actions.py` — reload action (Ollama KV flush)
+- `tests/test_drift_detection.py` — pure function tests (27 本)
+- `tests/test_drift_detection_integration.py` — engine integration tests (10 本)
+- `tests/test_drift_actions.py` — reload action tests (5 本)
+- `docs/drift-detection.md` — ユーザードキュメント
+### v2.0-H: Mid-stream Partial Stitching (L6 拡張)
+**streaming 応答が途中で失敗した際、蓄積済み��キストを破棄せずクライアントに返却。**
+| 機能 | 説明 |
+|---|---|
+| `_StreamUsageAccumulator` text 蓄積 | content_block_start/delta/stop を追跡し text block を in-memory 蓄積 |
+| `MidStreamError.partial_content` | 例外に蓄積テキストを搬送 (tool_use 部分 JSON は除外) |
+| `partial_stitch_action: off/surface` | profile 単位で有効化 (default: off) |
+| `event: coderouter_partial` | 蓄積テキスト + provider + reason を SSE メタデータとして返却 |
+| Prometheus metric | `coderouter_partial_stitch_surfaced_total` |
+### v2.0-I: Continuous Probing (L5 能動ヘルスチェック)
+**idle 時間帯のプロバイダ障害を能動的に検知し backend health state machine を更新。**
+| 機能 | 説明 |
+|---|---|
+| `probe_one()` | 1-token completion で全 model pipeline の正常性を確認 |
+| `probe_loop()` | asyncio background task — sequential probe + graceful shutdown |
+| `continuous_probe: off/active` | グローバル config で有効化 (default: off) |
+| Model drift detection | probe response の model 名と config を照合 → 不一致で warn |
+| Prometheus metrics | `probe_total`, `probe_outcomes_total`, `probe_rounds_total`, `probe_latency_ms`, `probe_drift_detected_total` |
+### 全体サマリ
+- Tests: ~930 → **~1005** (+75)
+- Runtime deps: 5 → 5 (**38 sub-release 連続据え置き**)
+- Backward compat: 完全互換、全機能 default off — opt-in するまで既���挙動完全一致
+---
 ## [v2.0.0] — 2026-05-05 (Context Budget Management — L1 overflow 防止)
 **Theme: 長時間 agent session の context overflow を未然に防止する guard を実装。** Claude Code / Cline / OpenClaw 等の agentic session が 8 時間超え loop で動くと messages が context window に漸近し、backend が 400 / truncation を返して session 死亡する問題 (L1) を根本解決。warn (80%) → auto trim (90%) の 2 段階 guard で overflow をゼロに。

{coderouter_cli-2.0.0 → coderouter_cli-2.1.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: coderouter-cli
-Version: 2.0.0
+Version: 2.1.0
 Summary: Local-first, free-first, fallback-built-in LLM router. Claude Code / OpenAI compatible.
 Project-URL: Homepage, https://github.com/zephel01/CodeRouter
 Project-URL: Repository, https://github.com/zephel01/CodeRouter
@@ -60,7 +60,7 @@ Description-Content-Type: text/markdown
 <p align="center">
   <a href="https://github.com/zephel01/CodeRouter/actions/workflows/ci.yml"><img src="https://github.com/zephel01/CodeRouter/actions/workflows/ci.yml/badge.svg?branch=main" alt="CI"></a>
   <a href=""><img src="https://img.shields.io/badge/status-stable-brightgreen" alt="status"></a>
-  <a href=""><img src="https://img.shields.io/badge/version-2.0.0-blue" alt="version"></a>
+  <a href=""><img src="https://img.shields.io/badge/version-2.1.0-blue" alt="version"></a>
   <a href=""><img src="https://img.shields.io/badge/python-3.12%2B-blue" alt="python"></a>
   <a href=""><img src="https://img.shields.io/badge/runtime%20deps-5-brightgreen" alt="deps"></a>
   <a href=""><img src="https://img.shields.io/badge/license-MIT-yellow" alt="license"></a>
@@ -91,7 +91,8 @@ Description-Content-Type: text/markdown
 - **v1.10.0 で Long-run reliability pillar が完成**: `cost.monthly_budget_usd` で provider 月次 USD 予算を強制、**L2 memory pressure detector**（Ollama / LM Studio が VRAM 切れで OOM になった時に自動クールダウン）、**L5 backend health 状態機械**（連続失敗で UNHEALTHY → chain 末尾に降格、1 回成功で即復帰）
 - **v1.10.0 で auto-router が 6 matcher に揃う**: `has_image` / `code_fence_ratio_min` / `content_contains` / `content_regex` / `model_pattern`（Opus/Sonnet/Haiku 分岐）/ `content_token_count_min`（長文 → 1M ctx Gemini Flash 等へ自動切替）
 - **v2.0.0 で Context Budget Management (L1 overflow 防止) を搭載**: 長時間 agent session で messages が context window に漸近 → backend 400 エラーで session 死亡する問題を根本解決。warn (80%) → auto trim (90%) の 2 段階 guard で **context overflow ゼロ**を実現。tool_use / tool_result ペアは atomic 保全、`X-CodeRouter-Context-Budget` ヘッダで状態通知、Prometheus メトリクス完備
-- ランタイム依存 5 個（`fastapi` / `uvicorn` / `httpx` / `pydantic` / `pyyaml`）— 純 Python、MIT、テスト 930 本緑
+- **v2.1.0 で Long-run Reliability 3 機能を追加搭載**: **Drift Detection** (L4 品質劣化検知 — 5 シグナル rolling window + warn/promote/reload 3 段階アクション)、**Partial Stitching** (L6 mid-stream 失敗時の蓄積テキスト返却)、**Continuous Probing** (P3 idle 時 1-token 定期 probe + model drift 検知 + backend health 自動更新)
+- ランタイム依存 5 個（`fastapi` / `uvicorn` / `httpx` / `pydantic` / `pyyaml`）— 純 Python、MIT、テスト 950 本緑
 → **Claude Code / gemini-cli / codex + Ollama / llama.cpp / NVIDIA NIM で、破綻しない local-first agent が組める**
@@ -198,7 +199,7 @@ CodeRouter / Voice Bridge ともに独立した repo で進化していて、HTT
 ## クイックスタート（3 コマンド）
-**v2.0.0 で Context Budget Management (L1 overflow 防止) を搭載** — 長時間 agent session が context window を使い切って死ぬ問題を根本解決。`uvx` 一発で動きます (Python 3.12 以上必須):
+**v2.1.0 で Long-run Reliability pillar 完成** — Context Budget (L1) + Drift Detection (L4) + Partial Stitching (L6) + Continuous Probing (P3)。`uvx` 一発で動きます (Python 3.12 以上必須):
 ```bash
 # 1. サンプル設定を置く
@@ -266,9 +267,9 @@ CodeRouter 自体は純 Python 3.12+ で、実質的な OS 対応範囲は `min(
 注意点や「ローカル GPU なし」向けレシピを含むフル版マトリクス: [利用ガイド §1](./docs/usage-guide.md#1-os-互換性)
-## ステータス — v2.0.0 (2026-05)
+## ステータス — v2.1.0 (2026-05)
-**テスト 930 本通過。ランタイム依存 5 個 (36 sub-release 連続据え置き)。macOS / Linux / Windows WSL2 で動作。** v2.0.0 で **Context Budget Management (L1 overflow 防止)** を搭載 — 長時間 agent session が context window を使い切って死ぬ問題を根本解決。v1.10.0 までに **Long-run Reliability** (L2/L3/L5)、**Cost pillar**、**Auto-router 6 matcher** が完成済み。v1.0 の総まとめは [`docs/retrospectives/v1.0.md`](./docs/retrospectives/v1.0.md)。
+**テスト 950 本通過。ランタイム依存 5 個 (39 sub-release 連続据え置き)。macOS / Linux / Windows WSL2 で動作。** v2.1.0 で **Long-run Reliability pillar が完成** — Context Budget (L1) / Drift Detection (L4) / Partial Stitching (L6) / Continuous Probing (P3) の 4 sub-release を統合出荷。v1.10.0 までに **Long-run Reliability** (L2/L3/L5)、**Cost pillar**、**Auto-router 6 matcher** が完成済み。v1.0 の総まとめは [`docs/retrospectives/v1.0.md`](./docs/retrospectives/v1.0.md)。
 今日の CodeRouter が届ける価値:

{coderouter_cli-2.0.0 → coderouter_cli-2.1.0}/README.md RENAMED Viewed

@@ -19,7 +19,7 @@
 <p align="center">
   <a href="https://github.com/zephel01/CodeRouter/actions/workflows/ci.yml"><img src="https://github.com/zephel01/CodeRouter/actions/workflows/ci.yml/badge.svg?branch=main" alt="CI"></a>
   <a href=""><img src="https://img.shields.io/badge/status-stable-brightgreen" alt="status"></a>
-  <a href=""><img src="https://img.shields.io/badge/version-2.0.0-blue" alt="version"></a>
+  <a href=""><img src="https://img.shields.io/badge/version-2.1.0-blue" alt="version"></a>
   <a href=""><img src="https://img.shields.io/badge/python-3.12%2B-blue" alt="python"></a>
   <a href=""><img src="https://img.shields.io/badge/runtime%20deps-5-brightgreen" alt="deps"></a>
   <a href=""><img src="https://img.shields.io/badge/license-MIT-yellow" alt="license"></a>
@@ -50,7 +50,8 @@
 - **v1.10.0 で Long-run reliability pillar が完成**: `cost.monthly_budget_usd` で provider 月次 USD 予算を強制、**L2 memory pressure detector**（Ollama / LM Studio が VRAM 切れで OOM になった時に自動クールダウン）、**L5 backend health 状態機械**（連続失敗で UNHEALTHY → chain 末尾に降格、1 回成功で即復帰）
 - **v1.10.0 で auto-router が 6 matcher に揃う**: `has_image` / `code_fence_ratio_min` / `content_contains` / `content_regex` / `model_pattern`（Opus/Sonnet/Haiku 分岐）/ `content_token_count_min`（長文 → 1M ctx Gemini Flash 等へ自動切替）
 - **v2.0.0 で Context Budget Management (L1 overflow 防止) を搭載**: 長時間 agent session で messages が context window に漸近 → backend 400 エラーで session 死亡する問題を根本解決。warn (80%) → auto trim (90%) の 2 段階 guard で **context overflow ゼロ**を実現。tool_use / tool_result ペアは atomic 保全、`X-CodeRouter-Context-Budget` ヘッダで状態通知、Prometheus メトリクス完備
-- ランタイム依存 5 個（`fastapi` / `uvicorn` / `httpx` / `pydantic` / `pyyaml`）— 純 Python、MIT、テスト 930 本緑
+- **v2.1.0 で Long-run Reliability 3 機能を追加搭載**: **Drift Detection** (L4 品質劣化検知 — 5 シグナル rolling window + warn/promote/reload 3 段階アクション)、**Partial Stitching** (L6 mid-stream 失敗時の蓄積テキスト返却)、**Continuous Probing** (P3 idle 時 1-token 定期 probe + model drift 検知 + backend health 自動更新)
+- ランタイム依存 5 個（`fastapi` / `uvicorn` / `httpx` / `pydantic` / `pyyaml`）— 純 Python、MIT、テスト 950 本緑
 → **Claude Code / gemini-cli / codex + Ollama / llama.cpp / NVIDIA NIM で、破綻しない local-first agent が組める**
@@ -157,7 +158,7 @@ CodeRouter / Voice Bridge ともに独立した repo で進化していて、HTT
 ## クイックスタート（3 コマンド）
-**v2.0.0 で Context Budget Management (L1 overflow 防止) を搭載** — 長時間 agent session が context window を使い切って死ぬ問題を根本解決。`uvx` 一発で動きます (Python 3.12 以上必須):
+**v2.1.0 で Long-run Reliability pillar 完成** — Context Budget (L1) + Drift Detection (L4) + Partial Stitching (L6) + Continuous Probing (P3)。`uvx` 一発で動きます (Python 3.12 以上必須):
 ```bash
 # 1. サンプル設定を置く
@@ -225,9 +226,9 @@ CodeRouter 自体は純 Python 3.12+ で、実質的な OS 対応範囲は `min(
 注意点や「ローカル GPU なし」向けレシピを含むフル版マトリクス: [利用ガイド §1](./docs/usage-guide.md#1-os-互換性)
-## ステータス — v2.0.0 (2026-05)
+## ステータス — v2.1.0 (2026-05)
-**テスト 930 本通過。ランタイム依存 5 個 (36 sub-release 連続据え置き)。macOS / Linux / Windows WSL2 で動作。** v2.0.0 で **Context Budget Management (L1 overflow 防止)** を搭載 — 長時間 agent session が context window を使い切って死ぬ問題を根本解決。v1.10.0 までに **Long-run Reliability** (L2/L3/L5)、**Cost pillar**、**Auto-router 6 matcher** が完成済み。v1.0 の総まとめは [`docs/retrospectives/v1.0.md`](./docs/retrospectives/v1.0.md)。
+**テスト 950 本通過。ランタイム依存 5 個 (39 sub-release 連続据え置き)。macOS / Linux / Windows WSL2 で動作。** v2.1.0 で **Long-run Reliability pillar が完成** — Context Budget (L1) / Drift Detection (L4) / Partial Stitching (L6) / Continuous Probing (P3) の 4 sub-release を統合出荷。v1.10.0 までに **Long-run Reliability** (L2/L3/L5)、**Cost pillar**、**Auto-router 6 matcher** が完成済み。v1.0 の総まとめは [`docs/retrospectives/v1.0.md`](./docs/retrospectives/v1.0.md)。
 今日の CodeRouter が届ける価値:

{coderouter_cli-2.0.0 → coderouter_cli-2.1.0}/coderouter/config/schemas.py RENAMED Viewed

@@ -531,6 +531,73 @@ class FallbackChain(BaseModel):
         ),
     )
+    # ------------------------------------------------------------------
+    # v2.0-G (L4): Drift detection — response quality degradation guard
+    # ------------------------------------------------------------------
+    #
+    # Long-running sessions on local LLMs can suffer gradual quality
+    # decay (KV cache pressure, thermal throttling, VRAM fragmentation)
+    # where the model "succeeds" but produces empty/short/toolless
+    # responses. This guard observes response quality signals in a
+    # rolling window and detects statistical drift.
+    #
+    # Four actions:
+    #   * ``off``     — no detection (default).
+    #   * ``warn``    — emit structured log + response header.
+    #   * ``promote`` — ``warn`` + demote drifted provider in chain.
+    #   * ``reload``  — ``promote`` + attempt KV cache flush (Ollama).
+    drift_detection_action: Literal["off", "warn", "promote", "reload"] = Field(
+        default="off",
+        description=(
+            "v2.0-G (L4): action on response quality drift detection. "
+            "``off`` (default) disables drift detection. ``warn`` emits "
+            "a log and response header. ``promote`` additionally demotes "
+            "the drifted provider in the chain. ``reload`` attempts to "
+            "flush the provider's KV cache (Ollama only) before promoting."
+        ),
+    )
+    drift_detection_window_size: int = Field(
+        default=20,
+        ge=4,
+        le=200,
+        description=(
+            "v2.0-G (L4): number of recent responses to keep in the "
+            "rolling observation window per provider. Larger windows "
+            "are more robust to noise but slower to detect drift."
+        ),
+    )
+    drift_detection_cooldown_s: int = Field(
+        default=300,
+        ge=10,
+        le=3600,
+        description=(
+            "v2.0-G (L4): seconds after a promote/reload action before "
+            "the drifted provider's rank is reset for recovery check. "
+            "Default 300s (5 min) gives the model time to stabilize."
+        ),
+    )
+    drift_detection_sensitivity: Literal["low", "normal", "high"] = Field(
+        default="normal",
+        description=(
+            "v2.0-G (L4): threshold preset for drift signals. "
+            "``low`` tolerates more degradation before triggering, "
+            "``high`` is stricter (fewer bad responses needed)."
+        ),
+    )
+    # --- v2.0-H (L6): Mid-stream partial stitching --------------------------
+    #   * ``off``      — discard partial content on mid-stream failure (legacy).
+    #   * ``surface``  — return partial content as a truncated-but-valid response.
+    partial_stitch_action: Literal["off", "surface"] = Field(
+        default="off",
+        description=(
+            "v2.0-H (L6): action when a streaming response fails mid-stream. "
+            "``off`` discards partial content (legacy error event). "
+            "``surface`` returns accumulated text as a graceful stream "
+            "termination with a ``coderouter_partial`` metadata event."
+        ),
+    )
 # ---------------------------------------------------------------------------
 # v1.6-A: auto_router — declarative request-body classifier
@@ -768,6 +835,42 @@ class CodeRouterConfig(BaseModel):
         ),
     )
+    # v2.0-I: Continuous probing — background health checks for idle periods.
+    continuous_probe: Literal["off", "active"] = Field(
+        default="off",
+        description=(
+            "v2.0-I: enable background health probes. 'active' starts a "
+            "background task that periodically sends 1-token requests to "
+            "each provider, feeding results into the L5 backend health "
+            "state machine. 'off' = no probing (backward-compatible default)."
+        ),
+    )
+    probe_interval_s: float = Field(
+        default=60.0,
+        ge=5.0,
+        le=3600.0,
+        description=(
+            "v2.0-I: seconds between probe rounds. Lower = faster detection "
+            "but more probe traffic. 60s is a good balance for local models."
+        ),
+    )
+    probe_paid: bool = Field(
+        default=False,
+        description=(
+            "v2.0-I: whether to probe providers marked ``paid: true``. "
+            "Default false protects operators from accidental API charges."
+        ),
+    )
+    probe_timeout_s: float = Field(
+        default=10.0,
+        ge=1.0,
+        le=60.0,
+        description=(
+            "v2.0-I: per-provider timeout for probe requests. A provider "
+            "that doesn't respond within this window is recorded as failed."
+        ),
+    )
     @model_validator(mode="after")
     def _check_default_profile_exists(self) -> CodeRouterConfig:
         """v0.6-A: surface a typo'd ``default_profile`` at load time.

coderouter_cli-2.1.0/coderouter/guards/continuous_probe.py ADDED Viewed

@@ -0,0 +1,349 @@
+"""Continuous health probing (v2.0-I).
+Background task that periodically sends minimal 1-token requests to each
+configured provider, feeding the results into the L5 backend health
+state machine. Detects provider crashes during idle periods (no user
+traffic) so the chain resolver knows to skip/demote a dead backend
+before the next real request hits it.
+Architecture
+============
+::
+    lifespan startup
+      └─ asyncio.create_task(probe_loop(...))
+    probe_loop:
+      while not shutdown:
+        sleep(interval_s)
+        for provider in providers:
+          result = await probe_one(provider)
+          backend_health.record_attempt(...)
+          emit log + metrics
+Design choices
+==============
+- **1-token completion** rather than ``/api/version`` or ``/api/tags``
+  because version endpoints are Ollama-only; a 1-token generate confirms
+  the entire model-serving pipeline is operational (model loaded, KV
+  allocated, inference works).
+- **Sequential** probing (not parallel) to avoid hammering backends and
+  to keep the implementation trivially correct without gather/semaphore.
+- **No new dependency** — uses httpx (already a runtime dep) + asyncio
+  (stdlib).
+- **Graceful shutdown** via an ``asyncio.Event`` set by the lifespan
+  exit path. The loop checks the event each iteration and breaks cleanly.
+"""
+from __future__ import annotations
+import asyncio
+import contextlib
+import time
+from dataclasses import dataclass, field
+from typing import Any
+import httpx
+from coderouter.config.schemas import ProviderConfig
+from coderouter.logging import (
+    get_logger,
+    log_probe_capabilities_drift,
+    log_probe_completed,
+    log_probe_round_completed,
+)
+logger = get_logger(__name__)
+# ---------------------------------------------------------------------------
+# ProbeResult
+# ---------------------------------------------------------------------------
+@dataclass(slots=True)
+class ProbeResult:
+    """Outcome of a single provider probe."""
+    provider: str
+    success: bool
+    latency_ms: float
+    error: str | None = None
+    model_name: str | None = None
+    timestamp: float = field(default_factory=time.time)
+# ---------------------------------------------------------------------------
+# probe_one: single-provider 1-token probe
+# ---------------------------------------------------------------------------
+async def probe_one(
+    provider: ProviderConfig,
+    *,
+    timeout_s: float = 10.0,
+) -> ProbeResult:
+    """Send a minimal 1-token completion request and measure response.
+    For ``kind: openai_compat``: POST /v1/chat/completions
+    For ``kind: anthropic``: POST /v1/messages
+    The request asks for ``max_tokens: 1`` so the probe is as cheap as
+    possible (a single output token is generated, exercising the full
+    model pipeline without producing meaningful output).
+    Never raises — all failures are captured in ProbeResult(success=False).
+    """
+    import os
+    start = time.monotonic()
+    provider_name = provider.name
+    base_url = str(provider.base_url).rstrip("/")
+    # Resolve API key from env (same logic as the adapters)
+    headers: dict[str, str] = {}
+    if provider.api_key_env:
+        api_key = os.environ.get(provider.api_key_env, "")
+        if api_key:
+            if provider.kind == "anthropic":
+                headers["x-api-key"] = api_key
+                headers["anthropic-version"] = "2023-06-01"
+            else:
+                headers["Authorization"] = f"Bearer {api_key}"
+    try:
+        async with httpx.AsyncClient(timeout=timeout_s) as client:
+            if provider.kind == "anthropic":
+                url = f"{base_url}/v1/messages"
+                body: dict[str, Any] = {
+                    "model": provider.model,
+                    "max_tokens": 1,
+                    "messages": [{"role": "user", "content": "hi"}],
+                }
+                resp = await client.post(url, json=body, headers=headers)
+            else:
+                # openai_compat: Ollama, LM Studio, OpenRouter, etc.
+                url = f"{base_url}/chat/completions"
+                body = {
+                    "model": provider.model,
+                    "max_tokens": 1,
+                    "messages": [{"role": "user", "content": "hi"}],
+                }
+                resp = await client.post(url, json=body, headers=headers)
+        latency_ms = (time.monotonic() - start) * 1000
+        if resp.status_code >= 400:
+            return ProbeResult(
+                provider=provider_name,
+                success=False,
+                latency_ms=latency_ms,
+                error=f"HTTP {resp.status_code}: {resp.text[:200]}",
+            )
+        # Extract model name from response (for capabilities drift check)
+        model_name: str | None = None
+        try:
+            data = resp.json()
+            model_name = data.get("model")
+        except Exception:
+            pass
+        return ProbeResult(
+            provider=provider_name,
+            success=True,
+            latency_ms=latency_ms,
+            model_name=model_name,
+        )
+    except httpx.TimeoutException:
+        latency_ms = (time.monotonic() - start) * 1000
+        return ProbeResult(
+            provider=provider_name,
+            success=False,
+            latency_ms=latency_ms,
+            error=f"timeout after {timeout_s}s",
+        )
+    except Exception as exc:
+        latency_ms = (time.monotonic() - start) * 1000
+        return ProbeResult(
+            provider=provider_name,
+            success=False,
+            latency_ms=latency_ms,
+            error=str(exc)[:200],
+        )
+# ---------------------------------------------------------------------------
+# capabilities drift detection (Phase 3)
+# ---------------------------------------------------------------------------
+@dataclass(slots=True)
+class DriftReport:
+    """Report of a model-name mismatch between config and probe response."""
+    provider: str
+    configured_model: str
+    observed_model: str
+    in_registry: bool
+def check_probe_drift(
+    provider: ProviderConfig,
+    observed_model: str | None,
+    *,
+    registry: Any = None,
+) -> DriftReport | None:
+    """Compare the probe response model name against the configured model.
+    Returns a :class:`DriftReport` when the observed model differs from
+    ``provider.model``, or ``None`` when they match (or when no model
+    name was returned by the probe). The ``registry`` argument is an
+    optional :class:`CapabilityRegistry` instance used to check whether
+    the observed model has a known entry — when it doesn't, the report
+    sets ``in_registry=False`` as an extra signal for the operator.
+    Never raises — a missing registry or lookup error just defaults to
+    ``in_registry=True`` (conservative, avoids false positives).
+    """
+    if not observed_model:
+        return None
+    configured = provider.model or ""
+    # Normalize: some backends return the model with a prefix or
+    # formatting variation. We compare case-sensitively but strip
+    # whitespace.
+    if observed_model.strip() == configured.strip():
+        return None
+    # Check registry for the observed model
+    in_registry = True
+    if registry is not None:
+        try:
+            resolved = registry.lookup(kind=provider.kind, model=observed_model)
+            # If every resolved field is None, the model is unknown
+            if (
+                resolved.thinking is None
+                and resolved.tools is None
+                and resolved.max_context_tokens is None
+                and resolved.claude_code_suitability is None
+                and resolved.cache_control is None
+            ):
+                in_registry = False
+        except Exception:
+            pass  # defensive — never crash the probe loop
+    return DriftReport(
+        provider=provider.name,
+        configured_model=configured,
+        observed_model=observed_model,
+        in_registry=in_registry,
+    )
+# ---------------------------------------------------------------------------
+# probe_loop: background task
+# ---------------------------------------------------------------------------
+async def probe_loop(
+    providers: list[ProviderConfig],
+    *,
+    record_fn: Any = None,
+    interval_s: float = 60.0,
+    timeout_s: float = 10.0,
+    probe_paid: bool = False,
+    shutdown_event: asyncio.Event | None = None,
+    health_threshold: int = 3,
+    registry: Any = None,
+) -> None:
+    """Run continuous health probes in an infinite loop until shutdown.
+    Args:
+        providers: list of provider configs to probe.
+        record_fn: callable(provider_name, *, success, threshold) that
+            feeds the backend health state machine. When None, results
+            are only logged (useful for testing).
+        interval_s: seconds to sleep between probe rounds.
+        timeout_s: per-provider probe timeout.
+        probe_paid: if False, providers with ``paid=True`` are skipped.
+        shutdown_event: set this event to stop the loop gracefully.
+        health_threshold: consecutive-failure threshold passed to record_fn.
+        registry: optional CapabilityRegistry for model drift detection.
+    """
+    _shutdown = shutdown_event or asyncio.Event()
+    # Initial delay: let the server finish startup before first probe round.
+    try:
+        await asyncio.wait_for(_shutdown.wait(), timeout=interval_s)
+        return  # shutdown during initial delay
+    except TimeoutError:
+        pass  # normal: timeout means the delay elapsed without shutdown
+    while not _shutdown.is_set():
+        probed = 0
+        failures = 0
+        for provider in providers:
+            if _shutdown.is_set():
+                break
+            if provider.paid and not probe_paid:
+                continue
+            result = await probe_one(provider, timeout_s=timeout_s)
+            probed += 1
+            if not result.success:
+                failures += 1
+            # Feed into backend health state machine
+            if record_fn is not None:
+                with contextlib.suppress(Exception):
+                    record_fn(
+                        result.provider,
+                        success=result.success,
+                        threshold=health_threshold,
+                    )
+            # Log individual result
+            log_probe_completed(
+                logger,
+                provider=result.provider,
+                success=result.success,
+                latency_ms=result.latency_ms,
+                error=result.error,
+                model_name=result.model_name,
+            )
+            # Check for model-capabilities drift on success
+            if result.success and result.model_name:
+                drift = check_probe_drift(
+                    provider, result.model_name, registry=registry
+                )
+                if drift is not None:
+                    log_probe_capabilities_drift(
+                        logger,
+                        provider=drift.provider,
+                        configured_model=drift.configured_model,
+                        observed_model=drift.observed_model,
+                        in_registry=drift.in_registry,
+                    )
+        # Log round summary
+        if probed > 0:
+            log_probe_round_completed(
+                logger,
+                providers_probed=probed,
+                failures=failures,
+            )
+        # Wait for next interval or shutdown
+        try:
+            await asyncio.wait_for(_shutdown.wait(), timeout=interval_s)
+            break  # shutdown signaled
+        except TimeoutError:
+            pass  # normal: sleep elapsed, start next round

coderouter-cli 2.0.0__tar.gz → 2.1.0__tar.gz

coderouter-cli 2.0.0tar.gz → 2.1.0tar.gz