PyPI - coderouter-cli - Versions diffs - 1.10.0__tar.gz → 1.10.1__tar.gz - Mend

coderouter-cli 1.10.0tar.gz → 1.10.1tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (144) hide show

{coderouter_cli-1.10.0 → coderouter_cli-1.10.1}/CHANGELOG.md RENAMED Viewed

@@ -6,6 +6,141 @@ versioning follows [SemVer](https://semver.org/).
 ---
+## [v1.10.1] — 2026-05-04 (Patch — tool-aware auto routing + Raspberry Pi starter)
+**Theme: 「ローカル小型モデルでは tool calling できないので tool-laden な request だけクラウドに逃がしたい」というユースケース (OpenClaw + Pi 8GB シナリオ) を declarative に解決。** v1.10.0 で feature complete を宣言した auto_router の 6 matcher を 7 matcher に拡張、`has_tools` を追加して「tools[] を宣言したリクエストか否か」で profile を分岐できるように。併せて Raspberry Pi 8GB 向けの starter YAML (`examples/providers.raspberrypi.yaml`) を同梱、SBC 上で OpenClaw / Claude Code 互換 agent を回すユーザーが yaml 1 個 copy するだけで動く状態にした。
+含まれる出荷 2 件:
+| # | sub-release | テーマ | LOC | tests |
+|---|---|---|---|---|
+| 1 | **has_tools matcher** | `RuleMatcher.has_tools` 7 番目 matcher 追加、OpenAI/Anthropic `tools[]` + OpenAI legacy `functions[]` を一括認識 (OpenClaw + Pi 由来) | ~80 | +7 |
+| 2 | **Raspberry Pi starter** | `examples/providers.raspberrypi.yaml` 新規、Ollama 小型モデル (≤4B) + OpenRouter free 系 + `has_tools` ベースの tool-aware profile 振り分け | YAML のみ | (loader 検証で +0 直接、既存 parametric test に乗る) |
+- Tests: 871 → **878** (+7、has_tools matcher の 6 シナリオ + `has_tools: false` の "set 扱いだがマッチしない" 安全網テスト)
+- Runtime deps: 5 → 5 (**34 sub-release 連続据え置き**)
+- Backward compat: 完全互換、既存 yaml / API / log payload schema 完全に同じ、新フィールド (`has_tools`) を使わない deployment は挙動完全一致
+- pyproject version: 1.10.0 → 1.10.1
+### Migration
+不要。**v1.10.0 からの自然なアップグレード**:
+- `coderouter` コマンド名 / Python import 名 / providers.yaml の format / env 変数 / ingress URL すべて完全に同じ
+- 既存 `auto_router.rules[]` は何も変わらない、`has_tools` matcher を使い始めるには yaml に 1 行足すだけ
+- v1.10.0 で v1.6 系 auto_router を「6 matcher で feature complete」と宣言した直後の追加だが、同じ宣言型 framework の延長線で構造変更なし — 「7 matcher で改めて feature complete」と読み替えて差し支えない
+### Out of scope (v1.11 以降)
+- **Provider capability gate for tools** — `capabilities.tools=false` を fallback chain の skip ゲートとして機能させる案。本 patch は profile レベルで振り分ける方針 (router で chain を切り替える) で `has_tools` matcher を採用、provider レベルの skip ゲートは別 issue。CodeRouter の chain semantics (順次フォールバック + downgrade) の互換性検討が必要なため、必要性が確認できてから着手。
+- **小型ローカルモデルの tool-call repair 強化** — 現状 `tool_repair.py` は `<tool_call>{...}</tool_call>` ラッパ形式の救済を行うが、1-4B モデルが返す自由形式の text からの推測救済は別領域 (`tool_emulation`)。プロンプトテンプレ書き換えで誘導する手段もあり、設計検討は v2.0 後送り。
+### Files touched
+```
+A  examples/providers.raspberrypi.yaml
+M  CHANGELOG.md
+M  coderouter/config/schemas.py
+M  coderouter/routing/auto_router.py
+M  pyproject.toml
+M  tests/test_auto_router.py
+```
+---
+### has_tools matcher (OpenClaw + Raspberry Pi 由来)
+**Theme: tools[] を宣言したリクエストだけクラウドに振り分け、ローカル小型モデルは tool 不要の素朴な chat に専念させる。** Raspberry Pi 8GB / Jetson Nano クラスの SBC で OpenClaw 等の tool-aware agent を動かしたい時、CPU 推論で実用域に入る Ollama モデル (≤4B) は tool calling が苦手 (`finish_reason: tool_calls` を返さない / 引数 JSON が壊れる / 自由形式 text に bury される) で、結果として agent 側からは「tool 呼び出しが起きてない」状態になる。`auto_router.rules[].if.has_tools` を 7 番目の matcher として追加することで、profile レベルで「tools あり → クラウド (Qwen3-Coder/gpt-oss/Gemini-Flash の OpenRouter free)」「tools 無し → ローカル小型」を declarative に切り替えられる。
+ユースケース例 (Raspberry Pi 8GB starter `examples/providers.raspberrypi.yaml` から抜粋):
+```yaml
+auto_router:
+  rules:
+    - id: user:has-tools-go-cloud
+      profile: with-tools         # OpenRouter free 系のみ
+      match:
+        has_tools: true
+    - id: user:image-go-cloud
+      profile: vision              # Gemini Flash 1M ctx
+      match:
+        has_image: true
+    - id: user:longcontext-go-cloud
+      profile: longcontext
+      match:
+        content_token_count_min: 32000
+  default_rule_profile: local-chat # qwen3.5:2b/4b / gemma3:1b ローカル
+```
+OpenClaw (毎ターン Bash/Read/Write 等の tool を declare する agent) を `OPENAI_BASE_URL=http://<pi-ip>:8088/v1` で繋ぐと、tool-laden traffic は自動でクラウドに飛び、軽い chat だけが Pi 上のローカルで処理される。OPENROUTER_API_KEY のみ設定すればよく、有料 API は不要 (`ALLOW_PAID=false` がデフォルト)。
+#### なぜ provider レベルの capability gate ではなく profile レベルなのか
+`ProviderConfig.capabilities.tools=false` フラグは既存 (v0.x からある) だが、現状は `coderouter doctor` の診断表示と `model-capabilities.yaml` registry の解決に使うだけで、fallback chain における skip ゲートには接続されていない。`thinking` / `cache_control` には `will_degrade` ゲート (capability.py の `provider_supports_*`) があるが、tools には同等の skip 機構がない。これは既存の v0.3-D 「downgrade path」(non-native + tools[] あり → 非ストリーミング + tool_repair) に依存していて、provider が tools を返せなくても adapter エラーは出ず、上流から見ると success (空 tool_calls) で chain が fallthrough せず止まってしまう (= 観測症状: 「tool call されてない」)。
+provider レベルの skip ゲートを後付けするのは chain semantics に踏み込む変更で互換性検討が要るため、本 patch では **profile レベルの宣言型 lever** に留める方針を採用。chain semantics を変えず、auto_router rule の追加で同じ効果を得られ、かつ既存の 6 matcher と完全に同じ規約 (exactly one + first match wins + fast-fail at load) で導入できる。
+- Tests: 871 → **878** (+7: OpenAI tools[] / Anthropic tools[] / OpenAI legacy functions[] / no-tools fallthrough / 空リスト fallthrough / has_tools rule が code-fence rule より優先 / `has_tools: false` の "set 扱いだがマッチしない" 安全網)
+- Runtime deps: 5 → 5 (34 sub-release 連続据え置き)
+- Backward compat: 完全互換、既存 `auto_router` rule は何も変わらない、`has_tools` を使わない deployment は挙動完全一致
+#### Changes
+- `coderouter/config/schemas.py`:
+  - `RuleMatcher` に `has_tools: bool | None = None` を追加、`_MATCHER_FIELDS` tuple に追加 (zero/multiple-fields の "exactly one" バリデータが自動適用)。
+  - docstring の Variants セクションに 7 番目として `has_tools` を追記、boolean 形状が `has_image` と同じである理由 (`True` のみ意味を持ち、`False` は "set" 扱いだが `_match_rule` の `is True` チェックでマッチしない安全網) と、provider レベルの `capabilities.tools` flag との違い (前者は profile-level routing、後者は doctor の診断補助で chain skip ゲートではない) を明示。
+- `coderouter/routing/auto_router.py`:
+  - `_has_tools_in_body(body)` ヘルパを新設 — body の top-level `tools[]` (OpenAI Chat Completions / Anthropic Messages API 共通) と `functions[]` (OpenAI legacy、deprecated だが pinned SDK で残存) を一括認識、空リスト / None は False (lazy init 対応)。
+  - `_match_rule(rule, message, text, model, estimated_tokens, has_tools)` シグネチャに `has_tools: bool` を追加、`has_tools is True` 分岐を 7 番目として実装。
+  - `classify(...)` 内で `_has_tools_in_body(body)` を一度だけ呼んで rule iteration に渡す。`user_msg is None` でも `has_tools` rule は評価する (system-only prompt + tools[] declaration の構成にも対応)。
+  - `_emit_resolved` / `_emit_fallthrough` の `signals` payload に `has_tools` を追記、`auto-router-resolved` log で「tools あり判定で routing したか」が dashboard / Prometheus exporter から見える。
+- `tests/test_auto_router.py` Group 8 (tool-aware routing) を新設、7 ケース:
+  - `test_classify_request_with_openai_tools_routes_to_with_tools` — 基本ケース、OpenAI 形式 `tools[].function` → `with-tools` profile。
+  - `test_classify_request_with_anthropic_tools_routes_to_with_tools` — Anthropic 形式 `tools[].input_schema` も同じ top-level `tools` キーなので、単一 matcher で両 ingress カバー。
+  - `test_classify_request_with_legacy_functions_routes_to_with_tools` — OpenAI legacy `functions[]` (deprecated だが pinned SDK で残存) も tool-laden 扱い。
+  - `test_classify_request_without_tools_falls_through` — 逆ケース、tools 宣言なしの plain chat は `default_rule_profile` (Pi の場合は `local-chat`) に落ちる。
+  - `test_classify_empty_tools_list_treated_as_no_tools` — `tools: []` / `functions: []` (lazy init shape) は False 扱い、no-spurious-match property を pin。
+  - `test_classify_has_tools_first_match_wins_over_later_content_rule` — has_tools rule が code_fence rule より前に置かれた時、両方マッチする body でも先勝、global "first match wins" を新 matcher にも適用。
+  - `test_has_tools_false_rejected_at_load` — `has_tools: False` が `_exactly_one` を通過するが `_match_rule` の `is True` チェックでマッチしない安全網を文書化、誤設定時もデフォルト経路に落ちることを保証。
+#### Files touched
+```
+M  CHANGELOG.md
+M  coderouter/config/schemas.py
+M  coderouter/routing/auto_router.py
+M  pyproject.toml
+M  tests/test_auto_router.py
+```
+---
+### Raspberry Pi 8GB starter (`examples/providers.raspberrypi.yaml`)
+**Theme: SBC で OpenClaw を動かす最小構成を yaml 1 個に集約。** v1.10.1 で追加した `has_tools` matcher を主役にした starter で、`coderouter serve` 1 発で Pi 上のローカル ollama (qwen3.5:2b/4b、qwen2.5:1.5b、gemma3:1b) と OpenRouter free 系 (qwen3-coder:free / gpt-oss-120b:free / gemini-2.5-flash:free) が tool-aware に振り分けられる。OPENROUTER_API_KEY のみ要設定、有料 API キー不要 (`ALLOW_PAID=false` がデフォルト)。
+#### 設計の要点
+- **ローカル全部 `tools: false`** — Pi 8GB に乗る ≤4B モデルは tool_calls を安定して返せないため capability で明示的に `false`。これは doctor 診断用の宣言で、実 routing は `has_tools` matcher が profile レベルで振り分けるので二重防御になる。
+- **`num_ctx: 8192` + `num_predict: 1024` 制限** — Pi の CPU 推論は context 縮めた方が prefill が現実的、デフォルト ollama の 2048 だと OpenClaw の system prompt で詰む & 2048 から 32K に上げると prefill が分単位になるので 8K が現実的中間点。
+- **画像 / 長尺 (32K+) もクラウドへ** — Pi では Gemma 4 E4B (vision capable だが 9.6GB で 8GB Pi に乗らない) の代わりに、`has_image` rule で OpenRouter Gemini Flash (1M ctx + vision native) に逃がす。
+- **OpenRouter free 3 モデルで vendor diversity** — qwen-coder / gpt-oss / gemini-flash の 3 ベンダーを並べて、daily cap (~200 req/day per model per account) 当たり時の rate-limit 逃げ場を確保。
+- **`output_filters: [strip_thinking, strip_stop_markers]` を Qwen 系で常時適用** — Pi で動かす Qwen 3.5 系は `<think>...</think>` リーク + `<|im_end|>` 漏れの両方を観測、両方 strip。
+#### Tests
+`tests/test_examples_yaml.py::test_example_yaml_loads` が `examples/providers*.yaml` を parametric にカバーしているため、`providers.raspberrypi.yaml` も自動でこの test に乗る。新たに pin したい invariant (例: ローカル全部 `tools: false`、`has_tools` rule の存在、auto_router default が `local-chat` 等) があれば後続 patch で個別 test 追加可能だが、本 patch では parametric の loader-clean property のみ確保。
+#### Files touched
+```
+A  examples/providers.raspberrypi.yaml
+```
+---
 ## [v1.10.0] — 2026-05-01 (Umbrella tag — Cost enforcement + Long-run reliability completion + Auto-router feature complete)
 **Theme: 「観測 → 理解 → 行動」を 3 軸で完成、Vision pillar P2/P3 が揃う。** v1.9.1 (patch) で取り切った 2 機能 (v1.9-B2 streaming usage 集約 + per-model auto-routing) は事実上 v1.10 backlog の助走、本 v1.10.0 で残り 3 機能を minor として束ねて出荷。CodeRouter は **「Local LLM で agent を長時間回すための信頼性層」** という Vision の v1.x 担当分が完成 — context overflow (L1) と quality drift (L4) を除く 4 系統障害 (L2/L3/L5/L6) を体系的に対処、auto-router の declarative 6 matcher も揃い、cost 系は観測 (v1.9-D) → enforcement (v1.10) で経路が閉じた。

{coderouter_cli-1.10.0 → coderouter_cli-1.10.1}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: coderouter-cli
-Version: 1.10.0
+Version: 1.10.1
 Summary: Local-first, free-first, fallback-built-in LLM router. Claude Code / OpenAI compatible.
 Project-URL: Homepage, https://github.com/zephel01/CodeRouter
 Project-URL: Repository, https://github.com/zephel01/CodeRouter

{coderouter_cli-1.10.0 → coderouter_cli-1.10.1}/coderouter/config/schemas.py RENAMED Viewed

@@ -498,6 +498,23 @@ class RuleMatcher(BaseModel):
       workloads can compensate by tuning the threshold, since the
       char/4 heuristic is conservative for CJK and looser for
       English code.
+    Variants ([Unreleased] / tool-aware routing, OpenClaw + Pi 由来):
+    - ``has_tools: True`` — the request body declares one or more
+      tools (OpenAI ``tools[]`` / Anthropic ``tools[]`` / OpenAI legacy
+      ``functions[]``). Lets operators send tool-laden requests to a
+      tool-capable cloud profile while keeping plain chat on a small
+      local model (typical Raspberry Pi / low-spec deployment shape:
+      a 1-4B local model that cannot reliably tool-call paired with a
+      free-tier cloud chain that can). Distinct from the
+      ``capabilities.tools`` flag on a provider — that flag is read by
+      ``coderouter doctor`` for diagnostics but does NOT gate the
+      fallback chain (the chain just iterates providers in order and
+      engages the v0.3-D tool-downgrade path on non-native ones with
+      ``request.tools`` set). The ``has_tools`` matcher is the
+      profile-level lever for steering tool-laden traffic to the right
+      chain entirely.
     """
     model_config = ConfigDict(extra="forbid")
@@ -508,6 +525,13 @@ class RuleMatcher(BaseModel):
     content_regex: str | None = None
     model_pattern: str | None = None
     content_token_count_min: int | None = Field(default=None, ge=1)
+    # [Unreleased]: tool-aware routing (OpenClaw + Raspberry Pi 由来).
+    # See class docstring "Variants ([Unreleased] / tool-aware routing)"
+    # above for the full rationale. Boolean shape mirrors ``has_image`` —
+    # only the ``True`` value is meaningful (matches when the body
+    # declares any tools); ``False`` is rejected by ``_exactly_one``
+    # since a "no-tools" rule would shadow the default fall-through.
+    has_tools: bool | None = None
     _MATCHER_FIELDS: tuple[str, ...] = (
         "has_image",
@@ -516,6 +540,7 @@ class RuleMatcher(BaseModel):
         "content_regex",
         "model_pattern",
         "content_token_count_min",
+        "has_tools",
     )
     @model_validator(mode="after")

{coderouter_cli-1.10.0 → coderouter_cli-1.10.1}/coderouter/routing/auto_router.py RENAMED Viewed

@@ -142,12 +142,40 @@ def _code_fence_ratio(text: str) -> float:
     return fenced / len(text)
+def _has_tools_in_body(body: dict[str, Any]) -> bool:
+    """True iff the request body declares one or more callable tools.
+    Recognized declaration shapes:
+    * ``tools: [...]`` — OpenAI Chat Completions ``tools[]`` AND
+      Anthropic Messages API ``tools[]``. Both wire formats put the
+      array at the same top-level key, so a single membership check
+      covers both ingresses.
+    * ``functions: [...]`` — OpenAI legacy ``functions[]`` (deprecated
+      since 2023-11 but still emitted by some agents that pinned old
+      SDK versions). Treated as equivalent to ``tools[]`` for routing
+      purposes.
+    A non-list value (or a value of ``None`` / empty list) returns
+    False — agents that initialize the field but populate it lazily
+    are still on the no-tools path until a tool actually appears.
+    """
+    tools = body.get("tools")
+    if isinstance(tools, list) and len(tools) > 0:
+        return True
+    functions = body.get("functions")
+    if isinstance(functions, list) and len(functions) > 0:
+        return True
+    return False
 def _match_rule(
     rule: AutoRouteRule,
     message: dict[str, Any] | None,
     text: str,
     model: str | None,
     estimated_tokens: int,
+    has_tools: bool,
 ) -> bool:
     m = rule.match
     if m.has_image is True:
@@ -177,6 +205,14 @@ def _match_rule(
         # for English code; operators tune the threshold to match
         # their input distribution.
         return estimated_tokens >= m.content_token_count_min
+    if m.has_tools is True:
+        # [Unreleased]: tool-aware routing (OpenClaw + Pi 由来).
+        # Computed once in ``classify`` from ``body.tools`` /
+        # ``body.functions`` so per-rule evaluation is O(1). See
+        # ``_has_tools_in_body`` for the recognized declaration shapes
+        # and ``RuleMatcher`` docstring for why this is profile-level
+        # routing (not a provider capability gate).
+        return has_tools
     return False  # pragma: no cover — _exactly_one guards against this
@@ -259,6 +295,11 @@ def classify(body: dict[str, Any], config: CodeRouterConfig) -> str:
     # contribute 0. See ``_estimate_total_tokens`` for the heuristic
     # rationale and the 5-deps tradeoff.
     estimated_tokens = _estimate_total_tokens(body)
+    # [Unreleased]: tool-aware routing (OpenClaw + Pi 由来). Computed
+    # once for both the ``has_tools`` matcher and the signals payload.
+    # See ``_has_tools_in_body`` for the recognized declaration shapes
+    # (OpenAI/Anthropic ``tools[]``, OpenAI legacy ``functions[]``).
+    has_tools = _has_tools_in_body(body)
     auto_cfg = config.auto_router
     if auto_cfg is not None and auto_cfg.disabled:
@@ -267,6 +308,7 @@ def classify(body: dict[str, Any], config: CodeRouterConfig) -> str:
             text,
             model,
             estimated_tokens,
+            has_tools,
             disabled=True,
         )
         return auto_cfg.default_rule_profile
@@ -278,17 +320,18 @@ def classify(body: dict[str, Any], config: CodeRouterConfig) -> str:
         else BUNDLED_DEFAULT_RULE_PROFILE
     )
-    # ``model_pattern`` and ``content_token_count_min`` matchers can
-    # fire even without a user message (e.g. system-only prompts or a
-    # request body that carries only a model field). Other matchers
-    # still require ``user_msg`` to be present — they short out via
-    # ``_match_rule``'s message-None handling.
+    # ``model_pattern``, ``content_token_count_min`` and ``has_tools``
+    # matchers can fire even without a user message (e.g. system-only
+    # prompts or a request body that carries only a model field +
+    # tools array). Other matchers still require ``user_msg`` to be
+    # present — they short out via ``_match_rule``'s message-None
+    # handling.
     for rule in rules:
-        if _match_rule(rule, user_msg, text, model, estimated_tokens):
-            _emit_resolved(rule, user_msg, text, model, estimated_tokens)
+        if _match_rule(rule, user_msg, text, model, estimated_tokens, has_tools):
+            _emit_resolved(rule, user_msg, text, model, estimated_tokens, has_tools)
             return rule.profile
-    _emit_fallthrough(default_profile, text, model, estimated_tokens)
+    _emit_fallthrough(default_profile, text, model, estimated_tokens, has_tools)
     return default_profile
@@ -298,6 +341,7 @@ def _emit_resolved(
     text: str,
     model: str | None,
     estimated_tokens: int,
+    has_tools: bool,
 ) -> None:
     logger.info(
         "auto-router-resolved",
@@ -310,6 +354,7 @@ def _emit_resolved(
                 "content_len": len(text),
                 "model": model,
                 "estimated_tokens": estimated_tokens,
+                "has_tools": has_tools,
             },
         },
     )
@@ -320,6 +365,7 @@ def _emit_fallthrough(
     text: str,
     model: str | None,
     estimated_tokens: int,
+    has_tools: bool,
     disabled: bool = False,
 ) -> None:
     logger.info(
@@ -331,6 +377,7 @@ def _emit_fallthrough(
                 "content_len": len(text),
                 "model": model,
                 "estimated_tokens": estimated_tokens,
+                "has_tools": has_tools,
                 "disabled": disabled,
             },
         },

coderouter_cli-1.10.1/examples/providers.raspberrypi.yaml ADDED Viewed

@@ -0,0 +1,298 @@
+# ============================================================
+# CodeRouter providers.yaml — Raspberry Pi 8GB starter
+#
+# Use case (OpenClaw + Pi 由来):
+#   小型 SBC (Raspberry Pi 4/5 8GB、Jetson Nano 等) で OpenClaw 等の
+#   tool-aware エージェントを動かしたい。CPU 推論オンリーで実用域に
+#   入る Ollama モデル (≤4B) はそもそも tool calling が苦手 — 7B 以上
+#   無いと native tool_calls を安定して返さない。
+#
+#   この starter は「**ローカルは tool 無し前提**、tool 必要なリクエスト
+#   は無料クラウドにフォワード」という割り切りで構成される:
+#
+#     - 軽い chat / 要約 / 翻訳 / 分類          → ローカル小型モデル
+#     - tools[] / functions[] が宣言された      → OpenRouter free tier
+#     - 画像入力                                  → OpenRouter (Gemini Flash 1M ctx)
+#
+#   v1.6 auto_router の ``has_tools`` matcher (本リリース新規) で
+#   profile レベルで振り分けるため、ユーザーは ``coderouter serve`` を
+#   起動するだけ — クライアント側 (OpenClaw / Claude Code) で profile
+#   を意識する必要は無い。
+#
+# Hardware budget (Raspberry Pi 4/5 8GB):
+#   - 実メモリは OS 込みで ~6GB 使える
+#   - GPU 無し、CPU 推論のみ (3-15 tok/s 程度)
+#   - tool calling が安定するのは 7B+ だがメモリ不足で動かない
+#   - 妥協ライン: 1.5B〜4B のローカル + tool は全部クラウド
+#
+# Setup (3 step / 約 5-10 分):
+#
+#   # 1. Ollama on Pi (公式 install script)
+#   curl -fsSL https://ollama.com/install.sh | sh
+#
+#   # 2. このリポジトリの想定モデルを pull (合計 ~6GB)
+#   ollama pull qwen2.5:1.5b           # 軽量 chat (~1GB)
+#   ollama pull gemma3:1b              # 多言語 chat (~0.8GB)
+#   ollama pull qwen3.5:2b             # よりまともな chat (~2.7GB)
+#   ollama pull qwen3.5:4b             # しっかり chat (~3.4GB) — 余裕があれば
+#   ollama pull sam860/lfm2.5:350m     # 超軽量・常駐用 (~0.5GB) — 任意
+#
+#   # 3. OpenRouter API key を ~/.coderouter/.env に書く
+#   #    https://openrouter.ai/ で無料アカウント作って API key 取得 (5min)
+#   echo 'OPENROUTER_API_KEY=sk-or-v1-...' >> ~/.coderouter/.env
+#
+#   # 4. このファイルを copy + 起動
+#   cp examples/providers.raspberrypi.yaml ~/.coderouter/providers.yaml
+#   coderouter serve
+#
+# Free-tier rate limits (OpenRouter, 2026-04 確認):
+#   ~20 req/min, ~200 req/day per model per account.
+#   tools-heavy エージェント (毎ターン tool 宣言する OpenClaw / Claude Code)
+#   だと daily cap に当たりやすい。3 つの free モデル (qwen / gpt-oss /
+#   gemini-flash) を rate-limit 逃げ用に並べてあるので、1 つ 429 を
+#   返したら次に fallback する設計。
+#
+# OpenClaw 連携:
+#   OpenClaw は OpenAI 互換 API で叩ける (env で base_url 上書き):
+#     export OPENAI_BASE_URL=http://<pi-ip>:8088/v1
+#     export OPENAI_API_KEY=dummy   # CodeRouter は鍵を見ない
+#   これで OpenClaw のリクエストは全部 CodeRouter を経由し、
+#   ``has_tools`` で振り分けられる。
+# ============================================================
+allow_paid: false
+default_profile: auto      # ← v1.6 sentinel: auto_router を有効化
+display_timezone: Asia/Tokyo
+# ---------------------------------------------------------------------
+# auto_router: tool-aware ルーティング (v[Unreleased] has_tools matcher)
+#
+# 4 ルールで Pi のユースケースを尽くす:
+#
+#   1. tools[] あり          → ``with-tools`` (クラウド free 専用)
+#   2. 画像あり              → ``vision``     (Gemini Flash 1M ctx)
+#   3. プロンプト超長尺      → ``longcontext``(Gemini Flash 1M ctx)
+#   4. それ以外              → ``local-chat`` (ローカル小型モデル)
+#
+# 順序は「絶対にローカルに流せないもの (tools / image / 超長尺) を上に
+# 寄せて、最後にローカル fallthrough」が原則。
+# ---------------------------------------------------------------------
+auto_router:
+  rules:
+    # 1) tools / functions 宣言があるリクエストは即クラウド。
+    #    Pi 上の 1-4B ローカルモデルでは tool_calls を安定して返せない
+    #    (text に bury される / format が崩れる / そもそも生成しない)
+    #    ので、router 段階で local を skip するのが一番確実。
+    - id: user:has-tools-go-cloud
+      profile: with-tools
+      match:
+        has_tools: true
+    # 2) 画像入力はローカル小型モデルが扱えない (Gemma 4 E4B は vision
+    #    対応だが Pi 8GB に乗らない 9.6GB)。1M ctx の Gemini Flash 無料枠で
+    #    捌くのが妥当。
+    - id: user:image-go-cloud
+      profile: vision
+      match:
+        has_image: true
+    # 3) 超長尺リクエスト (~32K token 超) — Pi 上の小型モデルは context
+    #    が 8K-32K 程度しか無いことが多く、prefill も遅い。1M ctx の
+    #    Gemini Flash に逃がす。
+    - id: user:longcontext-go-cloud
+      profile: longcontext
+      match:
+        content_token_count_min: 32000
+  # 上記いずれにも該当しない (= 軽い chat) はローカル先頭の chain で。
+  default_rule_profile: local-chat
+# ---------------------------------------------------------------------
+# providers
+# ---------------------------------------------------------------------
+providers:
+  # ============================================================
+  # Tier 1: ローカル (Raspberry Pi 8GB on-host Ollama)
+  # ============================================================
+  # 全部 ``tools: false`` 宣言。fallback 時の「tools 必要 → cloud」は
+  # 上の auto_router で振り分け済みなので、このローカル群が tool-laden
+  # なリクエストを受けることは無い (前提).
+  # qwen3.5:4b — このラインナップでは最大、しっかり chat 用途向け。
+  # 推論速度は Pi 4 で ~3 tok/s、Pi 5 で ~6 tok/s 程度。
+  # メモリは Q4_K_M 量子化で ~3.4GB、Pi 8GB に余裕で乗る。
+  - name: ollama-qwen3-5-4b
+    kind: openai_compat
+    base_url: http://localhost:11434/v1
+    model: qwen3.5:4b
+    paid: false
+    timeout_s: 240               # CPU 推論なので generous に
+    extra_body:
+      options:
+        # Pi の CPU 推論は context 縮めた方が prefill が現実的。
+        # 8K で OpenClaw の system prompt + 軽い turn を捌く想定。
+        num_ctx: 8192
+        num_predict: 1024
+    # Qwen 系は <think> リーク、stop marker 漏れがあるので scrub 推奨。
+    output_filters: [strip_thinking, strip_stop_markers]
+    capabilities:
+      chat: true
+      streaming: true
+      tools: false               # 4B 帯では tool_calls が安定しない
+  # qwen3.5:2b — 4b より速い。短い応答が欲しいときの primary。
+  # Pi 5 で ~10-12 tok/s、~2.7GB メモリ。
+  - name: ollama-qwen3-5-2b
+    kind: openai_compat
+    base_url: http://localhost:11434/v1
+    model: qwen3.5:2b
+    paid: false
+    timeout_s: 180
+    extra_body:
+      options:
+        num_ctx: 8192
+        num_predict: 1024
+    output_filters: [strip_thinking, strip_stop_markers]
+    capabilities:
+      chat: true
+      streaming: true
+      tools: false               # 2B 帯では tool_calls が安定しない
+  # qwen2.5:1.5b — 軽量・速い、簡単な応答用。
+  # Pi 5 で ~15 tok/s、~1GB メモリ。
+  - name: ollama-qwen2-5-1-5b
+    kind: openai_compat
+    base_url: http://localhost:11434/v1
+    model: qwen2.5:1.5b
+    paid: false
+    timeout_s: 120
+    extra_body:
+      options:
+        num_ctx: 8192
+        num_predict: 512
+    capabilities:
+      chat: true
+      streaming: true
+      tools: false               # 1.5B では tool_calls 不能
+  # gemma3:1b — Google Gemma 3 系の小型モデル、多言語性能良好。
+  # qwen2.5:1.5b の代替として CJK / 欧州言語の応答質が安定する場面あり。
+  - name: ollama-gemma3-1b
+    kind: openai_compat
+    base_url: http://localhost:11434/v1
+    model: gemma3:1b
+    paid: false
+    timeout_s: 120
+    extra_body:
+      options:
+        num_ctx: 8192
+        num_predict: 512
+    capabilities:
+      chat: true
+      streaming: true
+      tools: false               # 1B では tool_calls 不能
+  # ============================================================
+  # Tier 2: 無料クラウド (OpenRouter free tier、tool-capable)
+  # ============================================================
+  # 全部 free 階層。レート制限は ~20 req/min / ~200 req/day 程度
+  # (per model, per account)。3 つの異なるベンダーを並べて rate-limit
+  # 当たった時の逃げ場を確保。
+  # OpenRouter の Qwen3-Coder 480B free — agentic coding 専用設計、
+  # tool calling が極めて安定 (CodeRouter で v0.3-A repair 不要)。
+  - name: openrouter-qwen-coder-free
+    kind: openai_compat
+    base_url: https://openrouter.ai/api/v1
+    model: qwen/qwen3-coder:free
+    api_key_env: OPENROUTER_API_KEY
+    paid: false
+    timeout_s: 60
+    capabilities:
+      chat: true
+      streaming: true
+      tools: true
+  # OpenRouter の OpenAI gpt-oss-120b free — tool calling 対応、
+  # Qwen-Coder と異なるベンダー pool なので daily cap 当たり時の逃げ場。
+  - name: openrouter-gpt-oss-free
+    kind: openai_compat
+    base_url: https://openrouter.ai/api/v1
+    model: openai/gpt-oss-120b:free
+    api_key_env: OPENROUTER_API_KEY
+    paid: false
+    timeout_s: 60
+    capabilities:
+      chat: true
+      streaming: true
+      tools: true
+  # OpenRouter の Gemini 2.5 Flash free — 1M ctx + vision、超長尺と
+  # 画像入力で primary。tool calling もサポート。
+  - name: openrouter-gemini-flash-free
+    kind: openai_compat
+    base_url: https://openrouter.ai/api/v1
+    model: google/gemini-2.5-flash:free
+    api_key_env: OPENROUTER_API_KEY
+    paid: false
+    timeout_s: 60
+    capabilities:
+      chat: true
+      streaming: true
+      vision: true
+      tools: true
+# ---------------------------------------------------------------------
+# profiles — auto_router rules が指す 4 つの profile を定義
+# ---------------------------------------------------------------------
+profiles:
+  # ローカル先頭、軽量から段階的に大きく。クラウドは最終保険のみ。
+  # 「Pi で完結させたい軽い chat」用 — tools 必要なら router が cloud に
+  # 振り分けるので、ここに来る時点で tools 不要が確定している。
+  - name: local-chat
+    append_system_prompt: |
+      Be concise and direct. Aim for 1-3 sentence answers unless the
+      question genuinely needs more. You are running on a small local
+      model — do not pretend to have capabilities you don't have.
+    providers:
+      - ollama-qwen3-5-2b              # primary: 速度と質のバランス
+      - ollama-qwen2-5-1-5b            # 速度優先 fallback
+      - ollama-gemma3-1b               # 多言語 fallback
+      - ollama-qwen3-5-4b              # 質優先 fallback (遅め)
+      - openrouter-gpt-oss-free        # ローカル全滅時の最終保険
+  # tool-laden リクエスト専用 — クラウド先頭、tool_calls 安定優先。
+  # local モデルは一切含めない (tools 不能なので意味が無い)。
+  - name: with-tools
+    append_system_prompt: |
+      Use tools precisely. Batch independent operations in parallel
+      when possible. Match Claude Sonnet's tool-call style: terse,
+      structured, no preamble before invoking a tool.
+    providers:
+      - openrouter-qwen-coder-free     # primary: agentic coding 特化
+      - openrouter-gpt-oss-free        # vendor diversity (rate-limit 逃げ)
+      - openrouter-gemini-flash-free   # 1M ctx fallback
+  # 画像入力 (vision) 専用。Gemini Flash の 1M ctx で画像 + 長文も同時に。
+  - name: vision
+    append_system_prompt: |
+      Describe images factually with no speculation beyond what is visible.
+      Match Claude Sonnet's vision style: terse, accurate, no fluff.
+    providers:
+      - openrouter-gemini-flash-free   # 1M ctx + vision native
+      - openrouter-qwen-coder-free     # 視覚タスクは弱いが text fallback として
+      - openrouter-gpt-oss-free        # text-only fallback
+  # 長尺コンテキスト (32K token 超) — Pi 上の 8K-32K ローカルでは詰むので
+  # クラウドの大きい ctx に逃がす。Gemini Flash 1M が現状ベスト。
+  - name: longcontext
+    append_system_prompt: |
+      The user is sending a long-context request. Read the entire input
+      carefully, structure your reasoning, and answer concisely.
+    providers:
+      - openrouter-gemini-flash-free   # 1M ctx
+      - openrouter-qwen-coder-free     # 262K ctx fallback
+      - openrouter-gpt-oss-free        # 131K ctx fallback

{coderouter_cli-1.10.0 → coderouter_cli-1.10.1}/pyproject.toml RENAMED Viewed

@@ -11,7 +11,7 @@
 # in plan.md §11.B; once granted, this name will become an alias and
 # `coderouter` will become the canonical distribution name.
 name = "coderouter-cli"
-version = "1.10.0"
+version = "1.10.1"
 description = "Local-first, free-first, fallback-built-in LLM router. Claude Code / OpenAI compatible."
 readme = "README.md"
 requires-python = ">=3.12"

coderouter-cli 1.10.0__tar.gz → 1.10.1__tar.gz

coderouter-cli 1.10.0tar.gz → 1.10.1tar.gz