npm - @oneciel-ai/claude-any - Versions diffs - 0.1.37 → 0.1.38 - Mend

@oneciel-ai/claude-any 0.1.37 → 0.1.38

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (7) hide show

package/README.md CHANGED Viewed

@@ -48,7 +48,7 @@ arguments through unchanged.
 Credits: One Ciel LLC
-Current version: `0.1.37`
+Current version: `0.1.38`
 ## Why This Exists
@@ -381,6 +381,14 @@ steps under that larger model's supervision.
 ## Changelog
+### 0.1.38
+- **User-selected context windows**: removes the NVIDIA hosted 32K safety cap.
+  The router now uses the context window selected in LLM options or headless
+  configuration, with model-aware fallback only when no value is configured.
+- **NVIDIA presets updated**: NVIDIA hosted presets now start at 65K and scale
+  up to 256K for large-output/reasoning workflows.
 ### 0.1.37
 - **Pseudo tool-call recovery**: the NVIDIA/OpenAI-compatible stream path now

package/claude_any.py CHANGED Viewed

@@ -85,7 +85,7 @@ PROVIDER_LABELS = {
     "self-hosted-nim": "Self Hosted NIM",
 }
 APP_NAME = "Claude Any"
-VERSION = "0.1.37"
+VERSION = "0.1.38"
 CREDITS = "Credits: One Ciel LLC"
 LOG_LEVELS = {"SILENT": 0, "ERROR": 1, "WARN": 2, "INFO": 3, "DEBUG": 4, "TRACE": 5}
@@ -144,7 +144,7 @@ LANGUAGES = {
     "zh": "中文",
 }
-MODEL_PRESETS: dict[str, dict[str, Any]] = {
+MODEL_PRESETS: dict[str, dict[str, Any]] = {
     "glm-4.7": {"compat_max_tokens": 64, "thinking": True, "num_ctx_min": 32768, "num_ctx_max": 131072},
     "glm-5.1": {"compat_max_tokens": 64, "thinking": True, "num_ctx_min": 32768, "num_ctx_max": 131072},
     "glm-4.7:cloud": {"compat_max_tokens": 64, "thinking": True, "num_ctx_min": 32768, "num_ctx_max": 131072},
@@ -154,10 +154,21 @@ MODEL_PRESETS: dict[str, dict[str, Any]] = {
     "qwen3.6:27b": {"compat_max_tokens": 16, "thinking": False, "num_ctx_min": 32768, "num_ctx_max": 65536},
     "deepseek-r1": {"compat_max_tokens": 64, "thinking": True, "num_ctx_min": 32768, "num_ctx_max": 131072},
     "llama3.3:70b": {"compat_max_tokens": 16, "thinking": False, "num_ctx_min": 32768, "num_ctx_max": 131072},
-}
-def model_preset(model_id: str) -> dict[str, Any]:
+}
+def nvidia_hosted_context_default(model_id: str) -> int:
+    model = model_id.lower()
+    if "kimi-k2.6" in model or "kimi_k2.6" in model:
+        return 262144
+    if "deepseek" in model:
+        return 131072
+    if "glm" in model or "qwen" in model:
+        return 65536
+    return 65536
+def model_preset(model_id: str) -> dict[str, Any]:
     """Return preset dict for a model ID, checking exact match then prefix match."""
     if model_id in MODEL_PRESETS:
         return MODEL_PRESETS[model_id]
@@ -722,7 +733,7 @@ DEFAULT_CONFIG: dict[str, Any] = {
             "native_compat": False,
             "rate_limit_rpm": 40,
             "rate_limit_status": True,
-            "context_window": 32768,
+            "context_window": 65536,
             "max_output_tokens": 4096,
             "temperature": 0.7,
             "top_p": 0.8,
@@ -788,7 +799,14 @@ def apply_config_migrations(cfg: dict[str, Any]) -> None:
     if not migrations.get(marker):
         pcfg = cfg.get("providers", {}).get("nvidia-hosted", {})
         if isinstance(pcfg, dict) and not positive_int(pcfg.get("context_window")):
-            pcfg["context_window"] = 32768
+            pcfg["context_window"] = nvidia_hosted_context_default(str(pcfg.get("current_model") or ""))
+        migrations[marker] = True
+    marker = "nvidia_context_window_unforce_32k_20260513"
+    if not migrations.get(marker):
+        pcfg = cfg.get("providers", {}).get("nvidia-hosted", {})
+        if isinstance(pcfg, dict) and positive_int(pcfg.get("context_window")) == 32768:
+            pcfg["context_window"] = nvidia_hosted_context_default(str(pcfg.get("current_model") or ""))
         migrations[marker] = True
     marker = "stream_enabled_default_true_20260513"
@@ -3620,7 +3638,7 @@ def openai_context_limit_for_budget(provider: str, pcfg: dict[str, Any]) -> int:
     if configured:
         return configured
     if provider == "nvidia-hosted":
-        return 32768
+        return nvidia_hosted_context_default(str(pcfg.get("current_model") or ""))
     return 65536
@@ -6124,14 +6142,72 @@ def apply_llm_preset_to_provider(provider: str, pcfg: dict[str, Any], preset_id:
         }
         for token in tokens_by_preset[preset_id]:
             apply_provider_option(provider, pcfg, token)
-    else:
-        native_default = "false" if provider == "nvidia-hosted" else "true"
-        server_limit = upstream_model_context_limit(provider, pcfg) if provider in ("vllm", "self-hosted-nim") else None
-        tokens_by_preset = {
-            "balanced": [
-                "context_window=32768",
-                "reserve=2048",
-                "max_output_tokens=4096",
+    else:
+        native_default = "false" if provider == "nvidia-hosted" else "true"
+        server_limit = upstream_model_context_limit(provider, pcfg) if provider in ("vllm", "self-hosted-nim") else None
+        if provider == "nvidia-hosted":
+            tokens_by_preset = {
+                "balanced": [
+                    "context_window=65536",
+                    "reserve=4096",
+                    "max_output_tokens=4096",
+                    "timeout=300000",
+                    "temperature=0.3",
+                    "unset:top_p",
+                    "unset:top_k",
+                ],
+                "coding": [
+                    "context_window=65536",
+                    "reserve=4096",
+                    "max_output_tokens=4096",
+                    "timeout=300000",
+                    "temperature=0.2",
+                    "unset:top_p",
+                    "unset:top_k",
+                ],
+                "fast": [
+                    "context_window=65536",
+                    "reserve=2048",
+                    "max_output_tokens=2048",
+                    "timeout=300000",
+                    "temperature=0.2",
+                    "unset:top_p",
+                    "unset:top_k",
+                ],
+                "long-context-65k": [
+                    "context_window=131072",
+                    "reserve=8192",
+                    "max_output_tokens=4096",
+                    "timeout=900000",
+                    "temperature=0.3",
+                    "unset:top_p",
+                    "unset:top_k",
+                ],
+                "large-output": [
+                    "context_window=262144",
+                    "reserve=8192",
+                    "max_output_tokens=8192",
+                    "timeout=1200000",
+                    "temperature=0.3",
+                    "unset:top_p",
+                    "unset:top_k",
+                ],
+                "reasoning": [
+                    "context_window=262144",
+                    "reserve=8192",
+                    "max_output_tokens=4096",
+                    "timeout=1800000",
+                    "temperature=0.6",
+                    "unset:top_p",
+                    "unset:top_k",
+                ],
+            }
+        else:
+            tokens_by_preset = {
+            "balanced": [
+                "context_window=32768",
+                "reserve=2048",
+                "max_output_tokens=4096",
                 "timeout=300000",
                 "temperature=0.3",
                 "unset:top_p",
@@ -6185,10 +6261,10 @@ def apply_llm_preset_to_provider(provider: str, pcfg: dict[str, Any], preset_id:
                 "timeout=1800000",
                 "temperature=0.6",
                 "unset:top_p",
-                "unset:top_k",
-                f"native={native_default}",
-            ],
-        }
+                "unset:top_k",
+                f"native={native_default}",
+            ],
+            }
         for token in tokens_by_preset[preset_id]:
             if provider == "nvidia-hosted" and token.startswith("native="):
                 continue

package/docs/README.ja.md CHANGED Viewed

@@ -47,7 +47,7 @@ vLLM、NVIDIA hosted、self-hosted NIM を選択し、通常の Claude Code 引
 Credits: One Ciel LLC
-現在のバージョン: `0.1.37`
+現在のバージョン: `0.1.38`
 ## 作られた理由
@@ -351,6 +351,14 @@ Windows/Linux 管理、クリーンアップスクリプト、定期的なセキ
 ## 変更履歴
+### 0.1.38
+- **ユーザー選択の context window を優先**: NVIDIA hosted の 32K safety cap を
+  削除しました。router は LLM options または headless 設定で選ばれた
+  context window を使い、未設定の場合のみモデル別 fallback を使います。
+- **NVIDIA preset 更新**: NVIDIA hosted preset は 65K から開始し、
+  large-output/reasoning workflow では 256K まで使います。
 ### 0.1.37
 - **Pseudo tool-call recovery**: NVIDIA/OpenAI-compatible stream 経路で

package/docs/README.ko.md CHANGED Viewed

@@ -47,7 +47,7 @@ NVIDIA hosted, self-hosted NIM을 선택하고, Claude Code의 일반 인자는
 Credits: One Ciel LLC
-현재 버전: `0.1.37`
+현재 버전: `0.1.38`
 ## 왜 만들었나
@@ -351,6 +351,14 @@ Windows 이벤트 로그 리뷰, 바이러스/랜섬웨어 침입 시도 정리,
 ## 변경 이력
+### 0.1.38
+- **사용자 선택 context window 우선**: NVIDIA hosted 32K safety cap을 제거했습니다.
+  router는 LLM 옵션 또는 headless 설정에서 선택한 context window를 사용하고,
+  값이 없을 때만 모델별 fallback을 사용합니다.
+- **NVIDIA preset 업데이트**: NVIDIA hosted preset은 65K부터 시작하고,
+  large-output/reasoning 워크플로에서는 256K까지 사용합니다.
 ### 0.1.37
 - **Pseudo tool-call recovery**: NVIDIA/OpenAI-compatible stream 경로에서

package/docs/README.zh.md CHANGED Viewed

@@ -47,7 +47,7 @@ NIM，并把普通 Claude Code 参数原样传递。
 Credits: One Ciel LLC
-当前版本: `0.1.37`
+当前版本: `0.1.38`
 ## 为什么存在
@@ -337,6 +337,14 @@ Hermes 格式模型或部分较旧的 Qwen tool template。
 ## 更新日志
+### 0.1.38
+- **优先使用用户选择的 context window**：移除 NVIDIA hosted 的 32K safety cap。
+  router 会使用 LLM options 或 headless 配置中选择的 context window，
+  只有未配置时才使用按模型推断的 fallback。
+- **NVIDIA preset 更新**：NVIDIA hosted preset 从 65K 起步，
+  large-output/reasoning 工作流最高使用 256K。
 ### 0.1.37
 - **Pseudo tool-call recovery**：NVIDIA/OpenAI-compatible stream 路径现在会

package/docs/manual.md CHANGED Viewed

@@ -10,7 +10,7 @@ Code starts, while passing normal Claude Code arguments through unchanged.
 Credits: One Ciel LLC
-Current version: `0.1.37`
+Current version: `0.1.38`
 ## Install

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@oneciel-ai/claude-any",
-  "version": "0.1.37",
+  "version": "0.1.38",
   "description": "Claude Code provider selector for Anthropic, Ollama, Ollama Cloud, vLLM, NVIDIA hosted, and self-hosted NIM.",
   "license": "MIT",
   "author": "One Ciel LLC",