npm - @oneciel-ai/claude-any - Versions diffs - 0.1.28 → 0.1.29 - Mend

@oneciel-ai/claude-any 0.1.28 → 0.1.29

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

package/README.md CHANGED Viewed

@@ -1,9 +1,9 @@
-# Claude Any
-![Claude Any: full Claude Code experience with free or low-cost LLMs](claude-any-adv.png)
-| English | [한국어](docs/README.ko.md) | [日本語](docs/README.ja.md) | [中文](docs/README.zh.md) |
-| --- | --- | --- | --- |
+# Claude Any
+![Claude Any: full Claude Code experience with free or low-cost LLMs](claude-any-adv.png)
+| English | [한국어](docs/README.ko.md) | [日本語](docs/README.ja.md) | [中文](docs/README.zh.md) |
+| --- | --- | --- | --- |
 [![npm version](https://img.shields.io/npm/v/@oneciel-ai/claude-any?logo=npm&label=npm)](https://www.npmjs.com/package/@oneciel-ai/claude-any)
 [![npm downloads](https://img.shields.io/npm/dm/@oneciel-ai/claude-any?logo=npm&label=downloads)](https://www.npmjs.com/package/@oneciel-ai/claude-any)
@@ -11,23 +11,23 @@
 > ## 🚀 Use the full Claude Code experience with free or low-cost LLMs
 >
-> - **Free** — [NVIDIA hosted NIM](https://build.nvidia.com/) (qwen3-coder-480b, gpt-oss, and friends) through the API Catalog.
-> - **Low-cost** — [Ollama Cloud](https://ollama.com/cloud) for GLM, Qwen, DeepSeek, and other open-weight models at a fraction of frontier-model pricing.
-> - **Free + local** — [Ollama](https://ollama.com/) or [vLLM](https://github.com/vllm-project/vllm) on your own GPU, fully offline.
-> - **Plan Mode + Advisor ready** — Claude Any preserves Claude Code Plan Mode on non-Anthropic providers and adds an optional long-context Advisor model for review.
-> - **Smooth free-model pacing** — Claude Code spends time reading files and running tools, and Claude Any uses that natural gap for RPM pacing so NVIDIA hosted free models feel usable even with strict per-minute limits.
->
-> Provider, model, base URL, API key, streaming behavior, and LLM options are all selected from a console menu **before** Claude Code starts. Claude Code itself runs untouched with all of its native tooling, slash commands, and workflows.
-## Today's Top 3 Benefits
-1. **Plan Mode works on non-Anthropic models** — Claude Any keeps Claude Code's Plan Mode usable even when the upstream provider is NVIDIA hosted, Ollama Cloud, local Ollama, vLLM, or NIM.
-2. **Advisor review with a bigger model** — pick a long-context Advisor Model at launch, then use `/advisor` inside Claude Code to review the current task, blockers, and next concrete action.
-3. **Free-model RPM limits feel smoother** — router-side RPM pacing uses the natural time spent reading files and running tools, so NVIDIA hosted free models can stay within per-minute limits with less visible waiting.
-### Demo
-![NVIDIA hosted NIM driving Claude Code (deepseek-4-flash)](docs/assets/claude-any-nvidia-nim.gif)
+> - **Free** — [NVIDIA hosted NIM](https://build.nvidia.com/) (qwen3-coder-480b, gpt-oss, and friends) through the API Catalog.
+> - **Low-cost** — [Ollama Cloud](https://ollama.com/cloud) for GLM, Qwen, DeepSeek, and other open-weight models at a fraction of frontier-model pricing.
+> - **Free + local** — [Ollama](https://ollama.com/) or [vLLM](https://github.com/vllm-project/vllm) on your own GPU, fully offline.
+> - **Plan Mode + Advisor ready** — Claude Any preserves Claude Code Plan Mode on non-Anthropic providers and adds an optional long-context Advisor model for review.
+> - **Smooth free-model pacing** — Claude Code spends time reading files and running tools, and Claude Any uses that natural gap for RPM pacing so NVIDIA hosted free models feel usable even with strict per-minute limits.
+>
+> Provider, model, base URL, API key, streaming behavior, and LLM options are all selected from a console menu **before** Claude Code starts. Claude Code itself runs untouched with all of its native tooling, slash commands, and workflows.
+## Today's Top 3 Benefits
+1. **Plan Mode works on non-Anthropic models** — Claude Any keeps Claude Code's Plan Mode usable even when the upstream provider is NVIDIA hosted, Ollama Cloud, local Ollama, vLLM, or NIM.
+2. **Advisor review with a bigger model** — pick a long-context Advisor Model at launch, then use `/advisor` inside Claude Code to review the current task, blockers, and next concrete action.
+3. **Free-model RPM limits feel smoother** — router-side RPM pacing uses the natural time spent reading files and running tools, so NVIDIA hosted free models can stay within per-minute limits with less visible waiting.
+### Demo
+![NVIDIA hosted NIM driving Claude Code (deepseek-4-flash)](docs/assets/claude-any-nvidia-nim.gif)
 NVIDIA hosted NIM (deepseek-4-flash) driving Claude Code through the claude-any router. &nbsp;[full mp4 ⤓](https://github.com/OneCielAI/claude-any/raw/main/demo/claude-any-nvidia-nim.mp4)
@@ -44,7 +44,7 @@ arguments through unchanged.
 Credits: One Ciel LLC
-Current version: `0.1.28`
+Current version: `0.1.29`
 ## Why This Exists
@@ -84,23 +84,23 @@ Requirements:
 **Install from the npm registry (recommended):**
-```sh
-npm install -g @oneciel-ai/claude-any
-```
-```sh
-claude-any
-```
+```sh
+npm install -g @oneciel-ai/claude-any
+```
+```sh
+claude-any
+```
 **Upgrade:**
-```sh
-npm update -g @oneciel-ai/claude-any
-```
-```sh
-claude-any version
-```
+```sh
+npm update -g @oneciel-ai/claude-any
+```
+```sh
+claude-any version
+```
 **Uninstall:**
@@ -113,49 +113,49 @@ npm uninstall -g @oneciel-ai/claude-any
 Install directly from the GitHub repository (useful for testing unreleased
 commits between npm publishes):
-```sh
-npm install -g https://github.com/OneCielAI/claude-any.git
-```
-```sh
-claude-any
-```
+```sh
+npm install -g https://github.com/OneCielAI/claude-any.git
+```
+```sh
+claude-any
+```
 POSIX source install:
-```sh
-git clone https://github.com/OneCielAI/claude-any.git
-```
-```sh
-cd claude-any
-```
-```sh
-./install.sh
-```
-```sh
-claude-any
-```
+```sh
+git clone https://github.com/OneCielAI/claude-any.git
+```
+```sh
+cd claude-any
+```
+```sh
+./install.sh
+```
+```sh
+claude-any
+```
 Windows PowerShell source install:
-```powershell
-git clone https://github.com/OneCielAI/claude-any.git
-```
-```powershell
-cd claude-any
-```
-```powershell
-.\install.ps1
-```
-```powershell
-claude-any
-```
+```powershell
+git clone https://github.com/OneCielAI/claude-any.git
+```
+```powershell
+cd claude-any
+```
+```powershell
+.\install.ps1
+```
+```powershell
+claude-any
+```
 ### Releasing (maintainers)
@@ -263,24 +263,24 @@ steps under that larger model's supervision.
 - Native paths where providers expose Claude/Anthropic-compatible endpoints.
 - Router mode for providers that need request/response adaptation.
 - DuckDuckGo and fetch MCP wiring for non-native providers.
-- Headless setup flags such as `--ca-provider`, `--ca-model`, `--ca-base-url`,
-  `--ca-api-key-env`, `--ca-ollama-option`, and `--ca-max-output-tokens`.
-- Claude Code Plan Mode support on router-backed non-Anthropic providers,
-  including local handling for `EnterPlanMode` and plan artifacts.
-- Optional `/advisor` slash command that routes the current task state to a
-  selected Advisor Model, useful for long-context review and next-step checks.
-- Claude Code `statusLine` integration showing router RPM usage and wait time
-  in the bottom status area instead of polluting the chat transcript.
-- Router-side RPM control for NVIDIA hosted, self-hosted NIM, Ollama, and
-  Ollama Cloud. `rate_limit_rpm=0` disables throttling while still showing the
-  last-60-seconds usage rate.
-- Soft pacing subtracts time already spent reading files, running commands, and
-  waiting for tool results. In real coding sessions, those tool-call gaps absorb
-  much of the RPM spacing naturally, so providers such as NVIDIA hosted NIM can
-  stay within free-model limits without making every Claude Code turn feel
-  rate-limited.
-- Streaming proxy for Ollama/Ollama Cloud router path — tokens are delivered
-  to Claude Code as they arrive instead of waiting for the full response.
+- Headless setup flags such as `--ca-provider`, `--ca-model`, `--ca-base-url`,
+  `--ca-api-key-env`, `--ca-ollama-option`, and `--ca-max-output-tokens`.
+- Claude Code Plan Mode support on router-backed non-Anthropic providers,
+  including local handling for `EnterPlanMode` and plan artifacts.
+- Optional `/advisor` slash command that routes the current task state to a
+  selected Advisor Model, useful for long-context review and next-step checks.
+- Claude Code `statusLine` integration showing router RPM usage and wait time
+  in the bottom status area instead of polluting the chat transcript.
+- Router-side RPM control for NVIDIA hosted, self-hosted NIM, Ollama, and
+  Ollama Cloud. `rate_limit_rpm=0` disables throttling while still showing the
+  last-60-seconds usage rate.
+- Soft pacing subtracts time already spent reading files, running commands, and
+  waiting for tool results. In real coding sessions, those tool-call gaps absorb
+  much of the RPM spacing naturally, so providers such as NVIDIA hosted NIM can
+  stay within free-model limits without making every Claude Code turn feel
+  rate-limited.
+- Streaming proxy for Ollama/Ollama Cloud router path — tokens are delivered
+  to Claude Code as they arrive instead of waiting for the full response.
 - Per-provider `stream` on/off toggle and `stream_word_chunking` option to
   batch text deltas at word boundaries, mitigating SSE fragmentation that can
   break tool-call / JSON parsing in long streamed responses.
@@ -292,58 +292,66 @@ steps under that larger model's supervision.
   including `WorktreeCreate` / `WorktreeRemove`, so non-git working directories
   no longer fail Agent isolation with
   `Cannot create agent worktree: not in a git repository...`.
-- Config file caching — settings are read from disk once and reused until the
-  file changes, reducing per-request overhead in the router.
-- Router control-plane endpoints for headless agent coordination:
-  `/ca/chat/messages`, `/ca/chat/wait`, `/ca/chat/stream`, `/ca/chat/files`,
-  and `/ca/plan/artifacts`.
+- Config file caching — settings are read from disk once and reused until the
+  file changes, reducing per-request overhead in the router.
+- Router control-plane endpoints for headless agent coordination:
+  `/ca/chat/messages`, `/ca/chat/wait`, `/ca/chat/stream`, `/ca/chat/files`,
+  and `/ca/plan/artifacts`.
 ## Changelog
-### 0.1.28
-- **Plan Mode + Advisor headline**: document Claude Any's Plan Mode support for
-  router-backed non-Anthropic providers and the `/advisor` slash command backed
-  by a selected long-context Advisor Model.
-- **Status-line RPM telemetry**: Claude Any installs a Claude Code `statusLine`
-  command that shows router RPM usage and the latest wait time in the bottom
-  status area, keeping rate-limit telemetry out of the chat transcript.
-- **Soft RPM pacing for free hosted models**: NVIDIA hosted, self-hosted NIM,
-  Ollama, and Ollama Cloud can use router-side RPM pacing. The pacing subtracts
-  time already spent in file reads, command execution, and tool-result waits, so
-  normal coding tool-call gaps naturally absorb much of the RPM spacing.
-- **Unlimited usage display**: `rate_limit_rpm=0` disables throttling while
-  still displaying the last-60-seconds request rate.
-### 0.1.27
-- **Plan mode support for non-Anthropic providers**: the router now keeps
-  `EnterPlanMode` available and supports Claude Code Plan mode even when the
-  upstream model does not reliably choose that internal tool. Forced
-  `tool_choice=EnterPlanMode` is answered locally with a valid Anthropic
-  `tool_use`, and long implementation requests that receive only a short or
-  empty non-actionable text response are promoted to `EnterPlanMode` using
-  language-agnostic structure checks.
-- **Plan-mode self-tool handling**: unsupported Claude Code self-tools are
-  still stripped for non-Anthropic providers, but Plan-mode tools are handled
-  separately so planning can work instead of being disabled.
+### 0.1.29
-### 0.1.25
+- **NVIDIA compatibility test fix**: `claude-any test` now restarts the local
+  router before router-mode tests, so upgraded installs do not accidentally use
+  an old long-running router that still expects `nvd-claude-proxy`.
+- **Clear NVIDIA router wording**: menu status now describes NVIDIA hosted as
+  using the local claude-any router instead of the retired local proxy path.
-- **Plan-mode diagnostics**: set `~/.config/claude-any/log-level` to `TRACE`
-  to capture redacted request and response summaries in `requests.jsonl` /
-  `responses.jsonl`.
-- **Headless agent chat service**: the router exposes a small HTTP control
-  plane for sub coding agents. Agents can post messages, poll updates after
-  the last seen message id, or wait on an SSE stream when they do not have
-  their own loop.
-- **Plan artifact serving**: agents can create plan files through the router
-  and share stable local URLs, matching Claude Code's file/artifact-oriented
-  Plan-mode workflow without copying Anthropic's internal implementation.
-### 0.1.24
+### 0.1.28
-- **First public npm release** under the correct scope: `@oneciel-ai/claude-any`. Earlier 0.1.x versions were never published to the registry; this is the version that is actually installable via `npm install -g @oneciel-ai/claude-any`.
+- **Plan Mode + Advisor headline**: document Claude Any's Plan Mode support for
+  router-backed non-Anthropic providers and the `/advisor` slash command backed
+  by a selected long-context Advisor Model.
+- **Status-line RPM telemetry**: Claude Any installs a Claude Code `statusLine`
+  command that shows router RPM usage and the latest wait time in the bottom
+  status area, keeping rate-limit telemetry out of the chat transcript.
+- **Soft RPM pacing for free hosted models**: NVIDIA hosted, self-hosted NIM,
+  Ollama, and Ollama Cloud can use router-side RPM pacing. The pacing subtracts
+  time already spent in file reads, command execution, and tool-result waits, so
+  normal coding tool-call gaps naturally absorb much of the RPM spacing.
+- **Unlimited usage display**: `rate_limit_rpm=0` disables throttling while
+  still displaying the last-60-seconds request rate.
+### 0.1.27
+- **Plan mode support for non-Anthropic providers**: the router now keeps
+  `EnterPlanMode` available and supports Claude Code Plan mode even when the
+  upstream model does not reliably choose that internal tool. Forced
+  `tool_choice=EnterPlanMode` is answered locally with a valid Anthropic
+  `tool_use`, and long implementation requests that receive only a short or
+  empty non-actionable text response are promoted to `EnterPlanMode` using
+  language-agnostic structure checks.
+- **Plan-mode self-tool handling**: unsupported Claude Code self-tools are
+  still stripped for non-Anthropic providers, but Plan-mode tools are handled
+  separately so planning can work instead of being disabled.
+### 0.1.25
+- **Plan-mode diagnostics**: set `~/.config/claude-any/log-level` to `TRACE`
+  to capture redacted request and response summaries in `requests.jsonl` /
+  `responses.jsonl`.
+- **Headless agent chat service**: the router exposes a small HTTP control
+  plane for sub coding agents. Agents can post messages, poll updates after
+  the last seen message id, or wait on an SSE stream when they do not have
+  their own loop.
+- **Plan artifact serving**: agents can create plan files through the router
+  and share stable local URLs, matching Claude Code's file/artifact-oriented
+  Plan-mode workflow without copying Anthropic's internal implementation.
+### 0.1.24
+- **First public npm release** under the correct scope: `@oneciel-ai/claude-any`. Earlier 0.1.x versions were never published to the registry; this is the version that is actually installable via `npm install -g @oneciel-ai/claude-any`.
 ### 0.1.23
@@ -515,32 +523,32 @@ claude-any --ca-provider vllm --ca-base-url http://127.0.0.1:8000 --ca-model Qwe
 claude-any --ca-no-update-check -p "Reply with OK only." --output-format text
 ```
-All other arguments are passed through to Claude Code.
-## Headless Agent Chat
-When the claude-any router is running, sub agents can coordinate through local
-HTTP endpoints without opening the menu:
-```sh
-# Send a message to a channel.
-curl -s http://127.0.0.1:8799/ca/chat/messages \
-  -H 'content-type: application/json' \
-  -d '{"channel":"agents","sender_id":"codex","recipients":["kimi"],"message":"Need logs after id 42"}'
-# Poll updates after the last message id.
-curl -s 'http://127.0.0.1:8799/ca/chat/messages?channel=agents&recipient=kimi&after=42'
-# Wait on a stream until messages arrive.
-curl -N 'http://127.0.0.1:8799/ca/chat/stream?channel=agents&recipient=kimi&after=42&timeout=300'
-# Publish a plan file and get a served URL.
-curl -s http://127.0.0.1:8799/ca/plan/artifacts \
-  -H 'content-type: application/json' \
-  -d '{"title":"handoff","name":"handoff.md","content":"# Plan\n- step 1"}'
-```
-## Security
+All other arguments are passed through to Claude Code.
+## Headless Agent Chat
+When the claude-any router is running, sub agents can coordinate through local
+HTTP endpoints without opening the menu:
+```sh
+# Send a message to a channel.
+curl -s http://127.0.0.1:8799/ca/chat/messages \
+  -H 'content-type: application/json' \
+  -d '{"channel":"agents","sender_id":"codex","recipients":["kimi"],"message":"Need logs after id 42"}'
+# Poll updates after the last message id.
+curl -s 'http://127.0.0.1:8799/ca/chat/messages?channel=agents&recipient=kimi&after=42'
+# Wait on a stream until messages arrive.
+curl -N 'http://127.0.0.1:8799/ca/chat/stream?channel=agents&recipient=kimi&after=42&timeout=300'
+# Publish a plan file and get a served URL.
+curl -s http://127.0.0.1:8799/ca/plan/artifacts \
+  -H 'content-type: application/json' \
+  -d '{"title":"handoff","name":"handoff.md","content":"# Plan\n- step 1"}'
+```
+## Security
 Do not commit runtime configuration or API keys. Claude Any stores local runtime
 configuration under `~/.config/claude-any/`. NVIDIA hosted credentials used by

package/claude-any-menu.py CHANGED Viewed

@@ -656,7 +656,7 @@ def probe_base_url(provider: str, pcfg: dict) -> str:
     if "your-" in base:
         return f"Base URL: placeholder ({base})"
     if provider == "nvidia-hosted":
-        return f"Base URL: NVIDIA hosted ({base}); proxy starts on launch"
+        return f"Base URL: NVIDIA hosted ({base}); local router http://127.0.0.1:8799 starts on launch"
     path = "/api/tags" if provider in ("ollama", "ollama-cloud") else "/v1/models"
     url = join_url(base, path)
     headers = {}