npm - @oneciel-ai/claude-any - Versions diffs - 0.1.28 → 0.1.30 - Mend

@oneciel-ai/claude-any 0.1.28 → 0.1.30

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

package/README.md CHANGED Viewed

@@ -1,9 +1,9 @@
-# Claude Any
-![Claude Any: full Claude Code experience with free or low-cost LLMs](claude-any-adv.png)
-| English | [한국어](docs/README.ko.md) | [日本語](docs/README.ja.md) | [中文](docs/README.zh.md) |
-| --- | --- | --- | --- |
+# Claude Any
+![Claude Any: full Claude Code experience with free or low-cost LLMs](claude-any-adv.png)
+| English | [한국어](docs/README.ko.md) | [日本語](docs/README.ja.md) | [中文](docs/README.zh.md) |
+| --- | --- | --- | --- |
 [![npm version](https://img.shields.io/npm/v/@oneciel-ai/claude-any?logo=npm&label=npm)](https://www.npmjs.com/package/@oneciel-ai/claude-any)
 [![npm downloads](https://img.shields.io/npm/dm/@oneciel-ai/claude-any?logo=npm&label=downloads)](https://www.npmjs.com/package/@oneciel-ai/claude-any)
@@ -11,23 +11,23 @@
 > ## 🚀 Use the full Claude Code experience with free or low-cost LLMs
 >
-> - **Free** — [NVIDIA hosted NIM](https://build.nvidia.com/) (qwen3-coder-480b, gpt-oss, and friends) through the API Catalog.
-> - **Low-cost** — [Ollama Cloud](https://ollama.com/cloud) for GLM, Qwen, DeepSeek, and other open-weight models at a fraction of frontier-model pricing.
-> - **Free + local** — [Ollama](https://ollama.com/) or [vLLM](https://github.com/vllm-project/vllm) on your own GPU, fully offline.
-> - **Plan Mode + Advisor ready** — Claude Any preserves Claude Code Plan Mode on non-Anthropic providers and adds an optional long-context Advisor model for review.
-> - **Smooth free-model pacing** — Claude Code spends time reading files and running tools, and Claude Any uses that natural gap for RPM pacing so NVIDIA hosted free models feel usable even with strict per-minute limits.
->
-> Provider, model, base URL, API key, streaming behavior, and LLM options are all selected from a console menu **before** Claude Code starts. Claude Code itself runs untouched with all of its native tooling, slash commands, and workflows.
-## Today's Top 3 Benefits
-1. **Plan Mode works on non-Anthropic models** — Claude Any keeps Claude Code's Plan Mode usable even when the upstream provider is NVIDIA hosted, Ollama Cloud, local Ollama, vLLM, or NIM.
-2. **Advisor review with a bigger model** — pick a long-context Advisor Model at launch, then use `/advisor` inside Claude Code to review the current task, blockers, and next concrete action.
-3. **Free-model RPM limits feel smoother** — router-side RPM pacing uses the natural time spent reading files and running tools, so NVIDIA hosted free models can stay within per-minute limits with less visible waiting.
-### Demo
-![NVIDIA hosted NIM driving Claude Code (deepseek-4-flash)](docs/assets/claude-any-nvidia-nim.gif)
+> - **Free** — [NVIDIA hosted NIM](https://build.nvidia.com/) (qwen3-coder-480b, gpt-oss, and friends) through the API Catalog.
+> - **Low-cost** — [Ollama Cloud](https://ollama.com/cloud) for GLM, Qwen, DeepSeek, and other open-weight models at a fraction of frontier-model pricing.
+> - **Free + local** — [Ollama](https://ollama.com/) or [vLLM](https://github.com/vllm-project/vllm) on your own GPU, fully offline.
+> - **Plan Mode + Advisor ready** — Claude Any preserves Claude Code Plan Mode on non-Anthropic providers and adds an optional long-context Advisor model for review.
+> - **Smooth free-model pacing** — Claude Code spends time reading files and running tools, and Claude Any uses that natural gap for RPM pacing so NVIDIA hosted free models feel usable even with strict per-minute limits.
+>
+> Provider, model, base URL, API key, streaming behavior, and LLM options are all selected from a console menu **before** Claude Code starts. Claude Code itself runs untouched with all of its native tooling, slash commands, and workflows.
+## Today's Top 3 Benefits
+1. **Plan Mode works on non-Anthropic models** — Claude Any keeps Claude Code's Plan Mode usable even when the upstream provider is NVIDIA hosted, Ollama Cloud, local Ollama, vLLM, or NIM.
+2. **Advisor review with a bigger model** — pick a long-context Advisor Model at launch, then use `/advisor` inside Claude Code to review the current task, blockers, and next concrete action.
+3. **Free-model RPM limits feel smoother** — router-side RPM pacing uses the natural time spent reading files and running tools, so NVIDIA hosted free models can stay within per-minute limits with less visible waiting.
+### Demo
+![NVIDIA hosted NIM driving Claude Code (deepseek-4-flash)](docs/assets/claude-any-nvidia-nim.gif)
 NVIDIA hosted NIM (deepseek-4-flash) driving Claude Code through the claude-any router. &nbsp;[full mp4 ⤓](https://github.com/OneCielAI/claude-any/raw/main/demo/claude-any-nvidia-nim.mp4)
@@ -44,7 +44,7 @@ arguments through unchanged.
 Credits: One Ciel LLC
-Current version: `0.1.28`
+Current version: `0.1.30`
 ## Why This Exists
@@ -84,23 +84,57 @@ Requirements:
 **Install from the npm registry (recommended):**
-```sh
-npm install -g @oneciel-ai/claude-any
-```
-```sh
-claude-any
-```
+```sh
+npm install -g @oneciel-ai/claude-any
+```
+```sh
+claude-any
+```
+## Run Claude Code Headlessly
+Use headless mode when a script, SSH session, CI job, or parent agent needs to
+launch Claude Code directly without opening the pre-launch menu. `claude-any`
+consumes `--ca-*` options, starts the required local router, and passes the
+remaining arguments to Claude Code.
+```sh
+claude-any --ca-provider nvidia-hosted --ca-model z-ai/glm-4.7
+```
+```sh
+claude-any --ca-provider ollama-cloud --ca-model glm-5.1
+```
+```sh
+claude-any --ca-provider ollama --ca-base-url http://127.0.0.1:11434 --ca-model qwen3-coder
+```
+Run one non-interactive Claude Code prompt:
+```sh
+claude-any --ca-provider nvidia-hosted --ca-model z-ai/glm-4.7 --ca-no-update-check -p "Reply with OK only." --output-format text
+```
+Use the saved provider/model and only skip the menu:
+```sh
+CLAUDE_ANY_SKIP_MENU=1 claude-any -p "Summarize this repository." --output-format text
+```
+More examples are in [Headless Examples](#headless-examples) and
+[the full manual](docs/manual.md#headless-usage).
 **Upgrade:**
-```sh
-npm update -g @oneciel-ai/claude-any
-```
-```sh
-claude-any version
-```
+```sh
+npm update -g @oneciel-ai/claude-any
+```
+```sh
+claude-any version
+```
 **Uninstall:**
@@ -113,49 +147,49 @@ npm uninstall -g @oneciel-ai/claude-any
 Install directly from the GitHub repository (useful for testing unreleased
 commits between npm publishes):
-```sh
-npm install -g https://github.com/OneCielAI/claude-any.git
-```
-```sh
-claude-any
-```
+```sh
+npm install -g https://github.com/OneCielAI/claude-any.git
+```
+```sh
+claude-any
+```
 POSIX source install:
-```sh
-git clone https://github.com/OneCielAI/claude-any.git
-```
-```sh
-cd claude-any
-```
-```sh
-./install.sh
-```
-```sh
-claude-any
-```
+```sh
+git clone https://github.com/OneCielAI/claude-any.git
+```
+```sh
+cd claude-any
+```
+```sh
+./install.sh
+```
+```sh
+claude-any
+```
 Windows PowerShell source install:
-```powershell
-git clone https://github.com/OneCielAI/claude-any.git
-```
-```powershell
-cd claude-any
-```
-```powershell
-.\install.ps1
-```
-```powershell
-claude-any
-```
+```powershell
+git clone https://github.com/OneCielAI/claude-any.git
+```
+```powershell
+cd claude-any
+```
+```powershell
+.\install.ps1
+```
+```powershell
+claude-any
+```
 ### Releasing (maintainers)
@@ -263,24 +297,24 @@ steps under that larger model's supervision.
 - Native paths where providers expose Claude/Anthropic-compatible endpoints.
 - Router mode for providers that need request/response adaptation.
 - DuckDuckGo and fetch MCP wiring for non-native providers.
-- Headless setup flags such as `--ca-provider`, `--ca-model`, `--ca-base-url`,
-  `--ca-api-key-env`, `--ca-ollama-option`, and `--ca-max-output-tokens`.
-- Claude Code Plan Mode support on router-backed non-Anthropic providers,
-  including local handling for `EnterPlanMode` and plan artifacts.
-- Optional `/advisor` slash command that routes the current task state to a
-  selected Advisor Model, useful for long-context review and next-step checks.
-- Claude Code `statusLine` integration showing router RPM usage and wait time
-  in the bottom status area instead of polluting the chat transcript.
-- Router-side RPM control for NVIDIA hosted, self-hosted NIM, Ollama, and
-  Ollama Cloud. `rate_limit_rpm=0` disables throttling while still showing the
-  last-60-seconds usage rate.
-- Soft pacing subtracts time already spent reading files, running commands, and
-  waiting for tool results. In real coding sessions, those tool-call gaps absorb
-  much of the RPM spacing naturally, so providers such as NVIDIA hosted NIM can
-  stay within free-model limits without making every Claude Code turn feel
-  rate-limited.
-- Streaming proxy for Ollama/Ollama Cloud router path — tokens are delivered
-  to Claude Code as they arrive instead of waiting for the full response.
+- Headless setup flags such as `--ca-provider`, `--ca-model`, `--ca-base-url`,
+  `--ca-api-key-env`, `--ca-ollama-option`, and `--ca-max-output-tokens`.
+- Claude Code Plan Mode support on router-backed non-Anthropic providers,
+  including local handling for `EnterPlanMode` and plan artifacts.
+- Optional `/advisor` slash command that routes the current task state to a
+  selected Advisor Model, useful for long-context review and next-step checks.
+- Claude Code `statusLine` integration showing router RPM usage and wait time
+  in the bottom status area instead of polluting the chat transcript.
+- Router-side RPM control for NVIDIA hosted, self-hosted NIM, Ollama, and
+  Ollama Cloud. `rate_limit_rpm=0` disables throttling while still showing the
+  last-60-seconds usage rate.
+- Soft pacing subtracts time already spent reading files, running commands, and
+  waiting for tool results. In real coding sessions, those tool-call gaps absorb
+  much of the RPM spacing naturally, so providers such as NVIDIA hosted NIM can
+  stay within free-model limits without making every Claude Code turn feel
+  rate-limited.
+- Streaming proxy for Ollama/Ollama Cloud router path — tokens are delivered
+  to Claude Code as they arrive instead of waiting for the full response.
 - Per-provider `stream` on/off toggle and `stream_word_chunking` option to
   batch text deltas at word boundaries, mitigating SSE fragmentation that can
   break tool-call / JSON parsing in long streamed responses.
@@ -292,58 +326,75 @@ steps under that larger model's supervision.
   including `WorktreeCreate` / `WorktreeRemove`, so non-git working directories
   no longer fail Agent isolation with
   `Cannot create agent worktree: not in a git repository...`.
-- Config file caching — settings are read from disk once and reused until the
-  file changes, reducing per-request overhead in the router.
-- Router control-plane endpoints for headless agent coordination:
-  `/ca/chat/messages`, `/ca/chat/wait`, `/ca/chat/stream`, `/ca/chat/files`,
-  and `/ca/plan/artifacts`.
+- Config file caching — settings are read from disk once and reused until the
+  file changes, reducing per-request overhead in the router.
+- Router control-plane endpoints for headless agent coordination:
+  `/ca/chat/messages`, `/ca/chat/wait`, `/ca/chat/stream`, `/ca/chat/files`,
+  and `/ca/plan/artifacts`.
 ## Changelog
-### 0.1.28
-- **Plan Mode + Advisor headline**: document Claude Any's Plan Mode support for
-  router-backed non-Anthropic providers and the `/advisor` slash command backed
-  by a selected long-context Advisor Model.
-- **Status-line RPM telemetry**: Claude Any installs a Claude Code `statusLine`
-  command that shows router RPM usage and the latest wait time in the bottom
-  status area, keeping rate-limit telemetry out of the chat transcript.
-- **Soft RPM pacing for free hosted models**: NVIDIA hosted, self-hosted NIM,
-  Ollama, and Ollama Cloud can use router-side RPM pacing. The pacing subtracts
-  time already spent in file reads, command execution, and tool-result waits, so
-  normal coding tool-call gaps naturally absorb much of the RPM spacing.
-- **Unlimited usage display**: `rate_limit_rpm=0` disables throttling while
-  still displaying the last-60-seconds request rate.
-### 0.1.27
+### 0.1.30
-- **Plan mode support for non-Anthropic providers**: the router now keeps
-  `EnterPlanMode` available and supports Claude Code Plan mode even when the
-  upstream model does not reliably choose that internal tool. Forced
-  `tool_choice=EnterPlanMode` is answered locally with a valid Anthropic
-  `tool_use`, and long implementation requests that receive only a short or
-  empty non-actionable text response are promoted to `EnterPlanMode` using
-  language-agnostic structure checks.
-- **Plan-mode self-tool handling**: unsupported Claude Code self-tools are
-  still stripped for non-Anthropic providers, but Plan-mode tools are handled
-  separately so planning can work instead of being disabled.
+- **Headless launch docs moved up**: README now shows copy-ready examples for
+  launching Claude Code directly with `--ca-provider`, `--ca-model`, `-p`, and
+  `CLAUDE_ANY_SKIP_MENU=1` immediately after install.
+- **NVIDIA hosted wording cleanup**: provider and lifecycle docs now describe
+  NVIDIA hosted as using the Claude Any local router, with no separate hosted
+  API Catalog proxy requirement.
-### 0.1.25
+### 0.1.29
-- **Plan-mode diagnostics**: set `~/.config/claude-any/log-level` to `TRACE`
-  to capture redacted request and response summaries in `requests.jsonl` /
-  `responses.jsonl`.
-- **Headless agent chat service**: the router exposes a small HTTP control
-  plane for sub coding agents. Agents can post messages, poll updates after
-  the last seen message id, or wait on an SSE stream when they do not have
-  their own loop.
-- **Plan artifact serving**: agents can create plan files through the router
-  and share stable local URLs, matching Claude Code's file/artifact-oriented
-  Plan-mode workflow without copying Anthropic's internal implementation.
-### 0.1.24
-- **First public npm release** under the correct scope: `@oneciel-ai/claude-any`. Earlier 0.1.x versions were never published to the registry; this is the version that is actually installable via `npm install -g @oneciel-ai/claude-any`.
+- **NVIDIA compatibility test fix**: `claude-any test` now restarts the local
+  router before router-mode tests, so upgraded installs do not accidentally use
+  an old long-running router that still expects `nvd-claude-proxy`.
+- **Clear NVIDIA router wording**: menu status now describes NVIDIA hosted as
+  using the local claude-any router instead of the retired local proxy path.
+### 0.1.28
+- **Plan Mode + Advisor headline**: document Claude Any's Plan Mode support for
+  router-backed non-Anthropic providers and the `/advisor` slash command backed
+  by a selected long-context Advisor Model.
+- **Status-line RPM telemetry**: Claude Any installs a Claude Code `statusLine`
+  command that shows router RPM usage and the latest wait time in the bottom
+  status area, keeping rate-limit telemetry out of the chat transcript.
+- **Soft RPM pacing for free hosted models**: NVIDIA hosted, self-hosted NIM,
+  Ollama, and Ollama Cloud can use router-side RPM pacing. The pacing subtracts
+  time already spent in file reads, command execution, and tool-result waits, so
+  normal coding tool-call gaps naturally absorb much of the RPM spacing.
+- **Unlimited usage display**: `rate_limit_rpm=0` disables throttling while
+  still displaying the last-60-seconds request rate.
+### 0.1.27
+- **Plan mode support for non-Anthropic providers**: the router now keeps
+  `EnterPlanMode` available and supports Claude Code Plan mode even when the
+  upstream model does not reliably choose that internal tool. Forced
+  `tool_choice=EnterPlanMode` is answered locally with a valid Anthropic
+  `tool_use`, and long implementation requests that receive only a short or
+  empty non-actionable text response are promoted to `EnterPlanMode` using
+  language-agnostic structure checks.
+- **Plan-mode self-tool handling**: unsupported Claude Code self-tools are
+  still stripped for non-Anthropic providers, but Plan-mode tools are handled
+  separately so planning can work instead of being disabled.
+### 0.1.25
+- **Plan-mode diagnostics**: set `~/.config/claude-any/log-level` to `TRACE`
+  to capture redacted request and response summaries in `requests.jsonl` /
+  `responses.jsonl`.
+- **Headless agent chat service**: the router exposes a small HTTP control
+  plane for sub coding agents. Agents can post messages, poll updates after
+  the last seen message id, or wait on an SSE stream when they do not have
+  their own loop.
+- **Plan artifact serving**: agents can create plan files through the router
+  and share stable local URLs, matching Claude Code's file/artifact-oriented
+  Plan-mode workflow without copying Anthropic's internal implementation.
+### 0.1.24
+- **First public npm release** under the correct scope: `@oneciel-ai/claude-any`. Earlier 0.1.x versions were never published to the registry; this is the version that is actually installable via `npm install -g @oneciel-ai/claude-any`.
 ### 0.1.23
@@ -458,7 +509,7 @@ steps under that larger model's supervision.
 | Ollama | Native when available, router otherwise | Local Ollama normally needs no API key. Cloud models through local Ollama require `ollama signin` on the Ollama host. |
 | Ollama Cloud | Router | Calls `https://ollama.com/api`; requires an Ollama API key. |
 | vLLM | Native Anthropic-compatible endpoint | Use a vLLM endpoint that exposes Anthropic-compatible `/v1/messages`; match `--tool-call-parser` to the model family. |
-| NVIDIA hosted | Router/proxy | Uses NVIDIA hosted API through the compatibility path. |
+| NVIDIA hosted | Router | Uses the NVIDIA hosted API Catalog through the Claude Any local router. |
 | self-hosted NIM | Native Anthropic-compatible endpoint | Use the self-hosted NIM Anthropic-compatible endpoint. |
 ## Service Lifecycle
@@ -466,17 +517,16 @@ steps under that larger model's supervision.
 Claude Any does not keep every possible backend helper running all the time. The
 normal lifecycle is:
-- Before launch, managed router/proxy processes can be stopped with
+- Before launch, managed router processes can be stopped with
   `claude-any stop`.
 - When `claude-any` starts Claude Code, it starts only the services required by
   the selected provider.
 - Ollama and Ollama Cloud router mode use the Claude Any router on
   `127.0.0.1:8799`.
-- NVIDIA hosted router mode uses the Claude Any router on `127.0.0.1:8799` and
-  starts `nvd-claude-proxy` on `127.0.0.1:8788` only when that provider needs it.
-- Switching away from NVIDIA hosted does not require keeping the NVIDIA proxy
-  alive; stale sessions should be cleaned with `claude-any stop` before a fresh
-  test or launch.
+- NVIDIA hosted router mode uses the Claude Any router on `127.0.0.1:8799`;
+  hosted API Catalog models do not require a separate NVIDIA proxy.
+- Run `claude-any stop` before a fresh provider-switch test if an old router is
+  still bound to the local port.
 This keeps Claude Code pointed at one stable Claude Any entry point while still
 letting provider-specific helpers start on demand.
@@ -504,7 +554,7 @@ vllm serve Qwen/Qwen3-Coder-30B-A3B-Instruct \
 ## Headless Examples
 Headless commands skip the pre-launch menu and launch Claude Code immediately.
-Claude Any consumes `--ca-*` setup flags, starts the required router/proxy
+Claude Any consumes `--ca-*` setup flags, starts the required router
 services, then passes the remaining arguments to Claude Code.
 ```sh
@@ -515,32 +565,32 @@ claude-any --ca-provider vllm --ca-base-url http://127.0.0.1:8000 --ca-model Qwe
 claude-any --ca-no-update-check -p "Reply with OK only." --output-format text
 ```
-All other arguments are passed through to Claude Code.
-## Headless Agent Chat
-When the claude-any router is running, sub agents can coordinate through local
-HTTP endpoints without opening the menu:
-```sh
-# Send a message to a channel.
-curl -s http://127.0.0.1:8799/ca/chat/messages \
-  -H 'content-type: application/json' \
-  -d '{"channel":"agents","sender_id":"codex","recipients":["kimi"],"message":"Need logs after id 42"}'
-# Poll updates after the last message id.
-curl -s 'http://127.0.0.1:8799/ca/chat/messages?channel=agents&recipient=kimi&after=42'
-# Wait on a stream until messages arrive.
-curl -N 'http://127.0.0.1:8799/ca/chat/stream?channel=agents&recipient=kimi&after=42&timeout=300'
-# Publish a plan file and get a served URL.
-curl -s http://127.0.0.1:8799/ca/plan/artifacts \
-  -H 'content-type: application/json' \
-  -d '{"title":"handoff","name":"handoff.md","content":"# Plan\n- step 1"}'
-```
-## Security
+All other arguments are passed through to Claude Code.
+## Headless Agent Chat
+When the claude-any router is running, sub agents can coordinate through local
+HTTP endpoints without opening the menu:
+```sh
+# Send a message to a channel.
+curl -s http://127.0.0.1:8799/ca/chat/messages \
+  -H 'content-type: application/json' \
+  -d '{"channel":"agents","sender_id":"codex","recipients":["kimi"],"message":"Need logs after id 42"}'
+# Poll updates after the last message id.
+curl -s 'http://127.0.0.1:8799/ca/chat/messages?channel=agents&recipient=kimi&after=42'
+# Wait on a stream until messages arrive.
+curl -N 'http://127.0.0.1:8799/ca/chat/stream?channel=agents&recipient=kimi&after=42&timeout=300'
+# Publish a plan file and get a served URL.
+curl -s http://127.0.0.1:8799/ca/plan/artifacts \
+  -H 'content-type: application/json' \
+  -d '{"title":"handoff","name":"handoff.md","content":"# Plan\n- step 1"}'
+```
+## Security
 Do not commit runtime configuration or API keys. Claude Any stores local runtime
 configuration under `~/.config/claude-any/`. NVIDIA hosted credentials used by

package/claude-any-menu.py CHANGED Viewed

@@ -656,7 +656,7 @@ def probe_base_url(provider: str, pcfg: dict) -> str:
     if "your-" in base:
         return f"Base URL: placeholder ({base})"
     if provider == "nvidia-hosted":
-        return f"Base URL: NVIDIA hosted ({base}); proxy starts on launch"
+        return f"Base URL: NVIDIA hosted ({base}); local router http://127.0.0.1:8799 starts on launch"
     path = "/api/tags" if provider in ("ollama", "ollama-cloud") else "/v1/models"
     url = join_url(base, path)
     headers = {}