@oneciel-ai/claude-any 0.1.28 → 0.1.29

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,9 +1,9 @@
1
- # Claude Any
2
-
3
- ![Claude Any: full Claude Code experience with free or low-cost LLMs](claude-any-adv.png)
4
-
5
- | English | [한국어](docs/README.ko.md) | [日本語](docs/README.ja.md) | [中文](docs/README.zh.md) |
6
- | --- | --- | --- | --- |
1
+ # Claude Any
2
+
3
+ ![Claude Any: full Claude Code experience with free or low-cost LLMs](claude-any-adv.png)
4
+
5
+ | English | [한국어](docs/README.ko.md) | [日本語](docs/README.ja.md) | [中文](docs/README.zh.md) |
6
+ | --- | --- | --- | --- |
7
7
 
8
8
  [![npm version](https://img.shields.io/npm/v/@oneciel-ai/claude-any?logo=npm&label=npm)](https://www.npmjs.com/package/@oneciel-ai/claude-any)
9
9
  [![npm downloads](https://img.shields.io/npm/dm/@oneciel-ai/claude-any?logo=npm&label=downloads)](https://www.npmjs.com/package/@oneciel-ai/claude-any)
@@ -11,23 +11,23 @@
11
11
 
12
12
  > ## 🚀 Use the full Claude Code experience with free or low-cost LLMs
13
13
  >
14
- > - **Free** — [NVIDIA hosted NIM](https://build.nvidia.com/) (qwen3-coder-480b, gpt-oss, and friends) through the API Catalog.
15
- > - **Low-cost** — [Ollama Cloud](https://ollama.com/cloud) for GLM, Qwen, DeepSeek, and other open-weight models at a fraction of frontier-model pricing.
16
- > - **Free + local** — [Ollama](https://ollama.com/) or [vLLM](https://github.com/vllm-project/vllm) on your own GPU, fully offline.
17
- > - **Plan Mode + Advisor ready** — Claude Any preserves Claude Code Plan Mode on non-Anthropic providers and adds an optional long-context Advisor model for review.
18
- > - **Smooth free-model pacing** — Claude Code spends time reading files and running tools, and Claude Any uses that natural gap for RPM pacing so NVIDIA hosted free models feel usable even with strict per-minute limits.
19
- >
20
- > Provider, model, base URL, API key, streaming behavior, and LLM options are all selected from a console menu **before** Claude Code starts. Claude Code itself runs untouched with all of its native tooling, slash commands, and workflows.
21
-
22
- ## Today's Top 3 Benefits
23
-
24
- 1. **Plan Mode works on non-Anthropic models** — Claude Any keeps Claude Code's Plan Mode usable even when the upstream provider is NVIDIA hosted, Ollama Cloud, local Ollama, vLLM, or NIM.
25
- 2. **Advisor review with a bigger model** — pick a long-context Advisor Model at launch, then use `/advisor` inside Claude Code to review the current task, blockers, and next concrete action.
26
- 3. **Free-model RPM limits feel smoother** — router-side RPM pacing uses the natural time spent reading files and running tools, so NVIDIA hosted free models can stay within per-minute limits with less visible waiting.
27
-
28
- ### Demo
29
-
30
- ![NVIDIA hosted NIM driving Claude Code (deepseek-4-flash)](docs/assets/claude-any-nvidia-nim.gif)
14
+ > - **Free** — [NVIDIA hosted NIM](https://build.nvidia.com/) (qwen3-coder-480b, gpt-oss, and friends) through the API Catalog.
15
+ > - **Low-cost** — [Ollama Cloud](https://ollama.com/cloud) for GLM, Qwen, DeepSeek, and other open-weight models at a fraction of frontier-model pricing.
16
+ > - **Free + local** — [Ollama](https://ollama.com/) or [vLLM](https://github.com/vllm-project/vllm) on your own GPU, fully offline.
17
+ > - **Plan Mode + Advisor ready** — Claude Any preserves Claude Code Plan Mode on non-Anthropic providers and adds an optional long-context Advisor model for review.
18
+ > - **Smooth free-model pacing** — Claude Code spends time reading files and running tools, and Claude Any uses that natural gap for RPM pacing so NVIDIA hosted free models feel usable even with strict per-minute limits.
19
+ >
20
+ > Provider, model, base URL, API key, streaming behavior, and LLM options are all selected from a console menu **before** Claude Code starts. Claude Code itself runs untouched with all of its native tooling, slash commands, and workflows.
21
+
22
+ ## Today's Top 3 Benefits
23
+
24
+ 1. **Plan Mode works on non-Anthropic models** — Claude Any keeps Claude Code's Plan Mode usable even when the upstream provider is NVIDIA hosted, Ollama Cloud, local Ollama, vLLM, or NIM.
25
+ 2. **Advisor review with a bigger model** — pick a long-context Advisor Model at launch, then use `/advisor` inside Claude Code to review the current task, blockers, and next concrete action.
26
+ 3. **Free-model RPM limits feel smoother** — router-side RPM pacing uses the natural time spent reading files and running tools, so NVIDIA hosted free models can stay within per-minute limits with less visible waiting.
27
+
28
+ ### Demo
29
+
30
+ ![NVIDIA hosted NIM driving Claude Code (deepseek-4-flash)](docs/assets/claude-any-nvidia-nim.gif)
31
31
 
32
32
  NVIDIA hosted NIM (deepseek-4-flash) driving Claude Code through the claude-any router.  [full mp4 ⤓](https://github.com/OneCielAI/claude-any/raw/main/demo/claude-any-nvidia-nim.mp4)
33
33
 
@@ -44,7 +44,7 @@ arguments through unchanged.
44
44
 
45
45
  Credits: One Ciel LLC
46
46
 
47
- Current version: `0.1.28`
47
+ Current version: `0.1.29`
48
48
 
49
49
  ## Why This Exists
50
50
 
@@ -84,23 +84,23 @@ Requirements:
84
84
 
85
85
  **Install from the npm registry (recommended):**
86
86
 
87
- ```sh
88
- npm install -g @oneciel-ai/claude-any
89
- ```
90
-
91
- ```sh
92
- claude-any
93
- ```
87
+ ```sh
88
+ npm install -g @oneciel-ai/claude-any
89
+ ```
90
+
91
+ ```sh
92
+ claude-any
93
+ ```
94
94
 
95
95
  **Upgrade:**
96
96
 
97
- ```sh
98
- npm update -g @oneciel-ai/claude-any
99
- ```
100
-
101
- ```sh
102
- claude-any version
103
- ```
97
+ ```sh
98
+ npm update -g @oneciel-ai/claude-any
99
+ ```
100
+
101
+ ```sh
102
+ claude-any version
103
+ ```
104
104
 
105
105
  **Uninstall:**
106
106
 
@@ -113,49 +113,49 @@ npm uninstall -g @oneciel-ai/claude-any
113
113
  Install directly from the GitHub repository (useful for testing unreleased
114
114
  commits between npm publishes):
115
115
 
116
- ```sh
117
- npm install -g https://github.com/OneCielAI/claude-any.git
118
- ```
119
-
120
- ```sh
121
- claude-any
122
- ```
116
+ ```sh
117
+ npm install -g https://github.com/OneCielAI/claude-any.git
118
+ ```
119
+
120
+ ```sh
121
+ claude-any
122
+ ```
123
123
 
124
124
  POSIX source install:
125
125
 
126
- ```sh
127
- git clone https://github.com/OneCielAI/claude-any.git
128
- ```
129
-
130
- ```sh
131
- cd claude-any
132
- ```
133
-
134
- ```sh
135
- ./install.sh
136
- ```
137
-
138
- ```sh
139
- claude-any
140
- ```
126
+ ```sh
127
+ git clone https://github.com/OneCielAI/claude-any.git
128
+ ```
129
+
130
+ ```sh
131
+ cd claude-any
132
+ ```
133
+
134
+ ```sh
135
+ ./install.sh
136
+ ```
137
+
138
+ ```sh
139
+ claude-any
140
+ ```
141
141
 
142
142
  Windows PowerShell source install:
143
143
 
144
- ```powershell
145
- git clone https://github.com/OneCielAI/claude-any.git
146
- ```
147
-
148
- ```powershell
149
- cd claude-any
150
- ```
151
-
152
- ```powershell
153
- .\install.ps1
154
- ```
155
-
156
- ```powershell
157
- claude-any
158
- ```
144
+ ```powershell
145
+ git clone https://github.com/OneCielAI/claude-any.git
146
+ ```
147
+
148
+ ```powershell
149
+ cd claude-any
150
+ ```
151
+
152
+ ```powershell
153
+ .\install.ps1
154
+ ```
155
+
156
+ ```powershell
157
+ claude-any
158
+ ```
159
159
 
160
160
  ### Releasing (maintainers)
161
161
 
@@ -263,24 +263,24 @@ steps under that larger model's supervision.
263
263
  - Native paths where providers expose Claude/Anthropic-compatible endpoints.
264
264
  - Router mode for providers that need request/response adaptation.
265
265
  - DuckDuckGo and fetch MCP wiring for non-native providers.
266
- - Headless setup flags such as `--ca-provider`, `--ca-model`, `--ca-base-url`,
267
- `--ca-api-key-env`, `--ca-ollama-option`, and `--ca-max-output-tokens`.
268
- - Claude Code Plan Mode support on router-backed non-Anthropic providers,
269
- including local handling for `EnterPlanMode` and plan artifacts.
270
- - Optional `/advisor` slash command that routes the current task state to a
271
- selected Advisor Model, useful for long-context review and next-step checks.
272
- - Claude Code `statusLine` integration showing router RPM usage and wait time
273
- in the bottom status area instead of polluting the chat transcript.
274
- - Router-side RPM control for NVIDIA hosted, self-hosted NIM, Ollama, and
275
- Ollama Cloud. `rate_limit_rpm=0` disables throttling while still showing the
276
- last-60-seconds usage rate.
277
- - Soft pacing subtracts time already spent reading files, running commands, and
278
- waiting for tool results. In real coding sessions, those tool-call gaps absorb
279
- much of the RPM spacing naturally, so providers such as NVIDIA hosted NIM can
280
- stay within free-model limits without making every Claude Code turn feel
281
- rate-limited.
282
- - Streaming proxy for Ollama/Ollama Cloud router path — tokens are delivered
283
- to Claude Code as they arrive instead of waiting for the full response.
266
+ - Headless setup flags such as `--ca-provider`, `--ca-model`, `--ca-base-url`,
267
+ `--ca-api-key-env`, `--ca-ollama-option`, and `--ca-max-output-tokens`.
268
+ - Claude Code Plan Mode support on router-backed non-Anthropic providers,
269
+ including local handling for `EnterPlanMode` and plan artifacts.
270
+ - Optional `/advisor` slash command that routes the current task state to a
271
+ selected Advisor Model, useful for long-context review and next-step checks.
272
+ - Claude Code `statusLine` integration showing router RPM usage and wait time
273
+ in the bottom status area instead of polluting the chat transcript.
274
+ - Router-side RPM control for NVIDIA hosted, self-hosted NIM, Ollama, and
275
+ Ollama Cloud. `rate_limit_rpm=0` disables throttling while still showing the
276
+ last-60-seconds usage rate.
277
+ - Soft pacing subtracts time already spent reading files, running commands, and
278
+ waiting for tool results. In real coding sessions, those tool-call gaps absorb
279
+ much of the RPM spacing naturally, so providers such as NVIDIA hosted NIM can
280
+ stay within free-model limits without making every Claude Code turn feel
281
+ rate-limited.
282
+ - Streaming proxy for Ollama/Ollama Cloud router path — tokens are delivered
283
+ to Claude Code as they arrive instead of waiting for the full response.
284
284
  - Per-provider `stream` on/off toggle and `stream_word_chunking` option to
285
285
  batch text deltas at word boundaries, mitigating SSE fragmentation that can
286
286
  break tool-call / JSON parsing in long streamed responses.
@@ -292,58 +292,66 @@ steps under that larger model's supervision.
292
292
  including `WorktreeCreate` / `WorktreeRemove`, so non-git working directories
293
293
  no longer fail Agent isolation with
294
294
  `Cannot create agent worktree: not in a git repository...`.
295
- - Config file caching — settings are read from disk once and reused until the
296
- file changes, reducing per-request overhead in the router.
297
- - Router control-plane endpoints for headless agent coordination:
298
- `/ca/chat/messages`, `/ca/chat/wait`, `/ca/chat/stream`, `/ca/chat/files`,
299
- and `/ca/plan/artifacts`.
295
+ - Config file caching — settings are read from disk once and reused until the
296
+ file changes, reducing per-request overhead in the router.
297
+ - Router control-plane endpoints for headless agent coordination:
298
+ `/ca/chat/messages`, `/ca/chat/wait`, `/ca/chat/stream`, `/ca/chat/files`,
299
+ and `/ca/plan/artifacts`.
300
300
 
301
301
  ## Changelog
302
302
 
303
- ### 0.1.28
304
-
305
- - **Plan Mode + Advisor headline**: document Claude Any's Plan Mode support for
306
- router-backed non-Anthropic providers and the `/advisor` slash command backed
307
- by a selected long-context Advisor Model.
308
- - **Status-line RPM telemetry**: Claude Any installs a Claude Code `statusLine`
309
- command that shows router RPM usage and the latest wait time in the bottom
310
- status area, keeping rate-limit telemetry out of the chat transcript.
311
- - **Soft RPM pacing for free hosted models**: NVIDIA hosted, self-hosted NIM,
312
- Ollama, and Ollama Cloud can use router-side RPM pacing. The pacing subtracts
313
- time already spent in file reads, command execution, and tool-result waits, so
314
- normal coding tool-call gaps naturally absorb much of the RPM spacing.
315
- - **Unlimited usage display**: `rate_limit_rpm=0` disables throttling while
316
- still displaying the last-60-seconds request rate.
317
-
318
- ### 0.1.27
319
-
320
- - **Plan mode support for non-Anthropic providers**: the router now keeps
321
- `EnterPlanMode` available and supports Claude Code Plan mode even when the
322
- upstream model does not reliably choose that internal tool. Forced
323
- `tool_choice=EnterPlanMode` is answered locally with a valid Anthropic
324
- `tool_use`, and long implementation requests that receive only a short or
325
- empty non-actionable text response are promoted to `EnterPlanMode` using
326
- language-agnostic structure checks.
327
- - **Plan-mode self-tool handling**: unsupported Claude Code self-tools are
328
- still stripped for non-Anthropic providers, but Plan-mode tools are handled
329
- separately so planning can work instead of being disabled.
303
+ ### 0.1.29
330
304
 
331
- ### 0.1.25
305
+ - **NVIDIA compatibility test fix**: `claude-any test` now restarts the local
306
+ router before router-mode tests, so upgraded installs do not accidentally use
307
+ an old long-running router that still expects `nvd-claude-proxy`.
308
+ - **Clear NVIDIA router wording**: menu status now describes NVIDIA hosted as
309
+ using the local claude-any router instead of the retired local proxy path.
332
310
 
333
- - **Plan-mode diagnostics**: set `~/.config/claude-any/log-level` to `TRACE`
334
- to capture redacted request and response summaries in `requests.jsonl` /
335
- `responses.jsonl`.
336
- - **Headless agent chat service**: the router exposes a small HTTP control
337
- plane for sub coding agents. Agents can post messages, poll updates after
338
- the last seen message id, or wait on an SSE stream when they do not have
339
- their own loop.
340
- - **Plan artifact serving**: agents can create plan files through the router
341
- and share stable local URLs, matching Claude Code's file/artifact-oriented
342
- Plan-mode workflow without copying Anthropic's internal implementation.
343
-
344
- ### 0.1.24
311
+ ### 0.1.28
345
312
 
346
- - **First public npm release** under the correct scope: `@oneciel-ai/claude-any`. Earlier 0.1.x versions were never published to the registry; this is the version that is actually installable via `npm install -g @oneciel-ai/claude-any`.
313
+ - **Plan Mode + Advisor headline**: document Claude Any's Plan Mode support for
314
+ router-backed non-Anthropic providers and the `/advisor` slash command backed
315
+ by a selected long-context Advisor Model.
316
+ - **Status-line RPM telemetry**: Claude Any installs a Claude Code `statusLine`
317
+ command that shows router RPM usage and the latest wait time in the bottom
318
+ status area, keeping rate-limit telemetry out of the chat transcript.
319
+ - **Soft RPM pacing for free hosted models**: NVIDIA hosted, self-hosted NIM,
320
+ Ollama, and Ollama Cloud can use router-side RPM pacing. The pacing subtracts
321
+ time already spent in file reads, command execution, and tool-result waits, so
322
+ normal coding tool-call gaps naturally absorb much of the RPM spacing.
323
+ - **Unlimited usage display**: `rate_limit_rpm=0` disables throttling while
324
+ still displaying the last-60-seconds request rate.
325
+
326
+ ### 0.1.27
327
+
328
+ - **Plan mode support for non-Anthropic providers**: the router now keeps
329
+ `EnterPlanMode` available and supports Claude Code Plan mode even when the
330
+ upstream model does not reliably choose that internal tool. Forced
331
+ `tool_choice=EnterPlanMode` is answered locally with a valid Anthropic
332
+ `tool_use`, and long implementation requests that receive only a short or
333
+ empty non-actionable text response are promoted to `EnterPlanMode` using
334
+ language-agnostic structure checks.
335
+ - **Plan-mode self-tool handling**: unsupported Claude Code self-tools are
336
+ still stripped for non-Anthropic providers, but Plan-mode tools are handled
337
+ separately so planning can work instead of being disabled.
338
+
339
+ ### 0.1.25
340
+
341
+ - **Plan-mode diagnostics**: set `~/.config/claude-any/log-level` to `TRACE`
342
+ to capture redacted request and response summaries in `requests.jsonl` /
343
+ `responses.jsonl`.
344
+ - **Headless agent chat service**: the router exposes a small HTTP control
345
+ plane for sub coding agents. Agents can post messages, poll updates after
346
+ the last seen message id, or wait on an SSE stream when they do not have
347
+ their own loop.
348
+ - **Plan artifact serving**: agents can create plan files through the router
349
+ and share stable local URLs, matching Claude Code's file/artifact-oriented
350
+ Plan-mode workflow without copying Anthropic's internal implementation.
351
+
352
+ ### 0.1.24
353
+
354
+ - **First public npm release** under the correct scope: `@oneciel-ai/claude-any`. Earlier 0.1.x versions were never published to the registry; this is the version that is actually installable via `npm install -g @oneciel-ai/claude-any`.
347
355
 
348
356
  ### 0.1.23
349
357
 
@@ -515,32 +523,32 @@ claude-any --ca-provider vllm --ca-base-url http://127.0.0.1:8000 --ca-model Qwe
515
523
  claude-any --ca-no-update-check -p "Reply with OK only." --output-format text
516
524
  ```
517
525
 
518
- All other arguments are passed through to Claude Code.
519
-
520
- ## Headless Agent Chat
521
-
522
- When the claude-any router is running, sub agents can coordinate through local
523
- HTTP endpoints without opening the menu:
524
-
525
- ```sh
526
- # Send a message to a channel.
527
- curl -s http://127.0.0.1:8799/ca/chat/messages \
528
- -H 'content-type: application/json' \
529
- -d '{"channel":"agents","sender_id":"codex","recipients":["kimi"],"message":"Need logs after id 42"}'
530
-
531
- # Poll updates after the last message id.
532
- curl -s 'http://127.0.0.1:8799/ca/chat/messages?channel=agents&recipient=kimi&after=42'
533
-
534
- # Wait on a stream until messages arrive.
535
- curl -N 'http://127.0.0.1:8799/ca/chat/stream?channel=agents&recipient=kimi&after=42&timeout=300'
536
-
537
- # Publish a plan file and get a served URL.
538
- curl -s http://127.0.0.1:8799/ca/plan/artifacts \
539
- -H 'content-type: application/json' \
540
- -d '{"title":"handoff","name":"handoff.md","content":"# Plan\n- step 1"}'
541
- ```
542
-
543
- ## Security
526
+ All other arguments are passed through to Claude Code.
527
+
528
+ ## Headless Agent Chat
529
+
530
+ When the claude-any router is running, sub agents can coordinate through local
531
+ HTTP endpoints without opening the menu:
532
+
533
+ ```sh
534
+ # Send a message to a channel.
535
+ curl -s http://127.0.0.1:8799/ca/chat/messages \
536
+ -H 'content-type: application/json' \
537
+ -d '{"channel":"agents","sender_id":"codex","recipients":["kimi"],"message":"Need logs after id 42"}'
538
+
539
+ # Poll updates after the last message id.
540
+ curl -s 'http://127.0.0.1:8799/ca/chat/messages?channel=agents&recipient=kimi&after=42'
541
+
542
+ # Wait on a stream until messages arrive.
543
+ curl -N 'http://127.0.0.1:8799/ca/chat/stream?channel=agents&recipient=kimi&after=42&timeout=300'
544
+
545
+ # Publish a plan file and get a served URL.
546
+ curl -s http://127.0.0.1:8799/ca/plan/artifacts \
547
+ -H 'content-type: application/json' \
548
+ -d '{"title":"handoff","name":"handoff.md","content":"# Plan\n- step 1"}'
549
+ ```
550
+
551
+ ## Security
544
552
 
545
553
  Do not commit runtime configuration or API keys. Claude Any stores local runtime
546
554
  configuration under `~/.config/claude-any/`. NVIDIA hosted credentials used by
@@ -656,7 +656,7 @@ def probe_base_url(provider: str, pcfg: dict) -> str:
656
656
  if "your-" in base:
657
657
  return f"Base URL: placeholder ({base})"
658
658
  if provider == "nvidia-hosted":
659
- return f"Base URL: NVIDIA hosted ({base}); proxy starts on launch"
659
+ return f"Base URL: NVIDIA hosted ({base}); local router http://127.0.0.1:8799 starts on launch"
660
660
  path = "/api/tags" if provider in ("ollama", "ollama-cloud") else "/v1/models"
661
661
  url = join_url(base, path)
662
662
  headers = {}