@oneciel-ai/claude-any 0.1.64 → 0.1.66

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -5,122 +5,130 @@
5
5
  </p>
6
6
 
7
7
  ![Claude Any: full Claude Code experience with free or low-cost LLMs](claude-any-adv.png)
8
-
9
- | English | [한국어](docs/README.ko.md) | [日本語](docs/README.ja.md) | [中文](docs/README.zh.md) |
10
- | --- | --- | --- | --- |
11
-
12
- [![npm version](https://img.shields.io/npm/v/@oneciel-ai/claude-any?logo=npm&label=npm)](https://www.npmjs.com/package/@oneciel-ai/claude-any)
13
- [![npm downloads](https://img.shields.io/npm/dm/@oneciel-ai/claude-any?logo=npm&label=downloads)](https://www.npmjs.com/package/@oneciel-ai/claude-any)
14
- [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
15
-
16
- > ## 🚀 Use the full Claude Code experience with free or low-cost LLMs
17
- >
18
- > - **Free** — [NVIDIA hosted NIM](https://build.nvidia.com/) (qwen3-coder-480b, gpt-oss, and friends) through the API Catalog.
19
- > - **Low-cost** — [Ollama Cloud](https://ollama.com/cloud) for GLM, Qwen, DeepSeek, and other open-weight models at a fraction of frontier-model pricing.
20
- > - **Free + local** — [Ollama](https://ollama.com/) or [vLLM](https://github.com/vllm-project/vllm) on your own GPU, fully offline.
21
- > - **Plan Mode + Advisor ready** — Claude Any preserves Claude Code Plan Mode on non-Anthropic providers and adds an optional long-context Advisor model for review.
22
- > - **Smooth free-model pacing** — Claude Code spends time reading files and running tools, and Claude Any uses that natural gap for RPM pacing so NVIDIA hosted free models feel usable even with strict per-minute limits.
23
- >
24
- > Provider, model, base URL, API key, streaming behavior, and LLM options are all selected from a console menu **before** Claude Code starts. Claude Code itself runs untouched with all of its native tooling, slash commands, and workflows.
25
-
26
- ## Today's Top 3 Benefits
27
-
28
- 1. **Plan Mode works on non-Anthropic models** — Claude Any keeps Claude Code's Plan Mode usable even when the upstream provider is NVIDIA hosted, Ollama Cloud, local Ollama, vLLM, or NIM.
29
- 2. **Advisor review with a bigger model** — pick a long-context Advisor Model at launch, then use `/advisor` inside Claude Code to review the current task, blockers, and next concrete action.
30
- 3. **Free-model RPM limits feel smoother** — router-side RPM pacing uses the natural time spent reading files and running tools, so NVIDIA hosted free models can stay within per-minute limits with less visible waiting.
31
-
32
- ### Demo
33
-
34
- ![NVIDIA hosted NIM driving Claude Code (deepseek-4-flash)](docs/assets/claude-any-nvidia-nim.gif)
35
-
36
- NVIDIA hosted NIM (deepseek-4-flash) driving Claude Code through the claude-any router. &nbsp;[full mp4 ⤓](https://github.com/OneCielAI/claude-any/raw/main/demo/claude-any-nvidia-nim.mp4)
37
-
38
- ![Ollama Cloud streamed through the claude-any router (glm-5.1)](docs/assets/claude-any-ollama-cloud.gif)
39
-
40
- Ollama Cloud (glm-5.1) streamed through the claude-any router with SSE word-boundary chunking enabled. &nbsp;[full mp4 ⤓](https://github.com/OneCielAI/claude-any/raw/main/demo/claude-any-ollama-cloud.mp4)
41
-
42
- ---
43
-
44
- Claude Any is a provider selector and compatibility launcher for Claude Code.
45
- It lets you choose Anthropic, Ollama, Ollama Cloud, vLLM, NVIDIA hosted models,
46
- or self-hosted NIM before Claude Code starts, then passes normal Claude Code
47
- arguments through unchanged.
48
-
49
- Credits: One Ciel LLC
50
-
51
- Current version: `0.1.64`
52
-
53
- ## Why This Exists
54
-
55
- Claude Any started from a practical need: even on the highest Claude Code plan,
56
- long sessions can run out of available tokens or become blocked while waiting
57
- for the next quota window. The goal is not to replace Claude Code, but to keep
58
- work moving. Slower but usable providers such as NVIDIA NIM, Ollama Cloud,
59
- vLLM, and local Ollama can act as hybrid third-party agents for summaries,
60
- research, journaling, simple coding tasks, and delegated background work.
61
-
62
- Another design goal is to keep as much of Claude Code's native experience as
63
- possible. When a provider exposes an Anthropic-compatible endpoint, Claude Any
64
- prefers that path so Claude Code tooling, permissions, model selection, and
65
- workflow behavior remain close to the original. For capabilities that remote
66
- providers cannot supply directly, such as web search, Claude Any adds separate
67
- MCP-based tooling.
68
-
69
- The pre-launch menu is console-first. Provider, model, base URL, API key, and
70
- options are meant to be easy to review and change before Claude Code starts,
71
- including over SSH.
72
-
73
- macOS has not been fully tested by the maintainer yet, but Claude Any uses
74
- portable Python and shell wrappers. If you hit a macOS issue, please report it.
75
-
76
- - D. Yun
77
-
78
- ## Install
79
-
80
- [![npm version](https://img.shields.io/npm/v/@oneciel-ai/claude-any.svg)](https://www.npmjs.com/package/@oneciel-ai/claude-any)
81
- [![npm downloads](https://img.shields.io/npm/dm/@oneciel-ai/claude-any.svg)](https://www.npmjs.com/package/@oneciel-ai/claude-any)
82
-
83
- Requirements:
84
-
85
- - Python 3.10+
86
- - Claude Code installed and available as `claude`
87
- - Node/npm (used for the install shim and optional MCP web tooling)
88
-
89
- **Install from the npm registry (recommended):**
90
-
91
- ```sh
92
- npm install -g @oneciel-ai/claude-any
93
- ```
94
-
95
- ```sh
96
- claude-any
97
- ```
98
-
99
- ## Run Claude Code Headlessly
100
-
101
- Use headless mode when a script, SSH session, CI job, or parent agent needs to
102
- launch Claude Code directly without opening the pre-launch menu. `claude-any`
103
- consumes `--ca-*` options, starts the required local router, and passes the
104
- remaining arguments to Claude Code.
105
-
106
- ```sh
107
- claude-any --ca-provider nvidia-hosted --ca-model z-ai/glm-4.7
108
- ```
109
-
110
- ```sh
111
- claude-any --ca-provider ollama-cloud --ca-model glm-5.1
112
- ```
113
-
114
- ```sh
115
- claude-any --ca-provider ollama --ca-base-url http://127.0.0.1:11434 --ca-model qwen3-coder
116
- ```
117
-
118
- Run one non-interactive Claude Code prompt:
119
-
120
- ```sh
121
- claude-any --ca-provider nvidia-hosted --ca-model z-ai/glm-4.7 --ca-no-update-check -p "Reply with OK only." --output-format text
122
- ```
123
-
8
+
9
+ | English | [한국어](docs/README.ko.md) | [日本語](docs/README.ja.md) | [中文](docs/README.zh.md) |
10
+ | --- | --- | --- | --- |
11
+
12
+ [![npm version](https://img.shields.io/npm/v/@oneciel-ai/claude-any?logo=npm&label=npm)](https://www.npmjs.com/package/@oneciel-ai/claude-any)
13
+ [![npm downloads](https://img.shields.io/npm/dm/@oneciel-ai/claude-any?logo=npm&label=downloads)](https://www.npmjs.com/package/@oneciel-ai/claude-any)
14
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
15
+
16
+ > ## 🚀 Use the full Claude Code experience with free or low-cost LLMs
17
+ >
18
+ > - **Free** — [NVIDIA hosted NIM](https://build.nvidia.com/) (qwen3-coder-480b, gpt-oss, and friends) through the API Catalog.
19
+ > - **Low-cost** — [Ollama Cloud](https://ollama.com/cloud) for GLM, Qwen, DeepSeek, and other open-weight models at a fraction of frontier-model pricing.
20
+ > - **Free + local** — [Ollama](https://ollama.com/) or [vLLM](https://github.com/vllm-project/vllm) on your own GPU, fully offline.
21
+ > - **Plan Mode + Advisor ready** — Claude Any preserves Claude Code Plan Mode on non-Anthropic providers and adds an optional long-context Advisor model for review.
22
+ > - **Smooth free-model pacing** — Claude Code spends time reading files and running tools, and Claude Any uses that natural gap for RPM pacing so NVIDIA hosted free models feel usable even with strict per-minute limits.
23
+ >
24
+ > Provider, model, base URL, API key, streaming behavior, and LLM options are all selected from a console menu **before** Claude Code starts. Claude Code itself runs untouched with all of its native tooling, slash commands, and workflows.
25
+
26
+ ## Today's Top 3 Benefits
27
+
28
+ ### 2026-05-14
29
+
30
+ 1. **Plan Mode loop recovery is semantic, not hard-coded** — unchanged `Read` results are now converted with the previous authoritative observation and current Plan Mode state, so Claude Code can move to `ExitPlanMode` or the next real step instead of rereading the same slice.
31
+ 2. **Remote test router mode is easier to expose** — set `CLAUDE_ANY_ROUTER_BIND_HOST=0.0.0.0` when you intentionally want to test the router from another machine, while Claude Code still talks to the safe local client base.
32
+ 3. **Cleaner router transcripts for third-party models** — attachment-only metadata, historical no-op tool results, and orphan tool results are normalized before reaching Ollama, Ollama Cloud, NVIDIA hosted, vLLM, or NIM.
33
+
34
+ ### 2026-05-13
35
+
36
+ 1. **Plan Mode works on non-Anthropic models** Claude Any keeps Claude Code's Plan Mode usable even when the upstream provider is NVIDIA hosted, Ollama Cloud, local Ollama, vLLM, or NIM.
37
+ 2. **Advisor review with a bigger model** — pick a long-context Advisor Model at launch, then use `/advisor` inside Claude Code to review the current task, blockers, and next concrete action.
38
+ 3. **Free-model RPM limits feel smoother** — router-side RPM pacing uses the natural time spent reading files and running tools, so NVIDIA hosted free models can stay within per-minute limits with less visible waiting.
39
+
40
+ ### Demo
41
+
42
+ ![NVIDIA hosted NIM driving Claude Code (deepseek-4-flash)](docs/assets/claude-any-nvidia-nim.gif)
43
+
44
+ NVIDIA hosted NIM (deepseek-4-flash) driving Claude Code through the claude-any router. &nbsp;[full mp4 ⤓](https://github.com/OneCielAI/claude-any/raw/main/demo/claude-any-nvidia-nim.mp4)
45
+
46
+ ![Ollama Cloud streamed through the claude-any router (glm-5.1)](docs/assets/claude-any-ollama-cloud.gif)
47
+
48
+ Ollama Cloud (glm-5.1) streamed through the claude-any router with SSE word-boundary chunking enabled. &nbsp;[full mp4 ⤓](https://github.com/OneCielAI/claude-any/raw/main/demo/claude-any-ollama-cloud.mp4)
49
+
50
+ ---
51
+
52
+ Claude Any is a provider selector and compatibility launcher for Claude Code.
53
+ It lets you choose Anthropic, Ollama, Ollama Cloud, vLLM, NVIDIA hosted models,
54
+ or self-hosted NIM before Claude Code starts, then passes normal Claude Code
55
+ arguments through unchanged.
56
+
57
+ Credits: One Ciel LLC
58
+
59
+ Current version: `0.1.66`
60
+
61
+ ## Why This Exists
62
+
63
+ Claude Any started from a practical need: even on the highest Claude Code plan,
64
+ long sessions can run out of available tokens or become blocked while waiting
65
+ for the next quota window. The goal is not to replace Claude Code, but to keep
66
+ work moving. Slower but usable providers such as NVIDIA NIM, Ollama Cloud,
67
+ vLLM, and local Ollama can act as hybrid third-party agents for summaries,
68
+ research, journaling, simple coding tasks, and delegated background work.
69
+
70
+ Another design goal is to keep as much of Claude Code's native experience as
71
+ possible. When a provider exposes an Anthropic-compatible endpoint, Claude Any
72
+ prefers that path so Claude Code tooling, permissions, model selection, and
73
+ workflow behavior remain close to the original. For capabilities that remote
74
+ providers cannot supply directly, such as web search, Claude Any adds separate
75
+ MCP-based tooling.
76
+
77
+ The pre-launch menu is console-first. Provider, model, base URL, API key, and
78
+ options are meant to be easy to review and change before Claude Code starts,
79
+ including over SSH.
80
+
81
+ macOS has not been fully tested by the maintainer yet, but Claude Any uses
82
+ portable Python and shell wrappers. If you hit a macOS issue, please report it.
83
+
84
+ - D. Yun
85
+
86
+ ## Install
87
+
88
+ [![npm version](https://img.shields.io/npm/v/@oneciel-ai/claude-any.svg)](https://www.npmjs.com/package/@oneciel-ai/claude-any)
89
+ [![npm downloads](https://img.shields.io/npm/dm/@oneciel-ai/claude-any.svg)](https://www.npmjs.com/package/@oneciel-ai/claude-any)
90
+
91
+ Requirements:
92
+
93
+ - Python 3.10+
94
+ - Claude Code installed and available as `claude`
95
+ - Node/npm (used for the install shim and optional MCP web tooling)
96
+
97
+ **Install from the npm registry (recommended):**
98
+
99
+ ```sh
100
+ npm install -g @oneciel-ai/claude-any
101
+ ```
102
+
103
+ ```sh
104
+ claude-any
105
+ ```
106
+
107
+ ## Run Claude Code Headlessly
108
+
109
+ Use headless mode when a script, SSH session, CI job, or parent agent needs to
110
+ launch Claude Code directly without opening the pre-launch menu. `claude-any`
111
+ consumes `--ca-*` options, starts the required local router, and passes the
112
+ remaining arguments to Claude Code.
113
+
114
+ ```sh
115
+ claude-any --ca-provider nvidia-hosted --ca-model z-ai/glm-4.7
116
+ ```
117
+
118
+ ```sh
119
+ claude-any --ca-provider ollama-cloud --ca-model glm-5.1
120
+ ```
121
+
122
+ ```sh
123
+ claude-any --ca-provider ollama --ca-base-url http://127.0.0.1:11434 --ca-model qwen3-coder
124
+ ```
125
+
126
+ Run one non-interactive Claude Code prompt:
127
+
128
+ ```sh
129
+ claude-any --ca-provider nvidia-hosted --ca-model z-ai/glm-4.7 --ca-no-update-check -p "Reply with OK only." --output-format text
130
+ ```
131
+
124
132
  Use the saved provider/model and only skip the menu:
125
133
 
126
134
  ```sh
@@ -130,7 +138,7 @@ CLAUDE_ANY_SKIP_MENU=1 claude-any -p "Summarize this repository." --output-forma
130
138
  Configure every launch option with flags:
131
139
 
132
140
  ```sh
133
- claude-any --ca-provider nvidia-hosted --ca-base-url https://integrate.api.nvidia.com/v1 --ca-model z-ai/glm-4.7 --ca-advisor-model deepseek-ai/deepseek-v4-pro --ca-api-key-env NVIDIA_API_KEY --ca-max-output-tokens 4096 --ca-context-window 65536 --ca-request-timeout-ms 300000 --ca-rate-limit-rpm 40 --ca-rate-limit-status on --ca-no-update-check -p "Reply with OK only." --output-format text
141
+ claude-any --ca-provider nvidia-hosted --ca-base-url https://integrate.api.nvidia.com/v1 --ca-model z-ai/glm-4.7 --ca-advisor-model deepseek-ai/deepseek-v4-pro --ca-api-key-env NVIDIA_API_KEY --ca-max-output-tokens 4096 --ca-context-window 65536 --ca-request-timeout-ms 120000 --ca-rate-limit-rpm 40 --ca-rate-limit-status on --ca-no-update-check -p "Reply with OK only." --output-format text
134
142
  ```
135
143
 
136
144
  Or put the same values in environment variables:
@@ -144,7 +152,7 @@ export CLAUDE_ANY_ADVISOR_MODEL=deepseek-ai/deepseek-v4-pro
144
152
  export CLAUDE_ANY_API_KEY_ENV=NVIDIA_API_KEY
145
153
  export CLAUDE_ANY_MAX_OUTPUT_TOKENS=4096
146
154
  export CLAUDE_ANY_CONTEXT_WINDOW=65536
147
- export CLAUDE_ANY_REQUEST_TIMEOUT_MS=300000
155
+ export CLAUDE_ANY_REQUEST_TIMEOUT_MS=120000
148
156
  export CLAUDE_ANY_RATE_LIMIT_RPM=40
149
157
  export CLAUDE_ANY_RATE_LIMIT_STATUS=on
150
158
  claude-any -p "Reply with OK only." --output-format text
@@ -172,172 +180,172 @@ scripts because the secret does not appear in shell history.
172
180
 
173
181
  More examples are in [Headless Examples](#headless-examples) and
174
182
  [the full manual](docs/manual.md#headless-usage).
175
-
176
- **Upgrade:**
177
-
178
- ```sh
179
- npm update -g @oneciel-ai/claude-any
180
- ```
181
-
182
- ```sh
183
- claude-any version
184
- ```
185
-
186
- **Uninstall:**
187
-
188
- ```sh
189
- npm uninstall -g @oneciel-ai/claude-any
190
- ```
191
-
192
- ### Alternative install paths
193
-
194
- Install directly from the GitHub repository (useful for testing unreleased
195
- commits between npm publishes):
196
-
197
- ```sh
198
- npm install -g https://github.com/OneCielAI/claude-any.git
199
- ```
200
-
201
- ```sh
202
- claude-any
203
- ```
204
-
205
- POSIX source install:
206
-
207
- ```sh
208
- git clone https://github.com/OneCielAI/claude-any.git
209
- ```
210
-
211
- ```sh
212
- cd claude-any
213
- ```
214
-
215
- ```sh
216
- ./install.sh
217
- ```
218
-
219
- ```sh
220
- claude-any
221
- ```
222
-
223
- Windows PowerShell source install:
224
-
225
- ```powershell
226
- git clone https://github.com/OneCielAI/claude-any.git
227
- ```
228
-
229
- ```powershell
230
- cd claude-any
231
- ```
232
-
233
- ```powershell
234
- .\install.ps1
235
- ```
236
-
237
- ```powershell
238
- claude-any
239
- ```
240
-
241
- ### Releasing (maintainers)
242
-
243
- The npm registry version is published automatically by the
244
- [`Publish to npm`](.github/workflows/npm-publish.yml) workflow when a new
245
- GitHub Release is published. The workflow needs an `NPM_TOKEN` repository
246
- secret containing a granular access token for `@oneciel-ai/claude-any` with
247
- *Bypass 2FA for publishing* enabled.
248
-
249
- Release flow:
250
-
251
- 1. Bump `version` in `package.json` and `VERSION` in `claude_any.py`.
252
- 2. Add a Changelog entry.
253
- 3. `git tag -a vX.Y.Z -m "..." && git push origin vX.Y.Z`.
254
- 4. `gh release create vX.Y.Z --title "..." --notes "..."` — this triggers
255
- the publish workflow.
256
-
257
-
258
- ![Claude Any menu](docs/assets/claude-any-main.en.png)
259
-
260
- ## Demo
261
-
262
- ![Claude Any demo](docs/assets/claude-any-demo.en.gif)
263
-
264
- The demo sequence now shows provider selection, Base URL entry, model selection,
265
- LLM options, and the compatibility test. The compatibility test checks a plain
266
- text response, a required `tool_use`, and a `tool_result` follow-up before the
267
- launcher recommends starting Claude Code.
268
-
269
- Additional current screenshots:
270
-
271
- | Provider | Base URL | Model | LLM options | Compatibility |
272
- | --- | --- | --- | --- | --- |
273
- | ![Provider](docs/assets/claude-any-provider.en.png) | ![Base URL](docs/assets/claude-any-base-url.en.png) | ![Model](docs/assets/claude-any-model.en.png) | ![Options](docs/assets/claude-any-options.en.png) | ![Test](docs/assets/claude-any-test.en.png) |
274
-
275
- See the [full manual](docs/manual.md) for provider setup, headless flags, and
276
- troubleshooting. A downloadable demo video is available at
277
- [docs/assets/claude-any-demo.mp4](docs/assets/claude-any-demo.mp4).
278
-
279
- ## Development Story
280
-
281
- Claude Any was built through real integration tests: provider switching, model
282
- discovery, API-key entry, compatibility tests, web-search tooling, timeout
283
- handling, and native Claude Code behavior. The main lesson was that
284
- Anthropic-compatible Messages endpoints are the cleanest integration path when a
285
- provider supports them. Ollama, vLLM, and NIM can expose Anthropic-compatible
286
- routes that preserve more of Claude Code's tooling model than a generic
287
- OpenAI-compatible chat route.
288
-
289
- Local inference was also tested with Qwen 3.6 27B Q4 through Ollama and vLLM on
290
- RTX 5090 and MSI GB10-class hardware. It worked, but the speed should not be
291
- judged against native Claude Code or Codex. In practice, some hosted/cloud
292
- choices such as NVIDIA NIM and Ollama Cloud felt more useful for this hybrid
293
- workflow than expected.
294
-
295
- OpenAI-compatible endpoints were deliberately kept out of the primary path for
296
- Claude Code use. In testing, tool-call translation through generic OpenAI chat
297
- compatibility was more brittle around tool parameters, tool results, repeated
298
- calls, retries, and model selection.
299
-
300
- The most recent vLLM finding is that server-side tool-call parsing must match
301
- the model family. For Claude Code, a vLLM server can be reachable and still fail
302
- if `--tool-call-parser` is wrong. In particular, Qwen3-Coder should be served
303
- with `--enable-auto-tool-choice --tool-call-parser qwen3_xml`; Hermes is for
304
- Hermes-style models and some older Qwen tool templates. Claude Any now surfaces
305
- this in the compatibility test instead of treating a simple text response as
306
- enough.
307
-
308
- ## Recommended Uses
309
-
310
- Claude Any is most useful where speed is less important than keeping background
311
- work moving. Good fits include Docker host maintenance, Windows or Linux system
312
- administration, cleanup scripts for unused files, periodic security checklists,
313
- log review, Windows Event Log review, intrusion-attempt triage, and report
314
- drafting.
315
-
316
- It is not a replacement for dedicated security products, but it can help
317
- administrators turn routine checks into repeatable scripts and readable reports.
318
- It is useful for summarizing possible virus, ransomware, brute-force, or
319
- remote-access intrusion attempts. In that sense, Claude Any can help you build a
320
- free or low-cost system security watcher for routine checks, alerts, and
321
- human-readable summaries.
322
-
323
- For example, it can help turn requests such as "install PostgreSQL in a Docker
324
- container" or "analyze today's Docker logs and email me a report" into concrete
325
- commands, scripts, scheduled jobs, and summaries.
326
-
327
- A practical pattern is tiered supervision: use smaller or cheaper models to
328
- watch logs and detect possible issues, use a larger model to review findings,
329
- write policy, and plan the response, then let smaller models execute routine
330
- steps under that larger model's supervision.
331
-
332
- ## Features
333
-
334
- - Pre-launch provider picker with English, Korean, Japanese, and Chinese UI.
335
- - Provider-aware model list and custom model entry.
336
- - API key entry outside the Claude Code chat input.
337
- - LLM option presets for context window, output tokens, timeout, sampling, and
338
- native compatibility.
339
- - Compatibility test before launch, including text response, tool use, and
340
- tool-result round trip checks.
183
+
184
+ **Upgrade:**
185
+
186
+ ```sh
187
+ npm update -g @oneciel-ai/claude-any
188
+ ```
189
+
190
+ ```sh
191
+ claude-any version
192
+ ```
193
+
194
+ **Uninstall:**
195
+
196
+ ```sh
197
+ npm uninstall -g @oneciel-ai/claude-any
198
+ ```
199
+
200
+ ### Alternative install paths
201
+
202
+ Install directly from the GitHub repository (useful for testing unreleased
203
+ commits between npm publishes):
204
+
205
+ ```sh
206
+ npm install -g https://github.com/OneCielAI/claude-any.git
207
+ ```
208
+
209
+ ```sh
210
+ claude-any
211
+ ```
212
+
213
+ POSIX source install:
214
+
215
+ ```sh
216
+ git clone https://github.com/OneCielAI/claude-any.git
217
+ ```
218
+
219
+ ```sh
220
+ cd claude-any
221
+ ```
222
+
223
+ ```sh
224
+ ./install.sh
225
+ ```
226
+
227
+ ```sh
228
+ claude-any
229
+ ```
230
+
231
+ Windows PowerShell source install:
232
+
233
+ ```powershell
234
+ git clone https://github.com/OneCielAI/claude-any.git
235
+ ```
236
+
237
+ ```powershell
238
+ cd claude-any
239
+ ```
240
+
241
+ ```powershell
242
+ .\install.ps1
243
+ ```
244
+
245
+ ```powershell
246
+ claude-any
247
+ ```
248
+
249
+ ### Releasing (maintainers)
250
+
251
+ The npm registry version is published automatically by the
252
+ [`Publish to npm`](.github/workflows/npm-publish.yml) workflow when a new
253
+ GitHub Release is published. The workflow needs an `NPM_TOKEN` repository
254
+ secret containing a granular access token for `@oneciel-ai/claude-any` with
255
+ *Bypass 2FA for publishing* enabled.
256
+
257
+ Release flow:
258
+
259
+ 1. Bump `version` in `package.json` and `VERSION` in `claude_any.py`.
260
+ 2. Add a Changelog entry.
261
+ 3. `git tag -a vX.Y.Z -m "..." && git push origin vX.Y.Z`.
262
+ 4. `gh release create vX.Y.Z --title "..." --notes "..."` — this triggers
263
+ the publish workflow.
264
+
265
+
266
+ ![Claude Any menu](docs/assets/claude-any-main.en.png)
267
+
268
+ ## Demo
269
+
270
+ ![Claude Any demo](docs/assets/claude-any-demo.en.gif)
271
+
272
+ The demo sequence now shows provider selection, Base URL entry, model selection,
273
+ LLM options, and the compatibility test. The compatibility test checks a plain
274
+ text response, a required `tool_use`, and a `tool_result` follow-up before the
275
+ launcher recommends starting Claude Code.
276
+
277
+ Additional current screenshots:
278
+
279
+ | Provider | Base URL | Model | LLM options | Compatibility |
280
+ | --- | --- | --- | --- | --- |
281
+ | ![Provider](docs/assets/claude-any-provider.en.png) | ![Base URL](docs/assets/claude-any-base-url.en.png) | ![Model](docs/assets/claude-any-model.en.png) | ![Options](docs/assets/claude-any-options.en.png) | ![Test](docs/assets/claude-any-test.en.png) |
282
+
283
+ See the [full manual](docs/manual.md) for provider setup, headless flags, and
284
+ troubleshooting. A downloadable demo video is available at
285
+ [docs/assets/claude-any-demo.mp4](docs/assets/claude-any-demo.mp4).
286
+
287
+ ## Development Story
288
+
289
+ Claude Any was built through real integration tests: provider switching, model
290
+ discovery, API-key entry, compatibility tests, web-search tooling, timeout
291
+ handling, and native Claude Code behavior. The main lesson was that
292
+ Anthropic-compatible Messages endpoints are the cleanest integration path when a
293
+ provider supports them. Ollama, vLLM, and NIM can expose Anthropic-compatible
294
+ routes that preserve more of Claude Code's tooling model than a generic
295
+ OpenAI-compatible chat route.
296
+
297
+ Local inference was also tested with Qwen 3.6 27B Q4 through Ollama and vLLM on
298
+ RTX 5090 and MSI GB10-class hardware. It worked, but the speed should not be
299
+ judged against native Claude Code or Codex. In practice, some hosted/cloud
300
+ choices such as NVIDIA NIM and Ollama Cloud felt more useful for this hybrid
301
+ workflow than expected.
302
+
303
+ OpenAI-compatible endpoints were deliberately kept out of the primary path for
304
+ Claude Code use. In testing, tool-call translation through generic OpenAI chat
305
+ compatibility was more brittle around tool parameters, tool results, repeated
306
+ calls, retries, and model selection.
307
+
308
+ The most recent vLLM finding is that server-side tool-call parsing must match
309
+ the model family. For Claude Code, a vLLM server can be reachable and still fail
310
+ if `--tool-call-parser` is wrong. In particular, Qwen3-Coder should be served
311
+ with `--enable-auto-tool-choice --tool-call-parser qwen3_xml`; Hermes is for
312
+ Hermes-style models and some older Qwen tool templates. Claude Any now surfaces
313
+ this in the compatibility test instead of treating a simple text response as
314
+ enough.
315
+
316
+ ## Recommended Uses
317
+
318
+ Claude Any is most useful where speed is less important than keeping background
319
+ work moving. Good fits include Docker host maintenance, Windows or Linux system
320
+ administration, cleanup scripts for unused files, periodic security checklists,
321
+ log review, Windows Event Log review, intrusion-attempt triage, and report
322
+ drafting.
323
+
324
+ It is not a replacement for dedicated security products, but it can help
325
+ administrators turn routine checks into repeatable scripts and readable reports.
326
+ It is useful for summarizing possible virus, ransomware, brute-force, or
327
+ remote-access intrusion attempts. In that sense, Claude Any can help you build a
328
+ free or low-cost system security watcher for routine checks, alerts, and
329
+ human-readable summaries.
330
+
331
+ For example, it can help turn requests such as "install PostgreSQL in a Docker
332
+ container" or "analyze today's Docker logs and email me a report" into concrete
333
+ commands, scripts, scheduled jobs, and summaries.
334
+
335
+ A practical pattern is tiered supervision: use smaller or cheaper models to
336
+ watch logs and detect possible issues, use a larger model to review findings,
337
+ write policy, and plan the response, then let smaller models execute routine
338
+ steps under that larger model's supervision.
339
+
340
+ ## Features
341
+
342
+ - Pre-launch provider picker with English, Korean, Japanese, and Chinese UI.
343
+ - Provider-aware model list and custom model entry.
344
+ - API key entry outside the Claude Code chat input.
345
+ - LLM option presets for context window, output tokens, timeout, sampling, and
346
+ native compatibility.
347
+ - Compatibility test before launch, including text response, tool use, and
348
+ tool-result round trip checks.
341
349
  - Runtime context reporting for vLLM/NIM when `/v1/models` exposes
342
350
  `max_model_len`.
343
351
  - Ollama model-context catalog: `claude-any ollama-catalog` downloads
@@ -345,46 +353,68 @@ steps under that larger model's supervision.
345
353
  context windows such as 256K Kimi and 1M DeepSeek models for preset filtering
346
354
  and status display.
347
355
  - Console-first pre-launch menu for SSH and terminal workflows.
348
- - Native paths where providers expose Claude/Anthropic-compatible endpoints.
349
- - Router mode for providers that need request/response adaptation.
350
- - DuckDuckGo and fetch MCP wiring for non-native providers.
351
- - Headless setup flags such as `--ca-provider`, `--ca-model`, `--ca-base-url`,
352
- `--ca-api-key-env`, `--ca-ollama-option`, and `--ca-max-output-tokens`.
353
- - Claude Code Plan Mode support on router-backed non-Anthropic providers,
354
- including local handling for `EnterPlanMode` and plan artifacts.
355
- - Optional `/advisor` slash command that routes the current task state to a
356
- selected Advisor Model, useful for long-context review and next-step checks.
357
- - Claude Code `statusLine` integration showing router RPM usage and wait time
358
- in the bottom status area instead of polluting the chat transcript.
359
- - Router-side RPM control for NVIDIA hosted, self-hosted NIM, Ollama, and
360
- Ollama Cloud. `rate_limit_rpm=0` disables throttling while still showing the
361
- last-60-seconds usage rate.
362
- - Soft pacing subtracts time already spent reading files, running commands, and
363
- waiting for tool results. In real coding sessions, those tool-call gaps absorb
364
- much of the RPM spacing naturally, so providers such as NVIDIA hosted NIM can
365
- stay within free-model limits without making every Claude Code turn feel
366
- rate-limited.
367
- - Streaming proxy for Ollama/Ollama Cloud router path — tokens are delivered
368
- to Claude Code as they arrive instead of waiting for the full response.
369
- - Per-provider `stream` on/off toggle and `stream_word_chunking` option to
370
- batch text deltas at word boundaries, mitigating SSE fragmentation that can
371
- break tool-call / JSON parsing in long streamed responses.
372
- - LLM options menu shows the meaning of the highlighted row at the bottom of
373
- the panel in the selected language (English, Korean, Japanese, Chinese), and
374
- boolean rows (`Stream`, `Stream word chunking`, `Native compatibility`,
375
- `Think`) toggle in place when you press Enter — no input prompt.
376
- - Tool guard hook coverage extended to the full Claude Code hook surface,
377
- including `WorktreeCreate` / `WorktreeRemove`, so non-git working directories
378
- no longer fail Agent isolation with
379
- `Cannot create agent worktree: not in a git repository...`.
380
- - Config file caching — settings are read from disk once and reused until the
381
- file changes, reducing per-request overhead in the router.
382
- - Router control-plane endpoints for headless agent coordination:
383
- `/ca/chat/messages`, `/ca/chat/wait`, `/ca/chat/stream`, `/ca/chat/files`,
384
- and `/ca/plan/artifacts`.
385
-
356
+ - Native paths where providers expose Claude/Anthropic-compatible endpoints.
357
+ - Router mode for providers that need request/response adaptation.
358
+ - DuckDuckGo and fetch MCP wiring for non-native providers.
359
+ - Headless setup flags such as `--ca-provider`, `--ca-model`, `--ca-base-url`,
360
+ `--ca-api-key-env`, `--ca-ollama-option`, and `--ca-max-output-tokens`.
361
+ - Claude Code Plan Mode support on router-backed non-Anthropic providers,
362
+ including local handling for `EnterPlanMode` and plan artifacts.
363
+ - Optional `/advisor` slash command that routes the current task state to a
364
+ selected Advisor Model, useful for long-context review and next-step checks.
365
+ - Claude Code `statusLine` integration showing router RPM usage and wait time
366
+ in the bottom status area instead of polluting the chat transcript.
367
+ - Router-side RPM control for NVIDIA hosted, self-hosted NIM, Ollama, and
368
+ Ollama Cloud. `rate_limit_rpm=0` disables throttling while still showing the
369
+ last-60-seconds usage rate.
370
+ - Soft pacing subtracts time already spent reading files, running commands, and
371
+ waiting for tool results. In real coding sessions, those tool-call gaps absorb
372
+ much of the RPM spacing naturally, so providers such as NVIDIA hosted NIM can
373
+ stay within free-model limits without making every Claude Code turn feel
374
+ rate-limited.
375
+ - Streaming proxy for Ollama/Ollama Cloud router path — tokens are delivered
376
+ to Claude Code as they arrive instead of waiting for the full response.
377
+ - Per-provider `stream` on/off toggle and `stream_word_chunking` option to
378
+ batch text deltas at word boundaries, mitigating SSE fragmentation that can
379
+ break tool-call / JSON parsing in long streamed responses.
380
+ - LLM options menu shows the meaning of the highlighted row at the bottom of
381
+ the panel in the selected language (English, Korean, Japanese, Chinese), and
382
+ boolean rows (`Stream`, `Stream word chunking`, `Native compatibility`,
383
+ `Think`) toggle in place when you press Enter — no input prompt.
384
+ - Tool guard hook coverage extended to the full Claude Code hook surface,
385
+ including `WorktreeCreate` / `WorktreeRemove`, so non-git working directories
386
+ no longer fail Agent isolation with
387
+ `Cannot create agent worktree: not in a git repository...`.
388
+ - Config file caching — settings are read from disk once and reused until the
389
+ file changes, reducing per-request overhead in the router.
390
+ - Router control-plane endpoints for headless agent coordination:
391
+ `/ca/chat/messages`, `/ca/chat/wait`, `/ca/chat/stream`, `/ca/chat/files`,
392
+ and `/ca/plan/artifacts`.
393
+
386
394
  ## Changelog
387
395
 
396
+ ### 0.1.66
397
+
398
+ - **TUI input fix for tmux/zsh**: portable menu prompts now handle visible text,
399
+ paste, Backspace/Delete, and Ctrl-U directly in raw terminal mode instead of
400
+ depending on fragile terminal echo restoration.
401
+ - **Issue-linked release flow**: npm publishing can now be triggered
402
+ automatically by pushing a `v*` tag, in addition to GitHub releases and manual
403
+ workflow dispatch.
404
+
405
+ ### 0.1.65
406
+
407
+ - **Plan Mode unchanged-Read loop recovery**: router conversion now preserves the
408
+ previous successful `Read` result for unchanged/no-op reads, exposes the
409
+ current Plan Mode state to third-party models, and avoids arbitrary retry
410
+ thresholds.
411
+ - **Cleaner third-party transcripts**: attachment-only metadata, historical
412
+ no-op tool results, and orphan tool results are normalized before reaching
413
+ Ollama, Ollama Cloud, NVIDIA hosted, vLLM, or NIM.
414
+ - **Remote router test binding**: `CLAUDE_ANY_ROUTER_BIND_HOST=0.0.0.0` can be
415
+ used for intentional remote testing while Claude Code keeps using the local
416
+ client base URL.
417
+
388
418
  ### 0.1.64
389
419
 
390
420
  - **Model-aware native auto-compact**: claude-any now injects
@@ -429,7 +459,7 @@ steps under that larger model's supervision.
429
459
 
430
460
  - **Dynamic timeout help**: the LLM options panel now describes
431
461
  `request_timeout_ms` using the currently selected value instead of always
432
- showing the old `300000 ms = 5 minutes` example.
462
+ showing a hard-coded timeout example.
433
463
 
434
464
  ### 0.1.49
435
465
 
@@ -558,283 +588,282 @@ steps under that larger model's supervision.
558
588
 
559
589
  ### 0.1.31
560
590
 
561
- - **5-minute default upstream timeout**: existing saved 10/30-minute defaults
562
- are migrated to 300000 ms so gateway stalls fail faster.
563
- - **Localized gateway retries**: 502/503/504 and socket timeout responses are
564
- retried automatically, with retry progress shown in the selected UI language.
565
-
566
- ### 0.1.30
567
-
568
- - **Headless launch docs moved up**: README now shows copy-ready examples for
569
- launching Claude Code directly with `--ca-provider`, `--ca-model`, `-p`, and
570
- `CLAUDE_ANY_SKIP_MENU=1` immediately after install.
571
- - **NVIDIA hosted wording cleanup**: provider and lifecycle docs now describe
572
- NVIDIA hosted as using the Claude Any local router, with no separate hosted
573
- API Catalog proxy requirement.
574
-
575
- ### 0.1.29
576
-
577
- - **NVIDIA compatibility test fix**: `claude-any test` now restarts the local
578
- router before router-mode tests, so upgraded installs do not accidentally use
579
- an old long-running router that still expects `nvd-claude-proxy`.
580
- - **Clear NVIDIA router wording**: menu status now describes NVIDIA hosted as
581
- using the local claude-any router instead of the retired local proxy path.
582
-
583
- ### 0.1.28
584
-
585
- - **Plan Mode + Advisor headline**: document Claude Any's Plan Mode support for
586
- router-backed non-Anthropic providers and the `/advisor` slash command backed
587
- by a selected long-context Advisor Model.
588
- - **Status-line RPM telemetry**: Claude Any installs a Claude Code `statusLine`
589
- command that shows router RPM usage and the latest wait time in the bottom
590
- status area, keeping rate-limit telemetry out of the chat transcript.
591
- - **Soft RPM pacing for free hosted models**: NVIDIA hosted, self-hosted NIM,
592
- Ollama, and Ollama Cloud can use router-side RPM pacing. The pacing subtracts
593
- time already spent in file reads, command execution, and tool-result waits, so
594
- normal coding tool-call gaps naturally absorb much of the RPM spacing.
595
- - **Unlimited usage display**: `rate_limit_rpm=0` disables throttling while
596
- still displaying the last-60-seconds request rate.
597
-
598
- ### 0.1.27
599
-
600
- - **Plan mode support for non-Anthropic providers**: the router now keeps
601
- `EnterPlanMode` available and supports Claude Code Plan mode even when the
602
- upstream model does not reliably choose that internal tool. Forced
603
- `tool_choice=EnterPlanMode` is answered locally with a valid Anthropic
604
- `tool_use`, and long implementation requests that receive only a short or
605
- empty non-actionable text response are promoted to `EnterPlanMode` using
606
- language-agnostic structure checks.
607
- - **Plan-mode self-tool handling**: unsupported Claude Code self-tools are
608
- still stripped for non-Anthropic providers, but Plan-mode tools are handled
609
- separately so planning can work instead of being disabled.
610
-
611
- ### 0.1.25
612
-
613
- - **Plan-mode diagnostics**: set `~/.config/claude-any/log-level` to `TRACE`
614
- to capture redacted request and response summaries in `requests.jsonl` /
615
- `responses.jsonl`.
616
- - **Headless agent chat service**: the router exposes a small HTTP control
617
- plane for sub coding agents. Agents can post messages, poll updates after
618
- the last seen message id, or wait on an SSE stream when they do not have
619
- their own loop.
620
- - **Plan artifact serving**: agents can create plan files through the router
621
- and share stable local URLs, matching Claude Code's file/artifact-oriented
622
- Plan-mode workflow without copying Anthropic's internal implementation.
623
-
624
- ### 0.1.24
625
-
626
- - **First public npm release** under the correct scope: `@oneciel-ai/claude-any`. Earlier 0.1.x versions were never published to the registry; this is the version that is actually installable via `npm install -g @oneciel-ai/claude-any`.
627
-
628
- ### 0.1.23
629
-
630
- - **Stream toggle**: each non-Anthropic provider now has a `stream_enabled`
631
- knob in the LLM options menu, in `claude-anyctl ollama-options` /
632
- `provider-options`, and in headless flags. When off, the router forces
633
- `stream:false` upstream and returns the full response to Claude Code — a
634
- workaround when streaming fragmentation breaks tool-call or JSON parsing.
635
- - **Word-boundary streaming**: new `stream_word_chunking` option buffers SSE
636
- text deltas to whitespace/word boundaries before flushing. Implemented for
637
- both the Ollama router path and the native pass-through path (vLLM, NVIDIA
638
- hosted, self-hosted NIM). Tool deltas and non-text SSE events pass through
639
- unchanged.
640
- - **All-hooks tool guard**: `install_tool_guard_hooks` now registers the full
641
- set of Claude Code hook events (PreToolUse, PostToolUse, PostToolUseFailure,
642
- PostToolBatch, PermissionRequest, PermissionDenied, SessionStart/End, Setup,
643
- UserPromptSubmit/Expansion, Stop, StopFailure, InstructionsLoaded,
644
- ConfigChange, CwdChanged, Notification, SubagentStart/Stop, TeammateIdle,
645
- TaskCreated, TaskCompleted, PreCompact, PostCompact, WorktreeCreate,
646
- WorktreeRemove, Elicitation, ElicitationResult). The WorktreeCreate handler
647
- emits `worktreePath = base_path` so Agent isolation works in non-git
648
- directories.
649
- - **Windows hook compatibility**: `shell_command_string` now emits forward
650
- slashes and POSIX quoting on Windows so Claude Code's sh-based hook runner
651
- doesn't strip backslashes from paths like `C:\Users\...`.
652
- - **LLM options UX**: per-row description footer rendered in the user's
653
- selected language, and boolean toggles (`Stream`, `Stream word chunking`,
654
- `Native compatibility`, `Think`) flip on Enter without a prompt.
655
-
656
- ### 0.1.22
657
-
658
- - **Headless manual expansion**: expand the manual with practical headless setup, launch, testing, passthrough, and cleanup examples for automation and remote-server use.
659
-
660
- ### 0.1.21
661
-
662
- - **Service lifecycle documentation**: clarify that Claude Any starts only the router/proxy services required for the selected provider at launch time, and `claude-any stop` is the explicit cleanup command.
663
-
664
- ### 0.1.20
665
-
666
- - **NVIDIA hosted quick test**: `auto` mode now uses a text-only quick test for NVIDIA hosted providers, avoiding slow or flaky tool_use requests during menu checks. Use `smoke` for text + tool_use, or `full` for the full text/tool_use/tool_result round trip.
667
- - **Menu test timeout**: the terminal menu now runs `claude-any test 60 auto`, which keeps the pre-launch test responsive for hosted models.
668
-
669
- ### 0.1.19
670
-
671
- - **Faster compatibility tests**: `claude-any test` now supports `auto`, `smoke`, and `full` modes.
672
- - **Menu default speedup**: the terminal menu runs `claude-any test 120 auto`, so NVIDIA hosted compatibility checks finish faster while full validation remains available with `claude-any test 180 full`.
673
-
674
- ### 0.1.18
675
-
676
- - **NVIDIA hosted transient diagnostics**: compatibility tests now identify `RemoteDisconnected`, connection resets, and 502/503/504 responses from NVIDIA hosted backends as transient upstream/API Catalog failures.
677
- - **NVIDIA proxy cleanup**: `claude-any stop` now also matches `nvd-claude-proxy` executable processes so stale proxy sessions are cleaned up reliably.
678
-
679
- ### 0.1.17
680
-
681
- - **Menu compatibility-test timeout**: the terminal menu now runs compatibility tests with an explicit 180 s timeout and stops the child process if it exceeds the menu hard limit, so slow hosted models cannot leave the menu appearing indefinitely pending.
682
-
683
- ### 0.1.16
684
-
685
- - **NVIDIA hosted proxy startup fix**: detect and launch an installed `nvd-claude-proxy`/`ncp` executable before falling back to `python -m nvd_claude_proxy.main`. This supports uv-tool installs where the proxy is available as a command but not importable from Claude Any's Python interpreter.
686
-
687
- ### 0.1.15
688
-
689
- - **Ollama/Ollama Cloud tool-call streaming fix**: emit streamed tool calls using sequential Anthropic SSE content block indexes and `input_json_delta` payloads. This prevents Claude Code from rejecting malformed streamed tool-use blocks with `Invalid tool parameters`.
690
- - **Tool guard auto-install**: non-Anthropic launches now merge the Claude Any tool guard into `~/.claude/settings.json` so generated tool inputs can be normalized before execution.
691
- - **Tool-call diagnostics**: router-side tool calls are logged to `~/.config/claude-any/tool-calls.jsonl`, and Claude Code hook inputs are logged to `~/.claude/claude-any-tool-guard/tool-events.jsonl` for precise debugging.
692
- - **Tool input normalization**: the guard now maps common aliases such as `path` to `file_path`, `cmd` to `command`, and `query` to `pattern`, and returns explicit guidance when required fields are missing.
693
-
694
- ### 0.1.14
695
-
696
- - **SSH/terminal arrow-key compatibility**: rewrote `read_menu_key()` with a proper ANSI escape sequence parser and moved raw terminal setup into `portable_select()` so the terminal stays in raw mode for the entire menu loop. This prevents escape sequences from leaking to the screen when `ECHO` is restored between keystrokes. Arrow keys, Home, and End now work reliably in SSH sessions.
697
- - **Test timeout**: default compatibility test timeout increased from 60 s to 120 s for slower cloud providers.
698
- - **Ollama Cloud compatibility test fix**: added `"stream": false` to compatibility test requests so the router fetches a single JSON response from Ollama Cloud instead of SSE streaming, which was causing `post_json` to timeout while collecting all chunks.
699
-
700
- ### 0.1.13
701
-
702
- - **Ollama streaming proxy**: The router now streams Ollama and Ollama Cloud
703
- responses through to Claude Code in real time using Anthropic SSE format,
704
- instead of buffering the entire response before delivery.
705
- - **Config caching**: `load_config()` now caches the configuration file in
706
- memory and only re-reads from disk when the file modification time changes.
707
- This eliminates repeated file I/O and JSON parsing on every router request.
708
- - **Token estimation caching**: `estimate_tokens()` now accepts an optional
709
- cache dict to avoid redundant `json.dumps()` calls within a single request.
710
- `ollama_chat_request` and `cap_output_tokens_for_context` share the same
711
- cache when computing context window sizing.
712
-
713
- ### 0.1.12
714
-
715
- - Refresh docs and demo assets.
716
-
717
- ### 0.1.11
718
-
719
- - Validate tool call compatibility.
720
-
721
- ### 0.1.10
722
-
723
- - Show runtime context in tests.
724
-
725
- ### 0.1.9
726
-
727
- - Cap presets to server context.
728
-
729
- ### 0.1.8
730
-
731
- - Localize LLM presets.
732
-
733
- ## Provider Notes
734
-
735
- | Provider | Mode | Notes |
736
- | --- | --- | --- |
737
- | Anthropic | Native Claude Code | Uses Claude login or Anthropic API key. |
738
- | Ollama | Native when available, router otherwise | Local Ollama normally needs no API key. Cloud models through local Ollama require `ollama signin` on the Ollama host. |
739
- | Ollama Cloud | Router | Calls `https://ollama.com/api`; requires an Ollama API key. |
740
- | vLLM | Native Anthropic-compatible endpoint | Use a vLLM endpoint that exposes Anthropic-compatible `/v1/messages`; match `--tool-call-parser` to the model family. |
741
- | NVIDIA hosted | Router | Uses the NVIDIA hosted API Catalog through the Claude Any local router. |
742
- | self-hosted NIM | Native Anthropic-compatible endpoint | Use the self-hosted NIM Anthropic-compatible endpoint. |
743
-
744
- ## Service Lifecycle
745
-
746
- Claude Any does not keep every possible backend helper running all the time. The
747
- normal lifecycle is:
748
-
749
- - Before launch, managed router processes can be stopped with
750
- `claude-any stop`.
751
- - When `claude-any` starts Claude Code, it starts only the services required by
752
- the selected provider.
753
- - Ollama and Ollama Cloud router mode use the Claude Any router on
754
- `127.0.0.1:8799`.
755
- - NVIDIA hosted router mode uses the Claude Any router on `127.0.0.1:8799`;
756
- hosted API Catalog models do not require a separate NVIDIA proxy.
757
- - Run `claude-any stop` before a fresh provider-switch test if an old router is
758
- still bound to the local port.
759
-
760
- This keeps Claude Code pointed at one stable Claude Any entry point while still
761
- letting provider-specific helpers start on demand.
762
-
763
- For Qwen3-Coder on vLLM, start the server with a matching tool parser:
764
-
765
- ```sh
766
- vllm serve Qwen/Qwen3-Coder-30B-A3B-Instruct \
767
- --host 0.0.0.0 \
768
- --port 8000 \
769
- --served-model-name qwen3-coder-30b \
770
- --max-model-len 65536 \
771
- --enable-auto-tool-choice \
772
- --tool-call-parser qwen3_xml
773
- ```
774
-
775
- ## Provider Links
776
-
777
- - Ollama Cloud: [cloud overview](https://ollama.com/cloud), [API key settings](https://ollama.com/settings/keys), [authentication docs](https://docs.ollama.com/api/authentication).
778
- - Ollama local Anthropic compatibility: [Ollama Anthropic API docs](https://docs.ollama.com/api/anthropic-compatibility).
779
- - vLLM: [Claude Code integration](https://docs.vllm.ai/en/latest/serving/integrations/claude_code/), [tool calling](https://docs.vllm.ai/en/stable/features/tool_calling/), [project GitHub](https://github.com/vllm-project/vllm).
780
- - NVIDIA hosted NIM: [NVIDIA API Catalog](https://build.nvidia.com/), [API Catalog quickstart](https://docs.api.nvidia.com/nim/docs/api-quickstart).
781
- - Self-hosted NVIDIA NIM: [Claude Code with NIM](https://docs.nvidia.com/nim/large-language-models/latest/ai-assistant-integrations/claude-code.html), [NIM for LLMs getting started](https://docs.nvidia.com/nim/large-language-models/1.14.0/getting-started.html), [NGC personal keys](https://org.ngc.nvidia.com/setup/personal-keys).
782
-
783
- ## Headless Examples
784
-
785
- Headless commands skip the pre-launch menu and launch Claude Code immediately.
786
- Claude Any consumes `--ca-*` setup flags, starts the required router
787
- services, then passes the remaining arguments to Claude Code.
788
-
789
- ```sh
790
- claude-any --ca-provider ollama-cloud --ca-model glm-5.1
791
- claude-any --ca-provider ollama --ca-base-url http://127.0.0.1:11434 --ca-model qwen3-coder
792
- claude-any --ca-provider ollama-cloud --ca-api-key-env OLLAMA_API_KEY --ca-model qwen3-coder:480b:cloud
793
- claude-any --ca-provider vllm --ca-base-url http://127.0.0.1:8000 --ca-model Qwen/Qwen3-Coder
794
- claude-any --ca-no-update-check -p "Reply with OK only." --output-format text
795
- ```
796
-
797
- All other arguments are passed through to Claude Code.
798
-
799
- ## Headless Agent Chat
800
-
801
- When the claude-any router is running, sub agents can coordinate through local
802
- HTTP endpoints without opening the menu:
803
-
804
- ```sh
805
- # Send a message to a channel.
806
- curl -s http://127.0.0.1:8799/ca/chat/messages \
807
- -H 'content-type: application/json' \
808
- -d '{"channel":"agents","sender_id":"codex","recipients":["kimi"],"message":"Need logs after id 42"}'
809
-
810
- # Poll updates after the last message id.
811
- curl -s 'http://127.0.0.1:8799/ca/chat/messages?channel=agents&recipient=kimi&after=42'
812
-
813
- # Wait on a stream until messages arrive.
814
- curl -N 'http://127.0.0.1:8799/ca/chat/stream?channel=agents&recipient=kimi&after=42&timeout=300'
815
-
816
- # Publish a plan file and get a served URL.
817
- curl -s http://127.0.0.1:8799/ca/plan/artifacts \
818
- -H 'content-type: application/json' \
819
- -d '{"title":"handoff","name":"handoff.md","content":"# Plan\n- step 1"}'
820
- ```
821
-
822
- ## Security
823
-
824
- Do not commit runtime configuration or API keys. Claude Any stores local runtime
825
- configuration under `~/.config/claude-any/`. NVIDIA hosted credentials used by
826
- the optional proxy are stored under `~/.config/nvd-claude-proxy/`.
827
-
828
- This repository should contain source, documentation, and demo assets only.
829
-
830
- ## Development
831
-
832
- ```sh
833
- python -m py_compile claude_any.py claude-any-menu.py claude-any-tool-guard.py
834
- python -m ruff check .
835
- python scripts/make_demo_assets.py
836
- ```
837
-
838
- ## License
839
-
840
- MIT. See [LICENSE](LICENSE).
591
+ - **2-minute default upstream timeout**: existing saved longer bundled defaults are migrated to 120000 ms so gateway stalls fail faster.
592
+ - **Localized gateway retries**: 502/503/504 and socket timeout responses are
593
+ retried automatically, with retry progress shown in the selected UI language.
594
+
595
+ ### 0.1.30
596
+
597
+ - **Headless launch docs moved up**: README now shows copy-ready examples for
598
+ launching Claude Code directly with `--ca-provider`, `--ca-model`, `-p`, and
599
+ `CLAUDE_ANY_SKIP_MENU=1` immediately after install.
600
+ - **NVIDIA hosted wording cleanup**: provider and lifecycle docs now describe
601
+ NVIDIA hosted as using the Claude Any local router, with no separate hosted
602
+ API Catalog proxy requirement.
603
+
604
+ ### 0.1.29
605
+
606
+ - **NVIDIA compatibility test fix**: `claude-any test` now restarts the local
607
+ router before router-mode tests, so upgraded installs do not accidentally use
608
+ an old long-running router that still expects `nvd-claude-proxy`.
609
+ - **Clear NVIDIA router wording**: menu status now describes NVIDIA hosted as
610
+ using the local claude-any router instead of the retired local proxy path.
611
+
612
+ ### 0.1.28
613
+
614
+ - **Plan Mode + Advisor headline**: document Claude Any's Plan Mode support for
615
+ router-backed non-Anthropic providers and the `/advisor` slash command backed
616
+ by a selected long-context Advisor Model.
617
+ - **Status-line RPM telemetry**: Claude Any installs a Claude Code `statusLine`
618
+ command that shows router RPM usage and the latest wait time in the bottom
619
+ status area, keeping rate-limit telemetry out of the chat transcript.
620
+ - **Soft RPM pacing for free hosted models**: NVIDIA hosted, self-hosted NIM,
621
+ Ollama, and Ollama Cloud can use router-side RPM pacing. The pacing subtracts
622
+ time already spent in file reads, command execution, and tool-result waits, so
623
+ normal coding tool-call gaps naturally absorb much of the RPM spacing.
624
+ - **Unlimited usage display**: `rate_limit_rpm=0` disables throttling while
625
+ still displaying the last-60-seconds request rate.
626
+
627
+ ### 0.1.27
628
+
629
+ - **Plan mode support for non-Anthropic providers**: the router now keeps
630
+ `EnterPlanMode` available and supports Claude Code Plan mode even when the
631
+ upstream model does not reliably choose that internal tool. Forced
632
+ `tool_choice=EnterPlanMode` is answered locally with a valid Anthropic
633
+ `tool_use`, and long implementation requests that receive only a short or
634
+ empty non-actionable text response are promoted to `EnterPlanMode` using
635
+ language-agnostic structure checks.
636
+ - **Plan-mode self-tool handling**: unsupported Claude Code self-tools are
637
+ still stripped for non-Anthropic providers, but Plan-mode tools are handled
638
+ separately so planning can work instead of being disabled.
639
+
640
+ ### 0.1.25
641
+
642
+ - **Plan-mode diagnostics**: set `~/.config/claude-any/log-level` to `TRACE`
643
+ to capture redacted request and response summaries in `requests.jsonl` /
644
+ `responses.jsonl`.
645
+ - **Headless agent chat service**: the router exposes a small HTTP control
646
+ plane for sub coding agents. Agents can post messages, poll updates after
647
+ the last seen message id, or wait on an SSE stream when they do not have
648
+ their own loop.
649
+ - **Plan artifact serving**: agents can create plan files through the router
650
+ and share stable local URLs, matching Claude Code's file/artifact-oriented
651
+ Plan-mode workflow without copying Anthropic's internal implementation.
652
+
653
+ ### 0.1.24
654
+
655
+ - **First public npm release** under the correct scope: `@oneciel-ai/claude-any`. Earlier 0.1.x versions were never published to the registry; this is the version that is actually installable via `npm install -g @oneciel-ai/claude-any`.
656
+
657
+ ### 0.1.23
658
+
659
+ - **Stream toggle**: each non-Anthropic provider now has a `stream_enabled`
660
+ knob in the LLM options menu, in `claude-anyctl ollama-options` /
661
+ `provider-options`, and in headless flags. When off, the router forces
662
+ `stream:false` upstream and returns the full response to Claude Code — a
663
+ workaround when streaming fragmentation breaks tool-call or JSON parsing.
664
+ - **Word-boundary streaming**: new `stream_word_chunking` option buffers SSE
665
+ text deltas to whitespace/word boundaries before flushing. Implemented for
666
+ both the Ollama router path and the native pass-through path (vLLM, NVIDIA
667
+ hosted, self-hosted NIM). Tool deltas and non-text SSE events pass through
668
+ unchanged.
669
+ - **All-hooks tool guard**: `install_tool_guard_hooks` now registers the full
670
+ set of Claude Code hook events (PreToolUse, PostToolUse, PostToolUseFailure,
671
+ PostToolBatch, PermissionRequest, PermissionDenied, SessionStart/End, Setup,
672
+ UserPromptSubmit/Expansion, Stop, StopFailure, InstructionsLoaded,
673
+ ConfigChange, CwdChanged, Notification, SubagentStart/Stop, TeammateIdle,
674
+ TaskCreated, TaskCompleted, PreCompact, PostCompact, WorktreeCreate,
675
+ WorktreeRemove, Elicitation, ElicitationResult). The WorktreeCreate handler
676
+ emits `worktreePath = base_path` so Agent isolation works in non-git
677
+ directories.
678
+ - **Windows hook compatibility**: `shell_command_string` now emits forward
679
+ slashes and POSIX quoting on Windows so Claude Code's sh-based hook runner
680
+ doesn't strip backslashes from paths like `C:\Users\...`.
681
+ - **LLM options UX**: per-row description footer rendered in the user's
682
+ selected language, and boolean toggles (`Stream`, `Stream word chunking`,
683
+ `Native compatibility`, `Think`) flip on Enter without a prompt.
684
+
685
+ ### 0.1.22
686
+
687
+ - **Headless manual expansion**: expand the manual with practical headless setup, launch, testing, passthrough, and cleanup examples for automation and remote-server use.
688
+
689
+ ### 0.1.21
690
+
691
+ - **Service lifecycle documentation**: clarify that Claude Any starts only the router/proxy services required for the selected provider at launch time, and `claude-any stop` is the explicit cleanup command.
692
+
693
+ ### 0.1.20
694
+
695
+ - **NVIDIA hosted quick test**: `auto` mode now uses a text-only quick test for NVIDIA hosted providers, avoiding slow or flaky tool_use requests during menu checks. Use `smoke` for text + tool_use, or `full` for the full text/tool_use/tool_result round trip.
696
+ - **Menu test timeout**: the terminal menu now runs `claude-any test 60 auto`, which keeps the pre-launch test responsive for hosted models.
697
+
698
+ ### 0.1.19
699
+
700
+ - **Faster compatibility tests**: `claude-any test` now supports `auto`, `smoke`, and `full` modes.
701
+ - **Menu default speedup**: the terminal menu runs `claude-any test 120 auto`, so NVIDIA hosted compatibility checks finish faster while full validation remains available with `claude-any test 180 full`.
702
+
703
+ ### 0.1.18
704
+
705
+ - **NVIDIA hosted transient diagnostics**: compatibility tests now identify `RemoteDisconnected`, connection resets, and 502/503/504 responses from NVIDIA hosted backends as transient upstream/API Catalog failures.
706
+ - **NVIDIA proxy cleanup**: `claude-any stop` now also matches `nvd-claude-proxy` executable processes so stale proxy sessions are cleaned up reliably.
707
+
708
+ ### 0.1.17
709
+
710
+ - **Menu compatibility-test timeout**: the terminal menu now runs compatibility tests with an explicit 180 s timeout and stops the child process if it exceeds the menu hard limit, so slow hosted models cannot leave the menu appearing indefinitely pending.
711
+
712
+ ### 0.1.16
713
+
714
+ - **NVIDIA hosted proxy startup fix**: detect and launch an installed `nvd-claude-proxy`/`ncp` executable before falling back to `python -m nvd_claude_proxy.main`. This supports uv-tool installs where the proxy is available as a command but not importable from Claude Any's Python interpreter.
715
+
716
+ ### 0.1.15
717
+
718
+ - **Ollama/Ollama Cloud tool-call streaming fix**: emit streamed tool calls using sequential Anthropic SSE content block indexes and `input_json_delta` payloads. This prevents Claude Code from rejecting malformed streamed tool-use blocks with `Invalid tool parameters`.
719
+ - **Tool guard auto-install**: non-Anthropic launches now merge the Claude Any tool guard into `~/.claude/settings.json` so generated tool inputs can be normalized before execution.
720
+ - **Tool-call diagnostics**: router-side tool calls are logged to `~/.config/claude-any/tool-calls.jsonl`, and Claude Code hook inputs are logged to `~/.claude/claude-any-tool-guard/tool-events.jsonl` for precise debugging.
721
+ - **Tool input normalization**: the guard now maps common aliases such as `path` to `file_path`, `cmd` to `command`, and `query` to `pattern`, and returns explicit guidance when required fields are missing.
722
+
723
+ ### 0.1.14
724
+
725
+ - **SSH/terminal arrow-key compatibility**: rewrote `read_menu_key()` with a proper ANSI escape sequence parser and moved raw terminal setup into `portable_select()` so the terminal stays in raw mode for the entire menu loop. This prevents escape sequences from leaking to the screen when `ECHO` is restored between keystrokes. Arrow keys, Home, and End now work reliably in SSH sessions.
726
+ - **Test timeout**: default compatibility test timeout increased from 60 s to 120 s for slower cloud providers.
727
+ - **Ollama Cloud compatibility test fix**: added `"stream": false` to compatibility test requests so the router fetches a single JSON response from Ollama Cloud instead of SSE streaming, which was causing `post_json` to timeout while collecting all chunks.
728
+
729
+ ### 0.1.13
730
+
731
+ - **Ollama streaming proxy**: The router now streams Ollama and Ollama Cloud
732
+ responses through to Claude Code in real time using Anthropic SSE format,
733
+ instead of buffering the entire response before delivery.
734
+ - **Config caching**: `load_config()` now caches the configuration file in
735
+ memory and only re-reads from disk when the file modification time changes.
736
+ This eliminates repeated file I/O and JSON parsing on every router request.
737
+ - **Token estimation caching**: `estimate_tokens()` now accepts an optional
738
+ cache dict to avoid redundant `json.dumps()` calls within a single request.
739
+ `ollama_chat_request` and `cap_output_tokens_for_context` share the same
740
+ cache when computing context window sizing.
741
+
742
+ ### 0.1.12
743
+
744
+ - Refresh docs and demo assets.
745
+
746
+ ### 0.1.11
747
+
748
+ - Validate tool call compatibility.
749
+
750
+ ### 0.1.10
751
+
752
+ - Show runtime context in tests.
753
+
754
+ ### 0.1.9
755
+
756
+ - Cap presets to server context.
757
+
758
+ ### 0.1.8
759
+
760
+ - Localize LLM presets.
761
+
762
+ ## Provider Notes
763
+
764
+ | Provider | Mode | Notes |
765
+ | --- | --- | --- |
766
+ | Anthropic | Native Claude Code | Uses Claude login or Anthropic API key. |
767
+ | Ollama | Native when available, router otherwise | Local Ollama normally needs no API key. Cloud models through local Ollama require `ollama signin` on the Ollama host. |
768
+ | Ollama Cloud | Router | Calls `https://ollama.com/api`; requires an Ollama API key. |
769
+ | vLLM | Native Anthropic-compatible endpoint | Use a vLLM endpoint that exposes Anthropic-compatible `/v1/messages`; match `--tool-call-parser` to the model family. |
770
+ | NVIDIA hosted | Router | Uses the NVIDIA hosted API Catalog through the Claude Any local router. |
771
+ | self-hosted NIM | Native Anthropic-compatible endpoint | Use the self-hosted NIM Anthropic-compatible endpoint. |
772
+
773
+ ## Service Lifecycle
774
+
775
+ Claude Any does not keep every possible backend helper running all the time. The
776
+ normal lifecycle is:
777
+
778
+ - Before launch, managed router processes can be stopped with
779
+ `claude-any stop`.
780
+ - When `claude-any` starts Claude Code, it starts only the services required by
781
+ the selected provider.
782
+ - Ollama and Ollama Cloud router mode use the Claude Any router on
783
+ `127.0.0.1:8799`.
784
+ - NVIDIA hosted router mode uses the Claude Any router on `127.0.0.1:8799`;
785
+ hosted API Catalog models do not require a separate NVIDIA proxy.
786
+ - Run `claude-any stop` before a fresh provider-switch test if an old router is
787
+ still bound to the local port.
788
+
789
+ This keeps Claude Code pointed at one stable Claude Any entry point while still
790
+ letting provider-specific helpers start on demand.
791
+
792
+ For Qwen3-Coder on vLLM, start the server with a matching tool parser:
793
+
794
+ ```sh
795
+ vllm serve Qwen/Qwen3-Coder-30B-A3B-Instruct \
796
+ --host 0.0.0.0 \
797
+ --port 8000 \
798
+ --served-model-name qwen3-coder-30b \
799
+ --max-model-len 65536 \
800
+ --enable-auto-tool-choice \
801
+ --tool-call-parser qwen3_xml
802
+ ```
803
+
804
+ ## Provider Links
805
+
806
+ - Ollama Cloud: [cloud overview](https://ollama.com/cloud), [API key settings](https://ollama.com/settings/keys), [authentication docs](https://docs.ollama.com/api/authentication).
807
+ - Ollama local Anthropic compatibility: [Ollama Anthropic API docs](https://docs.ollama.com/api/anthropic-compatibility).
808
+ - vLLM: [Claude Code integration](https://docs.vllm.ai/en/latest/serving/integrations/claude_code/), [tool calling](https://docs.vllm.ai/en/stable/features/tool_calling/), [project GitHub](https://github.com/vllm-project/vllm).
809
+ - NVIDIA hosted NIM: [NVIDIA API Catalog](https://build.nvidia.com/), [API Catalog quickstart](https://docs.api.nvidia.com/nim/docs/api-quickstart).
810
+ - Self-hosted NVIDIA NIM: [Claude Code with NIM](https://docs.nvidia.com/nim/large-language-models/latest/ai-assistant-integrations/claude-code.html), [NIM for LLMs getting started](https://docs.nvidia.com/nim/large-language-models/1.14.0/getting-started.html), [NGC personal keys](https://org.ngc.nvidia.com/setup/personal-keys).
811
+
812
+ ## Headless Examples
813
+
814
+ Headless commands skip the pre-launch menu and launch Claude Code immediately.
815
+ Claude Any consumes `--ca-*` setup flags, starts the required router
816
+ services, then passes the remaining arguments to Claude Code.
817
+
818
+ ```sh
819
+ claude-any --ca-provider ollama-cloud --ca-model glm-5.1
820
+ claude-any --ca-provider ollama --ca-base-url http://127.0.0.1:11434 --ca-model qwen3-coder
821
+ claude-any --ca-provider ollama-cloud --ca-api-key-env OLLAMA_API_KEY --ca-model qwen3-coder:480b:cloud
822
+ claude-any --ca-provider vllm --ca-base-url http://127.0.0.1:8000 --ca-model Qwen/Qwen3-Coder
823
+ claude-any --ca-no-update-check -p "Reply with OK only." --output-format text
824
+ ```
825
+
826
+ All other arguments are passed through to Claude Code.
827
+
828
+ ## Headless Agent Chat
829
+
830
+ When the claude-any router is running, sub agents can coordinate through local
831
+ HTTP endpoints without opening the menu:
832
+
833
+ ```sh
834
+ # Send a message to a channel.
835
+ curl -s http://127.0.0.1:8799/ca/chat/messages \
836
+ -H 'content-type: application/json' \
837
+ -d '{"channel":"agents","sender_id":"codex","recipients":["kimi"],"message":"Need logs after id 42"}'
838
+
839
+ # Poll updates after the last message id.
840
+ curl -s 'http://127.0.0.1:8799/ca/chat/messages?channel=agents&recipient=kimi&after=42'
841
+
842
+ # Wait on a stream until messages arrive.
843
+ curl -N 'http://127.0.0.1:8799/ca/chat/stream?channel=agents&recipient=kimi&after=42&timeout=300'
844
+
845
+ # Publish a plan file and get a served URL.
846
+ curl -s http://127.0.0.1:8799/ca/plan/artifacts \
847
+ -H 'content-type: application/json' \
848
+ -d '{"title":"handoff","name":"handoff.md","content":"# Plan\n- step 1"}'
849
+ ```
850
+
851
+ ## Security
852
+
853
+ Do not commit runtime configuration or API keys. Claude Any stores local runtime
854
+ configuration under `~/.config/claude-any/`. NVIDIA hosted credentials used by
855
+ the optional proxy are stored under `~/.config/nvd-claude-proxy/`.
856
+
857
+ This repository should contain source, documentation, and demo assets only.
858
+
859
+ ## Development
860
+
861
+ ```sh
862
+ python -m py_compile claude_any.py claude-any-menu.py claude-any-tool-guard.py
863
+ python -m ruff check .
864
+ python scripts/make_demo_assets.py
865
+ ```
866
+
867
+ ## License
868
+
869
+ MIT. See [LICENSE](LICENSE).