@oneciel-ai/claude-any 0.1.27 → 0.1.29

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,495 +1,569 @@
1
- # Claude Any
2
-
3
- | English | [한국어](docs/README.ko.md) | [日本語](docs/README.ja.md) | [中文](docs/README.zh.md) |
4
- | --- | --- | --- | --- |
5
-
6
- [![npm version](https://img.shields.io/npm/v/@oneciel-ai/claude-any?logo=npm&label=npm)](https://www.npmjs.com/package/@oneciel-ai/claude-any)
7
- [![npm downloads](https://img.shields.io/npm/dm/@oneciel-ai/claude-any?logo=npm&label=downloads)](https://www.npmjs.com/package/@oneciel-ai/claude-any)
8
- [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
9
-
10
- > ## 🚀 Use the full Claude Code experience with free or low-cost LLMs
11
- >
12
- > - **Free** [NVIDIA hosted NIM](https://build.nvidia.com/) (qwen3-coder-480b, gpt-oss, and friends) through the API Catalog.
13
- > - **Low-cost** — [Ollama Cloud](https://ollama.com/cloud) for GLM, Qwen, DeepSeek, and other open-weight models at a fraction of frontier-model pricing.
14
- > - **Free + local** — [Ollama](https://ollama.com/) or [vLLM](https://github.com/vllm-project/vllm) on your own GPU, fully offline.
15
- >
16
- > Provider, model, base URL, API key, streaming behavior, and LLM options are all selected from a console menu **before** Claude Code starts. Claude Code itself runs untouched with all of its native tooling, slash commands, and workflows.
17
-
18
- ### Demo
19
-
20
- ![NVIDIA hosted NIM driving Claude Code (deepseek-4-flash)](docs/assets/claude-any-nvidia-nim.gif)
21
-
22
- NVIDIA hosted NIM (deepseek-4-flash) driving Claude Code through the claude-any router.  [full mp4 ⤓](https://github.com/OneCielAI/claude-any/raw/main/demo/claude-any-nvidia-nim.mp4)
23
-
24
- ![Ollama Cloud streamed through the claude-any router (glm-5.1)](docs/assets/claude-any-ollama-cloud.gif)
25
-
26
- Ollama Cloud (glm-5.1) streamed through the claude-any router with SSE word-boundary chunking enabled.  [full mp4 ⤓](https://github.com/OneCielAI/claude-any/raw/main/demo/claude-any-ollama-cloud.mp4)
27
-
28
- ---
29
-
30
- Claude Any is a provider selector and compatibility launcher for Claude Code.
31
- It lets you choose Anthropic, Ollama, Ollama Cloud, vLLM, NVIDIA hosted models,
32
- or self-hosted NIM before Claude Code starts, then passes normal Claude Code
33
- arguments through unchanged.
34
-
35
- Credits: One Ciel LLC
36
-
37
- Current version: `0.1.27`
38
-
39
- ## Why This Exists
40
-
41
- Claude Any started from a practical need: even on the highest Claude Code plan,
42
- long sessions can run out of available tokens or become blocked while waiting
43
- for the next quota window. The goal is not to replace Claude Code, but to keep
44
- work moving. Slower but usable providers such as NVIDIA NIM, Ollama Cloud,
45
- vLLM, and local Ollama can act as hybrid third-party agents for summaries,
46
- research, journaling, simple coding tasks, and delegated background work.
47
-
48
- Another design goal is to keep as much of Claude Code's native experience as
49
- possible. When a provider exposes an Anthropic-compatible endpoint, Claude Any
50
- prefers that path so Claude Code tooling, permissions, model selection, and
51
- workflow behavior remain close to the original. For capabilities that remote
52
- providers cannot supply directly, such as web search, Claude Any adds separate
53
- MCP-based tooling.
54
-
55
- The pre-launch menu is console-first. Provider, model, base URL, API key, and
56
- options are meant to be easy to review and change before Claude Code starts,
57
- including over SSH.
58
-
59
- macOS has not been fully tested by the maintainer yet, but Claude Any uses
60
- portable Python and shell wrappers. If you hit a macOS issue, please report it.
61
-
62
- - D. Yun
63
-
64
- ## Install
65
-
66
- [![npm version](https://img.shields.io/npm/v/@oneciel-ai/claude-any.svg)](https://www.npmjs.com/package/@oneciel-ai/claude-any)
67
- [![npm downloads](https://img.shields.io/npm/dm/@oneciel-ai/claude-any.svg)](https://www.npmjs.com/package/@oneciel-ai/claude-any)
68
-
69
- Requirements:
70
-
71
- - Python 3.10+
72
- - Claude Code installed and available as `claude`
73
- - Node/npm (used for the install shim and optional MCP web tooling)
74
-
75
- **Install from the npm registry (recommended):**
76
-
77
- ```sh
78
- npm install -g @oneciel-ai/claude-any
79
- claude-any
80
- ```
81
-
82
- **Upgrade:**
83
-
84
- ```sh
85
- npm update -g @oneciel-ai/claude-any
86
- claude-any version
87
- ```
88
-
89
- **Uninstall:**
90
-
91
- ```sh
92
- npm uninstall -g @oneciel-ai/claude-any
93
- ```
94
-
95
- ### Alternative install paths
96
-
97
- Install directly from the GitHub repository (useful for testing unreleased
98
- commits between npm publishes):
99
-
100
- ```sh
101
- npm install -g https://github.com/OneCielAI/claude-any.git
102
- claude-any
103
- ```
104
-
105
- POSIX source install:
106
-
107
- ```sh
108
- git clone https://github.com/OneCielAI/claude-any.git
109
- cd claude-any
110
- ./install.sh
111
- claude-any
112
- ```
113
-
114
- Windows PowerShell source install:
115
-
116
- ```powershell
117
- git clone https://github.com/OneCielAI/claude-any.git
118
- cd claude-any
119
- .\install.ps1
120
- claude-any
121
- ```
122
-
123
- ### Releasing (maintainers)
124
-
125
- The npm registry version is published automatically by the
126
- [`Publish to npm`](.github/workflows/npm-publish.yml) workflow when a new
127
- GitHub Release is published. The workflow needs an `NPM_TOKEN` repository
128
- secret containing a granular access token for `@oneciel-ai/claude-any` with
129
- *Bypass 2FA for publishing* enabled.
130
-
131
- Release flow:
132
-
133
- 1. Bump `version` in `package.json` and `VERSION` in `claude_any.py`.
134
- 2. Add a Changelog entry.
135
- 3. `git tag -a vX.Y.Z -m "..." && git push origin vX.Y.Z`.
136
- 4. `gh release create vX.Y.Z --title "..." --notes "..."` — this triggers
137
- the publish workflow.
138
-
139
-
140
- ![Claude Any menu](docs/assets/claude-any-main.en.png)
141
-
142
- ## Demo
143
-
144
- ![Claude Any demo](docs/assets/claude-any-demo.en.gif)
145
-
146
- The demo sequence now shows provider selection, Base URL entry, model selection,
147
- LLM options, and the compatibility test. The compatibility test checks a plain
148
- text response, a required `tool_use`, and a `tool_result` follow-up before the
149
- launcher recommends starting Claude Code.
150
-
151
- Additional current screenshots:
152
-
153
- | Provider | Base URL | Model | LLM options | Compatibility |
154
- | --- | --- | --- | --- | --- |
155
- | ![Provider](docs/assets/claude-any-provider.en.png) | ![Base URL](docs/assets/claude-any-base-url.en.png) | ![Model](docs/assets/claude-any-model.en.png) | ![Options](docs/assets/claude-any-options.en.png) | ![Test](docs/assets/claude-any-test.en.png) |
156
-
157
- See the [full manual](docs/manual.md) for provider setup, headless flags, and
158
- troubleshooting. A downloadable demo video is available at
159
- [docs/assets/claude-any-demo.mp4](docs/assets/claude-any-demo.mp4).
160
-
161
- ## Development Story
162
-
163
- Claude Any was built through real integration tests: provider switching, model
164
- discovery, API-key entry, compatibility tests, web-search tooling, timeout
165
- handling, and native Claude Code behavior. The main lesson was that
166
- Anthropic-compatible Messages endpoints are the cleanest integration path when a
167
- provider supports them. Ollama, vLLM, and NIM can expose Anthropic-compatible
168
- routes that preserve more of Claude Code's tooling model than a generic
169
- OpenAI-compatible chat route.
170
-
171
- Local inference was also tested with Qwen 3.6 27B Q4 through Ollama and vLLM on
172
- RTX 5090 and MSI GB10-class hardware. It worked, but the speed should not be
173
- judged against native Claude Code or Codex. In practice, some hosted/cloud
174
- choices such as NVIDIA NIM and Ollama Cloud felt more useful for this hybrid
175
- workflow than expected.
176
-
177
- OpenAI-compatible endpoints were deliberately kept out of the primary path for
178
- Claude Code use. In testing, tool-call translation through generic OpenAI chat
179
- compatibility was more brittle around tool parameters, tool results, repeated
180
- calls, retries, and model selection.
181
-
182
- The most recent vLLM finding is that server-side tool-call parsing must match
183
- the model family. For Claude Code, a vLLM server can be reachable and still fail
184
- if `--tool-call-parser` is wrong. In particular, Qwen3-Coder should be served
185
- with `--enable-auto-tool-choice --tool-call-parser qwen3_xml`; Hermes is for
186
- Hermes-style models and some older Qwen tool templates. Claude Any now surfaces
187
- this in the compatibility test instead of treating a simple text response as
188
- enough.
189
-
190
- ## Recommended Uses
191
-
192
- Claude Any is most useful where speed is less important than keeping background
193
- work moving. Good fits include Docker host maintenance, Windows or Linux system
194
- administration, cleanup scripts for unused files, periodic security checklists,
195
- log review, Windows Event Log review, intrusion-attempt triage, and report
196
- drafting.
197
-
198
- It is not a replacement for dedicated security products, but it can help
199
- administrators turn routine checks into repeatable scripts and readable reports.
200
- It is useful for summarizing possible virus, ransomware, brute-force, or
201
- remote-access intrusion attempts. In that sense, Claude Any can help you build a
202
- free or low-cost system security watcher for routine checks, alerts, and
203
- human-readable summaries.
204
-
205
- For example, it can help turn requests such as "install PostgreSQL in a Docker
206
- container" or "analyze today's Docker logs and email me a report" into concrete
207
- commands, scripts, scheduled jobs, and summaries.
208
-
209
- A practical pattern is tiered supervision: use smaller or cheaper models to
210
- watch logs and detect possible issues, use a larger model to review findings,
211
- write policy, and plan the response, then let smaller models execute routine
212
- steps under that larger model's supervision.
213
-
214
- ## Features
215
-
216
- - Pre-launch provider picker with English, Korean, Japanese, and Chinese UI.
217
- - Provider-aware model list and custom model entry.
218
- - API key entry outside the Claude Code chat input.
219
- - LLM option presets for context window, output tokens, timeout, sampling, and
220
- native compatibility.
221
- - Compatibility test before launch, including text response, tool use, and
222
- tool-result round trip checks.
223
- - Runtime context reporting for vLLM/NIM when `/v1/models` exposes
224
- `max_model_len`.
225
- - Console-first pre-launch menu for SSH and terminal workflows.
226
- - Native paths where providers expose Claude/Anthropic-compatible endpoints.
227
- - Router mode for providers that need request/response adaptation.
228
- - DuckDuckGo and fetch MCP wiring for non-native providers.
229
- - Headless setup flags such as `--ca-provider`, `--ca-model`, `--ca-base-url`,
230
- `--ca-api-key-env`, `--ca-ollama-option`, and `--ca-max-output-tokens`.
231
- - Streaming proxy for Ollama/Ollama Cloud router path — tokens are delivered
232
- to Claude Code as they arrive instead of waiting for the full response.
233
- - Per-provider `stream` on/off toggle and `stream_word_chunking` option to
234
- batch text deltas at word boundaries, mitigating SSE fragmentation that can
235
- break tool-call / JSON parsing in long streamed responses.
236
- - LLM options menu shows the meaning of the highlighted row at the bottom of
237
- the panel in the selected language (English, Korean, Japanese, Chinese), and
238
- boolean rows (`Stream`, `Stream word chunking`, `Native compatibility`,
239
- `Think`) toggle in place when you press Enter no input prompt.
240
- - Tool guard hook coverage extended to the full Claude Code hook surface,
241
- including `WorktreeCreate` / `WorktreeRemove`, so non-git working directories
242
- no longer fail Agent isolation with
243
- `Cannot create agent worktree: not in a git repository...`.
244
- - Config file caching — settings are read from disk once and reused until the
245
- file changes, reducing per-request overhead in the router.
246
- - Router control-plane endpoints for headless agent coordination:
247
- `/ca/chat/messages`, `/ca/chat/wait`, `/ca/chat/stream`, `/ca/chat/files`,
248
- and `/ca/plan/artifacts`.
249
-
1
+ # Claude Any
2
+
3
+ ![Claude Any: full Claude Code experience with free or low-cost LLMs](claude-any-adv.png)
4
+
5
+ | English | [한국어](docs/README.ko.md) | [日本語](docs/README.ja.md) | [中文](docs/README.zh.md) |
6
+ | --- | --- | --- | --- |
7
+
8
+ [![npm version](https://img.shields.io/npm/v/@oneciel-ai/claude-any?logo=npm&label=npm)](https://www.npmjs.com/package/@oneciel-ai/claude-any)
9
+ [![npm downloads](https://img.shields.io/npm/dm/@oneciel-ai/claude-any?logo=npm&label=downloads)](https://www.npmjs.com/package/@oneciel-ai/claude-any)
10
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
11
+
12
+ > ## 🚀 Use the full Claude Code experience with free or low-cost LLMs
13
+ >
14
+ > - **Free** — [NVIDIA hosted NIM](https://build.nvidia.com/) (qwen3-coder-480b, gpt-oss, and friends) through the API Catalog.
15
+ > - **Low-cost** — [Ollama Cloud](https://ollama.com/cloud) for GLM, Qwen, DeepSeek, and other open-weight models at a fraction of frontier-model pricing.
16
+ > - **Free + local** [Ollama](https://ollama.com/) or [vLLM](https://github.com/vllm-project/vllm) on your own GPU, fully offline.
17
+ > - **Plan Mode + Advisor ready** — Claude Any preserves Claude Code Plan Mode on non-Anthropic providers and adds an optional long-context Advisor model for review.
18
+ > - **Smooth free-model pacing** — Claude Code spends time reading files and running tools, and Claude Any uses that natural gap for RPM pacing so NVIDIA hosted free models feel usable even with strict per-minute limits.
19
+ >
20
+ > Provider, model, base URL, API key, streaming behavior, and LLM options are all selected from a console menu **before** Claude Code starts. Claude Code itself runs untouched with all of its native tooling, slash commands, and workflows.
21
+
22
+ ## Today's Top 3 Benefits
23
+
24
+ 1. **Plan Mode works on non-Anthropic models** — Claude Any keeps Claude Code's Plan Mode usable even when the upstream provider is NVIDIA hosted, Ollama Cloud, local Ollama, vLLM, or NIM.
25
+ 2. **Advisor review with a bigger model** — pick a long-context Advisor Model at launch, then use `/advisor` inside Claude Code to review the current task, blockers, and next concrete action.
26
+ 3. **Free-model RPM limits feel smoother** — router-side RPM pacing uses the natural time spent reading files and running tools, so NVIDIA hosted free models can stay within per-minute limits with less visible waiting.
27
+
28
+ ### Demo
29
+
30
+ ![NVIDIA hosted NIM driving Claude Code (deepseek-4-flash)](docs/assets/claude-any-nvidia-nim.gif)
31
+
32
+ NVIDIA hosted NIM (deepseek-4-flash) driving Claude Code through the claude-any router.  [full mp4 ⤓](https://github.com/OneCielAI/claude-any/raw/main/demo/claude-any-nvidia-nim.mp4)
33
+
34
+ ![Ollama Cloud streamed through the claude-any router (glm-5.1)](docs/assets/claude-any-ollama-cloud.gif)
35
+
36
+ Ollama Cloud (glm-5.1) streamed through the claude-any router with SSE word-boundary chunking enabled.  [full mp4 ⤓](https://github.com/OneCielAI/claude-any/raw/main/demo/claude-any-ollama-cloud.mp4)
37
+
38
+ ---
39
+
40
+ Claude Any is a provider selector and compatibility launcher for Claude Code.
41
+ It lets you choose Anthropic, Ollama, Ollama Cloud, vLLM, NVIDIA hosted models,
42
+ or self-hosted NIM before Claude Code starts, then passes normal Claude Code
43
+ arguments through unchanged.
44
+
45
+ Credits: One Ciel LLC
46
+
47
+ Current version: `0.1.29`
48
+
49
+ ## Why This Exists
50
+
51
+ Claude Any started from a practical need: even on the highest Claude Code plan,
52
+ long sessions can run out of available tokens or become blocked while waiting
53
+ for the next quota window. The goal is not to replace Claude Code, but to keep
54
+ work moving. Slower but usable providers such as NVIDIA NIM, Ollama Cloud,
55
+ vLLM, and local Ollama can act as hybrid third-party agents for summaries,
56
+ research, journaling, simple coding tasks, and delegated background work.
57
+
58
+ Another design goal is to keep as much of Claude Code's native experience as
59
+ possible. When a provider exposes an Anthropic-compatible endpoint, Claude Any
60
+ prefers that path so Claude Code tooling, permissions, model selection, and
61
+ workflow behavior remain close to the original. For capabilities that remote
62
+ providers cannot supply directly, such as web search, Claude Any adds separate
63
+ MCP-based tooling.
64
+
65
+ The pre-launch menu is console-first. Provider, model, base URL, API key, and
66
+ options are meant to be easy to review and change before Claude Code starts,
67
+ including over SSH.
68
+
69
+ macOS has not been fully tested by the maintainer yet, but Claude Any uses
70
+ portable Python and shell wrappers. If you hit a macOS issue, please report it.
71
+
72
+ - D. Yun
73
+
74
+ ## Install
75
+
76
+ [![npm version](https://img.shields.io/npm/v/@oneciel-ai/claude-any.svg)](https://www.npmjs.com/package/@oneciel-ai/claude-any)
77
+ [![npm downloads](https://img.shields.io/npm/dm/@oneciel-ai/claude-any.svg)](https://www.npmjs.com/package/@oneciel-ai/claude-any)
78
+
79
+ Requirements:
80
+
81
+ - Python 3.10+
82
+ - Claude Code installed and available as `claude`
83
+ - Node/npm (used for the install shim and optional MCP web tooling)
84
+
85
+ **Install from the npm registry (recommended):**
86
+
87
+ ```sh
88
+ npm install -g @oneciel-ai/claude-any
89
+ ```
90
+
91
+ ```sh
92
+ claude-any
93
+ ```
94
+
95
+ **Upgrade:**
96
+
97
+ ```sh
98
+ npm update -g @oneciel-ai/claude-any
99
+ ```
100
+
101
+ ```sh
102
+ claude-any version
103
+ ```
104
+
105
+ **Uninstall:**
106
+
107
+ ```sh
108
+ npm uninstall -g @oneciel-ai/claude-any
109
+ ```
110
+
111
+ ### Alternative install paths
112
+
113
+ Install directly from the GitHub repository (useful for testing unreleased
114
+ commits between npm publishes):
115
+
116
+ ```sh
117
+ npm install -g https://github.com/OneCielAI/claude-any.git
118
+ ```
119
+
120
+ ```sh
121
+ claude-any
122
+ ```
123
+
124
+ POSIX source install:
125
+
126
+ ```sh
127
+ git clone https://github.com/OneCielAI/claude-any.git
128
+ ```
129
+
130
+ ```sh
131
+ cd claude-any
132
+ ```
133
+
134
+ ```sh
135
+ ./install.sh
136
+ ```
137
+
138
+ ```sh
139
+ claude-any
140
+ ```
141
+
142
+ Windows PowerShell source install:
143
+
144
+ ```powershell
145
+ git clone https://github.com/OneCielAI/claude-any.git
146
+ ```
147
+
148
+ ```powershell
149
+ cd claude-any
150
+ ```
151
+
152
+ ```powershell
153
+ .\install.ps1
154
+ ```
155
+
156
+ ```powershell
157
+ claude-any
158
+ ```
159
+
160
+ ### Releasing (maintainers)
161
+
162
+ The npm registry version is published automatically by the
163
+ [`Publish to npm`](.github/workflows/npm-publish.yml) workflow when a new
164
+ GitHub Release is published. The workflow needs an `NPM_TOKEN` repository
165
+ secret containing a granular access token for `@oneciel-ai/claude-any` with
166
+ *Bypass 2FA for publishing* enabled.
167
+
168
+ Release flow:
169
+
170
+ 1. Bump `version` in `package.json` and `VERSION` in `claude_any.py`.
171
+ 2. Add a Changelog entry.
172
+ 3. `git tag -a vX.Y.Z -m "..." && git push origin vX.Y.Z`.
173
+ 4. `gh release create vX.Y.Z --title "..." --notes "..."` this triggers
174
+ the publish workflow.
175
+
176
+
177
+ ![Claude Any menu](docs/assets/claude-any-main.en.png)
178
+
179
+ ## Demo
180
+
181
+ ![Claude Any demo](docs/assets/claude-any-demo.en.gif)
182
+
183
+ The demo sequence now shows provider selection, Base URL entry, model selection,
184
+ LLM options, and the compatibility test. The compatibility test checks a plain
185
+ text response, a required `tool_use`, and a `tool_result` follow-up before the
186
+ launcher recommends starting Claude Code.
187
+
188
+ Additional current screenshots:
189
+
190
+ | Provider | Base URL | Model | LLM options | Compatibility |
191
+ | --- | --- | --- | --- | --- |
192
+ | ![Provider](docs/assets/claude-any-provider.en.png) | ![Base URL](docs/assets/claude-any-base-url.en.png) | ![Model](docs/assets/claude-any-model.en.png) | ![Options](docs/assets/claude-any-options.en.png) | ![Test](docs/assets/claude-any-test.en.png) |
193
+
194
+ See the [full manual](docs/manual.md) for provider setup, headless flags, and
195
+ troubleshooting. A downloadable demo video is available at
196
+ [docs/assets/claude-any-demo.mp4](docs/assets/claude-any-demo.mp4).
197
+
198
+ ## Development Story
199
+
200
+ Claude Any was built through real integration tests: provider switching, model
201
+ discovery, API-key entry, compatibility tests, web-search tooling, timeout
202
+ handling, and native Claude Code behavior. The main lesson was that
203
+ Anthropic-compatible Messages endpoints are the cleanest integration path when a
204
+ provider supports them. Ollama, vLLM, and NIM can expose Anthropic-compatible
205
+ routes that preserve more of Claude Code's tooling model than a generic
206
+ OpenAI-compatible chat route.
207
+
208
+ Local inference was also tested with Qwen 3.6 27B Q4 through Ollama and vLLM on
209
+ RTX 5090 and MSI GB10-class hardware. It worked, but the speed should not be
210
+ judged against native Claude Code or Codex. In practice, some hosted/cloud
211
+ choices such as NVIDIA NIM and Ollama Cloud felt more useful for this hybrid
212
+ workflow than expected.
213
+
214
+ OpenAI-compatible endpoints were deliberately kept out of the primary path for
215
+ Claude Code use. In testing, tool-call translation through generic OpenAI chat
216
+ compatibility was more brittle around tool parameters, tool results, repeated
217
+ calls, retries, and model selection.
218
+
219
+ The most recent vLLM finding is that server-side tool-call parsing must match
220
+ the model family. For Claude Code, a vLLM server can be reachable and still fail
221
+ if `--tool-call-parser` is wrong. In particular, Qwen3-Coder should be served
222
+ with `--enable-auto-tool-choice --tool-call-parser qwen3_xml`; Hermes is for
223
+ Hermes-style models and some older Qwen tool templates. Claude Any now surfaces
224
+ this in the compatibility test instead of treating a simple text response as
225
+ enough.
226
+
227
+ ## Recommended Uses
228
+
229
+ Claude Any is most useful where speed is less important than keeping background
230
+ work moving. Good fits include Docker host maintenance, Windows or Linux system
231
+ administration, cleanup scripts for unused files, periodic security checklists,
232
+ log review, Windows Event Log review, intrusion-attempt triage, and report
233
+ drafting.
234
+
235
+ It is not a replacement for dedicated security products, but it can help
236
+ administrators turn routine checks into repeatable scripts and readable reports.
237
+ It is useful for summarizing possible virus, ransomware, brute-force, or
238
+ remote-access intrusion attempts. In that sense, Claude Any can help you build a
239
+ free or low-cost system security watcher for routine checks, alerts, and
240
+ human-readable summaries.
241
+
242
+ For example, it can help turn requests such as "install PostgreSQL in a Docker
243
+ container" or "analyze today's Docker logs and email me a report" into concrete
244
+ commands, scripts, scheduled jobs, and summaries.
245
+
246
+ A practical pattern is tiered supervision: use smaller or cheaper models to
247
+ watch logs and detect possible issues, use a larger model to review findings,
248
+ write policy, and plan the response, then let smaller models execute routine
249
+ steps under that larger model's supervision.
250
+
251
+ ## Features
252
+
253
+ - Pre-launch provider picker with English, Korean, Japanese, and Chinese UI.
254
+ - Provider-aware model list and custom model entry.
255
+ - API key entry outside the Claude Code chat input.
256
+ - LLM option presets for context window, output tokens, timeout, sampling, and
257
+ native compatibility.
258
+ - Compatibility test before launch, including text response, tool use, and
259
+ tool-result round trip checks.
260
+ - Runtime context reporting for vLLM/NIM when `/v1/models` exposes
261
+ `max_model_len`.
262
+ - Console-first pre-launch menu for SSH and terminal workflows.
263
+ - Native paths where providers expose Claude/Anthropic-compatible endpoints.
264
+ - Router mode for providers that need request/response adaptation.
265
+ - DuckDuckGo and fetch MCP wiring for non-native providers.
266
+ - Headless setup flags such as `--ca-provider`, `--ca-model`, `--ca-base-url`,
267
+ `--ca-api-key-env`, `--ca-ollama-option`, and `--ca-max-output-tokens`.
268
+ - Claude Code Plan Mode support on router-backed non-Anthropic providers,
269
+ including local handling for `EnterPlanMode` and plan artifacts.
270
+ - Optional `/advisor` slash command that routes the current task state to a
271
+ selected Advisor Model, useful for long-context review and next-step checks.
272
+ - Claude Code `statusLine` integration showing router RPM usage and wait time
273
+ in the bottom status area instead of polluting the chat transcript.
274
+ - Router-side RPM control for NVIDIA hosted, self-hosted NIM, Ollama, and
275
+ Ollama Cloud. `rate_limit_rpm=0` disables throttling while still showing the
276
+ last-60-seconds usage rate.
277
+ - Soft pacing subtracts time already spent reading files, running commands, and
278
+ waiting for tool results. In real coding sessions, those tool-call gaps absorb
279
+ much of the RPM spacing naturally, so providers such as NVIDIA hosted NIM can
280
+ stay within free-model limits without making every Claude Code turn feel
281
+ rate-limited.
282
+ - Streaming proxy for Ollama/Ollama Cloud router path — tokens are delivered
283
+ to Claude Code as they arrive instead of waiting for the full response.
284
+ - Per-provider `stream` on/off toggle and `stream_word_chunking` option to
285
+ batch text deltas at word boundaries, mitigating SSE fragmentation that can
286
+ break tool-call / JSON parsing in long streamed responses.
287
+ - LLM options menu shows the meaning of the highlighted row at the bottom of
288
+ the panel in the selected language (English, Korean, Japanese, Chinese), and
289
+ boolean rows (`Stream`, `Stream word chunking`, `Native compatibility`,
290
+ `Think`) toggle in place when you press Enter — no input prompt.
291
+ - Tool guard hook coverage extended to the full Claude Code hook surface,
292
+ including `WorktreeCreate` / `WorktreeRemove`, so non-git working directories
293
+ no longer fail Agent isolation with
294
+ `Cannot create agent worktree: not in a git repository...`.
295
+ - Config file caching — settings are read from disk once and reused until the
296
+ file changes, reducing per-request overhead in the router.
297
+ - Router control-plane endpoints for headless agent coordination:
298
+ `/ca/chat/messages`, `/ca/chat/wait`, `/ca/chat/stream`, `/ca/chat/files`,
299
+ and `/ca/plan/artifacts`.
300
+
250
301
  ## Changelog
251
302
 
252
- ### 0.1.27
253
-
254
- - **Plan mode support for non-Anthropic providers**: the router now keeps
255
- `EnterPlanMode` available and supports Claude Code Plan mode even when the
256
- upstream model does not reliably choose that internal tool. Forced
257
- `tool_choice=EnterPlanMode` is answered locally with a valid Anthropic
258
- `tool_use`, and long implementation requests that receive only a short or
259
- empty non-actionable text response are promoted to `EnterPlanMode` using
260
- language-agnostic structure checks.
261
- - **Plan-mode self-tool handling**: unsupported Claude Code self-tools are
262
- still stripped for non-Anthropic providers, but Plan-mode tools are handled
263
- separately so planning can work instead of being disabled.
264
-
265
- ### 0.1.25
266
-
267
- - **Plan-mode diagnostics**: set `~/.config/claude-any/log-level` to `TRACE`
268
- to capture redacted request and response summaries in `requests.jsonl` /
269
- `responses.jsonl`.
270
- - **Headless agent chat service**: the router exposes a small HTTP control
271
- plane for sub coding agents. Agents can post messages, poll updates after
272
- the last seen message id, or wait on an SSE stream when they do not have
273
- their own loop.
274
- - **Plan artifact serving**: agents can create plan files through the router
275
- and share stable local URLs, matching Claude Code's file/artifact-oriented
276
- Plan-mode workflow without copying Anthropic's internal implementation.
277
-
278
- ### 0.1.24
279
-
280
- - **First public npm release** under the correct scope: `@oneciel-ai/claude-any`. Earlier 0.1.x versions were never published to the registry; this is the version that is actually installable via `npm install -g @oneciel-ai/claude-any`.
281
-
282
- ### 0.1.23
283
-
284
- - **Stream toggle**: each non-Anthropic provider now has a `stream_enabled`
285
- knob in the LLM options menu, in `claude-anyctl ollama-options` /
286
- `provider-options`, and in headless flags. When off, the router forces
287
- `stream:false` upstream and returns the full response to Claude Code — a
288
- workaround when streaming fragmentation breaks tool-call or JSON parsing.
289
- - **Word-boundary streaming**: new `stream_word_chunking` option buffers SSE
290
- text deltas to whitespace/word boundaries before flushing. Implemented for
291
- both the Ollama router path and the native pass-through path (vLLM, NVIDIA
292
- hosted, self-hosted NIM). Tool deltas and non-text SSE events pass through
293
- unchanged.
294
- - **All-hooks tool guard**: `install_tool_guard_hooks` now registers the full
295
- set of Claude Code hook events (PreToolUse, PostToolUse, PostToolUseFailure,
296
- PostToolBatch, PermissionRequest, PermissionDenied, SessionStart/End, Setup,
297
- UserPromptSubmit/Expansion, Stop, StopFailure, InstructionsLoaded,
298
- ConfigChange, CwdChanged, Notification, SubagentStart/Stop, TeammateIdle,
299
- TaskCreated, TaskCompleted, PreCompact, PostCompact, WorktreeCreate,
300
- WorktreeRemove, Elicitation, ElicitationResult). The WorktreeCreate handler
301
- emits `worktreePath = base_path` so Agent isolation works in non-git
302
- directories.
303
- - **Windows hook compatibility**: `shell_command_string` now emits forward
304
- slashes and POSIX quoting on Windows so Claude Code's sh-based hook runner
305
- doesn't strip backslashes from paths like `C:\Users\...`.
306
- - **LLM options UX**: per-row description footer rendered in the user's
307
- selected language, and boolean toggles (`Stream`, `Stream word chunking`,
308
- `Native compatibility`, `Think`) flip on Enter without a prompt.
309
-
310
- ### 0.1.22
311
-
312
- - **Headless manual expansion**: expand the manual with practical headless setup, launch, testing, passthrough, and cleanup examples for automation and remote-server use.
313
-
314
- ### 0.1.21
315
-
316
- - **Service lifecycle documentation**: clarify that Claude Any starts only the router/proxy services required for the selected provider at launch time, and `claude-any stop` is the explicit cleanup command.
317
-
318
- ### 0.1.20
319
-
320
- - **NVIDIA hosted quick test**: `auto` mode now uses a text-only quick test for NVIDIA hosted providers, avoiding slow or flaky tool_use requests during menu checks. Use `smoke` for text + tool_use, or `full` for the full text/tool_use/tool_result round trip.
321
- - **Menu test timeout**: the terminal menu now runs `claude-any test 60 auto`, which keeps the pre-launch test responsive for hosted models.
322
-
323
- ### 0.1.19
324
-
325
- - **Faster compatibility tests**: `claude-any test` now supports `auto`, `smoke`, and `full` modes.
326
- - **Menu default speedup**: the terminal menu runs `claude-any test 120 auto`, so NVIDIA hosted compatibility checks finish faster while full validation remains available with `claude-any test 180 full`.
327
-
328
- ### 0.1.18
329
-
330
- - **NVIDIA hosted transient diagnostics**: compatibility tests now identify `RemoteDisconnected`, connection resets, and 502/503/504 responses from NVIDIA hosted backends as transient upstream/API Catalog failures.
331
- - **NVIDIA proxy cleanup**: `claude-any stop` now also matches `nvd-claude-proxy` executable processes so stale proxy sessions are cleaned up reliably.
332
-
333
- ### 0.1.17
334
-
335
- - **Menu compatibility-test timeout**: the terminal menu now runs compatibility tests with an explicit 180 s timeout and stops the child process if it exceeds the menu hard limit, so slow hosted models cannot leave the menu appearing indefinitely pending.
336
-
337
- ### 0.1.16
338
-
339
- - **NVIDIA hosted proxy startup fix**: detect and launch an installed `nvd-claude-proxy`/`ncp` executable before falling back to `python -m nvd_claude_proxy.main`. This supports uv-tool installs where the proxy is available as a command but not importable from Claude Any's Python interpreter.
340
-
341
- ### 0.1.15
342
-
343
- - **Ollama/Ollama Cloud tool-call streaming fix**: emit streamed tool calls using sequential Anthropic SSE content block indexes and `input_json_delta` payloads. This prevents Claude Code from rejecting malformed streamed tool-use blocks with `Invalid tool parameters`.
344
- - **Tool guard auto-install**: non-Anthropic launches now merge the Claude Any tool guard into `~/.claude/settings.json` so generated tool inputs can be normalized before execution.
345
- - **Tool-call diagnostics**: router-side tool calls are logged to `~/.config/claude-any/tool-calls.jsonl`, and Claude Code hook inputs are logged to `~/.claude/claude-any-tool-guard/tool-events.jsonl` for precise debugging.
346
- - **Tool input normalization**: the guard now maps common aliases such as `path` to `file_path`, `cmd` to `command`, and `query` to `pattern`, and returns explicit guidance when required fields are missing.
347
-
348
- ### 0.1.14
349
-
350
- - **SSH/terminal arrow-key compatibility**: rewrote `read_menu_key()` with a proper ANSI escape sequence parser and moved raw terminal setup into `portable_select()` so the terminal stays in raw mode for the entire menu loop. This prevents escape sequences from leaking to the screen when `ECHO` is restored between keystrokes. Arrow keys, Home, and End now work reliably in SSH sessions.
351
- - **Test timeout**: default compatibility test timeout increased from 60 s to 120 s for slower cloud providers.
352
- - **Ollama Cloud compatibility test fix**: added `"stream": false` to compatibility test requests so the router fetches a single JSON response from Ollama Cloud instead of SSE streaming, which was causing `post_json` to timeout while collecting all chunks.
353
-
354
- ### 0.1.13
355
-
356
- - **Ollama streaming proxy**: The router now streams Ollama and Ollama Cloud
357
- responses through to Claude Code in real time using Anthropic SSE format,
358
- instead of buffering the entire response before delivery.
359
- - **Config caching**: `load_config()` now caches the configuration file in
360
- memory and only re-reads from disk when the file modification time changes.
361
- This eliminates repeated file I/O and JSON parsing on every router request.
362
- - **Token estimation caching**: `estimate_tokens()` now accepts an optional
363
- cache dict to avoid redundant `json.dumps()` calls within a single request.
364
- `ollama_chat_request` and `cap_output_tokens_for_context` share the same
365
- cache when computing context window sizing.
366
-
367
- ### 0.1.12
368
-
369
- - Refresh docs and demo assets.
370
-
371
- ### 0.1.11
372
-
373
- - Validate tool call compatibility.
374
-
375
- ### 0.1.10
376
-
377
- - Show runtime context in tests.
378
-
379
- ### 0.1.9
380
-
381
- - Cap presets to server context.
382
-
383
- ### 0.1.8
384
-
385
- - Localize LLM presets.
386
-
387
- ## Provider Notes
388
-
389
- | Provider | Mode | Notes |
390
- | --- | --- | --- |
391
- | Anthropic | Native Claude Code | Uses Claude login or Anthropic API key. |
392
- | Ollama | Native when available, router otherwise | Local Ollama normally needs no API key. Cloud models through local Ollama require `ollama signin` on the Ollama host. |
393
- | Ollama Cloud | Router | Calls `https://ollama.com/api`; requires an Ollama API key. |
394
- | vLLM | Native Anthropic-compatible endpoint | Use a vLLM endpoint that exposes Anthropic-compatible `/v1/messages`; match `--tool-call-parser` to the model family. |
395
- | NVIDIA hosted | Router/proxy | Uses NVIDIA hosted API through the compatibility path. |
396
- | self-hosted NIM | Native Anthropic-compatible endpoint | Use the self-hosted NIM Anthropic-compatible endpoint. |
397
-
398
- ## Service Lifecycle
399
-
400
- Claude Any does not keep every possible backend helper running all the time. The
401
- normal lifecycle is:
402
-
403
- - Before launch, managed router/proxy processes can be stopped with
404
- `claude-any stop`.
405
- - When `claude-any` starts Claude Code, it starts only the services required by
406
- the selected provider.
407
- - Ollama and Ollama Cloud router mode use the Claude Any router on
408
- `127.0.0.1:8799`.
409
- - NVIDIA hosted router mode uses the Claude Any router on `127.0.0.1:8799` and
410
- starts `nvd-claude-proxy` on `127.0.0.1:8788` only when that provider needs it.
411
- - Switching away from NVIDIA hosted does not require keeping the NVIDIA proxy
412
- alive; stale sessions should be cleaned with `claude-any stop` before a fresh
413
- test or launch.
414
-
415
- This keeps Claude Code pointed at one stable Claude Any entry point while still
416
- letting provider-specific helpers start on demand.
417
-
418
- For Qwen3-Coder on vLLM, start the server with a matching tool parser:
419
-
420
- ```sh
421
- vllm serve Qwen/Qwen3-Coder-30B-A3B-Instruct \
422
- --host 0.0.0.0 \
423
- --port 8000 \
424
- --served-model-name qwen3-coder-30b \
425
- --max-model-len 65536 \
426
- --enable-auto-tool-choice \
427
- --tool-call-parser qwen3_xml
428
- ```
429
-
430
- ## Provider Links
431
-
432
- - Ollama Cloud: [cloud overview](https://ollama.com/cloud), [API key settings](https://ollama.com/settings/keys), [authentication docs](https://docs.ollama.com/api/authentication).
433
- - Ollama local Anthropic compatibility: [Ollama Anthropic API docs](https://docs.ollama.com/api/anthropic-compatibility).
434
- - vLLM: [Claude Code integration](https://docs.vllm.ai/en/latest/serving/integrations/claude_code/), [tool calling](https://docs.vllm.ai/en/stable/features/tool_calling/), [project GitHub](https://github.com/vllm-project/vllm).
435
- - NVIDIA hosted NIM: [NVIDIA API Catalog](https://build.nvidia.com/), [API Catalog quickstart](https://docs.api.nvidia.com/nim/docs/api-quickstart).
436
- - Self-hosted NVIDIA NIM: [Claude Code with NIM](https://docs.nvidia.com/nim/large-language-models/latest/ai-assistant-integrations/claude-code.html), [NIM for LLMs getting started](https://docs.nvidia.com/nim/large-language-models/1.14.0/getting-started.html), [NGC personal keys](https://org.ngc.nvidia.com/setup/personal-keys).
437
-
438
- ## Headless Examples
439
-
440
- Headless commands skip the pre-launch menu and launch Claude Code immediately.
441
- Claude Any consumes `--ca-*` setup flags, starts the required router/proxy
442
- services, then passes the remaining arguments to Claude Code.
443
-
444
- ```sh
445
- claude-any --ca-provider ollama-cloud --ca-model glm-5.1
446
- claude-any --ca-provider ollama --ca-base-url http://127.0.0.1:11434 --ca-model qwen3-coder
447
- claude-any --ca-provider ollama-cloud --ca-api-key-env OLLAMA_API_KEY --ca-model qwen3-coder:480b:cloud
448
- claude-any --ca-provider vllm --ca-base-url http://127.0.0.1:8000 --ca-model Qwen/Qwen3-Coder
449
- claude-any --ca-no-update-check -p "Reply with OK only." --output-format text
450
- ```
451
-
452
- All other arguments are passed through to Claude Code.
453
-
454
- ## Headless Agent Chat
455
-
456
- When the claude-any router is running, sub agents can coordinate through local
457
- HTTP endpoints without opening the menu:
458
-
459
- ```sh
460
- # Send a message to a channel.
461
- curl -s http://127.0.0.1:8799/ca/chat/messages \
462
- -H 'content-type: application/json' \
463
- -d '{"channel":"agents","sender_id":"codex","recipients":["kimi"],"message":"Need logs after id 42"}'
464
-
465
- # Poll updates after the last message id.
466
- curl -s 'http://127.0.0.1:8799/ca/chat/messages?channel=agents&recipient=kimi&after=42'
467
-
468
- # Wait on a stream until messages arrive.
469
- curl -N 'http://127.0.0.1:8799/ca/chat/stream?channel=agents&recipient=kimi&after=42&timeout=300'
470
-
471
- # Publish a plan file and get a served URL.
472
- curl -s http://127.0.0.1:8799/ca/plan/artifacts \
473
- -H 'content-type: application/json' \
474
- -d '{"title":"handoff","name":"handoff.md","content":"# Plan\n- step 1"}'
475
- ```
476
-
477
- ## Security
478
-
479
- Do not commit runtime configuration or API keys. Claude Any stores local runtime
480
- configuration under `~/.config/claude-any/`. NVIDIA hosted credentials used by
481
- the optional proxy are stored under `~/.config/nvd-claude-proxy/`.
482
-
483
- This repository should contain source, documentation, and demo assets only.
484
-
485
- ## Development
486
-
487
- ```sh
488
- python -m py_compile claude_any.py claude-any-menu.py claude-any-tool-guard.py
489
- python -m ruff check .
490
- python scripts/make_demo_assets.py
491
- ```
492
-
493
- ## License
494
-
495
- MIT. See [LICENSE](LICENSE).
303
+ ### 0.1.29
304
+
305
+ - **NVIDIA compatibility test fix**: `claude-any test` now restarts the local
306
+ router before router-mode tests, so upgraded installs do not accidentally use
307
+ an old long-running router that still expects `nvd-claude-proxy`.
308
+ - **Clear NVIDIA router wording**: menu status now describes NVIDIA hosted as
309
+ using the local claude-any router instead of the retired local proxy path.
310
+
311
+ ### 0.1.28
312
+
313
+ - **Plan Mode + Advisor headline**: document Claude Any's Plan Mode support for
314
+ router-backed non-Anthropic providers and the `/advisor` slash command backed
315
+ by a selected long-context Advisor Model.
316
+ - **Status-line RPM telemetry**: Claude Any installs a Claude Code `statusLine`
317
+ command that shows router RPM usage and the latest wait time in the bottom
318
+ status area, keeping rate-limit telemetry out of the chat transcript.
319
+ - **Soft RPM pacing for free hosted models**: NVIDIA hosted, self-hosted NIM,
320
+ Ollama, and Ollama Cloud can use router-side RPM pacing. The pacing subtracts
321
+ time already spent in file reads, command execution, and tool-result waits, so
322
+ normal coding tool-call gaps naturally absorb much of the RPM spacing.
323
+ - **Unlimited usage display**: `rate_limit_rpm=0` disables throttling while
324
+ still displaying the last-60-seconds request rate.
325
+
326
+ ### 0.1.27
327
+
328
+ - **Plan mode support for non-Anthropic providers**: the router now keeps
329
+ `EnterPlanMode` available and supports Claude Code Plan mode even when the
330
+ upstream model does not reliably choose that internal tool. Forced
331
+ `tool_choice=EnterPlanMode` is answered locally with a valid Anthropic
332
+ `tool_use`, and long implementation requests that receive only a short or
333
+ empty non-actionable text response are promoted to `EnterPlanMode` using
334
+ language-agnostic structure checks.
335
+ - **Plan-mode self-tool handling**: unsupported Claude Code self-tools are
336
+ still stripped for non-Anthropic providers, but Plan-mode tools are handled
337
+ separately so planning can work instead of being disabled.
338
+
339
+ ### 0.1.25
340
+
341
+ - **Plan-mode diagnostics**: set `~/.config/claude-any/log-level` to `TRACE`
342
+ to capture redacted request and response summaries in `requests.jsonl` /
343
+ `responses.jsonl`.
344
+ - **Headless agent chat service**: the router exposes a small HTTP control
345
+ plane for sub coding agents. Agents can post messages, poll updates after
346
+ the last seen message id, or wait on an SSE stream when they do not have
347
+ their own loop.
348
+ - **Plan artifact serving**: agents can create plan files through the router
349
+ and share stable local URLs, matching Claude Code's file/artifact-oriented
350
+ Plan-mode workflow without copying Anthropic's internal implementation.
351
+
352
+ ### 0.1.24
353
+
354
+ - **First public npm release** under the correct scope: `@oneciel-ai/claude-any`. Earlier 0.1.x versions were never published to the registry; this is the version that is actually installable via `npm install -g @oneciel-ai/claude-any`.
355
+
356
+ ### 0.1.23
357
+
358
+ - **Stream toggle**: each non-Anthropic provider now has a `stream_enabled`
359
+ knob in the LLM options menu, in `claude-anyctl ollama-options` /
360
+ `provider-options`, and in headless flags. When off, the router forces
361
+ `stream:false` upstream and returns the full response to Claude Code — a
362
+ workaround when streaming fragmentation breaks tool-call or JSON parsing.
363
+ - **Word-boundary streaming**: new `stream_word_chunking` option buffers SSE
364
+ text deltas to whitespace/word boundaries before flushing. Implemented for
365
+ both the Ollama router path and the native pass-through path (vLLM, NVIDIA
366
+ hosted, self-hosted NIM). Tool deltas and non-text SSE events pass through
367
+ unchanged.
368
+ - **All-hooks tool guard**: `install_tool_guard_hooks` now registers the full
369
+ set of Claude Code hook events (PreToolUse, PostToolUse, PostToolUseFailure,
370
+ PostToolBatch, PermissionRequest, PermissionDenied, SessionStart/End, Setup,
371
+ UserPromptSubmit/Expansion, Stop, StopFailure, InstructionsLoaded,
372
+ ConfigChange, CwdChanged, Notification, SubagentStart/Stop, TeammateIdle,
373
+ TaskCreated, TaskCompleted, PreCompact, PostCompact, WorktreeCreate,
374
+ WorktreeRemove, Elicitation, ElicitationResult). The WorktreeCreate handler
375
+ emits `worktreePath = base_path` so Agent isolation works in non-git
376
+ directories.
377
+ - **Windows hook compatibility**: `shell_command_string` now emits forward
378
+ slashes and POSIX quoting on Windows so Claude Code's sh-based hook runner
379
+ doesn't strip backslashes from paths like `C:\Users\...`.
380
+ - **LLM options UX**: per-row description footer rendered in the user's
381
+ selected language, and boolean toggles (`Stream`, `Stream word chunking`,
382
+ `Native compatibility`, `Think`) flip on Enter without a prompt.
383
+
384
+ ### 0.1.22
385
+
386
+ - **Headless manual expansion**: expand the manual with practical headless setup, launch, testing, passthrough, and cleanup examples for automation and remote-server use.
387
+
388
+ ### 0.1.21
389
+
390
+ - **Service lifecycle documentation**: clarify that Claude Any starts only the router/proxy services required for the selected provider at launch time, and `claude-any stop` is the explicit cleanup command.
391
+
392
+ ### 0.1.20
393
+
394
+ - **NVIDIA hosted quick test**: `auto` mode now uses a text-only quick test for NVIDIA hosted providers, avoiding slow or flaky tool_use requests during menu checks. Use `smoke` for text + tool_use, or `full` for the full text/tool_use/tool_result round trip.
395
+ - **Menu test timeout**: the terminal menu now runs `claude-any test 60 auto`, which keeps the pre-launch test responsive for hosted models.
396
+
397
+ ### 0.1.19
398
+
399
+ - **Faster compatibility tests**: `claude-any test` now supports `auto`, `smoke`, and `full` modes.
400
+ - **Menu default speedup**: the terminal menu runs `claude-any test 120 auto`, so NVIDIA hosted compatibility checks finish faster while full validation remains available with `claude-any test 180 full`.
401
+
402
+ ### 0.1.18
403
+
404
+ - **NVIDIA hosted transient diagnostics**: compatibility tests now identify `RemoteDisconnected`, connection resets, and 502/503/504 responses from NVIDIA hosted backends as transient upstream/API Catalog failures.
405
+ - **NVIDIA proxy cleanup**: `claude-any stop` now also matches `nvd-claude-proxy` executable processes so stale proxy sessions are cleaned up reliably.
406
+
407
+ ### 0.1.17
408
+
409
+ - **Menu compatibility-test timeout**: the terminal menu now runs compatibility tests with an explicit 180 s timeout and stops the child process if it exceeds the menu hard limit, so slow hosted models cannot leave the menu appearing indefinitely pending.
410
+
411
+ ### 0.1.16
412
+
413
+ - **NVIDIA hosted proxy startup fix**: detect and launch an installed `nvd-claude-proxy`/`ncp` executable before falling back to `python -m nvd_claude_proxy.main`. This supports uv-tool installs where the proxy is available as a command but not importable from Claude Any's Python interpreter.
414
+
415
+ ### 0.1.15
416
+
417
+ - **Ollama/Ollama Cloud tool-call streaming fix**: emit streamed tool calls using sequential Anthropic SSE content block indexes and `input_json_delta` payloads. This prevents Claude Code from rejecting malformed streamed tool-use blocks with `Invalid tool parameters`.
418
+ - **Tool guard auto-install**: non-Anthropic launches now merge the Claude Any tool guard into `~/.claude/settings.json` so generated tool inputs can be normalized before execution.
419
+ - **Tool-call diagnostics**: router-side tool calls are logged to `~/.config/claude-any/tool-calls.jsonl`, and Claude Code hook inputs are logged to `~/.claude/claude-any-tool-guard/tool-events.jsonl` for precise debugging.
420
+ - **Tool input normalization**: the guard now maps common aliases such as `path` to `file_path`, `cmd` to `command`, and `query` to `pattern`, and returns explicit guidance when required fields are missing.
421
+
422
+ ### 0.1.14
423
+
424
+ - **SSH/terminal arrow-key compatibility**: rewrote `read_menu_key()` with a proper ANSI escape sequence parser and moved raw terminal setup into `portable_select()` so the terminal stays in raw mode for the entire menu loop. This prevents escape sequences from leaking to the screen when `ECHO` is restored between keystrokes. Arrow keys, Home, and End now work reliably in SSH sessions.
425
+ - **Test timeout**: default compatibility test timeout increased from 60 s to 120 s for slower cloud providers.
426
+ - **Ollama Cloud compatibility test fix**: added `"stream": false` to compatibility test requests so the router fetches a single JSON response from Ollama Cloud instead of SSE streaming, which was causing `post_json` to timeout while collecting all chunks.
427
+
428
+ ### 0.1.13
429
+
430
+ - **Ollama streaming proxy**: The router now streams Ollama and Ollama Cloud
431
+ responses through to Claude Code in real time using Anthropic SSE format,
432
+ instead of buffering the entire response before delivery.
433
+ - **Config caching**: `load_config()` now caches the configuration file in
434
+ memory and only re-reads from disk when the file modification time changes.
435
+ This eliminates repeated file I/O and JSON parsing on every router request.
436
+ - **Token estimation caching**: `estimate_tokens()` now accepts an optional
437
+ cache dict to avoid redundant `json.dumps()` calls within a single request.
438
+ `ollama_chat_request` and `cap_output_tokens_for_context` share the same
439
+ cache when computing context window sizing.
440
+
441
+ ### 0.1.12
442
+
443
+ - Refresh docs and demo assets.
444
+
445
+ ### 0.1.11
446
+
447
+ - Validate tool call compatibility.
448
+
449
+ ### 0.1.10
450
+
451
+ - Show runtime context in tests.
452
+
453
+ ### 0.1.9
454
+
455
+ - Cap presets to server context.
456
+
457
+ ### 0.1.8
458
+
459
+ - Localize LLM presets.
460
+
461
+ ## Provider Notes
462
+
463
+ | Provider | Mode | Notes |
464
+ | --- | --- | --- |
465
+ | Anthropic | Native Claude Code | Uses Claude login or Anthropic API key. |
466
+ | Ollama | Native when available, router otherwise | Local Ollama normally needs no API key. Cloud models through local Ollama require `ollama signin` on the Ollama host. |
467
+ | Ollama Cloud | Router | Calls `https://ollama.com/api`; requires an Ollama API key. |
468
+ | vLLM | Native Anthropic-compatible endpoint | Use a vLLM endpoint that exposes Anthropic-compatible `/v1/messages`; match `--tool-call-parser` to the model family. |
469
+ | NVIDIA hosted | Router/proxy | Uses NVIDIA hosted API through the compatibility path. |
470
+ | self-hosted NIM | Native Anthropic-compatible endpoint | Use the self-hosted NIM Anthropic-compatible endpoint. |
471
+
472
+ ## Service Lifecycle
473
+
474
+ Claude Any does not keep every possible backend helper running all the time. The
475
+ normal lifecycle is:
476
+
477
+ - Before launch, managed router/proxy processes can be stopped with
478
+ `claude-any stop`.
479
+ - When `claude-any` starts Claude Code, it starts only the services required by
480
+ the selected provider.
481
+ - Ollama and Ollama Cloud router mode use the Claude Any router on
482
+ `127.0.0.1:8799`.
483
+ - NVIDIA hosted router mode uses the Claude Any router on `127.0.0.1:8799` and
484
+ starts `nvd-claude-proxy` on `127.0.0.1:8788` only when that provider needs it.
485
+ - Switching away from NVIDIA hosted does not require keeping the NVIDIA proxy
486
+ alive; stale sessions should be cleaned with `claude-any stop` before a fresh
487
+ test or launch.
488
+
489
+ This keeps Claude Code pointed at one stable Claude Any entry point while still
490
+ letting provider-specific helpers start on demand.
491
+
492
+ For Qwen3-Coder on vLLM, start the server with a matching tool parser:
493
+
494
+ ```sh
495
+ vllm serve Qwen/Qwen3-Coder-30B-A3B-Instruct \
496
+ --host 0.0.0.0 \
497
+ --port 8000 \
498
+ --served-model-name qwen3-coder-30b \
499
+ --max-model-len 65536 \
500
+ --enable-auto-tool-choice \
501
+ --tool-call-parser qwen3_xml
502
+ ```
503
+
504
+ ## Provider Links
505
+
506
+ - Ollama Cloud: [cloud overview](https://ollama.com/cloud), [API key settings](https://ollama.com/settings/keys), [authentication docs](https://docs.ollama.com/api/authentication).
507
+ - Ollama local Anthropic compatibility: [Ollama Anthropic API docs](https://docs.ollama.com/api/anthropic-compatibility).
508
+ - vLLM: [Claude Code integration](https://docs.vllm.ai/en/latest/serving/integrations/claude_code/), [tool calling](https://docs.vllm.ai/en/stable/features/tool_calling/), [project GitHub](https://github.com/vllm-project/vllm).
509
+ - NVIDIA hosted NIM: [NVIDIA API Catalog](https://build.nvidia.com/), [API Catalog quickstart](https://docs.api.nvidia.com/nim/docs/api-quickstart).
510
+ - Self-hosted NVIDIA NIM: [Claude Code with NIM](https://docs.nvidia.com/nim/large-language-models/latest/ai-assistant-integrations/claude-code.html), [NIM for LLMs getting started](https://docs.nvidia.com/nim/large-language-models/1.14.0/getting-started.html), [NGC personal keys](https://org.ngc.nvidia.com/setup/personal-keys).
511
+
512
+ ## Headless Examples
513
+
514
+ Headless commands skip the pre-launch menu and launch Claude Code immediately.
515
+ Claude Any consumes `--ca-*` setup flags, starts the required router/proxy
516
+ services, then passes the remaining arguments to Claude Code.
517
+
518
+ ```sh
519
+ claude-any --ca-provider ollama-cloud --ca-model glm-5.1
520
+ claude-any --ca-provider ollama --ca-base-url http://127.0.0.1:11434 --ca-model qwen3-coder
521
+ claude-any --ca-provider ollama-cloud --ca-api-key-env OLLAMA_API_KEY --ca-model qwen3-coder:480b:cloud
522
+ claude-any --ca-provider vllm --ca-base-url http://127.0.0.1:8000 --ca-model Qwen/Qwen3-Coder
523
+ claude-any --ca-no-update-check -p "Reply with OK only." --output-format text
524
+ ```
525
+
526
+ All other arguments are passed through to Claude Code.
527
+
528
+ ## Headless Agent Chat
529
+
530
+ When the claude-any router is running, sub agents can coordinate through local
531
+ HTTP endpoints without opening the menu:
532
+
533
+ ```sh
534
+ # Send a message to a channel.
535
+ curl -s http://127.0.0.1:8799/ca/chat/messages \
536
+ -H 'content-type: application/json' \
537
+ -d '{"channel":"agents","sender_id":"codex","recipients":["kimi"],"message":"Need logs after id 42"}'
538
+
539
+ # Poll updates after the last message id.
540
+ curl -s 'http://127.0.0.1:8799/ca/chat/messages?channel=agents&recipient=kimi&after=42'
541
+
542
+ # Wait on a stream until messages arrive.
543
+ curl -N 'http://127.0.0.1:8799/ca/chat/stream?channel=agents&recipient=kimi&after=42&timeout=300'
544
+
545
+ # Publish a plan file and get a served URL.
546
+ curl -s http://127.0.0.1:8799/ca/plan/artifacts \
547
+ -H 'content-type: application/json' \
548
+ -d '{"title":"handoff","name":"handoff.md","content":"# Plan\n- step 1"}'
549
+ ```
550
+
551
+ ## Security
552
+
553
+ Do not commit runtime configuration or API keys. Claude Any stores local runtime
554
+ configuration under `~/.config/claude-any/`. NVIDIA hosted credentials used by
555
+ the optional proxy are stored under `~/.config/nvd-claude-proxy/`.
556
+
557
+ This repository should contain source, documentation, and demo assets only.
558
+
559
+ ## Development
560
+
561
+ ```sh
562
+ python -m py_compile claude_any.py claude-any-menu.py claude-any-tool-guard.py
563
+ python -m ruff check .
564
+ python scripts/make_demo_assets.py
565
+ ```
566
+
567
+ ## License
568
+
569
+ MIT. See [LICENSE](LICENSE).