pi-nvidia-nim 1.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md ADDED
@@ -0,0 +1,148 @@
1
+ # pi-nvidia-nim
2
+
3
+ NVIDIA NIM API provider extension for [pi coding agent](https://github.com/badlogic/pi-mono) - access 100+ models from [build.nvidia.com](https://build.nvidia.com) including DeepSeek V3.2, Kimi K2.5, MiniMax M2.1, GLM-4.7, Qwen3, Llama 4, and many more.
4
+
5
+ ## Setup
6
+
7
+ ### 1. Get an NVIDIA NIM API Key
8
+
9
+ 1. Go to [build.nvidia.com](https://build.nvidia.com)
10
+ 2. Sign in or create an account
11
+ 3. Navigate to any model page and click "Get API Key"
12
+ 4. Copy your key (starts with `nvapi-`)
13
+
14
+ ### 2. Set Your API Key
15
+
16
+ ```bash
17
+ export NVIDIA_NIM_API_KEY=nvapi-your-key-here
18
+ ```
19
+
20
+ Add this to your `~/.bashrc`, `~/.zshrc`, or shell profile to persist it.
21
+
22
+ ### 3. Install the Extension
23
+
24
+ **As a pi package (recommended):**
25
+
26
+ ```bash
27
+ pi install git:github.com/xRyul/pi-nvidia-nim
28
+ ```
29
+
30
+ **Or load directly:**
31
+
32
+ ```bash
33
+ pi -e /path/to/pi-nvidia-nim
34
+ ```
35
+
36
+ **Or copy to your extensions directory:**
37
+
38
+ ```bash
39
+ cp -r pi-nvidia-nim ~/.pi/agent/extensions/pi-nvidia-nim
40
+ ```
41
+
42
+ ## Usage
43
+
44
+ Once loaded, NVIDIA NIM models appear in the `/model` selector under the `nvidia-nim` provider. You can also:
45
+
46
+ - Press **Ctrl+L** to open the model selector and search for `nvidia-nim`
47
+ - Use `/scoped-models` to pin your favourite NIM models for quick switching
48
+
49
+ ### CLI
50
+
51
+ ```bash
52
+ # Use a specific NIM model directly
53
+ pi --provider nvidia-nim --model "deepseek-ai/deepseek-v3.2"
54
+
55
+ # With thinking enabled
56
+ pi --provider nvidia-nim --model "deepseek-ai/deepseek-v3.2" --thinking low
57
+
58
+ # Limit model cycling to NIM models
59
+ pi --models "nvidia-nim/*"
60
+ ```
61
+
62
+ ## Reasoning / Thinking
63
+
64
+ NVIDIA NIM models use a non-standard `chat_template_kwargs` parameter to enable thinking, rather than the standard OpenAI `reasoning_effort`. This extension handles this automatically via a custom streaming wrapper that injects the correct per-model parameters.
65
+
66
+ ### How it works
67
+
68
+ When you change the thinking level in pi (`Shift+Tab` to cycle), the extension:
69
+
70
+ 1. **Maps `"minimal"` → `"low"`** - NIM only accepts `low`, `medium`, `high` (not `minimal`). Selecting "minimal" in pi works fine; it's silently mapped.
71
+ 2. **Injects `chat_template_kwargs`** per model to actually enable thinking:
72
+ - DeepSeek V3.x, R1 distills: `{ thinking: true }`
73
+ - GLM-4.7: `{ enable_thinking: true, clear_thinking: false }`
74
+ - Kimi K2.5, K2-thinking: `{ thinking: true }`
75
+ - Qwen3, QwQ: `{ enable_thinking: true }`
76
+ 3. **Explicitly disables thinking** when the level is "off" for models that think by default (e.g., GLM-4.7).
77
+ 4. **Uses `system` role** instead of `developer` for all NIM models - the `developer` role combined with `chat_template_kwargs` causes 500 errors on NIM.
78
+
79
+ ### Supported thinking levels
80
+
81
+ | pi Level | NIM Mapping | Effect |
82
+ |----------|-------------|--------|
83
+ | off | No kwargs (or explicit disable) | No reasoning output |
84
+ | minimal | Mapped to "low" | Thinking enabled |
85
+ | low | low | Thinking enabled |
86
+ | medium | medium | Thinking enabled |
87
+ | high | high | Thinking enabled |
88
+
89
+ ## Available Models
90
+
91
+ The extension ships with curated metadata for 38 featured models. At startup, it also queries the NVIDIA NIM API to discover additional models automatically.
92
+
93
+ ### Featured Models
94
+
95
+ | Model | Reasoning | Vision | Context |
96
+ |-------|-----------|--------|---------|
97
+ | `deepseek-ai/deepseek-v3.2` | ✅ | | 128K |
98
+ | `deepseek-ai/deepseek-v3.1` | ✅ | | 128K |
99
+ | `moonshotai/kimi-k2.5` | ✅ | | 256K |
100
+ | `moonshotai/kimi-k2-thinking` | ✅ | | 128K |
101
+ | `minimaxai/minimax-m2.1` | | | 1M |
102
+ | `z-ai/glm4.7` | ✅ | | 128K |
103
+ | `openai/gpt-oss-120b` | | | 128K |
104
+ | `qwen/qwen3-coder-480b-a35b-instruct` | ✅ | | 256K |
105
+ | `qwen/qwen3-235b-a22b` | ✅ | | 128K |
106
+ | `meta/llama-4-maverick-17b-128e-instruct` | | | 1M |
107
+ | `meta/llama-3.1-405b-instruct` | | | 128K |
108
+ | `meta/llama-3.2-90b-vision-instruct` | | ✅ | 128K |
109
+ | `mistralai/mistral-large-3-675b-instruct-2512` | | | 128K |
110
+ | `mistralai/devstral-2-123b-instruct-2512` | | | 128K |
111
+ | `nvidia/llama-3.1-nemotron-ultra-253b-v1` | ✅ | | 128K |
112
+ | `nvidia/llama-3.3-nemotron-super-49b-v1.5` | ✅ | | 128K |
113
+ | `microsoft/phi-4-mini-flash-reasoning` | ✅ | | 128K |
114
+ | `ibm/granite-3.3-8b-instruct` | | | 128K |
115
+
116
+ ...and 20+ more curated models, plus automatic discovery of new models from the API.
117
+
118
+ ### Tool Calling
119
+
120
+ All major models support OpenAI-compatible tool calling. Tested and confirmed working with DeepSeek V3.2, GLM-4.7, Qwen3, Kimi K2.5, and others.
121
+
122
+ ## How It Works
123
+
124
+ This extension uses `pi.registerProvider()` to register NVIDIA NIM as a custom provider with a custom `streamSimple` wrapper around pi's built-in `openai-completions` streamer.
125
+
126
+ The custom streamer:
127
+ 1. Intercepts the request payload via `onPayload` callback
128
+ 2. Injects `chat_template_kwargs` for models that need it to enable thinking
129
+ 3. Maps unsupported thinking levels (`minimal` → `low`)
130
+ 4. Suppresses `reasoning_effort` for models that don't respond to it (e.g., DeepSeek without kwargs)
131
+ 5. Uses the standard OpenAI SSE streaming format - pi already parses `reasoning_content` and `reasoning` fields from streaming deltas
132
+
133
+ ## Configuration
134
+
135
+ The only configuration needed is the `NVIDIA_NIM_API_KEY` environment variable. All models on NVIDIA NIM are free during the preview period (with rate limits).
136
+
137
+ ## Notes
138
+
139
+ - All costs are set to `$0` since NVIDIA NIM preview models are free (rate-limited)
140
+ - Context windows and max tokens are best-effort estimates; some may differ from actual API limits
141
+ - If a model isn't in the curated list, it gets a conservative 32K context window and 8K max output tokens
142
+ - The extension filters out embedding, reward, safety, and other non-chat models automatically
143
+ - Rate limits on free preview keys are relatively strict; you may encounter 429 errors during heavy usage
144
+ - MiniMax models use `<think>` tags inline in content rather than the `reasoning_content` field
145
+
146
+ ## License
147
+
148
+ MIT
package/index.ts ADDED
@@ -0,0 +1,681 @@
1
+ /**
2
+ * NVIDIA NIM API Provider Extension for pi
3
+ *
4
+ * Provides access to 100+ models from NVIDIA's NIM platform (build.nvidia.com)
5
+ * via their OpenAI-compatible API endpoint.
6
+ *
7
+ * Setup:
8
+ * 1. Get an API key from https://build.nvidia.com
9
+ * 2. Export it: export NVIDIA_NIM_API_KEY=nvapi-...
10
+ * 3. Load the extension:
11
+ * pi -e ./path/to/pi-nvidia-nim
12
+ * # or install as a package:
13
+ * pi install git:github.com/user/pi-nvidia-nim
14
+ *
15
+ * Then use /model and search for "nvidia-nim/" to see all available models.
16
+ *
17
+ * ## Reasoning / Thinking
18
+ *
19
+ * NVIDIA NIM models use `chat_template_kwargs` to enable thinking, which differs
20
+ * from the standard OpenAI `reasoning_effort` parameter. This extension wraps the
21
+ * standard streaming implementation and injects the correct per-model thinking
22
+ * parameters:
23
+ *
24
+ * - DeepSeek V3.x: `chat_template_kwargs: { thinking: true }`
25
+ * - GLM-4.7: `chat_template_kwargs: { enable_thinking: true, clear_thinking: false }`
26
+ * - Kimi K2.5: `chat_template_kwargs: { thinking: true }` (also accepts reasoning_effort)
27
+ * - Qwen3: `chat_template_kwargs: { enable_thinking: true }`
28
+ *
29
+ * NIM only accepts `reasoning_effort` values of "low", "medium", "high" - NOT
30
+ * "minimal". The extension maps pi's "minimal" level to "low" automatically.
31
+ *
32
+ * Some models (e.g., GLM-4.7) always produce reasoning output regardless of
33
+ * thinking settings.
34
+ */
35
+
36
+ import type {
37
+ Api,
38
+ AssistantMessageEventStream,
39
+ Context,
40
+ Model,
41
+ SimpleStreamOptions,
42
+ } from "@mariozechner/pi-ai";
43
+ import { streamSimpleOpenAICompletions } from "@mariozechner/pi-ai";
44
+ import type { ExtensionAPI } from "@mariozechner/pi-coding-agent";
45
+
46
+ // =============================================================================
47
+ // Constants
48
+ // =============================================================================
49
+
50
+ const NVIDIA_NIM_BASE_URL = "https://integrate.api.nvidia.com/v1";
51
+ const NVIDIA_NIM_API_KEY_ENV = "NVIDIA_NIM_API_KEY";
52
+ const PROVIDER_NAME = "nvidia-nim";
53
+
54
+ // =============================================================================
55
+ // Per-model thinking configuration
56
+ // =============================================================================
57
+
58
+ /**
59
+ * Maps model ID prefixes/exact IDs to their chat_template_kwargs for thinking.
60
+ * When a user enables thinking in pi (any level > off), we inject these kwargs
61
+ * into the request body. Models not listed here either:
62
+ * - Don't support thinking (non-reasoning models)
63
+ * - Always think regardless (GLM-4.7 without explicit kwargs)
64
+ * - Work with standard reasoning_effort (rare on NIM)
65
+ */
66
+ interface ThinkingConfig {
67
+ /** chat_template_kwargs to send when thinking is enabled */
68
+ enableKwargs: Record<string, unknown>;
69
+ /** chat_template_kwargs to send when thinking is explicitly disabled (optional) */
70
+ disableKwargs?: Record<string, unknown>;
71
+ /** If true, also send reasoning_effort alongside chat_template_kwargs */
72
+ sendReasoningEffort?: boolean;
73
+ }
74
+
75
+ const THINKING_CONFIGS: Record<string, ThinkingConfig> = {
76
+ // DeepSeek models need chat_template_kwargs - reasoning_effort alone doesn't trigger thinking
77
+ "deepseek-ai/deepseek-v3.2": {
78
+ enableKwargs: { thinking: true },
79
+ disableKwargs: { thinking: false },
80
+ },
81
+ "deepseek-ai/deepseek-v3.1": {
82
+ enableKwargs: { thinking: true },
83
+ disableKwargs: { thinking: false },
84
+ },
85
+ "deepseek-ai/deepseek-v3.1-terminus": {
86
+ enableKwargs: { thinking: true },
87
+ disableKwargs: { thinking: false },
88
+ },
89
+ "deepseek-ai/deepseek-r1-distill-llama-8b": {
90
+ enableKwargs: { thinking: true },
91
+ disableKwargs: { thinking: false },
92
+ },
93
+ "deepseek-ai/deepseek-r1-distill-qwen-7b": {
94
+ enableKwargs: { thinking: true },
95
+ disableKwargs: { thinking: false },
96
+ },
97
+ "deepseek-ai/deepseek-r1-distill-qwen-14b": {
98
+ enableKwargs: { thinking: true },
99
+ disableKwargs: { thinking: false },
100
+ },
101
+ "deepseek-ai/deepseek-r1-distill-qwen-32b": {
102
+ enableKwargs: { thinking: true },
103
+ disableKwargs: { thinking: false },
104
+ },
105
+ // GLM-4.7 always thinks by default, but can be controlled
106
+ "z-ai/glm4.7": {
107
+ enableKwargs: { enable_thinking: true, clear_thinking: false },
108
+ disableKwargs: { enable_thinking: false },
109
+ },
110
+ // Kimi models: chat_template_kwargs works, reasoning_effort also works
111
+ "moonshotai/kimi-k2.5": {
112
+ enableKwargs: { thinking: true },
113
+ disableKwargs: { thinking: false },
114
+ sendReasoningEffort: true,
115
+ },
116
+ "moonshotai/kimi-k2-thinking": {
117
+ enableKwargs: { thinking: true },
118
+ disableKwargs: { thinking: false },
119
+ sendReasoningEffort: true,
120
+ },
121
+ // Qwen3 reasoning models
122
+ "qwen/qwen3-235b-a22b": {
123
+ enableKwargs: { enable_thinking: true },
124
+ disableKwargs: { enable_thinking: false },
125
+ },
126
+ "qwen/qwen3-coder-480b-a35b-instruct": {
127
+ enableKwargs: { enable_thinking: true },
128
+ disableKwargs: { enable_thinking: false },
129
+ },
130
+ "qwen/qwen3-next-80b-a3b-thinking": {
131
+ enableKwargs: { enable_thinking: true },
132
+ disableKwargs: { enable_thinking: false },
133
+ },
134
+ "qwen/qwq-32b": {
135
+ enableKwargs: { enable_thinking: true },
136
+ disableKwargs: { enable_thinking: false },
137
+ },
138
+ // Microsoft Phi reasoning
139
+ "microsoft/phi-4-mini-flash-reasoning": {
140
+ enableKwargs: { enable_thinking: true },
141
+ disableKwargs: { enable_thinking: false },
142
+ },
143
+ // NVIDIA Nemotron reasoning models
144
+ "nvidia/llama-3.1-nemotron-ultra-253b-v1": {
145
+ enableKwargs: { thinking: true },
146
+ disableKwargs: { thinking: false },
147
+ },
148
+ "nvidia/llama-3.3-nemotron-super-49b-v1": {
149
+ enableKwargs: { thinking: true },
150
+ disableKwargs: { thinking: false },
151
+ },
152
+ "nvidia/llama-3.3-nemotron-super-49b-v1.5": {
153
+ enableKwargs: { thinking: true },
154
+ disableKwargs: { thinking: false },
155
+ },
156
+ // Mistral reasoning
157
+ "mistralai/magistral-small-2506": {
158
+ enableKwargs: { enable_thinking: true },
159
+ disableKwargs: { enable_thinking: false },
160
+ },
161
+ };
162
+
163
+ // =============================================================================
164
+ // Reasoning models and their capabilities
165
+ // =============================================================================
166
+
167
+ const REASONING_MODELS = new Set(Object.keys(THINKING_CONFIGS));
168
+
169
+ // Models known to support image/vision input
170
+ const VISION_MODELS = new Set([
171
+ "meta/llama-3.2-11b-vision-instruct",
172
+ "meta/llama-3.2-90b-vision-instruct",
173
+ "microsoft/phi-3-vision-128k-instruct",
174
+ "microsoft/phi-3.5-vision-instruct",
175
+ "microsoft/phi-4-multimodal-instruct",
176
+ "nvidia/llama-3.1-nemotron-nano-vl-8b-v1",
177
+ "nvidia/nemotron-nano-12b-v2-vl",
178
+ "nvidia/cosmos-reason2-8b",
179
+ ]);
180
+
181
+ // Embedding / non-chat models to skip
182
+ const SKIP_MODELS = new Set([
183
+ "baai/bge-m3",
184
+ "nvidia/embed-qa-4",
185
+ "nvidia/nv-embed-v1",
186
+ "nvidia/nv-embedcode-7b-v1",
187
+ "nvidia/nv-embedqa-e5-v5",
188
+ "nvidia/nv-embedqa-mistral-7b-v2",
189
+ "nvidia/nvclip",
190
+ "nvidia/streampetr",
191
+ "nvidia/vila",
192
+ "nvidia/neva-22b",
193
+ "nvidia/nemoretriever-parse",
194
+ "nvidia/nemotron-parse",
195
+ "nvidia/llama-3.2-nemoretriever-1b-vlm-embed-v1",
196
+ "nvidia/llama-3.2-nemoretriever-300m-embed-v1",
197
+ "nvidia/llama-3.2-nemoretriever-300m-embed-v2",
198
+ "nvidia/llama-3.2-nv-embedqa-1b-v1",
199
+ "nvidia/llama-3.2-nv-embedqa-1b-v2",
200
+ "nvidia/llama-nemotron-embed-vl-1b-v2",
201
+ "nvidia/llama-3.1-nemotron-70b-reward",
202
+ "nvidia/nemotron-4-340b-reward",
203
+ "nvidia/nemotron-content-safety-reasoning-4b",
204
+ "nvidia/llama-3.1-nemoguard-8b-content-safety",
205
+ "nvidia/llama-3.1-nemoguard-8b-topic-control",
206
+ "nvidia/llama-3.1-nemotron-safety-guard-8b-v3",
207
+ "meta/llama-guard-4-12b",
208
+ "nvidia/riva-translate-4b-instruct",
209
+ "nvidia/riva-translate-4b-instruct-v1.1",
210
+ "google/deplot",
211
+ "google/paligemma",
212
+ "google/recurrentgemma-2b",
213
+ "google/shieldgemma-9b",
214
+ "microsoft/kosmos-2",
215
+ "adept/fuyu-8b",
216
+ "bigcode/starcoder2-15b",
217
+ "bigcode/starcoder2-7b",
218
+ "snowflake/arctic-embed-l",
219
+ "mistralai/mamba-codestral-7b-v0.1",
220
+ "mistralai/mathstral-7b-v0.1",
221
+ "mistralai/mixtral-8x22b-v0.1",
222
+ "nvidia/mistral-nemo-minitron-8b-base",
223
+ "google/gemma-2b",
224
+ "google/gemma-7b",
225
+ "google/codegemma-7b",
226
+ "meta/llama2-70b",
227
+ ]);
228
+
229
+ // Known context windows (tokens)
230
+ const CONTEXT_WINDOWS: Record<string, number> = {
231
+ // DeepSeek
232
+ "deepseek-ai/deepseek-v3.1": 131072,
233
+ "deepseek-ai/deepseek-v3.1-terminus": 131072,
234
+ "deepseek-ai/deepseek-v3.2": 131072,
235
+ "deepseek-ai/deepseek-r1-distill-llama-8b": 131072,
236
+ "deepseek-ai/deepseek-r1-distill-qwen-14b": 131072,
237
+ "deepseek-ai/deepseek-r1-distill-qwen-32b": 131072,
238
+ "deepseek-ai/deepseek-r1-distill-qwen-7b": 131072,
239
+ "deepseek-ai/deepseek-coder-6.7b-instruct": 16384,
240
+ // Kimi / Moonshot
241
+ "moonshotai/kimi-k2-instruct": 131072,
242
+ "moonshotai/kimi-k2-instruct-0905": 131072,
243
+ "moonshotai/kimi-k2-thinking": 131072,
244
+ "moonshotai/kimi-k2.5": 262144,
245
+ // MiniMax
246
+ "minimaxai/minimax-m2": 1048576,
247
+ "minimaxai/minimax-m2.1": 1048576,
248
+ // Meta Llama
249
+ "meta/llama-3.1-405b-instruct": 131072,
250
+ "meta/llama-3.1-70b-instruct": 131072,
251
+ "meta/llama-3.1-8b-instruct": 131072,
252
+ "meta/llama-3.2-11b-vision-instruct": 131072,
253
+ "meta/llama-3.2-1b-instruct": 131072,
254
+ "meta/llama-3.2-3b-instruct": 131072,
255
+ "meta/llama-3.2-90b-vision-instruct": 131072,
256
+ "meta/llama-3.3-70b-instruct": 131072,
257
+ "meta/llama-4-maverick-17b-128e-instruct": 1048576,
258
+ "meta/llama-4-scout-17b-16e-instruct": 524288,
259
+ "meta/llama3-70b-instruct": 8192,
260
+ "meta/llama3-8b-instruct": 8192,
261
+ // Mistral
262
+ "mistralai/mistral-large-3-675b-instruct-2512": 131072,
263
+ "mistralai/mistral-medium-3-instruct": 131072,
264
+ "mistralai/devstral-2-123b-instruct-2512": 131072,
265
+ "mistralai/magistral-small-2506": 131072,
266
+ "mistralai/mistral-large": 131072,
267
+ "mistralai/mistral-large-2-instruct": 131072,
268
+ "mistralai/mistral-small-24b-instruct": 32768,
269
+ "mistralai/mistral-small-3.1-24b-instruct-2503": 131072,
270
+ "mistralai/mistral-nemotron": 131072,
271
+ "mistralai/mixtral-8x22b-instruct-v0.1": 65536,
272
+ "mistralai/mixtral-8x7b-instruct-v0.1": 32768,
273
+ "mistralai/codestral-22b-instruct-v0.1": 32768,
274
+ "mistralai/ministral-14b-instruct-2512": 131072,
275
+ // Microsoft Phi
276
+ "microsoft/phi-3-medium-128k-instruct": 131072,
277
+ "microsoft/phi-3-mini-128k-instruct": 131072,
278
+ "microsoft/phi-3-small-128k-instruct": 131072,
279
+ "microsoft/phi-3-medium-4k-instruct": 4096,
280
+ "microsoft/phi-3-mini-4k-instruct": 4096,
281
+ "microsoft/phi-3-small-8k-instruct": 8192,
282
+ "microsoft/phi-3-vision-128k-instruct": 131072,
283
+ "microsoft/phi-3.5-mini-instruct": 131072,
284
+ "microsoft/phi-3.5-moe-instruct": 131072,
285
+ "microsoft/phi-3.5-vision-instruct": 131072,
286
+ "microsoft/phi-4-mini-instruct": 131072,
287
+ "microsoft/phi-4-mini-flash-reasoning": 131072,
288
+ "microsoft/phi-4-multimodal-instruct": 131072,
289
+ // Qwen
290
+ "qwen/qwen2-7b-instruct": 131072,
291
+ "qwen/qwen2.5-7b-instruct": 131072,
292
+ "qwen/qwen2.5-coder-32b-instruct": 131072,
293
+ "qwen/qwen2.5-coder-7b-instruct": 131072,
294
+ "qwen/qwen3-235b-a22b": 131072,
295
+ "qwen/qwen3-coder-480b-a35b-instruct": 262144,
296
+ "qwen/qwen3-next-80b-a3b-instruct": 131072,
297
+ "qwen/qwen3-next-80b-a3b-thinking": 131072,
298
+ "qwen/qwq-32b": 131072,
299
+ // Google Gemma
300
+ "google/gemma-2-27b-it": 8192,
301
+ "google/gemma-2-2b-it": 8192,
302
+ "google/gemma-2-9b-it": 8192,
303
+ "google/gemma-3-12b-it": 131072,
304
+ "google/gemma-3-1b-it": 32768,
305
+ "google/gemma-3-27b-it": 131072,
306
+ "google/gemma-3-4b-it": 131072,
307
+ "google/gemma-3n-e2b-it": 131072,
308
+ "google/gemma-3n-e4b-it": 131072,
309
+ "google/codegemma-1.1-7b": 8192,
310
+ // NVIDIA
311
+ "nvidia/llama-3.1-nemotron-ultra-253b-v1": 131072,
312
+ "nvidia/llama-3.1-nemotron-70b-instruct": 131072,
313
+ "nvidia/llama-3.1-nemotron-51b-instruct": 131072,
314
+ "nvidia/llama-3.3-nemotron-super-49b-v1": 131072,
315
+ "nvidia/llama-3.3-nemotron-super-49b-v1.5": 131072,
316
+ "nvidia/nemotron-4-340b-instruct": 4096,
317
+ "nvidia/nvidia-nemotron-nano-9b-v2": 131072,
318
+ // OpenAI open-source
319
+ "openai/gpt-oss-120b": 131072,
320
+ "openai/gpt-oss-20b": 131072,
321
+ // Z-AI / GLM
322
+ "z-ai/glm4.7": 131072,
323
+ // StepFun
324
+ "stepfun-ai/step-3.5-flash": 131072,
325
+ // ByteDance
326
+ "bytedance/seed-oss-36b-instruct": 131072,
327
+ // IBM Granite
328
+ "ibm/granite-3.3-8b-instruct": 131072,
329
+ "ibm/granite-3.0-8b-instruct": 8192,
330
+ "ibm/granite-3.0-3b-a800m-instruct": 8192,
331
+ "ibm/granite-34b-code-instruct": 8192,
332
+ "ibm/granite-8b-code-instruct": 8192,
333
+ // Older / smaller models with limited context
334
+ "upstage/solar-10.7b-instruct": 4096,
335
+ "01-ai/yi-large": 32768,
336
+ "databricks/dbrx-instruct": 32768,
337
+ "baichuan-inc/baichuan2-13b-chat": 4096,
338
+ "thudm/chatglm3-6b": 8192,
339
+ "deepseek-ai/deepseek-coder-6.7b-instruct": 16384,
340
+ "tiiuae/falcon3-7b-instruct": 8192,
341
+ "zyphra/zamba2-7b-instruct": 4096,
342
+ "aisingapore/sea-lion-7b-instruct": 4096,
343
+ "mediatek/breeze-7b-instruct": 4096,
344
+ "meta/codellama-70b": 16384,
345
+ "mistralai/mistral-7b-instruct-v0.2": 32768,
346
+ "mistralai/mistral-7b-instruct-v0.3": 32768,
347
+ "nv-mistralai/mistral-nemo-12b-instruct": 131072,
348
+ "nvidia/nemotron-mini-4b-instruct": 4096,
349
+ "nvidia/nemotron-4-mini-hindi-4b-instruct": 4096,
350
+ "nvidia/usdcode-llama-3.1-70b-instruct": 131072,
351
+ "sarvamai/sarvam-m": 32768,
352
+ "writer/palmyra-creative-122b": 32768,
353
+ "writer/palmyra-fin-70b-32k": 32768,
354
+ "writer/palmyra-med-70b": 8192,
355
+ "writer/palmyra-med-70b-32k": 32768,
356
+ "igenius/colosseum_355b_instruct_16k": 16384,
357
+ "igenius/italia_10b_instruct_16k": 16384,
358
+ "rakuten/rakutenai-7b-chat": 4096,
359
+ "rakuten/rakutenai-7b-instruct": 4096,
360
+ };
361
+
362
+ // Known max output tokens
363
+ const MAX_TOKENS: Record<string, number> = {
364
+ "deepseek-ai/deepseek-v3.1": 16384,
365
+ "deepseek-ai/deepseek-v3.1-terminus": 16384,
366
+ "deepseek-ai/deepseek-v3.2": 16384,
367
+ "moonshotai/kimi-k2.5": 16384,
368
+ "moonshotai/kimi-k2-instruct": 8192,
369
+ "moonshotai/kimi-k2-thinking": 16384,
370
+ "minimaxai/minimax-m2": 8192,
371
+ "minimaxai/minimax-m2.1": 8192,
372
+ "meta/llama-4-maverick-17b-128e-instruct": 16384,
373
+ "meta/llama-4-scout-17b-16e-instruct": 16384,
374
+ "z-ai/glm4.7": 16384,
375
+ "qwen/qwen3-coder-480b-a35b-instruct": 65536,
376
+ "nvidia/llama-3.1-nemotron-ultra-253b-v1": 32768,
377
+ "openai/gpt-oss-120b": 16384,
378
+ "openai/gpt-oss-20b": 16384,
379
+ "mistralai/mistral-large-3-675b-instruct-2512": 16384,
380
+ "mistralai/devstral-2-123b-instruct-2512": 32768,
381
+ };
382
+
383
+ // =============================================================================
384
+ // Curated "featured" models - listed first in the model selector
385
+ // =============================================================================
386
+
387
+ const FEATURED_MODELS = [
388
+ // Flagship / frontier
389
+ "deepseek-ai/deepseek-v3.2",
390
+ "deepseek-ai/deepseek-v3.1",
391
+ "deepseek-ai/deepseek-v3.1-terminus",
392
+ "moonshotai/kimi-k2.5",
393
+ "moonshotai/kimi-k2-thinking",
394
+ "moonshotai/kimi-k2-instruct",
395
+ "moonshotai/kimi-k2-instruct-0905",
396
+ "minimaxai/minimax-m2.1",
397
+ "minimaxai/minimax-m2",
398
+ "z-ai/glm4.7",
399
+ "openai/gpt-oss-120b",
400
+ "openai/gpt-oss-20b",
401
+ "stepfun-ai/step-3.5-flash",
402
+ "bytedance/seed-oss-36b-instruct",
403
+ // Qwen
404
+ "qwen/qwen3-coder-480b-a35b-instruct",
405
+ "qwen/qwen3-235b-a22b",
406
+ "qwen/qwen3-next-80b-a3b-instruct",
407
+ "qwen/qwen3-next-80b-a3b-thinking",
408
+ "qwen/qwq-32b",
409
+ "qwen/qwen2.5-coder-32b-instruct",
410
+ // Meta Llama
411
+ "meta/llama-4-maverick-17b-128e-instruct",
412
+ "meta/llama-4-scout-17b-16e-instruct",
413
+ "meta/llama-3.3-70b-instruct",
414
+ "meta/llama-3.1-405b-instruct",
415
+ "meta/llama-3.2-90b-vision-instruct",
416
+ // Mistral
417
+ "mistralai/mistral-large-3-675b-instruct-2512",
418
+ "mistralai/mistral-medium-3-instruct",
419
+ "mistralai/devstral-2-123b-instruct-2512",
420
+ "mistralai/magistral-small-2506",
421
+ "mistralai/mistral-nemotron",
422
+ // NVIDIA
423
+ "nvidia/llama-3.1-nemotron-ultra-253b-v1",
424
+ "nvidia/llama-3.3-nemotron-super-49b-v1.5",
425
+ "nvidia/llama-3.3-nemotron-super-49b-v1",
426
+ // DeepSeek R1 distilled
427
+ "deepseek-ai/deepseek-r1-distill-qwen-32b",
428
+ "deepseek-ai/deepseek-r1-distill-qwen-14b",
429
+ // Microsoft Phi
430
+ "microsoft/phi-4-mini-flash-reasoning",
431
+ "microsoft/phi-4-mini-instruct",
432
+ // IBM
433
+ "ibm/granite-3.3-8b-instruct",
434
+ ];
435
+
436
+ // =============================================================================
437
+ // Custom streaming - wraps standard openai-completions with NIM-specific fixes
438
+ // =============================================================================
439
+
440
+ /**
441
+ * Custom streamSimple that wraps the standard OpenAI completions streamer.
442
+ *
443
+ * Fixes for NVIDIA NIM:
444
+ * 1. Maps pi's "minimal" thinking level → "low" (NIM only accepts low/medium/high)
445
+ * 2. Strips reasoning_effort for models where it doesn't trigger thinking
446
+ * 3. Injects chat_template_kwargs per model to actually enable thinking
447
+ * 4. Uses onPayload callback to mutate request params before they're sent
448
+ */
449
+ function nimStreamSimple(
450
+ model: Model<Api>,
451
+ context: Context,
452
+ options?: SimpleStreamOptions,
453
+ ): AssistantMessageEventStream {
454
+ const thinkingConfig = THINKING_CONFIGS[model.id];
455
+ const reasoning = options?.reasoning;
456
+ const isThinkingEnabled = reasoning && reasoning !== "off";
457
+
458
+ // Map "minimal" → "low" since NIM rejects "minimal" with a 400 error.
459
+ // NIM only accepts: "low", "medium", "high"
460
+ let mappedReasoning = reasoning;
461
+ if (reasoning === "minimal") {
462
+ mappedReasoning = "low";
463
+ }
464
+
465
+ // For models that have a thinking config: we handle thinking via chat_template_kwargs.
466
+ // Suppress reasoning_effort (set reasoning to undefined) unless the model explicitly
467
+ // supports it alongside chat_template_kwargs (like Kimi).
468
+ let effectiveReasoning = mappedReasoning;
469
+ if (thinkingConfig && isThinkingEnabled && !thinkingConfig.sendReasoningEffort) {
470
+ // Don't send reasoning_effort - we'll use chat_template_kwargs instead.
471
+ // Setting to undefined prevents buildParams from adding reasoning_effort.
472
+ effectiveReasoning = undefined;
473
+ }
474
+
475
+ const modifiedOptions: SimpleStreamOptions = {
476
+ ...options,
477
+ reasoning: effectiveReasoning,
478
+ onPayload: (params: unknown) => {
479
+ const p = params as Record<string, unknown>;
480
+
481
+ if (thinkingConfig) {
482
+ if (isThinkingEnabled) {
483
+ // Inject chat_template_kwargs to enable thinking
484
+ p.chat_template_kwargs = thinkingConfig.enableKwargs;
485
+ } else if (thinkingConfig.disableKwargs) {
486
+ // Explicitly disable thinking (some models think by default, e.g. GLM-4.7)
487
+ p.chat_template_kwargs = thinkingConfig.disableKwargs;
488
+ }
489
+ }
490
+
491
+ // Ensure reasoning_effort is never "minimal" (belt & suspenders)
492
+ if (p.reasoning_effort === "minimal") {
493
+ p.reasoning_effort = "low";
494
+ }
495
+
496
+ // Normalize content arrays to plain strings where possible.
497
+ // Many older/smaller NIM models (e.g., solar, baichuan, falcon) reject the
498
+ // array format [{"type":"text","text":"..."}] and require a plain string.
499
+ // This is safe for all models since plain strings are universally accepted.
500
+ const messages = p.messages as Array<Record<string, unknown>> | undefined;
501
+ if (messages) {
502
+ for (const msg of messages) {
503
+ if (Array.isArray(msg.content)) {
504
+ const parts = msg.content as Array<Record<string, unknown>>;
505
+ const allText = parts.every((part) => part.type === "text");
506
+ if (allText) {
507
+ msg.content = parts.map((part) => part.text as string).join("\n");
508
+ }
509
+ }
510
+ }
511
+ }
512
+
513
+ // Chain to original onPayload if present
514
+ options?.onPayload?.(params);
515
+ },
516
+ };
517
+
518
+ return streamSimpleOpenAICompletions(model, context, modifiedOptions);
519
+ }
520
+
521
+ // =============================================================================
522
+ // Model building helpers
523
+ // =============================================================================
524
+
525
+ interface NimModelEntry {
526
+ id: string;
527
+ name: string;
528
+ reasoning: boolean;
529
+ input: ("text" | "image")[];
530
+ contextWindow: number;
531
+ maxTokens: number;
532
+ cost: { input: number; output: number; cacheRead: number; cacheWrite: number };
533
+ compat?: Record<string, unknown>;
534
+ }
535
+
536
+ function makeDisplayName(modelId: string): string {
537
+ const parts = modelId.split("/");
538
+ const name = parts[parts.length - 1];
539
+ return name
540
+ .replace(/-/g, " ")
541
+ .replace(/_/g, " ")
542
+ .replace(/\b\w/g, (c) => c.toUpperCase());
543
+ }
544
+
545
+ function buildModelEntry(modelId: string): NimModelEntry | null {
546
+ if (SKIP_MODELS.has(modelId)) return null;
547
+
548
+ const isReasoning = REASONING_MODELS.has(modelId);
549
+ const isVision = VISION_MODELS.has(modelId);
550
+ const contextWindow = CONTEXT_WINDOWS[modelId] ?? 4096;
551
+ const maxTokens = MAX_TOKENS[modelId] ?? Math.min(2048, contextWindow);
552
+
553
+ const entry: NimModelEntry = {
554
+ id: modelId,
555
+ name: makeDisplayName(modelId),
556
+ reasoning: isReasoning,
557
+ input: isVision ? ["text", "image"] : ["text"],
558
+ contextWindow,
559
+ maxTokens,
560
+ cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
561
+ };
562
+
563
+ // Default compat for all NIM models:
564
+ // - supportsReasoningEffort: false - we handle thinking via streamSimple + chat_template_kwargs
565
+ // - supportsDeveloperRole: false - "developer" role + chat_template_kwargs causes 500 on NIM
566
+ // (developer role alone works, but combined with thinking kwargs it breaks)
567
+ // - maxTokensField: "max_tokens" - safer default for heterogeneous backends
568
+ entry.compat = {
569
+ supportsReasoningEffort: false,
570
+ supportsDeveloperRole: false,
571
+ maxTokensField: "max_tokens",
572
+ };
573
+
574
+ // Mistral models on NIM need extra compat flags
575
+ if (modelId.startsWith("mistralai/")) {
576
+ entry.compat.requiresToolResultName = true;
577
+ entry.compat.requiresThinkingAsText = true;
578
+ entry.compat.requiresMistralToolIds = true;
579
+ }
580
+
581
+ return entry;
582
+ }
583
+
584
+ // =============================================================================
585
+ // Dynamic model discovery
586
+ // =============================================================================
587
+
588
+ interface NimApiModel {
589
+ id: string;
590
+ object: string;
591
+ owned_by: string;
592
+ }
593
+
594
+ async function fetchNimModels(apiKey: string): Promise<string[]> {
595
+ try {
596
+ const response = await fetch(`${NVIDIA_NIM_BASE_URL}/models`, {
597
+ headers: {
598
+ Authorization: `Bearer ${apiKey}`,
599
+ Accept: "application/json",
600
+ },
601
+ signal: AbortSignal.timeout(10000),
602
+ });
603
+
604
+ if (!response.ok) return [];
605
+
606
+ const data = (await response.json()) as { data: NimApiModel[] };
607
+ return data.data?.map((m) => m.id) ?? [];
608
+ } catch {
609
+ return [];
610
+ }
611
+ }
612
+
613
+ // =============================================================================
614
+ // Extension Entry Point
615
+ // =============================================================================
616
+
617
+ export default function (pi: ExtensionAPI) {
618
+ // Build the curated model list
619
+ const modelMap = new Map<string, NimModelEntry>();
620
+
621
+ // Add featured models first (preserves order in selector)
622
+ for (const id of FEATURED_MODELS) {
623
+ const entry = buildModelEntry(id);
624
+ if (entry) modelMap.set(id, entry);
625
+ }
626
+
627
+ // Register with curated models immediately
628
+ const curatedModels = Array.from(modelMap.values());
629
+
630
+ pi.registerProvider(PROVIDER_NAME, {
631
+ baseUrl: NVIDIA_NIM_BASE_URL,
632
+ apiKey: NVIDIA_NIM_API_KEY_ENV,
633
+ api: "openai-completions",
634
+ authHeader: true,
635
+ models: curatedModels,
636
+ streamSimple: nimStreamSimple,
637
+ });
638
+
639
+ // On session start, discover additional models from the API
640
+ pi.on("session_start", async (_event, ctx) => {
641
+ const apiKey = process.env[NVIDIA_NIM_API_KEY_ENV];
642
+ if (!apiKey) {
643
+ ctx.ui.notify(
644
+ `NVIDIA NIM: Set ${NVIDIA_NIM_API_KEY_ENV} env var to enable models. Get a key at https://build.nvidia.com`,
645
+ "warning",
646
+ );
647
+ return;
648
+ }
649
+
650
+ // Fetch live model list
651
+ const liveModelIds = await fetchNimModels(apiKey);
652
+ if (liveModelIds.length === 0) return;
653
+
654
+ let newModelsAdded = 0;
655
+ for (const id of liveModelIds) {
656
+ if (modelMap.has(id)) continue;
657
+ const entry = buildModelEntry(id);
658
+ if (entry) {
659
+ modelMap.set(id, entry);
660
+ newModelsAdded++;
661
+ }
662
+ }
663
+
664
+ // Re-register with full model list if we found new ones.
665
+ // NOTE: must use ctx.modelRegistry.registerProvider() here, not pi.registerProvider().
666
+ // pi.registerProvider() only queues registrations for the initial extension load.
667
+ // From event handlers/commands, we need to call the registry directly.
668
+ if (newModelsAdded > 0) {
669
+ const allModels = Array.from(modelMap.values());
670
+ ctx.modelRegistry.registerProvider(PROVIDER_NAME, {
671
+ baseUrl: NVIDIA_NIM_BASE_URL,
672
+ apiKey: NVIDIA_NIM_API_KEY_ENV,
673
+ api: "openai-completions",
674
+ authHeader: true,
675
+ models: allModels,
676
+ streamSimple: nimStreamSimple,
677
+ });
678
+ }
679
+ });
680
+
681
+ }
package/package.json ADDED
@@ -0,0 +1,17 @@
1
+ {
2
+ "name": "pi-nvidia-nim",
3
+ "version": "1.1.0",
4
+ "description": "NVIDIA NIM API provider extension for pi coding agent — access 100+ models from build.nvidia.com",
5
+ "type": "module",
6
+ "keywords": ["pi-package"],
7
+ "pi": {
8
+ "extensions": ["./index.ts"],
9
+ "image": "https://raw.githubusercontent.com/xRyul/pi-nvidia-nim/main/screenshot.png"
10
+ },
11
+ "scripts": {
12
+ "clean": "echo 'nothing to clean'",
13
+ "build": "echo 'nothing to build'",
14
+ "check": "echo 'nothing to check'"
15
+ },
16
+ "license": "MIT"
17
+ }
package/screenshot.png ADDED
Binary file