npm - pi-nvidia-nim - Versions diffs - 1.1.18 → 1.1.20 - Mend

pi-nvidia-nim 1.1.18 → 1.1.20

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (3) hide show

package/README.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # pi-nvidia-nim
-NVIDIA NIM API provider extension for [pi coding agent](https://github.com/badlogic/pi-mono) - access 100+ models from [build.nvidia.com](https://build.nvidia.com) including DeepSeek V3.2, Kimi K2.5, MiniMax M2.1, GLM-5, GLM-4.7, Qwen3, Llama 4, and many more.
+NVIDIA NIM API provider extension for [pi coding agent](https://github.com/badlogic/pi-mono) - access 100+ models from [build.nvidia.com](https://build.nvidia.com) including DeepSeek V4 Flash/Pro, DeepSeek V3.2, Kimi K2.6, MiniMax M2.1, GLM-5, GLM-4.7, Qwen3, Llama 4, and many more.
 https://github.com/user-attachments/assets/f44773e4-9bf8-4bb5-a9c0-d5938030701c
@@ -17,10 +17,14 @@ https://github.com/user-attachments/assets/f44773e4-9bf8-4bb5-a9c0-d5938030701c
 ### 2. Set Your API Key
 ```bash
+# Preferred by this extension
 export NVIDIA_NIM_API_KEY=nvapi-your-key-here
+# Also supported, matching NVIDIA's website examples
+export NVIDIA_API_KEY=nvapi-your-key-here
 ```
-Add this to your `~/.bashrc`, `~/.zshrc`, or shell profile to persist it.
+Add one of these to your `~/.bashrc`, `~/.zshrc`, or shell profile to persist it.
 ### 3. Install the Extension
@@ -53,10 +57,10 @@ Once loaded, NVIDIA NIM models appear in the `/model` selector under the `nvidia
 ```bash
 # Use a specific NIM model directly
-pi --provider nvidia-nim --model "deepseek-ai/deepseek-v3.2"
+pi --provider nvidia-nim --model "deepseek-ai/deepseek-v4-flash"
 # With thinking enabled
-pi --provider nvidia-nim --model "deepseek-ai/deepseek-v3.2" --thinking low
+pi --provider nvidia-nim --model "deepseek-ai/deepseek-v4-flash" --thinking high
 # Limit model cycling to NIM models
 pi --models "nvidia-nim/*"
@@ -70,11 +74,12 @@ NVIDIA NIM models use a non-standard `chat_template_kwargs` parameter to enable
 When you change the thinking level in pi (`Shift+Tab` to cycle), the extension:
-1. **Maps `"minimal"` → `"low"`** - NIM only accepts `low`, `medium`, `high` (not `minimal`). Selecting "minimal" in pi works fine; it's silently mapped.
+1. **Maps thinking levels** to values each NIM model accepts. For DeepSeek V4, `xhigh` maps to `max`; lower enabled levels use `high`.
 2. **Injects `chat_template_kwargs`** per model to actually enable thinking:
+   - DeepSeek V4: `{ thinking: true, reasoning_effort: "high" | "max" }`
    - DeepSeek V3.x, R1 distills: `{ thinking: true }`
    - GLM-5, GLM-4.7: `{ enable_thinking: true, clear_thinking: false }`
-   - Kimi K2.5, K2-thinking: `{ thinking: true }`
+   - Kimi K2.6, K2-thinking: `{ thinking: true }`
    - Qwen3, QwQ: `{ enable_thinking: true }`
 3. **Explicitly disables thinking** when the level is "off" for models that think by default (e.g., GLM-5, GLM-4.7).
 4. **Uses `system` role** instead of `developer` for all NIM models - the `developer` role combined with `chat_template_kwargs` causes 500 errors on NIM.
@@ -84,22 +89,25 @@ When you change the thinking level in pi (`Shift+Tab` to cycle), the extension:
 | pi Level | NIM Mapping | Effect |
 |----------|-------------|--------|
 | off | No kwargs (or explicit disable) | No reasoning output |
-| minimal | Mapped to "low" | Thinking enabled |
-| low | low | Thinking enabled |
-| medium | medium | Thinking enabled |
+| minimal | low, or high for DeepSeek V4 | Thinking enabled |
+| low | low, or high for DeepSeek V4 | Thinking enabled |
+| medium | medium, or high for DeepSeek V4 | Thinking enabled |
 | high | high | Thinking enabled |
+| xhigh | high, or max for DeepSeek V4 | Maximum supported thinking |
 ## Available Models
-The extension ships with curated metadata for 39 featured models. At startup, it also queries the NVIDIA NIM API to discover additional models automatically.
+The extension ships with curated metadata for 42 featured models. At startup, it also queries the NVIDIA NIM API to discover additional models automatically.
 ### Featured Models
 | Model | Reasoning | Vision | Context |
 |-------|-----------|--------|---------|
+| `deepseek-ai/deepseek-v4-flash` | ✅ | | 1M |
+| `deepseek-ai/deepseek-v4-pro` | ✅ | | 1M |
 | `deepseek-ai/deepseek-v3.2` | ✅ | | 128K |
 | `deepseek-ai/deepseek-v3.1` | ✅ | | 128K |
-| `moonshotai/kimi-k2.5` | ✅ | | 256K |
+| `moonshotai/kimi-k2.6` | ✅ | | 256K |
 | `moonshotai/kimi-k2-thinking` | ✅ | | 128K |
 | `minimaxai/minimax-m2.1` | | | 1M |
 | `z-ai/glm5` | ✅ | | 128K |
@@ -121,7 +129,7 @@ The extension ships with curated metadata for 39 featured models. At startup, it
 ### Tool Calling
-All major models support OpenAI-compatible tool calling. Tested and confirmed working with DeepSeek V3.2, GLM-5, GLM-4.7, Qwen3, Kimi K2.5, and others.
+All major models support OpenAI-compatible tool calling. Tested and confirmed working with DeepSeek V4/V3.2, GLM-5, GLM-4.7, Qwen3, Kimi K2.6, and others.
 ## How It Works
@@ -130,13 +138,13 @@ This extension uses `pi.registerProvider()` to register NVIDIA NIM as a custom p
 The custom streamer:
 1. Intercepts the request payload via `onPayload` callback
 2. Injects `chat_template_kwargs` for models that need it to enable thinking
-3. Maps unsupported thinking levels (`minimal` → `low`)
+3. Maps unsupported thinking levels to NIM-compatible values (`minimal` → `low`; `xhigh` → `high` or DeepSeek V4 `max`)
 4. Suppresses `reasoning_effort` for models that don't respond to it (e.g., DeepSeek without kwargs)
 5. Uses the standard OpenAI SSE streaming format - pi already parses `reasoning_content` and `reasoning` fields from streaming deltas
 ## Configuration
-The only configuration needed is the `NVIDIA_NIM_API_KEY` environment variable. All models on NVIDIA NIM are free during the preview period (with rate limits).
+The only configuration needed is either the `NVIDIA_NIM_API_KEY` or `NVIDIA_API_KEY` environment variable. All models on NVIDIA NIM are free during the preview period (with rate limits).
 ## Notes

package/index.ts CHANGED Viewed

@@ -6,7 +6,7 @@
  *
  * Setup:
  *   1. Get an API key from https://build.nvidia.com
- *   2. Export it: export NVIDIA_NIM_API_KEY=nvapi-...
+ *   2. Export it: export NVIDIA_NIM_API_KEY=nvapi-... (or NVIDIA_API_KEY=nvapi-...)
  *   3. Load the extension:
  *      pi -e ./path/to/pi-nvidia-nim
  *      # or install as a package:
@@ -22,17 +22,20 @@
  * parameters:
  *
  * - DeepSeek V3.x: `chat_template_kwargs: { thinking: true }`
+ * - DeepSeek V4:   `chat_template_kwargs: { thinking: true, reasoning_effort: "high" | "max" }`
  * - GLM-5/4.7:     `chat_template_kwargs: { enable_thinking: true, clear_thinking: false }`
  * - Kimi K2.5:     `chat_template_kwargs: { thinking: true }` (also accepts reasoning_effort)
  * - Qwen3:         `chat_template_kwargs: { enable_thinking: true }`
  *
- * NIM only accepts `reasoning_effort` values of "low", "medium", "high" - NOT
- * "minimal". The extension maps pi's "minimal" level to "low" automatically.
+ * NIM only accepts selected `reasoning_effort` values. The extension maps pi's
+ * provider-agnostic levels to the values each NIM model accepts.
  *
  * Some models (e.g., GLM-5, GLM-4.7) always produce reasoning output regardless of
  * thinking settings.
  */
+import { existsSync, readFileSync } from "node:fs";
+import { join } from "node:path";
 import type {
 	Api,
 	AssistantMessageEventStream,
@@ -41,7 +44,7 @@ import type {
 	SimpleStreamOptions,
 } from "@mariozechner/pi-ai";
 import { streamSimpleOpenAICompletions } from "@mariozechner/pi-ai";
-import type { ExtensionAPI } from "@mariozechner/pi-coding-agent";
+import { getAgentDir, type ExtensionAPI } from "@mariozechner/pi-coding-agent";
 // =============================================================================
 // Constants
@@ -49,6 +52,8 @@ import type { ExtensionAPI } from "@mariozechner/pi-coding-agent";
 const NVIDIA_NIM_BASE_URL = "https://integrate.api.nvidia.com/v1";
 const NVIDIA_NIM_API_KEY_ENV = "NVIDIA_NIM_API_KEY";
+const NVIDIA_API_KEY_ENV = "NVIDIA_API_KEY";
+const NVIDIA_API_KEY_ENV_NAMES = [NVIDIA_NIM_API_KEY_ENV, NVIDIA_API_KEY_ENV] as const;
 const PROVIDER_NAME = "nvidia-nim";
 // =============================================================================
@@ -70,10 +75,22 @@ interface ThinkingConfig {
 	disableKwargs?: Record<string, unknown>;
 	/** If true, also send reasoning_effort alongside chat_template_kwargs */
 	sendReasoningEffort?: boolean;
+	/** If true, include a model-specific reasoning_effort inside chat_template_kwargs */
+	includeReasoningEffortInKwargs?: boolean;
 }
 const THINKING_CONFIGS: Record<string, ThinkingConfig> = {
 	// DeepSeek models need chat_template_kwargs - reasoning_effort alone doesn't trigger thinking
+	"deepseek-ai/deepseek-v4-flash": {
+		enableKwargs: { thinking: true },
+		disableKwargs: { thinking: false },
+		includeReasoningEffortInKwargs: true,
+	},
+	"deepseek-ai/deepseek-v4-pro": {
+		enableKwargs: { thinking: true },
+		disableKwargs: { thinking: false },
+		includeReasoningEffortInKwargs: true,
+	},
 	"deepseek-ai/deepseek-v3.2": {
 		enableKwargs: { thinking: true },
 		disableKwargs: { thinking: false },
@@ -112,7 +129,7 @@ const THINKING_CONFIGS: Record<string, ThinkingConfig> = {
 		disableKwargs: { enable_thinking: false },
 	},
 	// Kimi models: chat_template_kwargs works, reasoning_effort also works
-	"moonshotai/kimi-k2.5": {
+	"moonshotai/kimi-k2.6": {
 		enableKwargs: { thinking: true },
 		disableKwargs: { thinking: false },
 		sendReasoningEffort: true,
@@ -236,6 +253,8 @@ const CONTEXT_WINDOWS: Record<string, number> = {
 	"deepseek-ai/deepseek-v3.1": 131072,
 	"deepseek-ai/deepseek-v3.1-terminus": 131072,
 	"deepseek-ai/deepseek-v3.2": 131072,
+	"deepseek-ai/deepseek-v4-flash": 1048576,
+	"deepseek-ai/deepseek-v4-pro": 1048576,
 	"deepseek-ai/deepseek-r1-distill-llama-8b": 131072,
 	"deepseek-ai/deepseek-r1-distill-qwen-14b": 131072,
 	"deepseek-ai/deepseek-r1-distill-qwen-32b": 131072,
@@ -245,7 +264,7 @@ const CONTEXT_WINDOWS: Record<string, number> = {
 	"moonshotai/kimi-k2-instruct": 131072,
 	"moonshotai/kimi-k2-instruct-0905": 131072,
 	"moonshotai/kimi-k2-thinking": 131072,
-	"moonshotai/kimi-k2.5": 262144,
+	"moonshotai/kimi-k2.6": 262144,
 	// MiniMax
 	"minimaxai/minimax-m2": 1048576,
 	"minimaxai/minimax-m2.1": 1048576,
@@ -369,7 +388,9 @@ const MAX_TOKENS: Record<string, number> = {
 	"deepseek-ai/deepseek-v3.1": 16384,
 	"deepseek-ai/deepseek-v3.1-terminus": 16384,
 	"deepseek-ai/deepseek-v3.2": 16384,
-	"moonshotai/kimi-k2.5": 16384,
+	"deepseek-ai/deepseek-v4-flash": 16384,
+	"deepseek-ai/deepseek-v4-pro": 16384,
+	"moonshotai/kimi-k2.6": 16384,
 	"moonshotai/kimi-k2-instruct": 8192,
 	"moonshotai/kimi-k2-thinking": 16384,
 	"minimaxai/minimax-m2": 8192,
@@ -393,10 +414,12 @@ const MAX_TOKENS: Record<string, number> = {
 const FEATURED_MODELS = [
 	// Flagship / frontier
+	"deepseek-ai/deepseek-v4-flash",
+	"deepseek-ai/deepseek-v4-pro",
 	"deepseek-ai/deepseek-v3.2",
 	"deepseek-ai/deepseek-v3.1",
 	"deepseek-ai/deepseek-v3.1-terminus",
-	"moonshotai/kimi-k2.5",
+	"moonshotai/kimi-k2.6",
 	"moonshotai/kimi-k2-thinking",
 	"moonshotai/kimi-k2-instruct",
 	"moonshotai/kimi-k2-instruct-0905",
@@ -450,26 +473,168 @@ const FEATURED_MODELS = [
  * Custom streamSimple that wraps the standard OpenAI completions streamer.
  *
  * Fixes for NVIDIA NIM:
- * 1. Maps pi's "minimal" thinking level → "low" (NIM only accepts low/medium/high)
+ * 1. Maps pi's thinking levels to values accepted by NVIDIA NIM
  * 2. Strips reasoning_effort for models where it doesn't trigger thinking
  * 3. Injects chat_template_kwargs per model to actually enable thinking
  * 4. Uses onPayload callback to mutate request params before they're sent
  */
+type NimApiKeyEnvName = (typeof NVIDIA_API_KEY_ENV_NAMES)[number];
+type AuthStorageLike = {
+	get?: (provider: string) => unknown;
+};
+interface NimApiKeyCredential {
+	type: "api_key";
+	key: string;
+}
+function getNimApiKeyEnv(): NimApiKeyEnvName | undefined {
+	return NVIDIA_API_KEY_ENV_NAMES.find((envName) => !!process.env[envName]);
+}
+function getNimApiKey(): string | undefined {
+	const envName = getNimApiKeyEnv();
+	const apiKey = envName ? process.env[envName] : undefined;
+	return apiKey?.trim() || undefined;
+}
+function isNimApiKeyEnvName(value: string): value is NimApiKeyEnvName {
+	return NVIDIA_API_KEY_ENV_NAMES.includes(value as NimApiKeyEnvName);
+}
+function isNimApiKeyEnvValue(value: string): boolean {
+	return NVIDIA_API_KEY_ENV_NAMES.some((envName) => process.env[envName]?.trim() === value);
+}
+function isNimApiKeyCredential(credential: unknown): credential is NimApiKeyCredential {
+	return (
+		typeof credential === "object" &&
+		credential !== null &&
+		(credential as { type?: unknown }).type === "api_key" &&
+		typeof (credential as { key?: unknown }).key === "string"
+	);
+}
+function readStoredNimApiKeyConfig(): string | undefined {
+	try {
+		const authPath = join(getAgentDir(), "auth.json");
+		if (!existsSync(authPath)) return undefined;
+		const data = JSON.parse(readFileSync(authPath, "utf-8")) as Record<string, unknown>;
+		const credential = data[PROVIDER_NAME];
+		return isNimApiKeyCredential(credential) ? credential.key : undefined;
+	} catch {
+		return undefined;
+	}
+}
+function getStoredNimApiKeyConfig(authStorage?: AuthStorageLike): string | undefined {
+	if (authStorage) {
+		try {
+			const credential = authStorage.get?.(PROVIDER_NAME);
+			return isNimApiKeyCredential(credential) ? credential.key : undefined;
+		} catch {
+			return undefined;
+		}
+	}
+	return readStoredNimApiKeyConfig();
+}
+function hasStoredNimCommandCredential(authStorage?: AuthStorageLike): boolean {
+	return getStoredNimApiKeyConfig(authStorage)?.startsWith("!") ?? false;
+}
+function getStoredResolvedNimApiKey(authStorage?: AuthStorageLike): string | undefined {
+	const configuredApiKey = getStoredNimApiKeyConfig(authStorage)?.trim();
+	if (!configuredApiKey || configuredApiKey.startsWith("!")) return undefined;
+	const envValue = process.env[configuredApiKey]?.trim();
+	return envValue || configuredApiKey;
+}
+function normalizeResolvedNimApiKey(apiKey: string | undefined): string | undefined {
+	if (apiKey === undefined) return undefined;
+	const trimmed = apiKey.trim();
+	if (!trimmed) {
+		throw new Error("NVIDIA NIM API key resolved to an empty value.");
+	}
+	return trimmed;
+}
+function resolveNimApiKey(apiKey: string | undefined, authStorage?: AuthStorageLike): string | undefined {
+	const resolvedApiKey = normalizeResolvedNimApiKey(apiKey);
+	const hasStoredCommandCredential = hasStoredNimCommandCredential(authStorage);
+	if (
+		hasStoredCommandCredential &&
+		(!resolvedApiKey || isNimApiKeyEnvName(resolvedApiKey) || isNimApiKeyEnvValue(resolvedApiKey))
+	) {
+		throw new Error("NVIDIA NIM API key command resolved to an empty value.");
+	}
+	const storedApiKey = getStoredResolvedNimApiKey(authStorage);
+	if (storedApiKey && (!resolvedApiKey || isNimApiKeyEnvName(resolvedApiKey))) return storedApiKey;
+	if (resolvedApiKey && !isNimApiKeyEnvName(resolvedApiKey)) return resolvedApiKey;
+	return getNimApiKey();
+}
+function resolveRequiredNimApiKey(apiKey: string | undefined): string {
+	const resolvedApiKey = resolveNimApiKey(apiKey);
+	if (resolvedApiKey) return resolvedApiKey;
+	throw new Error(
+		`NVIDIA NIM: no API key configured. Set ${NVIDIA_NIM_API_KEY_ENV} or ${NVIDIA_API_KEY_ENV}. ` +
+		`Get a free API key at https://build.nvidia.com and export it: ` +
+		`export ${NVIDIA_NIM_API_KEY_ENV}=nvapi-...`,
+	);
+}
+function mapNimTopLevelReasoning(reasoning: SimpleStreamOptions["reasoning"]): SimpleStreamOptions["reasoning"] {
+	if (reasoning === "minimal") return "low";
+	if (reasoning === "xhigh") return "high";
+	return reasoning;
+}
+function mapDeepSeekV4Reasoning(reasoning: SimpleStreamOptions["reasoning"]): "high" | "max" {
+	return reasoning === "xhigh" ? "max" : "high";
+}
+function buildThinkingKwargs(
+	thinkingConfig: ThinkingConfig,
+	reasoning: SimpleStreamOptions["reasoning"],
+): Record<string, unknown> {
+	const kwargs = { ...thinkingConfig.enableKwargs };
+	if (thinkingConfig.includeReasoningEffortInKwargs) {
+		kwargs.reasoning_effort = mapDeepSeekV4Reasoning(reasoning);
+	}
+	return kwargs;
+}
 function nimStreamSimple(
 	model: Model<Api>,
 	context: Context,
 	options?: SimpleStreamOptions,
 ): AssistantMessageEventStream {
+	// pi-coding-agent registers streamSimple globally per `api` type (not per provider).
+	// This streamer is invoked for ALL openai-completions providers (e.g. openrouter, openai),
+	// not just nvidia-nim. Pass non-NIM calls through unchanged so we don't leak the
+	// NVIDIA_NIM_API_KEY into other providers' Authorization headers.
+	if (model.provider !== PROVIDER_NAME) {
+		return streamSimpleOpenAICompletions(model as Model<"openai-completions">, context, options);
+	}
 	const thinkingConfig = THINKING_CONFIGS[model.id];
 	const reasoning = options?.reasoning;
 	const isThinkingEnabled = !!reasoning;
-	// Map "minimal" → "low" since NIM rejects "minimal" with a 400 error.
-	// NIM only accepts: "low", "medium", "high"
-	let mappedReasoning = reasoning;
-	if (reasoning === "minimal") {
-		mappedReasoning = "low";
-	}
+	// Map provider-agnostic pi levels to NIM's accepted top-level values.
+	// Model-specific chat_template_kwargs may apply a different mapping below.
+	const mappedReasoning = mapNimTopLevelReasoning(reasoning);
 	// For models that have a thinking config: we handle thinking via chat_template_kwargs.
 	// Suppress reasoning_effort (set reasoning to undefined) unless the model explicitly
@@ -481,16 +646,9 @@ function nimStreamSimple(
 		effectiveReasoning = undefined;
 	}
-	// Resolve API key at request time - must pass via options.apiKey because
-	// pi-ai's getEnvApiKey() doesn't know about custom providers like "nvidia-nim"
-	const nimApiKey = process.env["NVIDIA_NIM_API_KEY"];
-	if (!nimApiKey) {
-		throw new Error(
-			`NVIDIA NIM: NVIDIA_NIM_API_KEY environment variable is not set. ` +
-			`Get a free API key at https://build.nvidia.com and export it: ` +
-			`export NVIDIA_NIM_API_KEY=nvapi-...`
-		);
-	}
+	// Use pi's already-resolved provider key when available (auth.json, shell command,
+	// CLI override), and fall back to the two NVIDIA environment variable names.
+	const nimApiKey = resolveRequiredNimApiKey(options?.apiKey);
 	const modifiedOptions: SimpleStreamOptions = {
 		...options,
@@ -502,7 +660,7 @@ function nimStreamSimple(
 			if (thinkingConfig) {
 				if (isThinkingEnabled) {
 					// Inject chat_template_kwargs to enable thinking
-					p.chat_template_kwargs = thinkingConfig.enableKwargs;
+					p.chat_template_kwargs = buildThinkingKwargs(thinkingConfig, reasoning);
 				} else if (thinkingConfig.disableKwargs) {
 					// Explicitly disable thinking (some models think by default, e.g. GLM-5/4.7)
 					p.chat_template_kwargs = thinkingConfig.disableKwargs;
@@ -532,7 +690,7 @@ function nimStreamSimple(
 			}
 			// Chain to original onPayload if present
-			options?.onPayload?.(params);
+			return options?.onPayload?.(params, model);
 		},
 	};
@@ -612,7 +770,34 @@ interface NimApiModel {
 	owned_by: string;
 }
-async function fetchNimModels(apiKey: string): Promise<string[]> {
+type NimModelFetchResult =
+	| { ok: true; modelIds: string[] }
+	| { ok: false; reason: "auth" | "transient" | "invalid" | "network" | "other" };
+const NIM_DISCOVERY_CREDENTIAL_WARNING =
+	"NVIDIA NIM model discovery skipped: check your nvidia-nim credentials.";
+function sanitizeNimLogMessage(message: string): string {
+	return message.replace(/nvapi-[A-Za-z0-9._-]+/g, "nvapi-[REDACTED]");
+}
+function notifyNimDiscoveryCredentialWarning(ctx: any): void {
+	ctx?.ui?.notify?.(NIM_DISCOVERY_CREDENTIAL_WARNING, "warning");
+}
+async function resolveNimDiscoveryApiKey(ctx: any): Promise<string | undefined> {
+	try {
+		const apiKey = await ctx?.modelRegistry?.getApiKeyForProvider?.(PROVIDER_NAME);
+		return resolveNimApiKey(apiKey, ctx?.modelRegistry?.authStorage);
+	} catch (error) {
+		const message = error instanceof Error ? error.message : String(error);
+		console.warn(`pi-nvidia-nim: ${sanitizeNimLogMessage(message)}`);
+		notifyNimDiscoveryCredentialWarning(ctx);
+		return undefined;
+	}
+}
+async function fetchNimModels(apiKey: string): Promise<NimModelFetchResult> {
 	try {
 		const response = await fetch(`${NVIDIA_NIM_BASE_URL}/models`, {
 			headers: {
@@ -622,12 +807,25 @@ async function fetchNimModels(apiKey: string): Promise<string[]> {
 			signal: AbortSignal.timeout(10000),
 		});
-		if (!response.ok) return [];
+		if (response.status === 401 || response.status === 403) {
+			return { ok: false, reason: "auth" };
+		}
+		if (response.status === 429 || response.status >= 500) {
+			return { ok: false, reason: "transient" };
+		}
+		if (!response.ok) return { ok: false, reason: "other" };
+		const data = (await response.json()) as { data?: NimApiModel[] };
+		if (!Array.isArray(data.data)) return { ok: false, reason: "invalid" };
-		const data = (await response.json()) as { data: NimApiModel[] };
-		return data.data?.map((m) => m.id) ?? [];
+		return {
+			ok: true,
+			modelIds: data.data.map((m) => m.id).filter((id): id is string => typeof id === "string" && id.length > 0),
+		};
 	} catch {
-		return [];
+		return { ok: false, reason: "network" };
 	}
 }
@@ -636,27 +834,12 @@ async function fetchNimModels(apiKey: string): Promise<string[]> {
 // =============================================================================
 export default function (pi: ExtensionAPI) {
-	// Bail out cleanly if the user hasn't set up an API key yet. Without this
-	// guard, pi's flow can still route requests through this provider's
-	// streamSimple (e.g. when probing available providers, or when the
-	// configured default provider can't be resolved for any reason), which
-	// causes a fatal "NVIDIA_NIM_API_KEY environment variable is not set"
-	// error to be emitted on every `pi` invocation — even when the user
-	// never intended to use NVIDIA NIM. Skipping registration leaves the
-	// extension dormant: no NIM models appear in the picker, no provider is
-	// available to route to, and other providers (OpenAI, Anthropic, custom
-	// OpenAI-compatible endpoints in models.json, etc.) work normally.
-	//
-	// Re-enable by exporting the key:
-	//   export NVIDIA_NIM_API_KEY=nvapi-...
-	if (!process.env[NVIDIA_NIM_API_KEY_ENV]) {
-		// One-line, stderr, only on first load. Not fatal.
-		console.error(
-			`pi-nvidia-nim: ${NVIDIA_NIM_API_KEY_ENV} not set — extension dormant. ` +
-			`Set the key (https://build.nvidia.com) and reload pi to enable NVIDIA NIM models.`
-		);
-		return;
-	}
+	const providerApiKeyConfig = getNimApiKeyEnv() ?? NVIDIA_NIM_API_KEY_ENV;
+	// Always register the curated model list. The request path resolves credentials
+	// through pi first (CLI override, auth.json, shell command), then falls back to
+	// NVIDIA_NIM_API_KEY/NVIDIA_API_KEY. This keeps models available even when pi
+	// was launched by a shell that did not source ~/.bashrc or ~/.zshrc.
 	// Build the curated model list
 	const modelMap = new Map<string, NimModelEntry>();
@@ -672,7 +855,7 @@ export default function (pi: ExtensionAPI) {
 	pi.registerProvider(PROVIDER_NAME, {
 		baseUrl: NVIDIA_NIM_BASE_URL,
-		apiKey: NVIDIA_NIM_API_KEY_ENV,
+		apiKey: providerApiKeyConfig,
 		api: "openai-completions",
 		authHeader: true,
 		models: curatedModels,
@@ -681,11 +864,19 @@ export default function (pi: ExtensionAPI) {
 	// On session start, discover additional models from the API
 	pi.on("session_start", async (_event: any, ctx: any) => {
-		const apiKey = process.env[NVIDIA_NIM_API_KEY_ENV];
-		if (!apiKey) return; // belt-and-suspenders: skip if env was unset after load
+		const apiKey = await resolveNimDiscoveryApiKey(ctx);
+		if (!apiKey) return;
 		// Fetch live model list
-		const liveModelIds = await fetchNimModels(apiKey);
+		const fetchResult = await fetchNimModels(apiKey);
+		if (!fetchResult.ok) {
+			if (fetchResult.reason === "auth") {
+				notifyNimDiscoveryCredentialWarning(ctx);
+			}
+			return;
+		}
+		const liveModelIds = fetchResult.modelIds;
 		if (liveModelIds.length === 0) return;
 		let newModelsAdded = 0;
@@ -706,7 +897,7 @@ export default function (pi: ExtensionAPI) {
 			const allModels = Array.from(modelMap.values());
 			ctx.modelRegistry.registerProvider(PROVIDER_NAME, {
 				baseUrl: NVIDIA_NIM_BASE_URL,
-				apiKey: NVIDIA_NIM_API_KEY_ENV,
+				apiKey: getNimApiKeyEnv() ?? NVIDIA_NIM_API_KEY_ENV,
 				api: "openai-completions",
 				authHeader: true,
 				models: allModels,

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "pi-nvidia-nim",
-  "version": "1.1.18",
+  "version": "1.1.20",
   "description": "NVIDIA NIM API provider extension for pi coding agent — access 100+ models from build.nvidia.com",
   "type": "module",
   "files": [
@@ -19,7 +19,8 @@
   "scripts": {
     "clean": "echo 'nothing to clean'",
     "build": "echo 'nothing to build'",
-    "check": "echo 'nothing to check'"
+    "check": "tsc --noEmit",
+    "test": "node --test test/*.test.mjs"
   },
   "devDependencies": {
     "@mariozechner/pi-coding-agent": "^0.52.9",