npm - pi-llama-cpp - Versions diffs - 0.5.1 → 0.6.0 - Mend

pi-llama-cpp 0.5.1 → 0.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (33) hide show

package/README.md +58 -27
package/package.json +5 -4
package/src/constants.ts +9 -4
package/src/enums/action.ts +3 -2
package/src/enums/mode.ts +1 -0
package/src/enums/status.ts +1 -0
package/src/index.ts +33 -28
package/src/interfaces/auth.ts +1 -5
package/src/interfaces/endpoints/props.ts +1 -0
package/src/managers/command.ts +290 -0
package/src/managers/events.ts +63 -0
package/src/managers/server.ts +71 -0
package/src/models/baseModel.ts +68 -20
package/src/models/legacyModel.ts +45 -0
package/src/models/routerModel.ts +7 -30
package/src/models/singleModel.ts +9 -6
package/src/resolver.ts +123 -0
package/src/server.ts +171 -0
package/tests/commandManager.test.ts +182 -133
package/tests/legacyModel.test.ts +112 -0
package/tests/mocks.ts +97 -0
package/tests/resolver.test.ts +163 -104
package/tests/routerModel.test.ts +46 -68
package/tests/server.test.ts +175 -0
package/tests/serverManager.test.ts +117 -0
package/tests/singleModel.test.ts +21 -29
package/src/commands/models.ts +0 -228
package/src/events.ts +0 -26
package/src/manager.ts +0 -96
package/src/tools/resolver.ts +0 -136
package/src/tools/retriever.ts +0 -71
package/tests/handlers.test.ts +0 -164
package/tests/modelsCommand.test.ts +0 -270

package/README.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # pi-llama-cpp
-A [Pi Coding Agent](https://pi.dev/) extension that integrates with a running [llama.cpp server](https://github.com/ggml-org/llama.cpp) to provide live model browsing, loading, and switching directly from Pi.
+A [Pi Coding Agent](https://pi.dev/) extension that integrates with running [llama.cpp servers](https://github.com/ggml-org/llama.cpp) to provide live model browsing, loading, and switching directly from Pi.
 ## Features
@@ -10,20 +10,25 @@ A [Pi Coding Agent](https://pi.dev/) extension that integrates with a running [l
 - **Multi-model router support** — works with both single-model and multi-model llama.cpp server configurations
 - **Image capabilities detection** — detects multimodal models automatically
 - **Flexible URL resolution** — configures the server URL via project config, environment variable, or global settings
+- **Auth support** — allows to login into a llama.cpp server that was secured with an API key
+- **Multiple server support** — connect to multiple llama.cpp servers simultaneously by separating URLs with semicolons
 ### Status Indicators
-| Icon | Status | Description |
-|------|--------|-------------|
-| 🟢 | Loaded | Model is active and ready to use |
-| 🟡 | Loading | Model is currently being loaded |
-| 🔴 | Failed | Model failed to load |
-| 🔵 | Sleeping | Model is available, but inactive |
-| ⚪ | Unloaded | Model is not loaded on the server |
+| Icon | Status       | Description                            |
+| ---- | ------------ | -------------------------------------- |
+| 🟢   | Loaded       | Model is active and ready to use       |
+| 🟡   | Loading      | Model is currently being loaded        |
+| 🔴   | Failed       | Model failed to load                   |
+| 🔵   | Sleeping     | Model is available, but inactive       |
+| ⚪   | Unloaded     | Model is not loaded on the server      |
+| ⛔   | Unauthorized | Model can't be used (API key required) |
 > **Note**: The `Sleeping` status only shows when you start your server with `llama-server --sleep-idle-seconds <n> ...`.
-This is a **llama.cpp server flag** that tells the server to put idle models to sleep after `n` seconds.
-The model awakens automatically when you send a message.
+> This is a **llama.cpp server flag** that tells the server to put idle models to sleep after `n` seconds.
+> The model awakens automatically when you send a message.
+> **Note:** You can run your server with API authentication with `llama-server --api-key <your key> ...`.
 ## Installation
@@ -41,7 +46,7 @@ pi install https://github.com/gsanhueza/pi-llama-cpp
 ## Configuration
-The extension resolves the llama.cpp server URL using the following priority order:
+The extension resolves the llama.cpp server URL(s) using the following priority order:
 1. **Per-project config** — `.pi/llama-server.json` in your project root:
@@ -63,19 +68,33 @@ The extension resolves the llama.cpp server URL using the following priority ord
 4. **Default** — `http://127.0.0.1:8080`
-### API Key
+### Multiple Servers
+To connect to multiple llama.cpp servers simultaneously, add your URLs as a single string **separated with semicolons** in any of the examples above:
+```bash
+# Example for env, but you can use any of the other methods
+LLAMA_SERVER_URL="http://127.0.0.1:8080;http://127.0.0.1:8081;http://10.0.0.5:8080"
+```
-If your llama.cpp server requires authentication, use `/login` in Pi, select the "API key" option, and choose the `Llama.cpp` provider from the list.
+Each server gets its own provider (e.g., **Llama.cpp (http://127.0.0.1:8080)**) and its own set of models. The `/models` command lists all models from all servers, labeled with their server URL.
+### API Key
-Alternatively, configure the API key in `~/.pi/agent/auth.json` using the provider ID `llama-server`:
+If your llama.cpp server requires authentication, use `/login` in Pi, select the "API key" option, and choose the provider from the list that correlates with the server needing the API key.
-> **Note**: The provider is displayed as **Llama.cpp** in the Pi UI, but its internal identifier is `llama-server` — use this ID when configuring `auth.json` or other programmatic access.
+Alternatively, configure the API key in `~/.pi/agent/auth.json`:
+Use the provider ID `llama-server=<url>`:
 ```json
 {
-  "llama-server": {
+  "llama-server=http://127.0.0.1:8080": {
     "type": "api_key",
-    "key": "<your-api-key-here>"
+    "key": "<key-for-server-1>"
+  },
+  "llama-server=https://some-url-for-llama-cpp": {
+    "type": "api_key",
+    "key": "<key-for-server-2>"
   }
 }
 ```
@@ -98,22 +117,32 @@ llama-server --models-preset path/to/presets.ini ...
 llama-server --model path/to/model.gguf ...
 ```
+- For legacy-model mode (e.g., [ik_llama.cpp](https://github.com/ikawrakow/ik_llama.cpp)), the extension auto-detects and handles it transparently.
+> **Note:** This extension is focused on llama.cpp, not on ik_llama.cpp. Nonetheless, since I found a way to make it work with this extension, I added the option.
+> **Note:** The ik_llama.cpp fork is not legacy at all, but it uses an old way of describing models compared to llama.cpp.
 The extension determines the context size as follows:
 - **Router mode**
   - When loaded, reads `meta.n_ctx` from the `/models` endpoint
   - When not loaded, reads `--ctx-size` and/or `--fit-ctx` from the server arguments, or `ctx-size` and/or `fit-ctx` keys from the **presets.ini** file.
 - **Single mode** — reads `meta.n_ctx` from the `/models` endpoint
+- **Legacy mode** — reads `max_model_len` from `/models`, falling back to `n_ctx` from `/props`
 - Falls back to `128000` if not available
 ### Commands
-| Command          | Description                                                                                |
-| ---------------- | ------------------------------------------------------------------------------------------ |
-| `/models`        | Browse your models with live status. Select a model to load, switch, or unload it.         |
-| `/models info`   | Show detailed information for all available models at once.                                |
-| `/models unload` | Unload all loaded models at once (Note: this only makes sense in router mode).             |
+| Command          | Description                                                                        |
+| ---------------- | ---------------------------------------------------------------------------------- |
+| `/models`        | Browse your models with live status. Select a model to load, switch, or unload it. |
+| `/models info`   | Show detailed information for all available models at once.                        |
+| `/models unload` | Unload all loaded models at once.                                                  |
+> **Note:** When a llama.cpp server is unreachable, `/models` displays an error notification with the configured server URL, but healthy servers continue to show their models.
-> **Note:** When the llama.cpp server is unreachable, `/models` displays an error notification with the configured server URL.
+> **Note:** The `/models unload` command only makes sense in router mode.
 ### Model Actions
@@ -126,7 +155,7 @@ When browsing models via the `/models` command, you can:
 - **Info** — View model details (ID, capabilities, context size)
 - **Cancel** — Cancel the current operation
-> **Note:** In single-model mode, only **Info** and **Cancel** are available, since there is only one model loaded on the server.
+> **Note:** In single-model and legacy-model mode, **Unload** is not available, since there is only one model on the server.
 ### Model Selection Event
@@ -137,6 +166,7 @@ This keeps the server in sync with the active model in Pi, regardless of how the
 ### Loading Models
 When you trigger a load, switch, or retry action, the extension polls the server to track progress. If a model takes longer than **60 seconds** to load, the polling times out with an error.
 > **Note:** The timeout is only for the polling. The model might still be loading.
 ### Model Configuration
@@ -149,6 +179,7 @@ Each model exposed to Pi includes the following defaults:
 ## Dependencies
-| Dependency                        | Purpose                               |
-| --------------------------------- | ------------------------------------- |
-| `@earendil-works/pi-coding-agent` | Pi Coding Agent SDK (peer dependency) |
+| Peer dependency                   | Purpose             |
+| --------------------------------- | ------------------- |
+| `@earendil-works/pi-coding-agent` | Pi Coding Agent SDK |
+| `@earendil-works/pi-tui`          | Pi TUI SDK          |

package/package.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "name": "pi-llama-cpp",
-  "version": "0.5.1",
-  "description": "Pi extension for llama.cpp integration. Supports both router and single modes.",
+  "version": "0.6.0",
+  "description": "Pi extension for llama.cpp integration. Supports router, single and legacy models. Supports multiple servers.",
   "keywords": [
     "pi",
     "pi-package",
@@ -32,11 +32,12 @@
     ]
   },
   "peerDependencies": {
-    "@earendil-works/pi-coding-agent": "*"
+    "@earendil-works/pi-coding-agent": "*",
+    "@earendil-works/pi-tui": "*"
   },
   "devDependencies": {
     "@types/node": "^25.9.1",
     "prettier-plugin-organize-imports": "^4.3.0",
-    "vitest": "^4.1.7"
+    "vitest": "^4.1.8"
   }
 }

package/src/constants.ts CHANGED Viewed

@@ -1,7 +1,7 @@
 /**
- * This provider's id
+ * This provider's base ID
  */
-export const PROVIDER_ID = "llama-server";
+export const PROVIDER_PREFIX = "llama-server";
 /**
  * This provider's name
@@ -9,15 +9,20 @@ export const PROVIDER_ID = "llama-server";
 export const PROVIDER_NAME = "Llama.cpp";
 /**
- * The default URL if the resolver couldn't find it
+ * The default API type used in Pi
  */
-export const DEFAULT_LLAMA_SERVER_URL = "http://127.0.0.1:8080";
+export const API_TYPE = "openai-completions";
 /**
  * The placeholder api-key if it couldn't be resolved
  */
 export const API_KEY_PLACEHOLDER = "sk-placeholder";
+/**
+ * The default URL if the resolver couldn't find it
+ */
+export const DEFAULT_LLAMA_SERVER_URL = "http://127.0.0.1:8080";
 /**
  * The default context if the server didn't expose it
  */

package/src/enums/action.ts CHANGED Viewed

@@ -1,9 +1,10 @@
 /** The possible actions for the /models command */
 export enum Action {
+  LOAD_AND_SWITCH = "Load & switch",
   SWITCH = "Switch model",
-  RETRY = "Retry",
-  LOAD = "Load & switch",
+  LOAD = "Load only",
   UNLOAD = "Unload",
+  RETRY = "Retry",
   INFO = "Info",
   CANCEL = "Cancel",
 }

package/src/enums/mode.ts CHANGED Viewed

@@ -2,4 +2,5 @@
 export enum Mode {
   ROUTER = "router",
   SINGLE = "single",
+  LEGACY = "legacy",
 }

package/src/enums/status.ts CHANGED Viewed

@@ -5,4 +5,5 @@ export enum Status {
   FAILED = "failed",
   SLEEPING = "sleeping",
   UNLOADED = "unloaded",
+  UNAUTHORIZED = "unauthorized",
 }

package/src/index.ts CHANGED Viewed

@@ -1,42 +1,47 @@
 import type {
   ExtensionAPI,
   ExtensionCommandContext,
+  ExtensionContext,
+  SessionBeforeSwitchEvent,
 } from "@earendil-works/pi-coding-agent";
-import type { AutocompleteItem } from "@earendil-works/pi-tui";
-import { onSessionBeforeSwitch } from "./commands/models";
 import { PROVIDER_NAME } from "./constants";
-import { onModelSelect } from "./events";
-import { CommandManager } from "./manager";
+import { ModelSelectEvent } from "./interfaces/events";
+import { CommandManager } from "./managers/command";
+import { EventManager } from "./managers/events";
+import { ServerManager } from "./managers/server";
+import { ConfigResolver } from "./resolver";
+import { Server } from "./server";
 export default async function (pi: ExtensionAPI) {
-  const manager = new CommandManager(pi);
-  await manager.initialize();
+  const resolver = new ConfigResolver();
+  const urls = await resolver.resolveUrls(process.cwd());
+  const servers = urls.map((url) => new Server(url));
-  // Command: /models
+  const eventManager = new EventManager(servers);
+  const serverManager = new ServerManager(servers);
+  const commandManager = new CommandManager(serverManager);
+  // Register providers once at startup
+  await serverManager.registerAllProviders(pi);
+  // Single global /models command
   pi.registerCommand("models", {
     description: `Browse ${PROVIDER_NAME} models`,
-    getArgumentCompletions: (prefix: string): AutocompleteItem[] | null => {
-      const available = [
-        {
-          value: "info",
-          label: "info",
-          description: "Show information of all models",
-        },
-        {
-          value: "unload",
-          label: "unload",
-          description: "Unload all models",
-        },
-      ];
-      const filtered = available.filter((a) => a.value.startsWith(prefix));
-      return filtered.length > 0 ? filtered : null;
+    getArgumentCompletions: commandManager.getArgumentCompletions,
+    handler: async (args: string, ctx: ExtensionCommandContext) => {
+      await commandManager.handleCommand(args, ctx, pi);
     },
-    handler: async (args: string, ctx: ExtensionCommandContext) =>
-      await manager.run(args, ctx),
   });
-  // Events registration
-  pi.on("model_select", onModelSelect);
-  pi.on("session_before_switch", onSessionBeforeSwitch);
+  // Events
+  pi.on(
+    "model_select",
+    async (event: ModelSelectEvent, ctx: ExtensionContext) =>
+      await eventManager.onModelSelect(event, ctx),
+  );
+  pi.on(
+    "session_before_switch",
+    async (_: SessionBeforeSwitchEvent, ctx: ExtensionContext) =>
+      await eventManager.onSessionBeforeSwitch(ctx),
+  );
 }

package/src/interfaces/auth.ts CHANGED Viewed

@@ -1,10 +1,6 @@
-import { PROVIDER_ID } from "../constants";
 interface Auth {
   type: string;
   key: string;
 }
-export interface AuthFile {
-  [PROVIDER_ID]: Auth;
-}
+export type AuthFile = Record<string, Auth>;

package/src/interfaces/endpoints/props.ts CHANGED Viewed

@@ -2,6 +2,7 @@
  * The structure of llama-server's /props endpoint
  */
 export interface PropsEndpoint {
+  role?: "router";
   error?: PropsError;
   default_generation_settings: Record<string, any>;
   total_slots: number;

package/src/managers/command.ts ADDED Viewed

@@ -0,0 +1,290 @@
+import type {
+  ExtensionAPI,
+  ExtensionCommandContext,
+} from "@earendil-works/pi-coding-agent";
+import { AutocompleteItem } from "@earendil-works/pi-tui";
+import { PROVIDER_NAME } from "../constants";
+import { Action } from "../enums/action";
+import { Mode } from "../enums/mode";
+import { Status } from "../enums/status";
+import { BaseModel } from "../models/baseModel";
+import { EventManager } from "./events";
+import { ServerManager } from "./server";
+export class CommandManager {
+  constructor(private readonly serverManager: ServerManager) {}
+  /**
+   * Sets up the argument completions for the `/models` command
+   *
+   * @param prefix Prefix written by the user
+   * @returns Completions with that prefix
+   */
+  getArgumentCompletions(prefix: string): AutocompleteItem[] | null {
+    const available = [
+      {
+        value: "info",
+        label: "info",
+        description: "Show information of all models",
+      },
+      {
+        value: "unload",
+        label: "unload",
+        description: "Unload all models",
+      },
+    ];
+    const filtered = available.filter((a) => a.value.startsWith(prefix));
+    return filtered.length > 0 ? filtered : null;
+  }
+  /**
+   * Executes the action for the `/models` command
+   *
+   * @param args Arguments of the command
+   * @param ctx The context used by Pi
+   * @param pi The Pi extension
+   */
+  async handleCommand(
+    args: string,
+    ctx: ExtensionCommandContext,
+    pi: ExtensionAPI,
+  ) {
+    // Re-register providers so Pi sees updated model states
+    await this.serverManager.registerAllProviders(pi);
+    // Notify about unreachable servers
+    for (const url of this.serverManager.failedUrls) {
+      this.notifyNotFound(ctx, url);
+    }
+    if (args === "unload") {
+      await Promise.all(
+        this.serverManager.getAllModels().map((model) => model.unload()),
+      );
+      ctx.ui.notify(`Unloaded all ${PROVIDER_NAME} models`, "info");
+      return;
+    }
+    if (args === "info") {
+      const infos = await Promise.all(
+        this.serverManager.getAllModels().map((model) => model.getInfo()),
+      );
+      ctx.ui.notify(ctx.ui.theme.fg("accent", infos.join("\n")), "info");
+      return;
+    }
+    // Interactive menu: show <name> (<server_url>)
+    await this.runModelsMenu(ctx, pi);
+  }
+  /**
+   * Notifies the user that a server is unreachable.
+   */
+  private notifyNotFound(ctx: ExtensionCommandContext, url: string): void {
+    ctx.ui.notify(`${PROVIDER_NAME} unreachable at ${url}`, "error");
+  }
+  /**
+   * Runs the interactive model selection menu.
+   */
+  private async runModelsMenu(
+    ctx: ExtensionCommandContext,
+    pi: ExtensionAPI,
+  ): Promise<void> {
+    const event = await this.modelSelectionHandler(
+      ctx,
+      this.serverManager.getAllModels(),
+    );
+    if (!event) return;
+    const { action, model } = event;
+    // Action: Cancel
+    if (!action || action === Action.CANCEL) return;
+    // Action: Info
+    if (action === Action.INFO) {
+      const info = await model.getInfo();
+      ctx.ui.notify(`${info}`, "info");
+      return;
+    }
+    // Action: Unload
+    if (action === Action.UNLOAD) {
+      await model.unload();
+      ctx.ui.notify(`Unloaded ${model.name}`, "info");
+      return;
+    }
+    // Action: Switch
+    if (action === Action.SWITCH) {
+      const { serverId } = model;
+      const piModel = ctx.modelRegistry.find(serverId, model.id);
+      if (!piModel)
+        throw new Error(`Cannot find model ${model.name} in pi registry`);
+      await pi.setModel(piModel);
+      ctx.ui.notify(`Model ${model.name} ready`, "info");
+      return;
+    }
+    // Actions: Load / Load & Switch / Retry
+    const loadActions = [Action.LOAD, Action.LOAD_AND_SWITCH, Action.RETRY];
+    if (loadActions.includes(action)) {
+      ctx.ui.notify(`Loading ${model.name}...`, "info");
+      EventManager.inflightModel = model;
+      const onSuccess = async () => {
+        const { serverId } = model;
+        const piModel = ctx.modelRegistry.find(serverId, model.id);
+        if (!piModel)
+          throw new Error(`Cannot find model ${model.name} in pi registry`);
+        // Verify auth
+        if ((await model.getStatus()) === Status.UNAUTHORIZED)
+          throw new Error(
+            `Unauthorized for ${model.name}. Use /login and add your API key.`,
+          );
+        // Verify failure
+        if ((await model.getStatus()) === Status.FAILED)
+          throw new Error(`Failed to load model ${model.name}`);
+        // Select the model if asked
+        if (action === Action.LOAD_AND_SWITCH) await pi.setModel(piModel);
+        ctx.ui.notify(`Model ${model.name} ready`, "info");
+      };
+      const onFailure = (err: any) => {
+        const message = err instanceof Error ? err.message : String(err);
+        try {
+          ctx.ui.notify(message, "error");
+        } catch {
+          // ctx went stale between error and notification
+        }
+      };
+      // Load the model without blocking the UI
+      model
+        .load()
+        .then(onSuccess)
+        .catch(onFailure)
+        .finally(EventManager.resetInflightModel);
+    }
+  }
+  /**
+   * Handles the menu for model selection.
+   * Loops: select model → select action → handle action.
+   *
+   * Escape on actions menu goes back to model selection.
+   * Escape on model selection exits.
+   *
+   * @returns The selected action and model
+   */
+  private async modelSelectionHandler(
+    ctx: ExtensionCommandContext,
+    models: BaseModel[],
+  ): Promise<{ action: Action; model: BaseModel } | null> {
+    while (true) {
+      // Select the model
+      const model = await this.selectModel(ctx, models);
+      if (!model) return null;
+      // Select the action
+      const actions = await this.getActionsForModel(model);
+      const action = await this.selectAction(ctx, model, actions);
+      if (action === null) {
+        // Escape key pressed => back to model selection
+        continue;
+      }
+      // Return the selected action and model
+      return { action, model };
+    }
+  }
+  /**
+   * Select a model from the list. Returns null if user cancels.
+   *
+   * @returns The model selected by the user
+   */
+  private async selectModel(
+    ctx: ExtensionCommandContext,
+    models: BaseModel[],
+  ): Promise<BaseModel | null> {
+    const labels = await Promise.all(
+      models.map(async (model) => ({
+        label: (await model.getLabel()).trim(),
+        serverUrl: model.serverUrl,
+      })),
+    );
+    // Count grapheme clusters (not UTF-16 code units) so emoji padding aligns visually
+    const graphemeLength = (str: string) =>
+      [...new Intl.Segmenter().segment(str)].length;
+    // Decorate the label so the spacing makes it seem more like a table
+    const maxLength = Math.max(
+      ...labels.map(({ label }) => graphemeLength(label)),
+    );
+    const choices = labels.map(({ label, serverUrl }) => {
+      const extraPadding = 2;
+      const padLen = maxLength - graphemeLength(label) + extraPadding;
+      return `${label}${" ".repeat(padLen)} [Server: ${serverUrl}]`;
+    });
+    const choice = await ctx.ui.select(`${PROVIDER_NAME} models:`, choices);
+    if (!choice) return null;
+    const idx = choices.indexOf(choice);
+    return models[idx];
+  }
+  /**
+   * Get available actions for a model based on its mode and status.
+   *
+   * @returns A mapping of actions for each status
+   */
+  private async getActionsForModel(model: BaseModel): Promise<Array<Action>> {
+    const allActions: Record<Status, Array<Action>> = {
+      [Status.LOADED]:
+        model.mode === Mode.ROUTER
+          ? [Action.SWITCH, Action.UNLOAD, Action.INFO, Action.CANCEL]
+          : [Action.SWITCH, Action.INFO, Action.CANCEL],
+      [Status.LOADING]: [Action.INFO, Action.CANCEL],
+      [Status.FAILED]: [Action.RETRY, Action.CANCEL],
+      [Status.SLEEPING]: [
+        Action.SWITCH,
+        Action.UNLOAD,
+        Action.INFO,
+        Action.CANCEL,
+      ],
+      [Status.UNLOADED]: [Action.LOAD_AND_SWITCH, Action.LOAD, Action.CANCEL],
+      [Status.UNAUTHORIZED]: [Action.INFO, Action.CANCEL],
+    };
+    const status = await model.getStatus();
+    return allActions[status];
+  }
+  /**
+   * Selects an action for a model.
+   *
+   * @returns The selected action
+   */
+  private async selectAction(
+    ctx: ExtensionCommandContext,
+    model: BaseModel,
+    actions: Array<Action>,
+  ): Promise<Action | null> {
+    const labels = actions.map((a) => String(a));
+    const choice = await ctx.ui.select(`${model.name}`, labels);
+    if (!choice) return null;
+    const idx = labels.indexOf(choice);
+    return actions[idx];
+  }
+}