pi-llama-cpp 0.2.1 → 0.2.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -18,9 +18,13 @@ A [Pi Coding Agent](https://pi.dev/) extension that integrates with a running [l
18
18
  | 🟢 | Loaded | Model is active and ready to use |
19
19
  | 🟡 | Loading | Model is currently being loaded |
20
20
  | 🔴 | Failed | Model failed to load |
21
- | 🔵 | Sleeping | Model is loaded but inactive (router mode) |
21
+ | 🔵 | Sleeping | Model is available, but inactive |
22
22
  | ⚪ | Unloaded | Model is not loaded on the server |
23
23
 
24
+ > **Note**: The `Sleeping` status only shows when you start your server with `llama-server --sleep-idle-seconds <n> ...`.
25
+ This is a **llama.cpp server flag** that tells the server to put idle models to sleep after `n` seconds.
26
+ The model awakens automatically when you send a message.
27
+
24
28
  ## Installation
25
29
 
26
30
  This package is a Pi extension. Install it with
@@ -65,6 +69,8 @@ If your llama.cpp server requires authentication, use `/login` in Pi, select the
65
69
 
66
70
  Alternatively, configure the API key in `~/.pi/agent/auth.json` using the provider ID `llama-server`:
67
71
 
72
+ > **Note**: The provider is displayed as **Llama.cpp** in the Pi UI, but its internal identifier is `llama-server` — use this ID when configuring `auth.json` or other programmatic access.
73
+
68
74
  ```json
69
75
  {
70
76
  "llama-server": {
@@ -86,21 +92,24 @@ Make sure your llama.cpp server is running with the appropriate flags.
86
92
  llama-server --models-preset path/to/presets.ini ...
87
93
  ```
88
94
 
89
- The extension reads the context size from the preset file using the `ctx-size` and/or `fit-ctx` keys.
90
-
91
95
  - For single-model mode, start the server with:
92
96
 
93
97
  ```bash
94
- llama-server --model path/to/model.gguf --ctx-size 128000 ...
98
+ llama-server --model path/to/model.gguf ...
95
99
  ```
96
100
 
101
+ The extension determines the context size as follows:
102
+ - **Router mode** — reads from the preset file's `ctx-size` and/or `fit-ctx` keys
103
+ - **Single mode** — reads from the `/slots` endpoint (stores it in cache afterwards)
104
+ - Falls back to `128000` if not available
105
+
97
106
  ### Commands
98
107
 
99
108
  | Command | Description |
100
109
  | --------- | ------------------------------------------------------------------------------------------ |
101
110
  | `/models` | Browse your models with live status. Select a model to load, switch, or unload it. |
102
111
 
103
- > **Note:** When the llama.cpp server is unreachable, `/models` is still available but displays an error notification with the configured server URL.
112
+ > **Note:** When the llama.cpp server is unreachable, `/models` is still available but shows the description `Llama.cpp models (offline)` and displays an error notification with the configured server URL.
104
113
 
105
114
  ### Model Actions
106
115
 
@@ -117,13 +126,20 @@ When browsing models via the `/models` command, you can:
117
126
 
118
127
  ### Model Selection Event
119
128
 
120
- When Pi switches models (via `model_select`), the extension automatically loads the selected model on the llama.cpp server. This keeps the server in sync with the active model in Pi.
129
+ When you switch models via Pi's model picker (instead of using the `/models` command), the extension listens for the `model_select` event, which also loads the requested model before the conversation begins.
130
+
131
+ This keeps the server in sync with the active model in Pi, regardless of how the switch was initiated — you don't need to manually load models before using them.
132
+
133
+ ### Loading Models
134
+
135
+ When you trigger a load, switch, or retry action, the extension polls the server to track progress. If a model takes longer than **60 seconds** to load, the polling times out with an error.
136
+ > **Note:** The timeout is only for the polling. The model might still be loading.
121
137
 
122
138
  ### Model Configuration
123
139
 
124
140
  Each model exposed to Pi includes the following defaults:
125
141
 
126
- - **`maxTokens`** — `16384` (maximum tokens per response)
142
+ - **`maxTokens`** — `32000` (maximum possible tokens per response according to Pi's source code)
127
143
  - **`reasoning`** — `true` (assumed, as llama.cpp's `/models` endpoint does not expose it)
128
144
  - **`cost`** — all zero (local model)
129
145
 
package/package.json CHANGED
@@ -1,7 +1,7 @@
1
1
  {
2
2
  "name": "pi-llama-cpp",
3
- "version": "0.2.1",
4
- "description": "Pi extension for llama.cpp integration. Supports both router and single modes",
3
+ "version": "0.2.3",
4
+ "description": "Pi extension for llama.cpp integration. Supports both router and single modes.",
5
5
  "keywords": [
6
6
  "pi",
7
7
  "pi-package",
@@ -23,6 +23,10 @@
23
23
  "./src/index.ts"
24
24
  ]
25
25
  },
26
+ "scripts": {
27
+ "test": "vitest",
28
+ "test:run": "vitest run"
29
+ },
26
30
  "prettier": {
27
31
  "plugins": [
28
32
  "prettier-plugin-organize-imports"
@@ -33,6 +37,7 @@
33
37
  },
34
38
  "devDependencies": {
35
39
  "@types/node": "^25.6.0",
36
- "prettier-plugin-organize-imports": "^4.3.0"
40
+ "prettier-plugin-organize-imports": "^4.3.0",
41
+ "vitest": "^4.1.5"
37
42
  }
38
43
  }
package/src/constants.ts CHANGED
@@ -13,6 +13,11 @@ export const PROVIDER_NAME = "Llama.cpp";
13
13
  */
14
14
  export const DEFAULT_LLAMA_SERVER_URL = "http://127.0.0.1:8080";
15
15
 
16
+ /**
17
+ * The placeholder api-key if it couldn't be resolved
18
+ */
19
+ export const API_KEY_PLACEHOLDER = "sk-placeholder";
20
+
16
21
  /**
17
22
  * The default context if the server didn't expose it
18
23
  */
@@ -21,7 +26,7 @@ export const DEFAULT_CTX = 128000;
21
26
  /**
22
27
  * Maximum number of tokens a model can generate in a single response
23
28
  */
24
- export const MAX_TOKENS = 16384;
29
+ export const MAX_TOKENS = 32000;
25
30
 
26
31
  /**
27
32
  * Polling interval (ms) for checking model load status
package/src/events.ts CHANGED
@@ -1,6 +1,6 @@
1
1
  import { ExtensionContext } from "@mariozechner/pi-coding-agent";
2
- import { PROVIDER_NAME } from "./constants";
3
- import { ModelSelectEvent } from "./interfaces/IModelSelectEvent";
2
+ import { PROVIDER_ID } from "./constants";
3
+ import { ModelSelectEvent } from "./interfaces/events";
4
4
  import { listModels } from "./tools/retriever";
5
5
 
6
6
  /**
@@ -12,7 +12,7 @@ export const onModelSelect = async (
12
12
  event: ModelSelectEvent,
13
13
  ctx: ExtensionContext,
14
14
  ) => {
15
- if (event.model.provider !== PROVIDER_NAME) return;
15
+ if (event.model.provider !== PROVIDER_ID) return;
16
16
 
17
17
  const models = await listModels();
18
18
  const model = models.find((m) => m.id === event.model.id);
package/src/handlers.ts CHANGED
@@ -35,7 +35,7 @@ const selectModel = async (
35
35
  const getActionsForModel = async (model: BaseModel): Promise<Array<Action>> => {
36
36
  const routerModeActions: Record<Status, Array<Action>> = {
37
37
  [Status.LOADED]: [Action.SWITCH, Action.UNLOAD, Action.INFO, Action.CANCEL],
38
- [Status.LOADING]: [Action.CANCEL],
38
+ [Status.LOADING]: [Action.INFO, Action.CANCEL],
39
39
  [Status.FAILED]: [Action.RETRY, Action.CANCEL],
40
40
  [Status.SLEEPING]: [Action.UNLOAD, Action.INFO, Action.CANCEL],
41
41
  [Status.UNLOADED]: [Action.LOAD, Action.CANCEL],
@@ -45,7 +45,7 @@ const getActionsForModel = async (model: BaseModel): Promise<Array<Action>> => {
45
45
  [Status.LOADED]: [Action.INFO, Action.CANCEL],
46
46
  [Status.LOADING]: [Action.CANCEL],
47
47
  [Status.FAILED]: [Action.CANCEL],
48
- [Status.SLEEPING]: [Action.CANCEL],
48
+ [Status.SLEEPING]: [Action.INFO, Action.CANCEL],
49
49
  [Status.UNLOADED]: [Action.CANCEL],
50
50
  };
51
51
 
@@ -0,0 +1,10 @@
1
+ import { PROVIDER_ID } from "../constants";
2
+
3
+ interface Auth {
4
+ type: string;
5
+ key: string;
6
+ }
7
+
8
+ export interface AuthFile {
9
+ [PROVIDER_ID]: Auth;
10
+ }
@@ -0,0 +1,6 @@
1
+ /**
2
+ * The structure of llama-server's /health endpoint
3
+ */
4
+ export interface HealthEndpoint {
5
+ status: "ok";
6
+ }
@@ -0,0 +1,60 @@
1
+ /**
2
+ * The structure of llama-server's /models endpoint
3
+ *
4
+ * In single mode, the `models` property is not returned
5
+ * In router mode, everything is used
6
+ */
7
+ export interface ModelsEndpoint {
8
+ models?: ModelProperty[];
9
+ object: string;
10
+ data: DataProperty[];
11
+ }
12
+
13
+ export interface ModelProperty {
14
+ name: string;
15
+ model: string;
16
+ modified_at: string;
17
+ size: string;
18
+ digest: string;
19
+ type: string;
20
+ description: string;
21
+ tags: string[];
22
+ capabilities: string[];
23
+ parameters: string;
24
+ details: {
25
+ parent_model: string;
26
+ format: string;
27
+ family: string;
28
+ families: string[];
29
+ parameter_size: string;
30
+ quantization_level: string;
31
+ };
32
+ }
33
+
34
+ export interface DataProperty {
35
+ id: string;
36
+ aliases?: string[];
37
+ tags: string[];
38
+ object: string;
39
+ owned_by: string;
40
+ created: number;
41
+ status?: StatusProperty;
42
+ meta?: MetaProperty;
43
+ }
44
+
45
+ interface StatusProperty {
46
+ value: string;
47
+ args: string[];
48
+ preset: string;
49
+ exit_code?: number;
50
+ failed?: boolean;
51
+ }
52
+
53
+ interface MetaProperty {
54
+ vocab_type: number;
55
+ n_vocab: number;
56
+ n_ctx_train: number;
57
+ n_embd: number;
58
+ n_params: number;
59
+ size: number;
60
+ }
@@ -0,0 +1,29 @@
1
+
2
+ /**
3
+ * The structure of llama-server's /props endpoint
4
+ *
5
+ * In single mode, applies to /props
6
+ * In router mode, applies to /props?model=<id>
7
+ */
8
+ export interface PropsEndpoint {
9
+ default_generation_settings: Record<string, any>;
10
+ total_slots: number;
11
+ model_alias: string;
12
+ model_path: string;
13
+ modalities: {
14
+ vision: boolean;
15
+ audio: boolean;
16
+ };
17
+ media_marker: string;
18
+ endpoint_slots: boolean;
19
+ endpoint_props: boolean;
20
+ endpoint_metrics: boolean;
21
+ webui: boolean;
22
+ webui_settings: Record<string, any>;
23
+ chat_template: string;
24
+ chat_template_caps: Record<string, boolean>;
25
+ bos_token: string;
26
+ eos_token: string;
27
+ build_info: string;
28
+ is_sleeping: boolean;
29
+ }
@@ -0,0 +1,15 @@
1
+ /**
2
+ * The structure of llama-server's /slots endpoint
3
+ *
4
+ * In single mode, applies to /slots
5
+ * In router mode, applies to /slots?model=<id>
6
+ */
7
+ export interface SlotsEndpoint {
8
+ id: number;
9
+ n_ctx: number;
10
+ speculative: boolean;
11
+ is_processing: boolean;
12
+ id_task?: number;
13
+ params?: Array<Record<string, any>>;
14
+ next_token?: Array<Record<string, any>>;
15
+ }
@@ -2,9 +2,12 @@ import type { ProviderModelConfig } from "@mariozechner/pi-coding-agent";
2
2
  import { MAX_TOKENS, POLLING_INTERVAL, POLLING_TIMEOUT } from "../constants";
3
3
  import { Mode } from "../enums/mode";
4
4
  import { Status } from "../enums/status";
5
+ import { DataProperty } from "../interfaces/endpoints/models";
5
6
  import { rpc } from "../tools/retriever";
6
7
 
7
8
  export abstract class BaseModel {
9
+ constructor(protected readonly model: DataProperty) {}
10
+
8
11
  protected readonly statusMapper: Record<string, Status> = {
9
12
  loaded: Status.LOADED,
10
13
  loading: Status.LOADING,
@@ -23,9 +26,13 @@ export abstract class BaseModel {
23
26
 
24
27
  abstract get mode(): Mode;
25
28
 
26
- abstract get id(): string;
29
+ get id(): string {
30
+ return this.model.id;
31
+ }
27
32
 
28
- abstract get name(): string;
33
+ get name(): string {
34
+ return this.model.aliases?.[0] || this.model.id;
35
+ }
29
36
 
30
37
  get reasoning(): boolean {
31
38
  // We don't have a way to detect this, so we'll fallback to true
@@ -67,6 +74,7 @@ export abstract class BaseModel {
67
74
  `Reasoning : ${this.reasoning}`,
68
75
  `Capabilities : ${this.capabilities.join(", ")}`,
69
76
  `Context size : ${await this.getContextSize()}`,
77
+ `Status : ${await this.getStatus()}`,
70
78
  ];
71
79
 
72
80
  const response = `${messages.join("\n")}\n`;
@@ -104,31 +112,26 @@ export abstract class BaseModel {
104
112
  /**
105
113
  * Unloads the model from llama-server
106
114
  */
107
-
108
115
  async unload(): Promise<void> {
109
116
  await rpc("/models/unload", { model: this.id });
110
117
  }
111
118
 
112
119
  /**
113
120
  * Polls llama-server to check when the model is loaded
121
+ *
122
+ * @param startTime The initial polling timestamp
114
123
  */
115
- async pollStatus(): Promise<void> {
116
- const startTime = Date.now();
117
-
118
- // Check loading status
119
- try {
120
- while ((await this.getStatus()) === Status.LOADING) {
121
- // Force a timeout if we wasted too much time polling
122
- if (Date.now() - startTime > POLLING_TIMEOUT) {
123
- const message = `Model loading timed out after ${POLLING_TIMEOUT} ms: ${this.id}`;
124
- throw new Error(message);
125
- }
126
-
127
- await new Promise((r) => setTimeout(r, POLLING_INTERVAL));
128
- }
129
- } catch (err) {
130
- const message = err instanceof Error ? err.message : String(err);
124
+ async pollStatus(startTime = Date.now()): Promise<void> {
125
+ const status = await this.getStatus();
126
+ if (status !== Status.LOADING) return;
127
+
128
+ // Force a timeout if we wasted too much time polling
129
+ if (Date.now() - startTime > POLLING_TIMEOUT) {
130
+ const message = `Model loading timed out after ${POLLING_TIMEOUT} ms: ${this.id}`;
131
131
  throw new Error(message);
132
132
  }
133
+
134
+ await new Promise((r) => setTimeout(r, POLLING_INTERVAL));
135
+ await this.pollStatus(startTime);
133
136
  }
134
137
  }
@@ -1,40 +1,32 @@
1
1
  import { DEFAULT_CTX } from "../constants";
2
2
  import { Mode } from "../enums/mode";
3
3
  import { Status } from "../enums/status";
4
- import { IRouterModel } from "../interfaces/IRouterModel";
4
+ import { DataProperty, ModelsEndpoint } from "../interfaces/endpoints/models";
5
5
  import { rpc } from "../tools/retriever";
6
6
  import { BaseModel } from "./baseModel";
7
7
 
8
8
  export class RouterModel extends BaseModel {
9
- constructor(private readonly model: IRouterModel) {
10
- super();
9
+ constructor(protected readonly model: DataProperty) {
10
+ super(model);
11
11
  }
12
12
 
13
13
  get mode(): Mode {
14
14
  return Mode.ROUTER;
15
15
  }
16
16
 
17
- get id(): string {
18
- return this.model.id;
19
- }
20
-
21
- get name(): string {
22
- return this.model.aliases?.[0] || this.model.id;
23
- }
24
-
25
17
  get capabilities(): ["text"] | ["image"] {
26
- const hasImage = this.model.status.args?.includes("--mmproj") ?? false;
18
+ const hasImage = this.model.status?.args?.includes("--mmproj") ?? false;
27
19
  return hasImage ? ["image"] : ["text"];
28
20
  }
29
21
 
30
22
  async getStatus(): Promise<Status> {
31
- const { data } = await rpc<{ data: IRouterModel[] }>("/models");
23
+ const { data } = await rpc<ModelsEndpoint>("/models");
32
24
  const model = data.find((m) => m.id === this.id);
33
25
  if (!model) return Status.FAILED;
34
26
 
35
- const status = this.statusMapper[model.status.value];
27
+ const status = this.statusMapper[model.status!.value];
36
28
  if (status === Status.UNLOADED) {
37
- if (this.model.status.failed) return Status.FAILED;
29
+ if (this.model.status!.failed) return Status.FAILED;
38
30
 
39
31
  return Status.UNLOADED;
40
32
  }
@@ -58,7 +50,7 @@ export class RouterModel extends BaseModel {
58
50
  * @returns The value
59
51
  */
60
52
  private extractFrom(arg: string): number | null {
61
- const args = this.model.status.args;
53
+ const args = this.model.status!.args;
62
54
  if (!args) return null;
63
55
 
64
56
  const ctxIdx = args.indexOf(arg);
@@ -1,41 +1,50 @@
1
1
  import { DEFAULT_CTX } from "../constants";
2
2
  import { Mode } from "../enums/mode";
3
3
  import { Status } from "../enums/status";
4
- import { ISingleModel } from "../interfaces/ISingleModel";
4
+ import { DataProperty, ModelProperty } from "../interfaces/endpoints/models";
5
+ import { PropsEndpoint } from "../interfaces/endpoints/props";
6
+ import { SlotsEndpoint } from "../interfaces/endpoints/slots";
5
7
  import { rpc } from "../tools/retriever";
6
8
  import { BaseModel } from "./baseModel";
7
9
 
8
10
  export class SingleModel extends BaseModel {
9
- constructor(private readonly model: ISingleModel) {
10
- super();
11
+ private contextSize?: number;
12
+
13
+ constructor(
14
+ protected readonly model: DataProperty,
15
+ private readonly extra: ModelProperty,
16
+ ) {
17
+ super(model);
11
18
  }
12
19
 
13
20
  get mode(): Mode {
14
21
  return Mode.SINGLE;
15
22
  }
16
23
 
17
- get id(): string {
18
- return this.model.name;
19
- }
20
-
21
- get name(): string {
22
- return this.model.name;
23
- }
24
-
25
24
  get capabilities(): ["text"] | ["image"] {
26
- const hasImage = this.model.capabilities.includes("multimodal");
25
+ const hasImage = this.extra.capabilities.includes("multimodal");
27
26
  return hasImage ? ["image"] : ["text"];
28
27
  }
29
28
 
30
29
  async getStatus(): Promise<Status> {
31
30
  // In single-mode, the extension will only work when the model is fully loaded
31
+ const { is_sleeping } = await rpc<PropsEndpoint>("/props");
32
+ if (is_sleeping) return Status.SLEEPING;
33
+
32
34
  return Status.LOADED;
33
35
  }
34
36
 
35
37
  async getContextSize(): Promise<number> {
36
- const slots = await rpc<{ n_ctx: number }[]>("/slots");
37
- const [{ n_ctx }] = slots;
38
+ // Avoid calling the endpoint if we already have the value
39
+ if (this.contextSize) return this.contextSize;
40
+
41
+ try {
42
+ const [{ n_ctx }] = await rpc<SlotsEndpoint[]>("/slots");
43
+ this.contextSize = n_ctx;
38
44
 
39
- return n_ctx ?? DEFAULT_CTX;
45
+ return this.contextSize;
46
+ } catch {
47
+ return DEFAULT_CTX;
48
+ }
40
49
  }
41
50
  }
@@ -1,7 +1,11 @@
1
1
  import { access, constants, readFile } from "node:fs/promises";
2
2
  import { join } from "node:path";
3
- import { DEFAULT_LLAMA_SERVER_URL, PROVIDER_ID } from "../constants";
4
- import { IAuth, IAuthFile } from "../interfaces/IAuthFile";
3
+ import {
4
+ API_KEY_PLACEHOLDER,
5
+ DEFAULT_LLAMA_SERVER_URL,
6
+ PROVIDER_ID,
7
+ } from "../constants";
8
+ import { AuthFile } from "../interfaces/auth";
5
9
 
6
10
  // The URL is detected once, to reuse forever
7
11
  let resolvedUrl: string | undefined;
@@ -42,12 +46,12 @@ const readContents = async <T>(filePath: string): Promise<T | null> => {
42
46
  * @param key Key to extract from the parsed JSON
43
47
  * @returns The string value, or null if file/key missing or invalid
44
48
  */
45
- const readConfigValue = async <T, U>(
49
+ const readConfigValue = async <T>(
46
50
  filePath: string,
47
- key: string,
48
- ): Promise<U> => {
51
+ key: keyof T,
52
+ ): Promise<T[keyof T] | null> => {
49
53
  const cfg = await readContents<T>(filePath);
50
- return (cfg as Record<string, any>)?.[key] || null;
54
+ return cfg?.[key] ?? null;
51
55
  };
52
56
 
53
57
  /**
@@ -55,16 +59,11 @@ const readConfigValue = async <T, U>(
55
59
  * @returns The API key, as defined by the auth.json file
56
60
  */
57
61
  export const resolveApiKey = async (): Promise<string> => {
58
- const placeholder = "sk-placeholder";
59
-
60
62
  const authPath = join(process.env.HOME || ".", ".pi", "agent", "auth.json");
61
- if (!(await fileExists(authPath))) return placeholder;
63
+ if (!(await fileExists(authPath))) return API_KEY_PLACEHOLDER;
62
64
 
63
- const cfg = await readConfigValue<IAuthFile, IAuth | null>(
64
- authPath,
65
- PROVIDER_ID,
66
- );
67
- return cfg?.key ?? placeholder;
65
+ const cfg = await readConfigValue<AuthFile>(authPath, PROVIDER_ID);
66
+ return cfg?.key ?? API_KEY_PLACEHOLDER;
68
67
  };
69
68
 
70
69
  /**
@@ -81,10 +80,7 @@ const resolveGlobalUrl = async (): Promise<string | null> => {
81
80
 
82
81
  if (!(await fileExists(globalPath))) return null;
83
82
 
84
- return readConfigValue<Record<string, string>, string>(
85
- globalPath,
86
- "llamaServerUrl",
87
- );
83
+ return readConfigValue<Record<string, string>>(globalPath, "llamaServerUrl");
88
84
  };
89
85
 
90
86
  /**
@@ -96,7 +92,7 @@ const resolveProjectUrl = async (cwd: string): Promise<string | null> => {
96
92
  const projectPath = join(cwd, ".pi", "llama-server.json");
97
93
 
98
94
  if (!(await fileExists(projectPath))) return null;
99
- return readConfigValue<Record<string, string>, string>(projectPath, "url");
95
+ return readConfigValue<Record<string, string>>(projectPath, "url");
100
96
  };
101
97
 
102
98
  /**
@@ -1,5 +1,5 @@
1
- import { IRouterModel } from "../interfaces/IRouterModel";
2
- import { ISingleModel } from "../interfaces/ISingleModel";
1
+ import { HealthEndpoint } from "../interfaces/endpoints/health";
2
+ import { ModelsEndpoint } from "../interfaces/endpoints/models";
3
3
  import { BaseModel } from "../models/baseModel";
4
4
  import { RouterModel } from "../models/routerModel";
5
5
  import { SingleModel } from "../models/singleModel";
@@ -11,7 +11,7 @@ import { resolveApiKey, resolveUrl } from "./resolver";
11
11
  */
12
12
  export const isServerReady = async (): Promise<boolean> => {
13
13
  try {
14
- const { status } = await rpc<{ status: string }>("/health");
14
+ const { status } = await rpc<HealthEndpoint>("/health");
15
15
  return status === "ok";
16
16
  } catch {
17
17
  return false;
@@ -59,13 +59,11 @@ export const rpc = async <T>(
59
59
  * @returns The list of models
60
60
  */
61
61
  export const listModels = async (): Promise<BaseModel[]> => {
62
- const { models, data } = await rpc<{
63
- models?: ISingleModel[];
64
- data: IRouterModel[];
65
- }>("/models");
62
+ const { models, data } = await rpc<ModelsEndpoint>("/models");
66
63
 
67
64
  if (models) {
68
- return models.map((m) => new SingleModel(m));
65
+ const [extra] = models;
66
+ return data.map((m) => new SingleModel(m, extra));
69
67
  }
70
68
 
71
69
  const response = data
@@ -0,0 +1,159 @@
1
+ import { describe, expect, it, vi } from "vitest";
2
+ import { Action } from "../src/enums/action";
3
+ import { Mode } from "../src/enums/mode";
4
+ import { Status } from "../src/enums/status";
5
+ import { DataProperty } from "../src/interfaces/endpoints/models";
6
+
7
+ // Mock the retriever module before importing anything that depends on it
8
+ vi.mock("../src/tools/retriever", () => ({
9
+ rpc: vi.fn(),
10
+ isServerReady: vi.fn(),
11
+ listModels: vi.fn(),
12
+ }));
13
+
14
+ class TestModel {
15
+ constructor(
16
+ private readonly model: DataProperty,
17
+ private readonly _mode: Mode,
18
+ private readonly _status: Status,
19
+ ) {}
20
+
21
+ get mode(): Mode {
22
+ return this._mode;
23
+ }
24
+
25
+ get capabilities(): ["text"] | ["image"] {
26
+ return ["text"];
27
+ }
28
+
29
+ async getStatus(): Promise<Status> {
30
+ return this._status;
31
+ }
32
+
33
+ async getContextSize(): Promise<number> {
34
+ return 4096;
35
+ }
36
+ }
37
+
38
+ const createModel = (
39
+ mode: Mode,
40
+ status: Status,
41
+ overrides: Partial<DataProperty> = {},
42
+ ) =>
43
+ new TestModel(
44
+ {
45
+ id: "test",
46
+ tags: [],
47
+ object: "model",
48
+ owned_by: "test",
49
+ created: Date.now(),
50
+ ...overrides,
51
+ },
52
+ mode,
53
+ status,
54
+ );
55
+
56
+ /**
57
+ * Replicates the getActionsForModel logic from handlers.ts for testing
58
+ * without needing the full Pi extension context.
59
+ */
60
+ const getActionsForModel = async (model: TestModel): Promise<Array<Action>> => {
61
+ const routerModeActions: Record<Status, Array<Action>> = {
62
+ [Status.LOADED]: [Action.SWITCH, Action.UNLOAD, Action.INFO, Action.CANCEL],
63
+ [Status.LOADING]: [Action.INFO, Action.CANCEL],
64
+ [Status.FAILED]: [Action.RETRY, Action.CANCEL],
65
+ [Status.SLEEPING]: [Action.UNLOAD, Action.INFO, Action.CANCEL],
66
+ [Status.UNLOADED]: [Action.LOAD, Action.CANCEL],
67
+ };
68
+
69
+ const singleModeActions: Record<Status, Array<Action>> = {
70
+ [Status.LOADED]: [Action.INFO, Action.CANCEL],
71
+ [Status.LOADING]: [Action.CANCEL],
72
+ [Status.FAILED]: [Action.CANCEL],
73
+ [Status.SLEEPING]: [Action.INFO, Action.CANCEL],
74
+ [Status.UNLOADED]: [Action.CANCEL],
75
+ };
76
+
77
+ const allActions =
78
+ model.mode === Mode.ROUTER ? routerModeActions : singleModeActions;
79
+
80
+ const status = await model.getStatus();
81
+ return allActions[status];
82
+ };
83
+
84
+ describe("Action availability", () => {
85
+ const actionMatrix: Array<{
86
+ mode: Mode;
87
+ status: Status;
88
+ expected: Action[];
89
+ }> = [
90
+ // Router mode
91
+ {
92
+ mode: Mode.ROUTER,
93
+ status: Status.LOADED,
94
+ expected: [Action.SWITCH, Action.UNLOAD, Action.INFO, Action.CANCEL],
95
+ },
96
+ {
97
+ mode: Mode.ROUTER,
98
+ status: Status.LOADING,
99
+ expected: [Action.INFO, Action.CANCEL],
100
+ },
101
+ {
102
+ mode: Mode.ROUTER,
103
+ status: Status.FAILED,
104
+ expected: [Action.RETRY, Action.CANCEL],
105
+ },
106
+ {
107
+ mode: Mode.ROUTER,
108
+ status: Status.SLEEPING,
109
+ expected: [Action.UNLOAD, Action.INFO, Action.CANCEL],
110
+ },
111
+ {
112
+ mode: Mode.ROUTER,
113
+ status: Status.UNLOADED,
114
+ expected: [Action.LOAD, Action.CANCEL],
115
+ },
116
+ // Single mode
117
+ {
118
+ mode: Mode.SINGLE,
119
+ status: Status.LOADED,
120
+ expected: [Action.INFO, Action.CANCEL],
121
+ },
122
+ { mode: Mode.SINGLE, status: Status.LOADING, expected: [Action.CANCEL] },
123
+ { mode: Mode.SINGLE, status: Status.FAILED, expected: [Action.CANCEL] },
124
+ {
125
+ mode: Mode.SINGLE,
126
+ status: Status.SLEEPING,
127
+ expected: [Action.INFO, Action.CANCEL],
128
+ },
129
+ { mode: Mode.SINGLE, status: Status.UNLOADED, expected: [Action.CANCEL] },
130
+ ];
131
+
132
+ it.each(actionMatrix)(
133
+ "should return correct actions for $mode/$status",
134
+ async ({ mode, status, expected }) => {
135
+ const model = createModel(mode, status);
136
+ const actions = await getActionsForModel(model);
137
+ expect(actions).toEqual(expected);
138
+ },
139
+ );
140
+
141
+ it("should always include CANCEL regardless of mode or status", async () => {
142
+ for (const mode of [Mode.ROUTER, Mode.SINGLE]) {
143
+ for (const status of Object.values(Status)) {
144
+ const model = createModel(mode, status);
145
+ const actions = await getActionsForModel(model);
146
+ expect(actions).toContain(Action.CANCEL);
147
+ }
148
+ }
149
+ });
150
+
151
+ it("should not include mode-exclusive actions", async () => {
152
+ const singleLoaded = createModel(Mode.SINGLE, Status.LOADED);
153
+ expect(await getActionsForModel(singleLoaded)).not.toContain(Action.SWITCH);
154
+ expect(await getActionsForModel(singleLoaded)).not.toContain(Action.LOAD);
155
+
156
+ const singleFailed = createModel(Mode.SINGLE, Status.FAILED);
157
+ expect(await getActionsForModel(singleFailed)).not.toContain(Action.RETRY);
158
+ });
159
+ });
@@ -0,0 +1,157 @@
1
+ import { afterEach, beforeEach, describe, expect, it, vi } from "vitest";
2
+ import {
3
+ API_KEY_PLACEHOLDER,
4
+ DEFAULT_LLAMA_SERVER_URL,
5
+ PROVIDER_ID,
6
+ } from "../src/constants";
7
+
8
+ describe("URL resolution fallback chain", () => {
9
+ beforeEach(() => {
10
+ vi.clearAllMocks();
11
+ vi.resetModules();
12
+ });
13
+
14
+ afterEach(() => {
15
+ delete process.env.LLAMA_SERVER_URL;
16
+ });
17
+
18
+ it("should return default URL when no config is found", async () => {
19
+ vi.doMock("node:fs/promises", () => ({
20
+ access: vi.fn().mockRejectedValue(new Error("ENOENT")),
21
+ constants: { F_OK: 0 },
22
+ readFile: vi.fn().mockResolvedValue(""),
23
+ }));
24
+
25
+ const { resolveUrl } = await import("../src/tools/resolver");
26
+ const result = await resolveUrl("/tmp/test-project");
27
+
28
+ expect(result).toBe(DEFAULT_LLAMA_SERVER_URL);
29
+ });
30
+
31
+ it("should prioritize project config over env variable", async () => {
32
+ vi.doMock("node:fs/promises", () => ({
33
+ access: vi.fn().mockImplementation(async (path: string) => {
34
+ if (path.includes("llama-server.json")) return undefined;
35
+ throw new Error("ENOENT");
36
+ }),
37
+ constants: { F_OK: 0 },
38
+ readFile: vi
39
+ .fn()
40
+ .mockResolvedValue(JSON.stringify({ url: "http://localhost:9999" })),
41
+ }));
42
+
43
+ process.env.LLAMA_SERVER_URL = "http://env-url:8080";
44
+
45
+ const { resolveUrl } = await import("../src/tools/resolver");
46
+ const result = await resolveUrl("/tmp/test-project");
47
+
48
+ expect(result).toBe("http://localhost:9999");
49
+ });
50
+
51
+ it("should use env variable when no project config exists", async () => {
52
+ vi.doMock("node:fs/promises", () => ({
53
+ access: vi.fn().mockRejectedValue(new Error("ENOENT")),
54
+ constants: { F_OK: 0 },
55
+ readFile: vi.fn().mockResolvedValue(""),
56
+ }));
57
+
58
+ process.env.LLAMA_SERVER_URL = "http://env-url:8080";
59
+
60
+ const { resolveUrl } = await import("../src/tools/resolver");
61
+ const result = await resolveUrl("/tmp/test-project");
62
+
63
+ expect(result).toBe("http://env-url:8080");
64
+ });
65
+
66
+ it("should use global settings when no project config or env exists", async () => {
67
+ vi.doMock("node:fs/promises", () => ({
68
+ access: vi.fn().mockImplementation(async (path: string) => {
69
+ if (path.includes("settings.json")) return undefined;
70
+ throw new Error("ENOENT");
71
+ }),
72
+ constants: { F_OK: 0 },
73
+ readFile: vi
74
+ .fn()
75
+ .mockResolvedValue(
76
+ JSON.stringify({ llamaServerUrl: "http://global:8080" }),
77
+ ),
78
+ }));
79
+
80
+ const { resolveUrl } = await import("../src/tools/resolver");
81
+ const result = await resolveUrl("/tmp/test-project");
82
+
83
+ expect(result).toBe("http://global:8080");
84
+ });
85
+
86
+ it("should strip trailing slashes from resolved URL", async () => {
87
+ vi.doMock("node:fs/promises", () => ({
88
+ access: vi.fn().mockImplementation(async (path: string) => {
89
+ if (path.includes("llama-server.json")) return undefined;
90
+ throw new Error("ENOENT");
91
+ }),
92
+ constants: { F_OK: 0 },
93
+ readFile: vi
94
+ .fn()
95
+ .mockResolvedValue(JSON.stringify({ url: "http://localhost:8080/" })),
96
+ }));
97
+
98
+ const { resolveUrl } = await import("../src/tools/resolver");
99
+ const result = await resolveUrl("/tmp/test-project");
100
+
101
+ expect(result).toBe("http://localhost:8080");
102
+ });
103
+ });
104
+
105
+ describe("API key resolution", () => {
106
+ beforeEach(() => {
107
+ vi.clearAllMocks();
108
+ vi.resetModules();
109
+ });
110
+
111
+ it("should return placeholder when auth file does not exist", async () => {
112
+ vi.doMock("node:fs/promises", () => ({
113
+ access: vi.fn().mockRejectedValue(new Error("ENOENT")),
114
+ constants: { F_OK: 0 },
115
+ readFile: vi.fn().mockResolvedValue(""),
116
+ }));
117
+
118
+ const { resolveApiKey } = await import("../src/tools/resolver");
119
+ const result = await resolveApiKey();
120
+
121
+ expect(result).toBe(API_KEY_PLACEHOLDER);
122
+ });
123
+
124
+ it("should return placeholder when provider key is missing", async () => {
125
+ vi.doMock("node:fs/promises", () => ({
126
+ access: vi.fn().mockResolvedValue(undefined),
127
+ constants: { F_OK: 0 },
128
+ readFile: vi
129
+ .fn()
130
+ .mockResolvedValue(
131
+ JSON.stringify({ "other-provider": { key: "other-key" } }),
132
+ ),
133
+ }));
134
+
135
+ const { resolveApiKey } = await import("../src/tools/resolver");
136
+ const result = await resolveApiKey();
137
+
138
+ expect(result).toBe(API_KEY_PLACEHOLDER);
139
+ });
140
+
141
+ it("should return the provider key when present", async () => {
142
+ vi.doMock("node:fs/promises", () => ({
143
+ access: vi.fn().mockResolvedValue(undefined),
144
+ constants: { F_OK: 0 },
145
+ readFile: vi
146
+ .fn()
147
+ .mockResolvedValue(
148
+ JSON.stringify({ [PROVIDER_ID]: { key: "test-api-key" } }),
149
+ ),
150
+ }));
151
+
152
+ const { resolveApiKey } = await import("../src/tools/resolver");
153
+ const result = await resolveApiKey();
154
+
155
+ expect(result).toBe("test-api-key");
156
+ });
157
+ });
@@ -0,0 +1,176 @@
1
+ import { describe, expect, it } from "vitest";
2
+ import { Mode } from "../src/enums/mode";
3
+ import { DataProperty } from "../src/interfaces/endpoints/models";
4
+ import { RouterModel } from "../src/models/routerModel";
5
+
6
+ // Helper to create a mock DataProperty
7
+ const createModel = (overrides: Partial<DataProperty> = {}): DataProperty => ({
8
+ id: "test-model",
9
+ aliases: ["test-alias"],
10
+ tags: [],
11
+ object: "model",
12
+ owned_by: "test",
13
+ created: Date.now(),
14
+ ...overrides,
15
+ });
16
+
17
+ describe("RouterModel context size extraction", () => {
18
+ it("should extract --ctx-size value", () => {
19
+ const model = new RouterModel(
20
+ createModel({
21
+ status: {
22
+ value: "loaded",
23
+ args: [
24
+ "--model",
25
+ "gguf",
26
+ "--ctx-size",
27
+ "4096",
28
+ "--batch-size",
29
+ "512",
30
+ ],
31
+ preset: "default",
32
+ },
33
+ }),
34
+ );
35
+
36
+ // Access the private method via any
37
+ const extractFrom = (model as any).extractFrom.bind(model);
38
+ expect(extractFrom("--ctx-size")).toBe(4096);
39
+ });
40
+
41
+ it("should extract --fit-ctx value when --ctx-size is not present", () => {
42
+ const model = new RouterModel(
43
+ createModel({
44
+ status: {
45
+ value: "loaded",
46
+ args: ["--model", "gguf", "--fit-ctx", "8192"],
47
+ preset: "default",
48
+ },
49
+ }),
50
+ );
51
+
52
+ const extractFrom = (model as any).extractFrom.bind(model);
53
+ expect(extractFrom("--fit-ctx")).toBe(8192);
54
+ });
55
+
56
+ it("should return null when argument is not found", () => {
57
+ const model = new RouterModel(
58
+ createModel({
59
+ status: {
60
+ value: "loaded",
61
+ args: ["--model", "gguf", "--batch-size", "512"],
62
+ preset: "default",
63
+ },
64
+ }),
65
+ );
66
+
67
+ const extractFrom = (model as any).extractFrom.bind(model);
68
+ expect(extractFrom("--ctx-size")).toBeNull();
69
+ expect(extractFrom("--fit-ctx")).toBeNull();
70
+ });
71
+
72
+ it("should return null when argument has no following value", () => {
73
+ const model = new RouterModel(
74
+ createModel({
75
+ status: {
76
+ value: "loaded",
77
+ args: ["--model", "gguf", "--ctx-size"],
78
+ preset: "default",
79
+ },
80
+ }),
81
+ );
82
+
83
+ const extractFrom = (model as any).extractFrom.bind(model);
84
+ expect(extractFrom("--ctx-size")).toBeNull();
85
+ });
86
+
87
+ it("should return null when argument value is not a valid number", () => {
88
+ const model = new RouterModel(
89
+ createModel({
90
+ status: {
91
+ value: "loaded",
92
+ args: ["--model", "gguf", "--ctx-size", "not-a-number"],
93
+ preset: "default",
94
+ },
95
+ }),
96
+ );
97
+
98
+ const extractFrom = (model as any).extractFrom.bind(model);
99
+ expect(extractFrom("--ctx-size")).toBeNull();
100
+ });
101
+
102
+ it("should prefer --ctx-size over --fit-ctx", async () => {
103
+ const model = new RouterModel(
104
+ createModel({
105
+ status: {
106
+ value: "loaded",
107
+ args: ["--model", "gguf", "--ctx-size", "4096", "--fit-ctx", "8192"],
108
+ preset: "default",
109
+ },
110
+ }),
111
+ );
112
+
113
+ const ctxSize = await model.getContextSize();
114
+ expect(ctxSize).toBe(4096);
115
+ });
116
+
117
+ it("should return DEFAULT_CTX when no context size args are present", async () => {
118
+ const { DEFAULT_CTX } = await import("../src/constants");
119
+
120
+ const model = new RouterModel(
121
+ createModel({
122
+ status: {
123
+ value: "loaded",
124
+ args: ["--model", "gguf"],
125
+ preset: "default",
126
+ },
127
+ }),
128
+ );
129
+
130
+ const ctxSize = await model.getContextSize();
131
+ expect(ctxSize).toBe(DEFAULT_CTX);
132
+ });
133
+ });
134
+
135
+ describe("RouterModel capabilities detection", () => {
136
+ it("should detect image capability when --mmproj is present", () => {
137
+ const model = new RouterModel(
138
+ createModel({
139
+ status: {
140
+ value: "loaded",
141
+ args: ["--model", "gguf", "--mmproj", "mmproj.gguf"],
142
+ preset: "default",
143
+ },
144
+ }),
145
+ );
146
+
147
+ expect(model.capabilities).toEqual(["image"]);
148
+ });
149
+
150
+ it("should detect text-only capability when --mmproj is absent", () => {
151
+ const model = new RouterModel(
152
+ createModel({
153
+ status: {
154
+ value: "loaded",
155
+ args: ["--model", "gguf"],
156
+ preset: "default",
157
+ },
158
+ }),
159
+ );
160
+
161
+ expect(model.capabilities).toEqual(["text"]);
162
+ });
163
+
164
+ it("should default to text when status is undefined", () => {
165
+ const model = new RouterModel(createModel({ status: undefined }));
166
+
167
+ expect(model.capabilities).toEqual(["text"]);
168
+ });
169
+ });
170
+
171
+ describe("RouterModel mode", () => {
172
+ it("should always return ROUTER mode", () => {
173
+ const model = new RouterModel(createModel());
174
+ expect(model.mode).toBe(Mode.ROUTER);
175
+ });
176
+ });
@@ -0,0 +1,123 @@
1
+ import { beforeEach, describe, expect, it, vi } from "vitest";
2
+ import { DEFAULT_CTX } from "../src/constants";
3
+ import { Mode } from "../src/enums/mode";
4
+ import { Status } from "../src/enums/status";
5
+ import { ModelProperty } from "../src/interfaces/endpoints/models";
6
+ import { SingleModel } from "../src/models/singleModel";
7
+
8
+ const mockRpc = vi.fn();
9
+
10
+ vi.mock("../src/tools/retriever", () => ({
11
+ rpc: (...args: unknown[]) => mockRpc(...args),
12
+ isServerReady: vi.fn(),
13
+ listModels: vi.fn(),
14
+ }));
15
+
16
+ beforeEach(() => {
17
+ mockRpc.mockClear();
18
+ });
19
+
20
+ const createModel = (extra: Partial<ModelProperty> = {}): SingleModel =>
21
+ new SingleModel(
22
+ {
23
+ id: "test",
24
+ tags: [],
25
+ object: "model",
26
+ owned_by: "test",
27
+ created: Date.now(),
28
+ },
29
+ {
30
+ name: "test",
31
+ model: "test.gguf",
32
+ modified_at: new Date().toISOString(),
33
+ size: "1B",
34
+ digest: "abc123",
35
+ type: "model",
36
+ description: "test",
37
+ tags: [],
38
+ capabilities: [],
39
+ parameters: "",
40
+ details: {
41
+ parent_model: "",
42
+ format: "",
43
+ family: "",
44
+ families: [],
45
+ parameter_size: "",
46
+ quantization_level: "",
47
+ },
48
+ ...extra,
49
+ },
50
+ );
51
+
52
+ describe("SingleModel mode", () => {
53
+ it("should always return SINGLE mode", () => {
54
+ const model = createModel();
55
+ expect(model.mode).toBe(Mode.SINGLE);
56
+ });
57
+ });
58
+
59
+ describe("SingleModel capabilities", () => {
60
+ it("should detect image capability when multimodal", () => {
61
+ const model = createModel({ capabilities: ["multimodal"] });
62
+ expect(model.capabilities).toEqual(["image"]);
63
+ });
64
+
65
+ it("should detect text-only capability when not multimodal", () => {
66
+ const model = createModel({ capabilities: [] });
67
+ expect(model.capabilities).toEqual(["text"]);
68
+ });
69
+ });
70
+
71
+ describe("SingleModel getStatus", () => {
72
+ it("should return LOADED when not sleeping", async () => {
73
+ mockRpc.mockResolvedValueOnce({ is_sleeping: false });
74
+
75
+ const model = createModel();
76
+ const status = await model.getStatus();
77
+
78
+ expect(status).toBe(Status.LOADED);
79
+ expect(mockRpc).toHaveBeenCalledWith("/props");
80
+ });
81
+
82
+ it("should return SLEEPING when is_sleeping is true", async () => {
83
+ mockRpc.mockResolvedValueOnce({ is_sleeping: true });
84
+
85
+ const model = createModel();
86
+ const status = await model.getStatus();
87
+
88
+ expect(status).toBe(Status.SLEEPING);
89
+ });
90
+ });
91
+
92
+ describe("SingleModel getContextSize", () => {
93
+ it("should return n_ctx from /slots endpoint", async () => {
94
+ mockRpc.mockResolvedValueOnce([{ n_ctx: 8192 }]);
95
+
96
+ const model = createModel();
97
+ const ctxSize = await model.getContextSize();
98
+
99
+ expect(ctxSize).toBe(8192);
100
+ expect(mockRpc).toHaveBeenCalledWith("/slots");
101
+ });
102
+
103
+ it("should cache the context size on first call", async () => {
104
+ mockRpc.mockResolvedValueOnce([{ n_ctx: 4096 }]);
105
+
106
+ const model = createModel();
107
+ const first = await model.getContextSize();
108
+ const second = await model.getContextSize();
109
+
110
+ expect(first).toBe(4096);
111
+ expect(second).toBe(4096);
112
+ expect(mockRpc).toHaveBeenCalledTimes(1);
113
+ });
114
+
115
+ it("should return DEFAULT_CTX when /slots fails", async () => {
116
+ mockRpc.mockRejectedValueOnce(new Error("Connection refused"));
117
+
118
+ const model = createModel();
119
+ const ctxSize = await model.getContextSize();
120
+
121
+ expect(ctxSize).toBe(DEFAULT_CTX);
122
+ });
123
+ });
@@ -0,0 +1,8 @@
1
+ import { defineConfig } from "vitest/config";
2
+
3
+ export default defineConfig({
4
+ test: {
5
+ globals: true,
6
+ environment: "node",
7
+ },
8
+ });
@@ -1,10 +0,0 @@
1
- import { PROVIDER_NAME } from "../constants";
2
-
3
- export interface IAuth {
4
- type: string;
5
- key: string;
6
- }
7
-
8
- export interface IAuthFile {
9
- [PROVIDER_NAME]: IAuth;
10
- }
@@ -1,17 +0,0 @@
1
- interface IRouterModelStatus {
2
- value: string;
3
- args: string[];
4
- preset: string;
5
- exit_code?: number;
6
- failed?: boolean;
7
- }
8
-
9
- export interface IRouterModel {
10
- id: string;
11
- aliases?: string[];
12
- tags: string[];
13
- object: string;
14
- owned_by: string;
15
- created: number;
16
- status: IRouterModelStatus;
17
- }
@@ -1,20 +0,0 @@
1
- export interface ISingleModel {
2
- name: string;
3
- model: string;
4
- modified_at: string;
5
- size: string;
6
- digest: string;
7
- type: string;
8
- description: string;
9
- tags: string[];
10
- capabilities: string[];
11
- parameters: string;
12
- details: {
13
- parent_model: string;
14
- format: string;
15
- family: string;
16
- families: string[];
17
- parameter_size: string;
18
- quantization_level: string;
19
- };
20
- }