offgrid-ai 0.8.15 → 0.9.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +26 -25
- package/package.json +3 -2
- package/src/autodetect.mjs +6 -3
- package/src/backends.mjs +36 -45
- package/src/benchmark/finalize.mjs +198 -0
- package/src/benchmark/flow.mjs +237 -0
- package/src/benchmark/metrics.mjs +152 -0
- package/src/benchmark/pi-runner.mjs +252 -0
- package/src/benchmark/prepare.mjs +121 -0
- package/src/benchmark/repo.mjs +77 -0
- package/src/benchmark/shared.mjs +54 -0
- package/src/benchmark/stream-renderer.mjs +274 -0
- package/src/benchmark.mjs +10 -1330
- package/src/cli.mjs +2 -2
- package/src/commands/main.mjs +2 -2
- package/src/commands/onboard.mjs +6 -2
- package/src/config.mjs +8 -2
- package/src/harness-pi.mjs +1 -1
- package/src/managed.mjs +3 -3
- package/src/model-catalog.mjs +2 -1
- package/src/model-name.mjs +220 -0
- package/src/process.mjs +29 -21
- package/src/runtime.mjs +11 -0
- package/src/scan.mjs +9 -20
package/README.md
CHANGED
|
@@ -2,28 +2,29 @@
|
|
|
2
2
|
|
|
3
3
|
# offgrid-ai
|
|
4
4
|
|
|
5
|
-
**
|
|
5
|
+
**Helper CLI for running local AI models on Mac with llama.cpp, ollama, and oMLX.**
|
|
6
6
|
|
|
7
7
|
[](package.json)
|
|
8
8
|
[]()
|
|
9
9
|
|
|
10
|
-
Install • Pick a model • Start chatting
|
|
11
|
-
```bash
|
|
12
|
-
curl -fsSL https://raw.githubusercontent.com/eeshansrivastava89/offgrid-ai/main/install.sh | bash
|
|
13
|
-
```
|
|
14
10
|
|
|
15
11
|
</div>
|
|
16
12
|
|
|
17
13
|
## What is offgrid-ai?
|
|
18
14
|
|
|
19
|
-
offgrid-ai is a command-line tool that lets you run AI models locally.
|
|
15
|
+
offgrid-ai is a command-line tool that lets you run AI models locally. Running local models with llama.cpp, ollama, or oMLX have a steep learning curve compared to cloud-based models, so offgrid-ai is designed to abstract away the complexity, while still providing a powerful and flexible way to run local models.
|
|
16
|
+
|
|
17
|
+
This is the recommended workflow:
|
|
20
18
|
|
|
21
|
-
|
|
19
|
+
1. Download models from **LM Studio**, **Ollama**, or **oMLX**
|
|
20
|
+
2. Do minimal configuration using the `offgrid-ai` command
|
|
21
|
+
3. Run the model with `offgrid-ai` with Pi in interactive mode
|
|
22
22
|
|
|
23
|
-
|
|
24
|
-
-
|
|
25
|
-
-
|
|
26
|
-
-
|
|
23
|
+
## Core Features
|
|
24
|
+
- Auto-detects available models from LM Studio, Ollama, and oMLX
|
|
25
|
+
- Auto-detects MTP (multi-token prediction) or QAT (quantization aware training) models, and applies the correct flags for llama.cpp
|
|
26
|
+
- Auto-applies the optimal flags for the model type in llama.cpp
|
|
27
|
+
- Start / stop llama.cpp server automatically for chat sessions
|
|
27
28
|
|
|
28
29
|
## Quick start
|
|
29
30
|
|
|
@@ -35,7 +36,7 @@ Open your terminal and run:
|
|
|
35
36
|
curl -fsSL https://raw.githubusercontent.com/eeshansrivastava89/offgrid-ai/main/install.sh | bash
|
|
36
37
|
```
|
|
37
38
|
|
|
38
|
-
This installs offgrid-ai and
|
|
39
|
+
This installs offgrid-ai and dependencies (node, npm, and llama.cpp). Then open a new terminal window and run:
|
|
39
40
|
|
|
40
41
|
```bash
|
|
41
42
|
offgrid-ai
|
|
@@ -53,14 +54,8 @@ The curl installer is recommended for first-time setup because it also verifies
|
|
|
53
54
|
|
|
54
55
|
The first time you run offgrid-ai, it looks for models already on your machine. If it does not find any, it tells you how to get one.
|
|
55
56
|
|
|
56
|
-
|
|
57
|
+
<img width="808" height="274" alt="image" src="https://github.com/user-attachments/assets/6e1583ab-65db-423c-b0eb-b627586fbf86" />
|
|
57
58
|
|
|
58
|
-
| Source | Example command |
|
|
59
|
-
|---|---|
|
|
60
|
-
| LM Studio | `lms get qwen/qwen3.5-9b` |
|
|
61
|
-
| Ollama | `ollama pull gemma3:4b` |
|
|
62
|
-
| oMLX | Use `omlx start` |
|
|
63
|
-
| Hugging Face | Download a GGUF file |
|
|
64
59
|
|
|
65
60
|
### 3. Start chatting
|
|
66
61
|
|
|
@@ -68,23 +63,29 @@ Supported ways to get models:
|
|
|
68
63
|
offgrid-ai
|
|
69
64
|
```
|
|
70
65
|
|
|
66
|
+
<img width="786" height="281" alt="image" src="https://github.com/user-attachments/assets/03cb1e06-d461-4bdf-ad82-f0692e5ba5c6" />
|
|
67
|
+
|
|
68
|
+
|
|
71
69
|
Pick a model from the list and press Enter. offgrid-ai configures the rest and opens the Pi coding agent.
|
|
72
70
|
|
|
71
|
+
<img width="786" height="499" alt="image" src="https://github.com/user-attachments/assets/223e1455-c69c-4405-a91c-5bac1b9fc9bd" />
|
|
72
|
+
|
|
73
|
+
|
|
73
74
|
## Everyday commands
|
|
74
75
|
|
|
75
76
|
```bash
|
|
76
|
-
offgrid-ai #
|
|
77
|
-
offgrid-ai status # see
|
|
77
|
+
offgrid-ai # primary entry-point for the CLI
|
|
78
|
+
offgrid-ai status # see if any model is running
|
|
78
79
|
offgrid-ai stop # stop the running model
|
|
79
|
-
offgrid-ai benchmark # run a benchmark
|
|
80
|
+
offgrid-ai benchmark # run a benchmark paired with my local llm benchmark runner
|
|
80
81
|
offgrid-ai uninstall # remove offgrid-ai
|
|
81
82
|
```
|
|
82
83
|
|
|
83
84
|
## What can I do with it?
|
|
84
85
|
|
|
85
|
-
- **Chat with local models** —
|
|
86
|
-
- **Run benchmarks** — compare how different models perform on creative or data-science tasks.
|
|
87
|
-
- **Keep data private** — everything
|
|
86
|
+
- **Chat with local models** — you download the models yourself, and then offgrid-ai helps configure and run then
|
|
87
|
+
- **Run benchmarks** — compare how different models perform on creative or data-science tasks. Pairs with my other [local llm benchmark runner](https://github.com/eeshansrivastava89/local-llm-visual-benchmark)
|
|
88
|
+
- **Keep data private** — everything runs on your machine without any cloud connections
|
|
88
89
|
|
|
89
90
|
## Need help?
|
|
90
91
|
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "offgrid-ai",
|
|
3
|
-
"version": "0.
|
|
3
|
+
"version": "0.9.2",
|
|
4
4
|
"description": "Privacy-first CLI for running local LLMs — discover, configure, run, benchmark",
|
|
5
5
|
"author": "Eeshan Srivastava (https://eeshans.com)",
|
|
6
6
|
"type": "module",
|
|
@@ -11,6 +11,7 @@
|
|
|
11
11
|
"bin/*.mjs",
|
|
12
12
|
"src/*.mjs",
|
|
13
13
|
"src/commands/*.mjs",
|
|
14
|
+
"src/benchmark/*.mjs",
|
|
14
15
|
"install.sh"
|
|
15
16
|
],
|
|
16
17
|
"publishConfig": {
|
|
@@ -31,7 +32,7 @@
|
|
|
31
32
|
"start": "node bin/offgrid-ai.mjs",
|
|
32
33
|
"test": "node --test test/*.mjs",
|
|
33
34
|
"test:integration": "OFFGRID_INTEGRATION=1 node --test test/integration/*.mjs",
|
|
34
|
-
"lint": "eslint src/*.mjs src/commands/*.mjs bin/*.mjs",
|
|
35
|
+
"lint": "eslint src/*.mjs src/commands/*.mjs src/benchmark/*.mjs bin/*.mjs",
|
|
35
36
|
"check:privacy": "node scripts/privacy-gate.mjs",
|
|
36
37
|
"release:check": "bash scripts/release-check.sh",
|
|
37
38
|
"release:check:fast": "bash scripts/release-check.sh --skip-install --skip-manual",
|
package/src/autodetect.mjs
CHANGED
|
@@ -2,13 +2,13 @@ import { basename } from "node:path";
|
|
|
2
2
|
import { existsSync } from "node:fs";
|
|
3
3
|
import { readGgufMetadata } from "./gguf.mjs";
|
|
4
4
|
import { defaultFlagsForBackend } from "./backends.mjs";
|
|
5
|
+
import { parseModelName } from "./model-name.mjs";
|
|
5
6
|
|
|
6
7
|
// ── Detect model capabilities from GGUF metadata ──────────────────────────
|
|
7
8
|
|
|
8
9
|
export function detectCapabilities(modelPath, mmprojPath) {
|
|
9
10
|
const meta = safeReadGgufMetadata(modelPath);
|
|
10
11
|
const mmprojMeta = mmprojPath ? safeReadGgufMetadata(mmprojPath) : {};
|
|
11
|
-
const name = basename(modelPath).toLowerCase();
|
|
12
12
|
const pathHints = String(modelPath).toLowerCase();
|
|
13
13
|
|
|
14
14
|
// Architecture
|
|
@@ -33,8 +33,11 @@ export function detectCapabilities(modelPath, mmprojPath) {
|
|
|
33
33
|
// Do not treat all Qwen models as MTP; require an explicit filename or metadata hint.
|
|
34
34
|
const mtp = /\bmtp\b|draft-mtp|multi-token/i.test(pathHints) || Object.keys(meta).some((key) => /mtp|draft|speculative/i.test(key));
|
|
35
35
|
|
|
36
|
-
// Quantization
|
|
37
|
-
|
|
36
|
+
// Quantization — use parseModelName (single path) for filename-based extraction.
|
|
37
|
+
// GGUF metadata does not store a standardized quant field, so the filename
|
|
38
|
+
// is the authoritative source for quant identification.
|
|
39
|
+
const parsed = parseModelName(basename(modelPath).replace(/\.gguf$/i, ""), "local-gguf");
|
|
40
|
+
const quant = parsed.quant;
|
|
38
41
|
|
|
39
42
|
// Context size from metadata, fallback to name hints
|
|
40
43
|
const metaCtx = architecture
|
package/src/backends.mjs
CHANGED
|
@@ -1,5 +1,6 @@
|
|
|
1
1
|
import { findLlamaServer } from "./config.mjs";
|
|
2
2
|
import { scanGgufModels } from "./scan.mjs";
|
|
3
|
+
import { parseModelName } from "./model-name.mjs";
|
|
3
4
|
|
|
4
5
|
// ── Backend definitions ────────────────────────────────────────────────────
|
|
5
6
|
|
|
@@ -87,51 +88,47 @@ export function defaultFlagsForBackend(backendId) {
|
|
|
87
88
|
// ── Ollama model discovery ──────────────────────────────────────────────
|
|
88
89
|
|
|
89
90
|
async function scanOllamaModels() {
|
|
90
|
-
|
|
91
|
-
|
|
92
|
-
|
|
93
|
-
const body = await response.json();
|
|
94
|
-
if (!Array.isArray(body?.models)) return [];
|
|
95
|
-
return body.models
|
|
96
|
-
.filter((model) => isLocalOllamaModel(model))
|
|
97
|
-
.map((model) => ({
|
|
98
|
-
id: model.name,
|
|
99
|
-
label: ollamaLabel(model.name),
|
|
100
|
-
aliasSuggestion: model.name,
|
|
101
|
-
sizeBytes: model.size ?? 0,
|
|
102
|
-
quant: model.details?.quantization_level,
|
|
103
|
-
family: model.details?.family,
|
|
104
|
-
backend: "ollama",
|
|
105
|
-
source: "ollama",
|
|
106
|
-
})).sort((a, b) => a.label.localeCompare(b.label));
|
|
107
|
-
} catch {
|
|
108
|
-
return [];
|
|
91
|
+
const response = await fetch(`${BACKENDS.ollama.apiBaseUrl}/api/tags`, { signal: AbortSignal.timeout(3000) });
|
|
92
|
+
if (!response.ok) {
|
|
93
|
+
throw new Error(`Ollama /api/tags returned ${response.status} ${response.statusText}`);
|
|
109
94
|
}
|
|
95
|
+
const body = await response.json();
|
|
96
|
+
if (!Array.isArray(body?.models)) return [];
|
|
97
|
+
return body.models
|
|
98
|
+
.filter((model) => isLocalOllamaModel(model))
|
|
99
|
+
.map((model) => ({
|
|
100
|
+
id: model.name,
|
|
101
|
+
label: parseModelName(model.name, "ollama").display,
|
|
102
|
+
aliasSuggestion: model.name,
|
|
103
|
+
sizeBytes: model.size ?? 0,
|
|
104
|
+
quant: model.details?.quantization_level,
|
|
105
|
+
family: model.details?.family,
|
|
106
|
+
backend: "ollama",
|
|
107
|
+
source: "ollama",
|
|
108
|
+
})).sort((a, b) => a.label.localeCompare(b.label));
|
|
110
109
|
}
|
|
111
110
|
|
|
112
111
|
// ── oMLX model discovery ───────────────────────────────────────────────
|
|
113
112
|
|
|
114
113
|
async function scanOmlxModels() {
|
|
115
|
-
|
|
116
|
-
|
|
117
|
-
|
|
118
|
-
const body = await response.json();
|
|
119
|
-
if (!Array.isArray(body?.data)) return [];
|
|
120
|
-
return body.data
|
|
121
|
-
.filter((model) => isChatOmlxModel(model))
|
|
122
|
-
.map((model) => ({
|
|
123
|
-
id: model.id,
|
|
124
|
-
label: omlxLabel(model.id),
|
|
125
|
-
aliasSuggestion: model.id,
|
|
126
|
-
sizeBytes: 0,
|
|
127
|
-
quant: null,
|
|
128
|
-
family: null,
|
|
129
|
-
backend: "omlx",
|
|
130
|
-
source: "omlx",
|
|
131
|
-
})).sort((a, b) => a.label.localeCompare(b.label));
|
|
132
|
-
} catch {
|
|
133
|
-
return [];
|
|
114
|
+
const response = await fetch(`${BACKENDS.omlx.defaultBaseUrl}/models`, { signal: AbortSignal.timeout(3000) });
|
|
115
|
+
if (!response.ok) {
|
|
116
|
+
throw new Error(`oMLX /models returned ${response.status} ${response.statusText}`);
|
|
134
117
|
}
|
|
118
|
+
const body = await response.json();
|
|
119
|
+
if (!Array.isArray(body?.data)) return [];
|
|
120
|
+
return body.data
|
|
121
|
+
.filter((model) => isChatOmlxModel(model))
|
|
122
|
+
.map((model) => ({
|
|
123
|
+
id: model.id,
|
|
124
|
+
label: parseModelName(model.id, "omlx").display,
|
|
125
|
+
aliasSuggestion: model.id,
|
|
126
|
+
sizeBytes: 0,
|
|
127
|
+
quant: null,
|
|
128
|
+
family: null,
|
|
129
|
+
backend: "omlx",
|
|
130
|
+
source: "omlx",
|
|
131
|
+
})).sort((a, b) => a.label.localeCompare(b.label));
|
|
135
132
|
}
|
|
136
133
|
|
|
137
134
|
// ── Labels ──────────────────────────────────────────────────────────────
|
|
@@ -151,10 +148,4 @@ function isChatOmlxModel(model) {
|
|
|
151
148
|
return true;
|
|
152
149
|
}
|
|
153
150
|
|
|
154
|
-
|
|
155
|
-
return name.replace(/[-_]/g, " ").replace(/^gemma\b/i, "Gemma").replace(/^qwen/i, "Qwen");
|
|
156
|
-
}
|
|
157
|
-
|
|
158
|
-
function omlxLabel(id) {
|
|
159
|
-
return id.replace(/[-_]/g, " ").replace(/^gemma-4/i, "Gemma 4").replace(/^qwen/i, "Qwen");
|
|
160
|
-
}
|
|
151
|
+
// (ollamaLabel and omlxLabel removed — parseModelName in model-name.mjs is the single path)
|
|
@@ -0,0 +1,198 @@
|
|
|
1
|
+
// ── Unload model from server memory after benchmark ────────────────────────────
|
|
2
|
+
|
|
3
|
+
import { backendFor } from "../backends.mjs";
|
|
4
|
+
import { apiRootUrl } from "../process.mjs";
|
|
5
|
+
import { existsSync } from "node:fs";
|
|
6
|
+
import { readFile, writeFile } from "node:fs/promises";
|
|
7
|
+
import { join } from "node:path";
|
|
8
|
+
import { pc, renderRows, renderSection } from "../ui.mjs";
|
|
9
|
+
|
|
10
|
+
export async function unloadModelFromServer(profile) {
|
|
11
|
+
const backend = backendFor(profile.backend);
|
|
12
|
+
|
|
13
|
+
if (backend.id === "ollama") {
|
|
14
|
+
const apiBaseUrl = apiRootUrl(profile.baseUrl || backend.apiBaseUrl || "");
|
|
15
|
+
|
|
16
|
+
try {
|
|
17
|
+
await fetch(`${apiBaseUrl}/api/generate`, {
|
|
18
|
+
method: "POST",
|
|
19
|
+
headers: { "Content-Type": "application/json" },
|
|
20
|
+
body: JSON.stringify({ model: profile.modelAlias, prompt: "", stream: false, keep_alive: 0 }),
|
|
21
|
+
signal: AbortSignal.timeout(10000),
|
|
22
|
+
});
|
|
23
|
+
return { unloaded: true, backend: backend.id };
|
|
24
|
+
} catch (err) {
|
|
25
|
+
return { unloaded: false, backend: backend.id, error: err.message };
|
|
26
|
+
}
|
|
27
|
+
}
|
|
28
|
+
|
|
29
|
+
if (backend.id === "llama-cpp" || backend.id === "llama-cpp-mtp") {
|
|
30
|
+
// llama.cpp unloads when the server process exits; no HTTP unload API exists.
|
|
31
|
+
// If offgrid-ai started the server, stopProfile already handled it.
|
|
32
|
+
return { unloaded: false, backend: backend.id, reason: "stop server to unload" };
|
|
33
|
+
}
|
|
34
|
+
|
|
35
|
+
if (backend.id === "omlx") {
|
|
36
|
+
// oMLX does not expose a model-unload endpoint. The model stays resident
|
|
37
|
+
// until the oMLX server process is stopped.
|
|
38
|
+
return { unloaded: false, backend: backend.id, reason: "no unload API available" };
|
|
39
|
+
}
|
|
40
|
+
|
|
41
|
+
return { unloaded: false, backend: backend.id, reason: "unsupported backend" };
|
|
42
|
+
}
|
|
43
|
+
|
|
44
|
+
export async function finalizeBenchmarkRun(runDirectory, runResult, speedMetrics) {
|
|
45
|
+
const metadataPath = join(runDirectory, "metadata.json");
|
|
46
|
+
const metadata = JSON.parse(await readFile(metadataPath, "utf8"));
|
|
47
|
+
const now = new Date();
|
|
48
|
+
const timestamp = now.toISOString();
|
|
49
|
+
|
|
50
|
+
const kind = metadata.kind ?? "visual";
|
|
51
|
+
const isDs = kind === "data-science";
|
|
52
|
+
const requiredFile = isDs ? "analysis.ipynb" : "index.html";
|
|
53
|
+
const requiredPath = join(runDirectory, requiredFile);
|
|
54
|
+
|
|
55
|
+
const outputFiles = [];
|
|
56
|
+
for (const candidate of [requiredFile, isDs ? "summary.json" : "preview.png", isDs ? "chart-distribution.png" : "preview.webm", "preview.mp4"]) {
|
|
57
|
+
if (existsSync(join(runDirectory, candidate))) {
|
|
58
|
+
outputFiles.push(candidate);
|
|
59
|
+
}
|
|
60
|
+
}
|
|
61
|
+
|
|
62
|
+
const success = existsSync(requiredPath) && (await readFile(requiredPath, "utf8")).trim().length > 0;
|
|
63
|
+
const hasTurns = runResult.agentTurns > 0;
|
|
64
|
+
|
|
65
|
+
let failureReason = null;
|
|
66
|
+
if (runResult.error) {
|
|
67
|
+
failureReason = typeof runResult.error === "string" ? runResult.error : (runResult.error.message ?? "Unknown error");
|
|
68
|
+
} else if (!hasTurns) {
|
|
69
|
+
failureReason = "The model did not produce any response turns.";
|
|
70
|
+
} else if (!success) {
|
|
71
|
+
if (runResult.toolCalls === 0) {
|
|
72
|
+
failureReason = `The model finished without writing the required output file (${requiredFile}). It may have returned the response as chat text instead of using the write tool.`;
|
|
73
|
+
} else {
|
|
74
|
+
failureReason = `The required output file (${requiredFile}) was missing or empty after the run.`;
|
|
75
|
+
}
|
|
76
|
+
}
|
|
77
|
+
|
|
78
|
+
const failed = failureReason !== null;
|
|
79
|
+
|
|
80
|
+
metadata.status = failed ? "failed" : "completed";
|
|
81
|
+
metadata.updatedAt = timestamp;
|
|
82
|
+
if (failed) {
|
|
83
|
+
metadata.failedAt = timestamp;
|
|
84
|
+
} else {
|
|
85
|
+
metadata.completedAt = timestamp;
|
|
86
|
+
}
|
|
87
|
+
|
|
88
|
+
const totalTokens = runResult.promptTokens + runResult.completionTokens;
|
|
89
|
+
|
|
90
|
+
metadata.runner.tokenMetrics = {
|
|
91
|
+
reported: hasTurns,
|
|
92
|
+
promptTokens: runResult.promptTokens,
|
|
93
|
+
completionTokens: runResult.completionTokens,
|
|
94
|
+
totalTokens,
|
|
95
|
+
};
|
|
96
|
+
|
|
97
|
+
metadata.runner.speedMetrics = speedMetrics;
|
|
98
|
+
metadata.runner.metricSource = speedMetrics?.metricSource ?? null;
|
|
99
|
+
|
|
100
|
+
metadata.results = {
|
|
101
|
+
wallClockMs: runResult.wallClockMs,
|
|
102
|
+
agentTurns: runResult.agentTurns,
|
|
103
|
+
toolCalls: runResult.toolCalls,
|
|
104
|
+
toolResults: runResult.toolResults,
|
|
105
|
+
success,
|
|
106
|
+
outputFiles,
|
|
107
|
+
perTurn: runResult.perTurn,
|
|
108
|
+
};
|
|
109
|
+
|
|
110
|
+
if (failureReason) {
|
|
111
|
+
metadata.error = { message: failureReason, ...(typeof runResult.error === "object" && runResult.error?.stack ? { stack: runResult.error.stack } : {}) };
|
|
112
|
+
} else if (runResult.error) {
|
|
113
|
+
metadata.error = typeof runResult.error === "string"
|
|
114
|
+
? { message: runResult.error }
|
|
115
|
+
: { message: runResult.error.message ?? "Unknown error", ...(runResult.error.stack ? { stack: runResult.error.stack } : {}) };
|
|
116
|
+
}
|
|
117
|
+
|
|
118
|
+
await writeFile(metadataPath, JSON.stringify(metadata, null, 2) + "\n", "utf8");
|
|
119
|
+
return metadata;
|
|
120
|
+
}
|
|
121
|
+
|
|
122
|
+
function formatMetric(value, formatter) {
|
|
123
|
+
if (value === null || value === undefined || !Number.isFinite(value)) return pc.dim("—");
|
|
124
|
+
return formatter(value);
|
|
125
|
+
}
|
|
126
|
+
|
|
127
|
+
function formatMs(ms) {
|
|
128
|
+
return formatMetric(ms, (n) => (n < 1000 ? `${Math.round(n)} ms` : `${(n / 1000).toFixed(1)} s`));
|
|
129
|
+
}
|
|
130
|
+
|
|
131
|
+
function formatNumber(n) {
|
|
132
|
+
return formatMetric(n, (v) => v.toLocaleString());
|
|
133
|
+
}
|
|
134
|
+
|
|
135
|
+
function formatTokPerSec(n) {
|
|
136
|
+
return formatMetric(n, (v) => `${v.toFixed(1)} tok/s`);
|
|
137
|
+
}
|
|
138
|
+
|
|
139
|
+
function formatPercent(n) {
|
|
140
|
+
return formatMetric(n, (v) => `${(v * 100).toFixed(0)} %`);
|
|
141
|
+
}
|
|
142
|
+
|
|
143
|
+
export function renderBenchmarkSummary(metadata) {
|
|
144
|
+
const { status, results, runner, error } = metadata;
|
|
145
|
+
|
|
146
|
+
const agentRows = [
|
|
147
|
+
["Status", status === "completed" ? pc.green("completed") : pc.red(status ?? "failed")],
|
|
148
|
+
["Duration", formatMs(results?.wallClockMs)],
|
|
149
|
+
["Agent turns", formatNumber(results?.agentTurns)],
|
|
150
|
+
["Input tokens", formatNumber(runner?.tokenMetrics?.promptTokens)],
|
|
151
|
+
["Output tokens", formatNumber(runner?.tokenMetrics?.completionTokens)],
|
|
152
|
+
["Total tokens", formatNumber(runner?.tokenMetrics?.totalTokens)],
|
|
153
|
+
["Tool calls", formatNumber(results?.toolCalls)],
|
|
154
|
+
["Tool results", formatNumber(results?.toolResults)],
|
|
155
|
+
["Output files", (results?.outputFiles?.length ?? 0) > 0 ? results.outputFiles.join(", ") : pc.dim("—")],
|
|
156
|
+
];
|
|
157
|
+
|
|
158
|
+
console.log("");
|
|
159
|
+
console.log(renderSection("Benchmark Result", renderRows(agentRows)));
|
|
160
|
+
|
|
161
|
+
if (status === "completed" && runner?.speedMetrics) {
|
|
162
|
+
const speed = runner.speedMetrics;
|
|
163
|
+
const speedRows = [
|
|
164
|
+
["Prefill tok/s", formatTokPerSec(speed.prefillTokensPerSecond)],
|
|
165
|
+
["Generation tok/s", formatTokPerSec(speed.generationTokensPerSecond)],
|
|
166
|
+
["TTFT", formatMs(speed.ttftMs)],
|
|
167
|
+
["Speculative decode", formatPercent(speed.speculativeDecodeAcceptance)],
|
|
168
|
+
["KV cache tokens", formatNumber(speed.kvCacheTokens)],
|
|
169
|
+
["Model load time", formatMs(speed.modelLoadMs)],
|
|
170
|
+
["Metric source", speed.metricSource ?? pc.dim("—")],
|
|
171
|
+
];
|
|
172
|
+
console.log(renderSection("Speed Metrics", renderRows(speedRows)));
|
|
173
|
+
} else if (error) {
|
|
174
|
+
const wrappedError = wrapText(error.message ?? "Unknown error");
|
|
175
|
+
console.log(renderSection("Error", pc.red(wrappedError)));
|
|
176
|
+
if (error.message?.includes("write tool") || error.message?.includes("required output file")) {
|
|
177
|
+
const tip = wrapText("Tip: This usually means the model returned the answer as chat text instead of writing the file. Try a model with stronger tool-use support, or run the prompt manually.", 64);
|
|
178
|
+
console.log(pc.dim("\n" + tip));
|
|
179
|
+
}
|
|
180
|
+
}
|
|
181
|
+
}
|
|
182
|
+
|
|
183
|
+
function wrapText(text, width = 64) {
|
|
184
|
+
if (!text) return "";
|
|
185
|
+
const words = text.split(/\s+/);
|
|
186
|
+
const lines = [];
|
|
187
|
+
let current = "";
|
|
188
|
+
for (const word of words) {
|
|
189
|
+
if ((current + " " + word).trim().length > width) {
|
|
190
|
+
if (current) lines.push(current.trim());
|
|
191
|
+
current = word;
|
|
192
|
+
} else {
|
|
193
|
+
current = current ? `${current} ${word}` : word;
|
|
194
|
+
}
|
|
195
|
+
}
|
|
196
|
+
if (current) lines.push(current.trim());
|
|
197
|
+
return lines.join("\n");
|
|
198
|
+
}
|