offgrid-ai 0.8.15 → 0.9.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -2,28 +2,29 @@
2
2
 
3
3
  # offgrid-ai
4
4
 
5
- **Privacy-first CLI for running local AI models on your own machine.**
5
+ **Helper CLI for running local AI models on Mac with llama.cpp, ollama, and oMLX.**
6
6
 
7
7
  [![node](https://img.shields.io/badge/node-20%2B-3c873a)](package.json)
8
8
  [![platform](https://img.shields.io/badge/platform-macOS%20%7C%20Linux-blue)]()
9
9
 
10
- Install • Pick a model • Start chatting
11
- ```bash
12
- curl -fsSL https://raw.githubusercontent.com/eeshansrivastava89/offgrid-ai/main/install.sh | bash
13
- ```
14
10
 
15
11
  </div>
16
12
 
17
13
  ## What is offgrid-ai?
18
14
 
19
- offgrid-ai is a command-line tool that lets you run AI models locally. Everything stays on your computer. No API keys, no remote servers, no data leaving your machine.
15
+ offgrid-ai is a command-line tool that lets you run AI models locally. Running local models with llama.cpp, ollama, or oMLX have a steep learning curve compared to cloud-based models, so offgrid-ai is designed to abstract away the complexity, while still providing a powerful and flexible way to run local models.
16
+
17
+ This is the recommended workflow:
20
18
 
21
- It works with:
19
+ 1. Download models from **LM Studio**, **Ollama**, or **oMLX**
20
+ 2. Do minimal configuration using the `offgrid-ai` command
21
+ 3. Run the model with `offgrid-ai` with Pi in interactive mode
22
22
 
23
- - Models from **LM Studio**
24
- - **Ollama** models
25
- - **oMLX** models on Apple Silicon
26
- - GGUF models from **Hugging Face** or other sources
23
+ ## Core Features
24
+ - Auto-detects available models from LM Studio, Ollama, and oMLX
25
+ - Auto-detects MTP (multi-token prediction) or QAT (quantization aware training) models, and applies the correct flags for llama.cpp
26
+ - Auto-applies the optimal flags for the model type in llama.cpp
27
+ - Start / stop llama.cpp server automatically for chat sessions
27
28
 
28
29
  ## Quick start
29
30
 
@@ -35,7 +36,7 @@ Open your terminal and run:
35
36
  curl -fsSL https://raw.githubusercontent.com/eeshansrivastava89/offgrid-ai/main/install.sh | bash
36
37
  ```
37
38
 
38
- This installs offgrid-ai and anything else it needs. Then open a new terminal window and run:
39
+ This installs offgrid-ai and dependencies (node, npm, and llama.cpp). Then open a new terminal window and run:
39
40
 
40
41
  ```bash
41
42
  offgrid-ai
@@ -53,14 +54,8 @@ The curl installer is recommended for first-time setup because it also verifies
53
54
 
54
55
  The first time you run offgrid-ai, it looks for models already on your machine. If it does not find any, it tells you how to get one.
55
56
 
56
- Supported ways to get models:
57
+ <img width="808" height="274" alt="image" src="https://github.com/user-attachments/assets/6e1583ab-65db-423c-b0eb-b627586fbf86" />
57
58
 
58
- | Source | Example command |
59
- |---|---|
60
- | LM Studio | `lms get qwen/qwen3.5-9b` |
61
- | Ollama | `ollama pull gemma3:4b` |
62
- | oMLX | Use `omlx start` |
63
- | Hugging Face | Download a GGUF file |
64
59
 
65
60
  ### 3. Start chatting
66
61
 
@@ -68,23 +63,29 @@ Supported ways to get models:
68
63
  offgrid-ai
69
64
  ```
70
65
 
66
+ <img width="786" height="281" alt="image" src="https://github.com/user-attachments/assets/03cb1e06-d461-4bdf-ad82-f0692e5ba5c6" />
67
+
68
+
71
69
  Pick a model from the list and press Enter. offgrid-ai configures the rest and opens the Pi coding agent.
72
70
 
71
+ <img width="786" height="499" alt="image" src="https://github.com/user-attachments/assets/223e1455-c69c-4405-a91c-5bac1b9fc9bd" />
72
+
73
+
73
74
  ## Everyday commands
74
75
 
75
76
  ```bash
76
- offgrid-ai # start a model
77
- offgrid-ai status # see what's running
77
+ offgrid-ai # primary entry-point for the CLI
78
+ offgrid-ai status # see if any model is running
78
79
  offgrid-ai stop # stop the running model
79
- offgrid-ai benchmark # run a benchmark
80
+ offgrid-ai benchmark # run a benchmark paired with my local llm benchmark runner
80
81
  offgrid-ai uninstall # remove offgrid-ai
81
82
  ```
82
83
 
83
84
  ## What can I do with it?
84
85
 
85
- - **Chat with local models** — no internet required after setup.
86
- - **Run benchmarks** — compare how different models perform on creative or data-science tasks.
87
- - **Keep data private** — everything happens on your machine.
86
+ - **Chat with local models** — you download the models yourself, and then offgrid-ai helps configure and run then
87
+ - **Run benchmarks** — compare how different models perform on creative or data-science tasks. Pairs with my other [local llm benchmark runner](https://github.com/eeshansrivastava89/local-llm-visual-benchmark)
88
+ - **Keep data private** — everything runs on your machine without any cloud connections
88
89
 
89
90
  ## Need help?
90
91
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "offgrid-ai",
3
- "version": "0.8.15",
3
+ "version": "0.9.2",
4
4
  "description": "Privacy-first CLI for running local LLMs — discover, configure, run, benchmark",
5
5
  "author": "Eeshan Srivastava (https://eeshans.com)",
6
6
  "type": "module",
@@ -11,6 +11,7 @@
11
11
  "bin/*.mjs",
12
12
  "src/*.mjs",
13
13
  "src/commands/*.mjs",
14
+ "src/benchmark/*.mjs",
14
15
  "install.sh"
15
16
  ],
16
17
  "publishConfig": {
@@ -31,7 +32,7 @@
31
32
  "start": "node bin/offgrid-ai.mjs",
32
33
  "test": "node --test test/*.mjs",
33
34
  "test:integration": "OFFGRID_INTEGRATION=1 node --test test/integration/*.mjs",
34
- "lint": "eslint src/*.mjs src/commands/*.mjs bin/*.mjs",
35
+ "lint": "eslint src/*.mjs src/commands/*.mjs src/benchmark/*.mjs bin/*.mjs",
35
36
  "check:privacy": "node scripts/privacy-gate.mjs",
36
37
  "release:check": "bash scripts/release-check.sh",
37
38
  "release:check:fast": "bash scripts/release-check.sh --skip-install --skip-manual",
@@ -2,13 +2,13 @@ import { basename } from "node:path";
2
2
  import { existsSync } from "node:fs";
3
3
  import { readGgufMetadata } from "./gguf.mjs";
4
4
  import { defaultFlagsForBackend } from "./backends.mjs";
5
+ import { parseModelName } from "./model-name.mjs";
5
6
 
6
7
  // ── Detect model capabilities from GGUF metadata ──────────────────────────
7
8
 
8
9
  export function detectCapabilities(modelPath, mmprojPath) {
9
10
  const meta = safeReadGgufMetadata(modelPath);
10
11
  const mmprojMeta = mmprojPath ? safeReadGgufMetadata(mmprojPath) : {};
11
- const name = basename(modelPath).toLowerCase();
12
12
  const pathHints = String(modelPath).toLowerCase();
13
13
 
14
14
  // Architecture
@@ -33,8 +33,11 @@ export function detectCapabilities(modelPath, mmprojPath) {
33
33
  // Do not treat all Qwen models as MTP; require an explicit filename or metadata hint.
34
34
  const mtp = /\bmtp\b|draft-mtp|multi-token/i.test(pathHints) || Object.keys(meta).some((key) => /mtp|draft|speculative/i.test(key));
35
35
 
36
- // Quantization
37
- const quant = name.match(/(Q\d_K_[A-Z]+|Q\d_[01]|UD-[A-Z0-9_]+)/i)?.[1] ?? null;
36
+ // Quantization — use parseModelName (single path) for filename-based extraction.
37
+ // GGUF metadata does not store a standardized quant field, so the filename
38
+ // is the authoritative source for quant identification.
39
+ const parsed = parseModelName(basename(modelPath).replace(/\.gguf$/i, ""), "local-gguf");
40
+ const quant = parsed.quant;
38
41
 
39
42
  // Context size from metadata, fallback to name hints
40
43
  const metaCtx = architecture
package/src/backends.mjs CHANGED
@@ -1,5 +1,6 @@
1
1
  import { findLlamaServer } from "./config.mjs";
2
2
  import { scanGgufModels } from "./scan.mjs";
3
+ import { parseModelName } from "./model-name.mjs";
3
4
 
4
5
  // ── Backend definitions ────────────────────────────────────────────────────
5
6
 
@@ -87,51 +88,47 @@ export function defaultFlagsForBackend(backendId) {
87
88
  // ── Ollama model discovery ──────────────────────────────────────────────
88
89
 
89
90
  async function scanOllamaModels() {
90
- try {
91
- const response = await fetch(`${BACKENDS.ollama.apiBaseUrl}/api/tags`, { signal: AbortSignal.timeout(3000) });
92
- if (!response.ok) return [];
93
- const body = await response.json();
94
- if (!Array.isArray(body?.models)) return [];
95
- return body.models
96
- .filter((model) => isLocalOllamaModel(model))
97
- .map((model) => ({
98
- id: model.name,
99
- label: ollamaLabel(model.name),
100
- aliasSuggestion: model.name,
101
- sizeBytes: model.size ?? 0,
102
- quant: model.details?.quantization_level,
103
- family: model.details?.family,
104
- backend: "ollama",
105
- source: "ollama",
106
- })).sort((a, b) => a.label.localeCompare(b.label));
107
- } catch {
108
- return [];
91
+ const response = await fetch(`${BACKENDS.ollama.apiBaseUrl}/api/tags`, { signal: AbortSignal.timeout(3000) });
92
+ if (!response.ok) {
93
+ throw new Error(`Ollama /api/tags returned ${response.status} ${response.statusText}`);
109
94
  }
95
+ const body = await response.json();
96
+ if (!Array.isArray(body?.models)) return [];
97
+ return body.models
98
+ .filter((model) => isLocalOllamaModel(model))
99
+ .map((model) => ({
100
+ id: model.name,
101
+ label: parseModelName(model.name, "ollama").display,
102
+ aliasSuggestion: model.name,
103
+ sizeBytes: model.size ?? 0,
104
+ quant: model.details?.quantization_level,
105
+ family: model.details?.family,
106
+ backend: "ollama",
107
+ source: "ollama",
108
+ })).sort((a, b) => a.label.localeCompare(b.label));
110
109
  }
111
110
 
112
111
  // ── oMLX model discovery ───────────────────────────────────────────────
113
112
 
114
113
  async function scanOmlxModels() {
115
- try {
116
- const response = await fetch(`${BACKENDS.omlx.defaultBaseUrl}/models`, { signal: AbortSignal.timeout(3000) });
117
- if (!response.ok) return [];
118
- const body = await response.json();
119
- if (!Array.isArray(body?.data)) return [];
120
- return body.data
121
- .filter((model) => isChatOmlxModel(model))
122
- .map((model) => ({
123
- id: model.id,
124
- label: omlxLabel(model.id),
125
- aliasSuggestion: model.id,
126
- sizeBytes: 0,
127
- quant: null,
128
- family: null,
129
- backend: "omlx",
130
- source: "omlx",
131
- })).sort((a, b) => a.label.localeCompare(b.label));
132
- } catch {
133
- return [];
114
+ const response = await fetch(`${BACKENDS.omlx.defaultBaseUrl}/models`, { signal: AbortSignal.timeout(3000) });
115
+ if (!response.ok) {
116
+ throw new Error(`oMLX /models returned ${response.status} ${response.statusText}`);
134
117
  }
118
+ const body = await response.json();
119
+ if (!Array.isArray(body?.data)) return [];
120
+ return body.data
121
+ .filter((model) => isChatOmlxModel(model))
122
+ .map((model) => ({
123
+ id: model.id,
124
+ label: parseModelName(model.id, "omlx").display,
125
+ aliasSuggestion: model.id,
126
+ sizeBytes: 0,
127
+ quant: null,
128
+ family: null,
129
+ backend: "omlx",
130
+ source: "omlx",
131
+ })).sort((a, b) => a.label.localeCompare(b.label));
135
132
  }
136
133
 
137
134
  // ── Labels ──────────────────────────────────────────────────────────────
@@ -151,10 +148,4 @@ function isChatOmlxModel(model) {
151
148
  return true;
152
149
  }
153
150
 
154
- function ollamaLabel(name) {
155
- return name.replace(/[-_]/g, " ").replace(/^gemma\b/i, "Gemma").replace(/^qwen/i, "Qwen");
156
- }
157
-
158
- function omlxLabel(id) {
159
- return id.replace(/[-_]/g, " ").replace(/^gemma-4/i, "Gemma 4").replace(/^qwen/i, "Qwen");
160
- }
151
+ // (ollamaLabel and omlxLabel removed — parseModelName in model-name.mjs is the single path)
@@ -0,0 +1,198 @@
1
+ // ── Unload model from server memory after benchmark ────────────────────────────
2
+
3
+ import { backendFor } from "../backends.mjs";
4
+ import { apiRootUrl } from "../process.mjs";
5
+ import { existsSync } from "node:fs";
6
+ import { readFile, writeFile } from "node:fs/promises";
7
+ import { join } from "node:path";
8
+ import { pc, renderRows, renderSection } from "../ui.mjs";
9
+
10
+ export async function unloadModelFromServer(profile) {
11
+ const backend = backendFor(profile.backend);
12
+
13
+ if (backend.id === "ollama") {
14
+ const apiBaseUrl = apiRootUrl(profile.baseUrl || backend.apiBaseUrl || "");
15
+
16
+ try {
17
+ await fetch(`${apiBaseUrl}/api/generate`, {
18
+ method: "POST",
19
+ headers: { "Content-Type": "application/json" },
20
+ body: JSON.stringify({ model: profile.modelAlias, prompt: "", stream: false, keep_alive: 0 }),
21
+ signal: AbortSignal.timeout(10000),
22
+ });
23
+ return { unloaded: true, backend: backend.id };
24
+ } catch (err) {
25
+ return { unloaded: false, backend: backend.id, error: err.message };
26
+ }
27
+ }
28
+
29
+ if (backend.id === "llama-cpp" || backend.id === "llama-cpp-mtp") {
30
+ // llama.cpp unloads when the server process exits; no HTTP unload API exists.
31
+ // If offgrid-ai started the server, stopProfile already handled it.
32
+ return { unloaded: false, backend: backend.id, reason: "stop server to unload" };
33
+ }
34
+
35
+ if (backend.id === "omlx") {
36
+ // oMLX does not expose a model-unload endpoint. The model stays resident
37
+ // until the oMLX server process is stopped.
38
+ return { unloaded: false, backend: backend.id, reason: "no unload API available" };
39
+ }
40
+
41
+ return { unloaded: false, backend: backend.id, reason: "unsupported backend" };
42
+ }
43
+
44
+ export async function finalizeBenchmarkRun(runDirectory, runResult, speedMetrics) {
45
+ const metadataPath = join(runDirectory, "metadata.json");
46
+ const metadata = JSON.parse(await readFile(metadataPath, "utf8"));
47
+ const now = new Date();
48
+ const timestamp = now.toISOString();
49
+
50
+ const kind = metadata.kind ?? "visual";
51
+ const isDs = kind === "data-science";
52
+ const requiredFile = isDs ? "analysis.ipynb" : "index.html";
53
+ const requiredPath = join(runDirectory, requiredFile);
54
+
55
+ const outputFiles = [];
56
+ for (const candidate of [requiredFile, isDs ? "summary.json" : "preview.png", isDs ? "chart-distribution.png" : "preview.webm", "preview.mp4"]) {
57
+ if (existsSync(join(runDirectory, candidate))) {
58
+ outputFiles.push(candidate);
59
+ }
60
+ }
61
+
62
+ const success = existsSync(requiredPath) && (await readFile(requiredPath, "utf8")).trim().length > 0;
63
+ const hasTurns = runResult.agentTurns > 0;
64
+
65
+ let failureReason = null;
66
+ if (runResult.error) {
67
+ failureReason = typeof runResult.error === "string" ? runResult.error : (runResult.error.message ?? "Unknown error");
68
+ } else if (!hasTurns) {
69
+ failureReason = "The model did not produce any response turns.";
70
+ } else if (!success) {
71
+ if (runResult.toolCalls === 0) {
72
+ failureReason = `The model finished without writing the required output file (${requiredFile}). It may have returned the response as chat text instead of using the write tool.`;
73
+ } else {
74
+ failureReason = `The required output file (${requiredFile}) was missing or empty after the run.`;
75
+ }
76
+ }
77
+
78
+ const failed = failureReason !== null;
79
+
80
+ metadata.status = failed ? "failed" : "completed";
81
+ metadata.updatedAt = timestamp;
82
+ if (failed) {
83
+ metadata.failedAt = timestamp;
84
+ } else {
85
+ metadata.completedAt = timestamp;
86
+ }
87
+
88
+ const totalTokens = runResult.promptTokens + runResult.completionTokens;
89
+
90
+ metadata.runner.tokenMetrics = {
91
+ reported: hasTurns,
92
+ promptTokens: runResult.promptTokens,
93
+ completionTokens: runResult.completionTokens,
94
+ totalTokens,
95
+ };
96
+
97
+ metadata.runner.speedMetrics = speedMetrics;
98
+ metadata.runner.metricSource = speedMetrics?.metricSource ?? null;
99
+
100
+ metadata.results = {
101
+ wallClockMs: runResult.wallClockMs,
102
+ agentTurns: runResult.agentTurns,
103
+ toolCalls: runResult.toolCalls,
104
+ toolResults: runResult.toolResults,
105
+ success,
106
+ outputFiles,
107
+ perTurn: runResult.perTurn,
108
+ };
109
+
110
+ if (failureReason) {
111
+ metadata.error = { message: failureReason, ...(typeof runResult.error === "object" && runResult.error?.stack ? { stack: runResult.error.stack } : {}) };
112
+ } else if (runResult.error) {
113
+ metadata.error = typeof runResult.error === "string"
114
+ ? { message: runResult.error }
115
+ : { message: runResult.error.message ?? "Unknown error", ...(runResult.error.stack ? { stack: runResult.error.stack } : {}) };
116
+ }
117
+
118
+ await writeFile(metadataPath, JSON.stringify(metadata, null, 2) + "\n", "utf8");
119
+ return metadata;
120
+ }
121
+
122
+ function formatMetric(value, formatter) {
123
+ if (value === null || value === undefined || !Number.isFinite(value)) return pc.dim("—");
124
+ return formatter(value);
125
+ }
126
+
127
+ function formatMs(ms) {
128
+ return formatMetric(ms, (n) => (n < 1000 ? `${Math.round(n)} ms` : `${(n / 1000).toFixed(1)} s`));
129
+ }
130
+
131
+ function formatNumber(n) {
132
+ return formatMetric(n, (v) => v.toLocaleString());
133
+ }
134
+
135
+ function formatTokPerSec(n) {
136
+ return formatMetric(n, (v) => `${v.toFixed(1)} tok/s`);
137
+ }
138
+
139
+ function formatPercent(n) {
140
+ return formatMetric(n, (v) => `${(v * 100).toFixed(0)} %`);
141
+ }
142
+
143
+ export function renderBenchmarkSummary(metadata) {
144
+ const { status, results, runner, error } = metadata;
145
+
146
+ const agentRows = [
147
+ ["Status", status === "completed" ? pc.green("completed") : pc.red(status ?? "failed")],
148
+ ["Duration", formatMs(results?.wallClockMs)],
149
+ ["Agent turns", formatNumber(results?.agentTurns)],
150
+ ["Input tokens", formatNumber(runner?.tokenMetrics?.promptTokens)],
151
+ ["Output tokens", formatNumber(runner?.tokenMetrics?.completionTokens)],
152
+ ["Total tokens", formatNumber(runner?.tokenMetrics?.totalTokens)],
153
+ ["Tool calls", formatNumber(results?.toolCalls)],
154
+ ["Tool results", formatNumber(results?.toolResults)],
155
+ ["Output files", (results?.outputFiles?.length ?? 0) > 0 ? results.outputFiles.join(", ") : pc.dim("—")],
156
+ ];
157
+
158
+ console.log("");
159
+ console.log(renderSection("Benchmark Result", renderRows(agentRows)));
160
+
161
+ if (status === "completed" && runner?.speedMetrics) {
162
+ const speed = runner.speedMetrics;
163
+ const speedRows = [
164
+ ["Prefill tok/s", formatTokPerSec(speed.prefillTokensPerSecond)],
165
+ ["Generation tok/s", formatTokPerSec(speed.generationTokensPerSecond)],
166
+ ["TTFT", formatMs(speed.ttftMs)],
167
+ ["Speculative decode", formatPercent(speed.speculativeDecodeAcceptance)],
168
+ ["KV cache tokens", formatNumber(speed.kvCacheTokens)],
169
+ ["Model load time", formatMs(speed.modelLoadMs)],
170
+ ["Metric source", speed.metricSource ?? pc.dim("—")],
171
+ ];
172
+ console.log(renderSection("Speed Metrics", renderRows(speedRows)));
173
+ } else if (error) {
174
+ const wrappedError = wrapText(error.message ?? "Unknown error");
175
+ console.log(renderSection("Error", pc.red(wrappedError)));
176
+ if (error.message?.includes("write tool") || error.message?.includes("required output file")) {
177
+ const tip = wrapText("Tip: This usually means the model returned the answer as chat text instead of writing the file. Try a model with stronger tool-use support, or run the prompt manually.", 64);
178
+ console.log(pc.dim("\n" + tip));
179
+ }
180
+ }
181
+ }
182
+
183
+ function wrapText(text, width = 64) {
184
+ if (!text) return "";
185
+ const words = text.split(/\s+/);
186
+ const lines = [];
187
+ let current = "";
188
+ for (const word of words) {
189
+ if ((current + " " + word).trim().length > width) {
190
+ if (current) lines.push(current.trim());
191
+ current = word;
192
+ } else {
193
+ current = current ? `${current} ${word}` : word;
194
+ }
195
+ }
196
+ if (current) lines.push(current.trim());
197
+ return lines.join("\n");
198
+ }