npm - offgrid-ai - Versions diffs - 0.8.15 → 0.9.2 - Mend

offgrid-ai 0.8.15 → 0.9.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (24) hide show

package/README.md +26 -25
package/package.json +3 -2
package/src/autodetect.mjs +6 -3
package/src/backends.mjs +36 -45
package/src/benchmark/finalize.mjs +198 -0
package/src/benchmark/flow.mjs +237 -0
package/src/benchmark/metrics.mjs +152 -0
package/src/benchmark/pi-runner.mjs +252 -0
package/src/benchmark/prepare.mjs +121 -0
package/src/benchmark/repo.mjs +77 -0
package/src/benchmark/shared.mjs +54 -0
package/src/benchmark/stream-renderer.mjs +274 -0
package/src/benchmark.mjs +10 -1330
package/src/cli.mjs +2 -2
package/src/commands/main.mjs +2 -2
package/src/commands/onboard.mjs +6 -2
package/src/config.mjs +8 -2
package/src/harness-pi.mjs +1 -1
package/src/managed.mjs +3 -3
package/src/model-catalog.mjs +2 -1
package/src/model-name.mjs +220 -0
package/src/process.mjs +29 -21
package/src/runtime.mjs +11 -0
package/src/scan.mjs +9 -20

package/README.md CHANGED Viewed

@@ -2,28 +2,29 @@
 # offgrid-ai
-**Privacy-first CLI for running local AI models on your own machine.**
+**Helper CLI for running local AI models on Mac with llama.cpp, ollama, and oMLX.**
 [![node](https://img.shields.io/badge/node-20%2B-3c873a)](package.json)
 [![platform](https://img.shields.io/badge/platform-macOS%20%7C%20Linux-blue)]()
-Install • Pick a model • Start chatting
-```bash
-curl -fsSL https://raw.githubusercontent.com/eeshansrivastava89/offgrid-ai/main/install.sh | bash
-```
 </div>
 ## What is offgrid-ai?
-offgrid-ai is a command-line tool that lets you run AI models locally. Everything stays on your computer. No API keys, no remote servers, no data leaving your machine.
+offgrid-ai is a command-line tool that lets you run AI models locally. Running local models with llama.cpp, ollama, or oMLX have a steep learning curve compared to cloud-based models, so offgrid-ai is designed to abstract away the complexity, while still providing a powerful and flexible way to run local models.
+This is the recommended workflow:
-It works with:
+1. Download models from **LM Studio**, **Ollama**, or **oMLX**
+2. Do minimal configuration using the `offgrid-ai` command
+3. Run the model with `offgrid-ai` with Pi in interactive mode
-- Models from **LM Studio**
-- **Ollama** models
-- **oMLX** models on Apple Silicon
-- GGUF models from **Hugging Face** or other sources
+## Core Features
+- Auto-detects available models from LM Studio, Ollama, and oMLX
+- Auto-detects MTP (multi-token prediction) or QAT (quantization aware training) models, and applies the correct flags for llama.cpp
+- Auto-applies the optimal flags for the model type in llama.cpp
+- Start / stop llama.cpp server automatically for chat sessions
 ## Quick start
@@ -35,7 +36,7 @@ Open your terminal and run:
 curl -fsSL https://raw.githubusercontent.com/eeshansrivastava89/offgrid-ai/main/install.sh | bash
 ```
-This installs offgrid-ai and anything else it needs. Then open a new terminal window and run:
+This installs offgrid-ai and dependencies (node, npm, and llama.cpp). Then open a new terminal window and run:
 ```bash
 offgrid-ai
@@ -53,14 +54,8 @@ The curl installer is recommended for first-time setup because it also verifies
 The first time you run offgrid-ai, it looks for models already on your machine. If it does not find any, it tells you how to get one.
-Supported ways to get models:
+<img width="808" height="274" alt="image" src="https://github.com/user-attachments/assets/6e1583ab-65db-423c-b0eb-b627586fbf86" />
-| Source | Example command |
-|---|---|
-| LM Studio | `lms get qwen/qwen3.5-9b` |
-| Ollama | `ollama pull gemma3:4b` |
-| oMLX | Use `omlx start` |
-| Hugging Face | Download a GGUF file |
 ### 3. Start chatting
@@ -68,23 +63,29 @@ Supported ways to get models:
 offgrid-ai
 ```
+<img width="786" height="281" alt="image" src="https://github.com/user-attachments/assets/03cb1e06-d461-4bdf-ad82-f0692e5ba5c6" />
 Pick a model from the list and press Enter. offgrid-ai configures the rest and opens the Pi coding agent.
+<img width="786" height="499" alt="image" src="https://github.com/user-attachments/assets/223e1455-c69c-4405-a91c-5bac1b9fc9bd" />
 ## Everyday commands
 ```bash
-offgrid-ai              # start a model
-offgrid-ai status       # see what's running
+offgrid-ai              # primary entry-point for the CLI
+offgrid-ai status       # see if any model is running
 offgrid-ai stop         # stop the running model
-offgrid-ai benchmark    # run a benchmark
+offgrid-ai benchmark    # run a benchmark paired with my local llm benchmark runner
 offgrid-ai uninstall    # remove offgrid-ai
 ```
 ## What can I do with it?
-- **Chat with local models** — no internet required after setup.
-- **Run benchmarks** — compare how different models perform on creative or data-science tasks.
-- **Keep data private** — everything happens on your machine.
+- **Chat with local models** — you download the models yourself, and then offgrid-ai helps configure and run then
+- **Run benchmarks** — compare how different models perform on creative or data-science tasks. Pairs with my other [local llm benchmark runner](https://github.com/eeshansrivastava89/local-llm-visual-benchmark)
+- **Keep data private** — everything runs on your machine without any cloud connections
 ## Need help?

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "offgrid-ai",
-  "version": "0.8.15",
+  "version": "0.9.2",
   "description": "Privacy-first CLI for running local LLMs — discover, configure, run, benchmark",
   "author": "Eeshan Srivastava (https://eeshans.com)",
   "type": "module",
@@ -11,6 +11,7 @@
     "bin/*.mjs",
     "src/*.mjs",
     "src/commands/*.mjs",
+    "src/benchmark/*.mjs",
     "install.sh"
   ],
   "publishConfig": {
@@ -31,7 +32,7 @@
     "start": "node bin/offgrid-ai.mjs",
     "test": "node --test test/*.mjs",
     "test:integration": "OFFGRID_INTEGRATION=1 node --test test/integration/*.mjs",
-    "lint": "eslint src/*.mjs src/commands/*.mjs bin/*.mjs",
+    "lint": "eslint src/*.mjs src/commands/*.mjs src/benchmark/*.mjs bin/*.mjs",
     "check:privacy": "node scripts/privacy-gate.mjs",
     "release:check": "bash scripts/release-check.sh",
     "release:check:fast": "bash scripts/release-check.sh --skip-install --skip-manual",

package/src/autodetect.mjs CHANGED Viewed

@@ -2,13 +2,13 @@ import { basename } from "node:path";
 import { existsSync } from "node:fs";
 import { readGgufMetadata } from "./gguf.mjs";
 import { defaultFlagsForBackend } from "./backends.mjs";
+import { parseModelName } from "./model-name.mjs";
 // ── Detect model capabilities from GGUF metadata ──────────────────────────
 export function detectCapabilities(modelPath, mmprojPath) {
   const meta = safeReadGgufMetadata(modelPath);
   const mmprojMeta = mmprojPath ? safeReadGgufMetadata(mmprojPath) : {};
-  const name = basename(modelPath).toLowerCase();
   const pathHints = String(modelPath).toLowerCase();
   // Architecture
@@ -33,8 +33,11 @@ export function detectCapabilities(modelPath, mmprojPath) {
   // Do not treat all Qwen models as MTP; require an explicit filename or metadata hint.
   const mtp = /\bmtp\b|draft-mtp|multi-token/i.test(pathHints) || Object.keys(meta).some((key) => /mtp|draft|speculative/i.test(key));
-  // Quantization
-  const quant = name.match(/(Q\d_K_[A-Z]+|Q\d_[01]|UD-[A-Z0-9_]+)/i)?.[1] ?? null;
+  // Quantization — use parseModelName (single path) for filename-based extraction.
+  // GGUF metadata does not store a standardized quant field, so the filename
+  // is the authoritative source for quant identification.
+  const parsed = parseModelName(basename(modelPath).replace(/\.gguf$/i, ""), "local-gguf");
+  const quant = parsed.quant;
   // Context size from metadata, fallback to name hints
   const metaCtx = architecture

package/src/backends.mjs CHANGED Viewed

@@ -1,5 +1,6 @@
 import { findLlamaServer } from "./config.mjs";
 import { scanGgufModels } from "./scan.mjs";
+import { parseModelName } from "./model-name.mjs";
 // ── Backend definitions ────────────────────────────────────────────────────
@@ -87,51 +88,47 @@ export function defaultFlagsForBackend(backendId) {
 // ── Ollama model discovery ──────────────────────────────────────────────
 async function scanOllamaModels() {
-  try {
-    const response = await fetch(`${BACKENDS.ollama.apiBaseUrl}/api/tags`, { signal: AbortSignal.timeout(3000) });
-    if (!response.ok) return [];
-    const body = await response.json();
-    if (!Array.isArray(body?.models)) return [];
-    return body.models
-      .filter((model) => isLocalOllamaModel(model))
-      .map((model) => ({
-        id: model.name,
-        label: ollamaLabel(model.name),
-        aliasSuggestion: model.name,
-        sizeBytes: model.size ?? 0,
-        quant: model.details?.quantization_level,
-        family: model.details?.family,
-        backend: "ollama",
-        source: "ollama",
-      })).sort((a, b) => a.label.localeCompare(b.label));
-  } catch {
-    return [];
+  const response = await fetch(`${BACKENDS.ollama.apiBaseUrl}/api/tags`, { signal: AbortSignal.timeout(3000) });
+  if (!response.ok) {
+    throw new Error(`Ollama /api/tags returned ${response.status} ${response.statusText}`);
   }
+  const body = await response.json();
+  if (!Array.isArray(body?.models)) return [];
+  return body.models
+    .filter((model) => isLocalOllamaModel(model))
+    .map((model) => ({
+      id: model.name,
+      label: parseModelName(model.name, "ollama").display,
+      aliasSuggestion: model.name,
+      sizeBytes: model.size ?? 0,
+      quant: model.details?.quantization_level,
+      family: model.details?.family,
+      backend: "ollama",
+      source: "ollama",
+    })).sort((a, b) => a.label.localeCompare(b.label));
 }
 // ── oMLX model discovery ───────────────────────────────────────────────
 async function scanOmlxModels() {
-  try {
-    const response = await fetch(`${BACKENDS.omlx.defaultBaseUrl}/models`, { signal: AbortSignal.timeout(3000) });
-    if (!response.ok) return [];
-    const body = await response.json();
-    if (!Array.isArray(body?.data)) return [];
-    return body.data
-      .filter((model) => isChatOmlxModel(model))
-      .map((model) => ({
-        id: model.id,
-        label: omlxLabel(model.id),
-        aliasSuggestion: model.id,
-        sizeBytes: 0,
-        quant: null,
-        family: null,
-        backend: "omlx",
-        source: "omlx",
-      })).sort((a, b) => a.label.localeCompare(b.label));
-  } catch {
-    return [];
+  const response = await fetch(`${BACKENDS.omlx.defaultBaseUrl}/models`, { signal: AbortSignal.timeout(3000) });
+  if (!response.ok) {
+    throw new Error(`oMLX /models returned ${response.status} ${response.statusText}`);
   }
+  const body = await response.json();
+  if (!Array.isArray(body?.data)) return [];
+  return body.data
+    .filter((model) => isChatOmlxModel(model))
+    .map((model) => ({
+      id: model.id,
+      label: parseModelName(model.id, "omlx").display,
+      aliasSuggestion: model.id,
+      sizeBytes: 0,
+      quant: null,
+      family: null,
+      backend: "omlx",
+      source: "omlx",
+    })).sort((a, b) => a.label.localeCompare(b.label));
 }
 // ── Labels ──────────────────────────────────────────────────────────────
@@ -151,10 +148,4 @@ function isChatOmlxModel(model) {
   return true;
 }
-function ollamaLabel(name) {
-  return name.replace(/[-_]/g, " ").replace(/^gemma\b/i, "Gemma").replace(/^qwen/i, "Qwen");
-}
-function omlxLabel(id) {
-  return id.replace(/[-_]/g, " ").replace(/^gemma-4/i, "Gemma 4").replace(/^qwen/i, "Qwen");
-}
+// (ollamaLabel and omlxLabel removed — parseModelName in model-name.mjs is the single path)

package/src/benchmark/finalize.mjs ADDED Viewed

@@ -0,0 +1,198 @@
+// ── Unload model from server memory after benchmark ────────────────────────────
+import { backendFor } from "../backends.mjs";
+import { apiRootUrl } from "../process.mjs";
+import { existsSync } from "node:fs";
+import { readFile, writeFile } from "node:fs/promises";
+import { join } from "node:path";
+import { pc, renderRows, renderSection } from "../ui.mjs";
+export async function unloadModelFromServer(profile) {
+  const backend = backendFor(profile.backend);
+  if (backend.id === "ollama") {
+    const apiBaseUrl = apiRootUrl(profile.baseUrl || backend.apiBaseUrl || "");
+    try {
+      await fetch(`${apiBaseUrl}/api/generate`, {
+        method: "POST",
+        headers: { "Content-Type": "application/json" },
+        body: JSON.stringify({ model: profile.modelAlias, prompt: "", stream: false, keep_alive: 0 }),
+        signal: AbortSignal.timeout(10000),
+      });
+      return { unloaded: true, backend: backend.id };
+    } catch (err) {
+      return { unloaded: false, backend: backend.id, error: err.message };
+    }
+  }
+  if (backend.id === "llama-cpp" || backend.id === "llama-cpp-mtp") {
+    // llama.cpp unloads when the server process exits; no HTTP unload API exists.
+    // If offgrid-ai started the server, stopProfile already handled it.
+    return { unloaded: false, backend: backend.id, reason: "stop server to unload" };
+  }
+  if (backend.id === "omlx") {
+    // oMLX does not expose a model-unload endpoint. The model stays resident
+    // until the oMLX server process is stopped.
+    return { unloaded: false, backend: backend.id, reason: "no unload API available" };
+  }
+  return { unloaded: false, backend: backend.id, reason: "unsupported backend" };
+}
+export async function finalizeBenchmarkRun(runDirectory, runResult, speedMetrics) {
+  const metadataPath = join(runDirectory, "metadata.json");
+  const metadata = JSON.parse(await readFile(metadataPath, "utf8"));
+  const now = new Date();
+  const timestamp = now.toISOString();
+  const kind = metadata.kind ?? "visual";
+  const isDs = kind === "data-science";
+  const requiredFile = isDs ? "analysis.ipynb" : "index.html";
+  const requiredPath = join(runDirectory, requiredFile);
+  const outputFiles = [];
+  for (const candidate of [requiredFile, isDs ? "summary.json" : "preview.png", isDs ? "chart-distribution.png" : "preview.webm", "preview.mp4"]) {
+    if (existsSync(join(runDirectory, candidate))) {
+      outputFiles.push(candidate);
+    }
+  }
+  const success = existsSync(requiredPath) && (await readFile(requiredPath, "utf8")).trim().length > 0;
+  const hasTurns = runResult.agentTurns > 0;
+  let failureReason = null;
+  if (runResult.error) {
+    failureReason = typeof runResult.error === "string" ? runResult.error : (runResult.error.message ?? "Unknown error");
+  } else if (!hasTurns) {
+    failureReason = "The model did not produce any response turns.";
+  } else if (!success) {
+    if (runResult.toolCalls === 0) {
+      failureReason = `The model finished without writing the required output file (${requiredFile}). It may have returned the response as chat text instead of using the write tool.`;
+    } else {
+      failureReason = `The required output file (${requiredFile}) was missing or empty after the run.`;
+    }
+  }
+  const failed = failureReason !== null;
+  metadata.status = failed ? "failed" : "completed";
+  metadata.updatedAt = timestamp;
+  if (failed) {
+    metadata.failedAt = timestamp;
+  } else {
+    metadata.completedAt = timestamp;
+  }
+  const totalTokens = runResult.promptTokens + runResult.completionTokens;
+  metadata.runner.tokenMetrics = {
+    reported: hasTurns,
+    promptTokens: runResult.promptTokens,
+    completionTokens: runResult.completionTokens,
+    totalTokens,
+  };
+  metadata.runner.speedMetrics = speedMetrics;
+  metadata.runner.metricSource = speedMetrics?.metricSource ?? null;
+  metadata.results = {
+    wallClockMs: runResult.wallClockMs,
+    agentTurns: runResult.agentTurns,
+    toolCalls: runResult.toolCalls,
+    toolResults: runResult.toolResults,
+    success,
+    outputFiles,
+    perTurn: runResult.perTurn,
+  };
+  if (failureReason) {
+    metadata.error = { message: failureReason, ...(typeof runResult.error === "object" && runResult.error?.stack ? { stack: runResult.error.stack } : {}) };
+  } else if (runResult.error) {
+    metadata.error = typeof runResult.error === "string"
+      ? { message: runResult.error }
+      : { message: runResult.error.message ?? "Unknown error", ...(runResult.error.stack ? { stack: runResult.error.stack } : {}) };
+  }
+  await writeFile(metadataPath, JSON.stringify(metadata, null, 2) + "\n", "utf8");
+  return metadata;
+}
+function formatMetric(value, formatter) {
+  if (value === null || value === undefined || !Number.isFinite(value)) return pc.dim("—");
+  return formatter(value);
+}
+function formatMs(ms) {
+  return formatMetric(ms, (n) => (n < 1000 ? `${Math.round(n)} ms` : `${(n / 1000).toFixed(1)} s`));
+}
+function formatNumber(n) {
+  return formatMetric(n, (v) => v.toLocaleString());
+}
+function formatTokPerSec(n) {
+  return formatMetric(n, (v) => `${v.toFixed(1)} tok/s`);
+}
+function formatPercent(n) {
+  return formatMetric(n, (v) => `${(v * 100).toFixed(0)} %`);
+}
+export function renderBenchmarkSummary(metadata) {
+  const { status, results, runner, error } = metadata;
+  const agentRows = [
+    ["Status", status === "completed" ? pc.green("completed") : pc.red(status ?? "failed")],
+    ["Duration", formatMs(results?.wallClockMs)],
+    ["Agent turns", formatNumber(results?.agentTurns)],
+    ["Input tokens", formatNumber(runner?.tokenMetrics?.promptTokens)],
+    ["Output tokens", formatNumber(runner?.tokenMetrics?.completionTokens)],
+    ["Total tokens", formatNumber(runner?.tokenMetrics?.totalTokens)],
+    ["Tool calls", formatNumber(results?.toolCalls)],
+    ["Tool results", formatNumber(results?.toolResults)],
+    ["Output files", (results?.outputFiles?.length ?? 0) > 0 ? results.outputFiles.join(", ") : pc.dim("—")],
+  ];
+  console.log("");
+  console.log(renderSection("Benchmark Result", renderRows(agentRows)));
+  if (status === "completed" && runner?.speedMetrics) {
+    const speed = runner.speedMetrics;
+    const speedRows = [
+      ["Prefill tok/s", formatTokPerSec(speed.prefillTokensPerSecond)],
+      ["Generation tok/s", formatTokPerSec(speed.generationTokensPerSecond)],
+      ["TTFT", formatMs(speed.ttftMs)],
+      ["Speculative decode", formatPercent(speed.speculativeDecodeAcceptance)],
+      ["KV cache tokens", formatNumber(speed.kvCacheTokens)],
+      ["Model load time", formatMs(speed.modelLoadMs)],
+      ["Metric source", speed.metricSource ?? pc.dim("—")],
+    ];
+    console.log(renderSection("Speed Metrics", renderRows(speedRows)));
+  } else if (error) {
+    const wrappedError = wrapText(error.message ?? "Unknown error");
+    console.log(renderSection("Error", pc.red(wrappedError)));
+    if (error.message?.includes("write tool") || error.message?.includes("required output file")) {
+      const tip = wrapText("Tip: This usually means the model returned the answer as chat text instead of writing the file. Try a model with stronger tool-use support, or run the prompt manually.", 64);
+      console.log(pc.dim("\n" + tip));
+    }
+  }
+}
+function wrapText(text, width = 64) {
+  if (!text) return "";
+  const words = text.split(/\s+/);
+  const lines = [];
+  let current = "";
+  for (const word of words) {
+    if ((current + " " + word).trim().length > width) {
+      if (current) lines.push(current.trim());
+      current = word;
+    } else {
+      current = current ? `${current} ${word}` : word;
+    }
+  }
+  if (current) lines.push(current.trim());
+  return lines.join("\n");
+}