preppergpt 0.1.1 → 0.1.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +31 -11
- package/compose/preppergpt.yaml +8 -11
- package/docs/hardware.md +27 -2
- package/docs/model-sources.md +4 -2
- package/docs/preppergpt-local-parity-map.md +8 -2
- package/installer/cli.mjs +42 -9
- package/installer/lib/detect.mjs +241 -5
- package/installer/lib/planner.mjs +82 -12
- package/installer/lib/render.mjs +70 -4
- package/package.json +5 -3
- package/profiles/models.json +41 -13
- package/services/local-scheduler/app.py +18 -16
package/README.md
CHANGED
|
@@ -1,13 +1,20 @@
|
|
|
1
1
|
# PrepperGPT
|
|
2
2
|
|
|
3
|
-
PrepperGPT packages a local-first ChatGPT-like experience for
|
|
4
|
-
|
|
5
|
-
|
|
6
|
-
|
|
7
|
-
|
|
8
|
-
|
|
9
|
-
|
|
10
|
-
|
|
3
|
+
PrepperGPT packages a local-first ChatGPT-like experience for post-apocalyptic
|
|
4
|
+
or long-duration outage scenarios where hosted AI services are unavailable. It
|
|
5
|
+
uses upstream OpenWebUI for the app shell and adds a hardware detector, model
|
|
6
|
+
planner, Docker Compose runtime, local sidecars, and a practical PrepperGPT
|
|
7
|
+
field-kit theme.
|
|
8
|
+
|
|
9
|
+
PrepperGPT supports Linux first, including NVIDIA CUDA GPUs, Linux AMD ROCm
|
|
10
|
+
GPUs, and CPU fallback where possible. Windows users should install and run it
|
|
11
|
+
inside WSL2; native Windows installs are intentionally rejected until the native
|
|
12
|
+
runtime path is reliable. It is an online installer: model and container
|
|
13
|
+
downloads require a working network during setup.
|
|
14
|
+
|
|
15
|
+
PrepperGPT optimizes for survivability over cloud-like latency. On very large
|
|
16
|
+
local models, very low tokens/sec is acceptable because the alternative in the
|
|
17
|
+
target scenario is no assistant at all.
|
|
11
18
|
|
|
12
19
|
## Install
|
|
13
20
|
|
|
@@ -35,6 +42,9 @@ node bin/preppergpt.js install --profile balanced
|
|
|
35
42
|
node bin/preppergpt.js start
|
|
36
43
|
```
|
|
37
44
|
|
|
45
|
+
Windows users should install Ubuntu in WSL2, enable Docker Desktop's WSL
|
|
46
|
+
integration, and run the npm or GitHub install commands inside the WSL2 shell.
|
|
47
|
+
|
|
38
48
|
Other profiles:
|
|
39
49
|
|
|
40
50
|
```bash
|
|
@@ -68,9 +78,11 @@ preppergpt bundle whisper
|
|
|
68
78
|
## Profiles
|
|
69
79
|
|
|
70
80
|
- `intelligence`: chooses the strongest local reasoning route that fits the
|
|
71
|
-
machine, preferring GLM 5.2
|
|
81
|
+
machine, preferring GLM 5.2 Q8 on enterprise hardware, then GLM 5.2 Q4, then
|
|
82
|
+
long-context coding routes when available.
|
|
72
83
|
- `speed`: chooses smaller GPU-friendly routes and makes low-latency chat the
|
|
73
|
-
default.
|
|
84
|
+
default. NVIDIA hosts use CUDA container access; Linux AMD hosts use the
|
|
85
|
+
Ollama ROCm image and ROCm device mounts when ROCm is detected.
|
|
74
86
|
- `balanced`: uses the local auto-router as the default and keeps reasoning,
|
|
75
87
|
coding, research, vision, image, and STT routes additive.
|
|
76
88
|
|
|
@@ -85,10 +97,18 @@ by default and mounted into OpenWebUI, so speech-to-text works from local files
|
|
|
85
97
|
after setup.
|
|
86
98
|
|
|
87
99
|
Some other routes can be pulled by the runtime, while very large routes such as
|
|
88
|
-
GLM 5.2 Q4 and Flux weights are marked as manual or external in
|
|
100
|
+
GLM 5.2 Q8/Q4 and Flux weights are marked as manual or external in
|
|
89
101
|
`profiles/models.json`. `preppergpt doctor` reports which selected routes still
|
|
90
102
|
need local files or endpoints.
|
|
91
103
|
|
|
104
|
+
AMD acceleration requires a Linux ROCm host with `rocm-smi` or `rocminfo`
|
|
105
|
+
available. AMD cards detected without ROCm stay on CPU-compatible routes and
|
|
106
|
+
receive a doctor warning instead of a broken GPU configuration.
|
|
107
|
+
|
|
108
|
+
The GLM 5.2 Q8 route is intended for an enterprise/off-grid bunker-class host:
|
|
109
|
+
large RAM, fast NVMe, and patience for slow local generation when no hosted
|
|
110
|
+
service remains available.
|
|
111
|
+
|
|
92
112
|
## Publishing
|
|
93
113
|
|
|
94
114
|
The package is designed to be published as:
|
package/compose/preppergpt.yaml
CHANGED
|
@@ -27,8 +27,8 @@ services:
|
|
|
27
27
|
ENABLE_OLLAMA_API: "True"
|
|
28
28
|
OLLAMA_BASE_URLS: "${OLLAMA_BASE_URL:-http://127.0.0.1:11434}"
|
|
29
29
|
ENABLE_OPENAI_API: "True"
|
|
30
|
-
OPENAI_API_BASE_URLS: "${SLOCODE_BASE_URL:-http://127.0.0.1:11438/v1};${GLM52_BASE_URL:-http://127.0.0.1:11441/v1};http://127.0.0.1:18041/v1;http://127.0.0.1:18043/v1;http://127.0.0.1:18044/v1"
|
|
31
|
-
OPENAI_API_KEYS: "slopcode;glm52;deep-research;local-agent;local-vision"
|
|
30
|
+
OPENAI_API_BASE_URLS: "${GLM52_Q8_BASE_URL:-http://127.0.0.1:11446/v1};${SLOCODE_BASE_URL:-http://127.0.0.1:11438/v1};${GLM52_BASE_URL:-http://127.0.0.1:11441/v1};http://127.0.0.1:18041/v1;http://127.0.0.1:18043/v1;http://127.0.0.1:18044/v1"
|
|
31
|
+
OPENAI_API_KEYS: "glm52-q8;slopcode;glm52;deep-research;local-agent;local-vision"
|
|
32
32
|
ENABLE_DIRECT_CONNECTIONS: "True"
|
|
33
33
|
DEFAULT_MODELS: "${PREPPERGPT_DEFAULT_MODEL:-local-chatgpt-auto}"
|
|
34
34
|
MODEL_ORDER_LIST: "${PREPPERGPT_MODEL_ORDER_LIST:-[\"local-chatgpt-auto\"]}"
|
|
@@ -88,7 +88,7 @@ services:
|
|
|
88
88
|
PORT: "${PREPPERGPT_PORT:-8080}"
|
|
89
89
|
|
|
90
90
|
ollama:
|
|
91
|
-
image: ollama/ollama:latest
|
|
91
|
+
image: ${OLLAMA_IMAGE:-ollama/ollama:latest}
|
|
92
92
|
container_name: preppergpt-ollama
|
|
93
93
|
restart: unless-stopped
|
|
94
94
|
network_mode: host
|
|
@@ -149,8 +149,8 @@ services:
|
|
|
149
149
|
DEEP_RESEARCH_PORT: "18041"
|
|
150
150
|
DEEP_RESEARCH_PUBLIC_BASE_URL: "http://127.0.0.1:18041"
|
|
151
151
|
DEEP_RESEARCH_MODEL_ID: "deep-research-glm52"
|
|
152
|
-
DEEP_RESEARCH_MODEL: "${
|
|
153
|
-
DEEP_RESEARCH_GLM_BASE_URL: "${
|
|
152
|
+
DEEP_RESEARCH_MODEL: "${PREPPERGPT_GLM_MODEL:-glm52-q4-local}"
|
|
153
|
+
DEEP_RESEARCH_GLM_BASE_URL: "${PREPPERGPT_GLM_BASE_URL:-http://127.0.0.1:11441/v1}"
|
|
154
154
|
DEEP_RESEARCH_SEARXNG_URL: "http://127.0.0.1:18080/search"
|
|
155
155
|
DEEP_RESEARCH_TIKA_URL: "http://127.0.0.1:9998/tika"
|
|
156
156
|
DEEP_RESEARCH_LOCAL_APP_CONNECTOR_URL: "http://127.0.0.1:18042"
|
|
@@ -187,16 +187,13 @@ services:
|
|
|
187
187
|
network_mode: host
|
|
188
188
|
volumes:
|
|
189
189
|
- ${PREPPERGPT_DATA_DIR:?set PREPPERGPT_DATA_DIR}/local-agent:/data
|
|
190
|
-
- /tmp/.X11-unix:/tmp/.X11-unix:rw
|
|
191
|
-
- ${XDG_RUNTIME_DIR:-/run/user/1000}:${XDG_RUNTIME_DIR:-/run/user/1000}:rw
|
|
192
|
-
- ${XAUTHORITY:-/tmp/.preppergpt-missing-xauthority}:/tmp/.Xauthority:ro
|
|
193
190
|
environment:
|
|
194
191
|
LOCAL_AGENT_HOST: "127.0.0.1"
|
|
195
192
|
LOCAL_AGENT_PORT: "18043"
|
|
196
193
|
LOCAL_AGENT_PUBLIC_BASE_URL: "http://127.0.0.1:18043"
|
|
197
194
|
LOCAL_AGENT_MODEL_ID: "local-agent-glm52"
|
|
198
|
-
LOCAL_AGENT_GLM_MODEL: "glm52-q4-local"
|
|
199
|
-
LOCAL_AGENT_GLM_BASE_URL: "${
|
|
195
|
+
LOCAL_AGENT_GLM_MODEL: "${PREPPERGPT_GLM_MODEL:-glm52-q4-local}"
|
|
196
|
+
LOCAL_AGENT_GLM_BASE_URL: "${PREPPERGPT_GLM_BASE_URL:-http://127.0.0.1:11441/v1}"
|
|
200
197
|
LOCAL_AGENT_AUTO_ROUTER_MODEL_ID: "local-auto-router"
|
|
201
198
|
LOCAL_AGENT_AUTO_ROUTER_FAST_MODEL: "gemma4:12b-256k-gpu"
|
|
202
199
|
LOCAL_AGENT_AUTO_ROUTER_FAST_BASE_URL: "http://127.0.0.1:11434/v1"
|
|
@@ -210,7 +207,7 @@ services:
|
|
|
210
207
|
LOCAL_AGENT_TIKA_URL: "http://127.0.0.1:9998/tika"
|
|
211
208
|
LOCAL_AGENT_SCHEDULER_URL: "http://127.0.0.1:18042"
|
|
212
209
|
LOCAL_AGENT_PLAYWRIGHT_WS_URL: "ws://127.0.0.1:18045"
|
|
213
|
-
LOCAL_AGENT_DESKTOP_ENABLED: "${LOCAL_AGENT_DESKTOP_ENABLED:-
|
|
210
|
+
LOCAL_AGENT_DESKTOP_ENABLED: "${LOCAL_AGENT_DESKTOP_ENABLED:-0}"
|
|
214
211
|
DISPLAY: "${DISPLAY:-}"
|
|
215
212
|
WAYLAND_DISPLAY: "${WAYLAND_DISPLAY:-}"
|
|
216
213
|
XDG_RUNTIME_DIR: "${XDG_RUNTIME_DIR:-/run/user/1000}"
|
package/docs/hardware.md
CHANGED
|
@@ -1,15 +1,40 @@
|
|
|
1
1
|
# Hardware Guide
|
|
2
2
|
|
|
3
|
-
PrepperGPT works best on Linux with
|
|
4
|
-
model weights.
|
|
3
|
+
PrepperGPT works best on Linux with a supported GPU and enough NVMe space for
|
|
4
|
+
model weights. NVIDIA CUDA is supported, Linux AMD ROCm is supported when
|
|
5
|
+
`rocm-smi` or `rocminfo` is available, and CPU fallback remains available for
|
|
6
|
+
smaller routes. Windows users should run PrepperGPT inside WSL2; native Windows
|
|
7
|
+
installs are rejected with WSL2 guidance. It is designed for post-apocalyptic or
|
|
8
|
+
long-duration outage scenarios, so the high-end GLM tiers deliberately favor
|
|
9
|
+
local availability and answer quality over hosted-service latency.
|
|
5
10
|
|
|
6
11
|
Recommended starting points:
|
|
7
12
|
|
|
8
13
|
- Speed profile: 16 GB RAM, 8-12 GB VRAM, 40 GB free disk.
|
|
9
14
|
- Balanced profile: 32-64 GB RAM, 12-24 GB VRAM, 120 GB free disk.
|
|
15
|
+
- Linux AMD ROCm profile: 32-128 GB RAM, 12-24+ GB AMD VRAM, ROCm tools
|
|
16
|
+
installed, and Docker access to `/dev/kfd` and `/dev/dri`.
|
|
10
17
|
- Intelligence profile: 96 GB RAM or more, fast NVMe, and hundreds of GB free
|
|
11
18
|
for GLM 5.2 Q4 or similar large weights.
|
|
19
|
+
- Enterprise 8-bit GLM tier: 256 GB RAM or more, 48-80 GB VRAM preferred,
|
|
20
|
+
and 1.5-2 TB of fast NVMe for GLM 5.2 Q8 plus working/cache room.
|
|
12
21
|
|
|
13
22
|
The installer reserves about 15-20% VRAM headroom when deciding whether a model
|
|
14
23
|
fits. If a large manual model is selected, `preppergpt doctor` explains the
|
|
15
24
|
endpoint or file path that must be provided.
|
|
25
|
+
|
|
26
|
+
Very low tokens/sec is acceptable for the GLM 5.2 Q8 tier because that tier is
|
|
27
|
+
for situations where there is no cloud model to fall back to.
|
|
28
|
+
|
|
29
|
+
## Hardware Matrix
|
|
30
|
+
|
|
31
|
+
| Tier | Typical specs | PrepperGPT routes |
|
|
32
|
+
| --- | --- | --- |
|
|
33
|
+
| Basic CPU laptop | 16 GB RAM, no GPU, 80 GB disk | `local-chatgpt-auto`, `llama3.1:8b`, `local-vision-moondream2`, bundled Whisper |
|
|
34
|
+
| Mid NVIDIA | 64 GB RAM, 12 GB usable VRAM, 250 GB disk | Gemma fast lane, Qwen coder fallback, local vision, bundled Whisper |
|
|
35
|
+
| Mid AMD ROCm | 64 GB RAM, 12-24 GB usable AMD VRAM, ROCm on Linux, 250 GB disk | Ollama ROCm Gemma fast lane, Qwen coder fallback, local vision through Ollama, bundled Whisper |
|
|
36
|
+
| AMD without ROCm | 32-64 GB RAM, AMD GPU detected, no ROCm tools | CPU-compatible routes plus a doctor warning to install ROCm for acceleration |
|
|
37
|
+
| Windows WSL2 | 32+ GB RAM, Docker Desktop WSL integration, WSL2 Ubuntu | Linux-style install inside WSL2; native Windows install is rejected |
|
|
38
|
+
| High NVIDIA | 128 GB RAM, 24 GB VRAM, 750 GB NVMe | GLM 5.2 Q4 configured, Slopcode/Qwen configured, Gemma fast lane, Flux configured |
|
|
39
|
+
| Full PrepperGPT rig | 128+ GB RAM, 24+ GB VRAM, 1 TB NVMe, GLM/Slopcode/Flux files present | GLM 5.2 Q4 primary, Slopcode coding, Gemma fast lane, Deep Research, Agent, Vision, Flux, Whisper |
|
|
40
|
+
| Enterprise 8-bit GLM rig | 256+ GB RAM, 48-80+ GB VRAM preferred, 1.5-2 TB fast NVMe | `glm52-q8-local` primary for Max Intelligence, `glm52-q4-local` fallback, Slopcode/Qwen coding, Gemma fast lane, full sidecar stack |
|
package/docs/model-sources.md
CHANGED
|
@@ -2,11 +2,13 @@
|
|
|
2
2
|
|
|
3
3
|
PrepperGPT separates routing from model licensing and distribution.
|
|
4
4
|
|
|
5
|
-
- Ollama models are pulled by the local Ollama runtime when available.
|
|
5
|
+
- Ollama models are pulled by the local Ollama runtime when available. NVIDIA
|
|
6
|
+
hosts use the standard Ollama image; Linux AMD ROCm hosts use
|
|
7
|
+
`ollama/ollama:rocm` with `/dev/kfd` and `/dev/dri` exposed.
|
|
6
8
|
- Whisper Base STT is installer-cached from `Systran/faster-whisper-base`
|
|
7
9
|
under the local PrepperGPT model directory and mounted into OpenWebUI.
|
|
8
10
|
- Hugging Face vision models are downloaded by the local vision sidecar.
|
|
9
|
-
- Very large GLM, Slopcode, and Flux assets are marked as manual or external
|
|
11
|
+
- Very large GLM Q8/Q4, Slopcode, and Flux assets are marked as manual or external
|
|
10
12
|
until a license-compatible public download source is configured.
|
|
11
13
|
|
|
12
14
|
Manual routes are still added to OpenWebUI. They become live when their local
|
|
@@ -1,10 +1,12 @@
|
|
|
1
1
|
# PrepperGPT Local Parity Map
|
|
2
2
|
|
|
3
|
-
PrepperGPT packages the local ChatGPT-like stack around OpenWebUI
|
|
3
|
+
PrepperGPT packages the local ChatGPT-like stack around OpenWebUI for resilient
|
|
4
|
+
local use when hosted AI services are unavailable:
|
|
4
5
|
|
|
5
6
|
- OpenWebUI UI at `http://127.0.0.1:8080`
|
|
6
7
|
- Ollama fast local models at `http://127.0.0.1:11434`
|
|
7
|
-
- Optional GLM 5.2 route at `http://127.0.0.1:
|
|
8
|
+
- Optional GLM 5.2 Q8 route at `http://127.0.0.1:11446/v1`
|
|
9
|
+
- Optional GLM 5.2 Q4 route at `http://127.0.0.1:11441/v1`
|
|
8
10
|
- Optional Slopcode/Qwen route at `http://127.0.0.1:11438/v1`
|
|
9
11
|
- Deep research sidecar at `http://127.0.0.1:18041/v1`
|
|
10
12
|
- Local scheduler connector at `http://127.0.0.1:18042`
|
|
@@ -12,5 +14,9 @@ PrepperGPT packages the local ChatGPT-like stack around OpenWebUI:
|
|
|
12
14
|
- Local vision sidecar at `http://127.0.0.1:18044/v1`
|
|
13
15
|
- SearXNG, Tika, Jupyter, and ComfyUI support services
|
|
14
16
|
|
|
17
|
+
Hardware support is additive: Linux NVIDIA uses CUDA container access, Linux AMD
|
|
18
|
+
uses ROCm when available, CPU fallback remains available, and Windows support is
|
|
19
|
+
through WSL2 rather than native Windows.
|
|
20
|
+
|
|
15
21
|
The local goal is functional local parity for common ChatGPT workflows, not
|
|
16
22
|
hosted frontier-model quality or cloud account continuity.
|
package/installer/cli.mjs
CHANGED
|
@@ -2,12 +2,12 @@ import fs from "node:fs";
|
|
|
2
2
|
import http from "node:http";
|
|
3
3
|
import { ensureWhisperBundle, modelDirs, whisperBundleStatus } from "./lib/bundles.mjs";
|
|
4
4
|
import { detectMachine } from "./lib/detect.mjs";
|
|
5
|
-
import { buildPlan, normalizeProfile } from "./lib/planner.mjs";
|
|
5
|
+
import { buildPlan, installSupportError, normalizeProfile } from "./lib/planner.mjs";
|
|
6
6
|
import { packageRoot, runtimePaths } from "./lib/paths.mjs";
|
|
7
7
|
import { renderInstall } from "./lib/render.mjs";
|
|
8
8
|
import { commandResult, parseArgs, readJson, shellQuote } from "./lib/util.mjs";
|
|
9
9
|
|
|
10
|
-
const VERSION = "0.1.
|
|
10
|
+
const VERSION = "0.1.3";
|
|
11
11
|
|
|
12
12
|
function usage() {
|
|
13
13
|
return `PrepperGPT ${VERSION}
|
|
@@ -34,6 +34,22 @@ function profileFrom(flags) {
|
|
|
34
34
|
return normalizeProfile(flags.profile || flags.mode || "balanced");
|
|
35
35
|
}
|
|
36
36
|
|
|
37
|
+
function requiredToolStatuses(detection) {
|
|
38
|
+
return {
|
|
39
|
+
docker: Boolean(detection.tools?.docker),
|
|
40
|
+
dockerCompose: Boolean(detection.tools?.dockerCompose),
|
|
41
|
+
curl: Boolean(detection.tools?.curl),
|
|
42
|
+
"python3 or python": Boolean(detection.tools?.python3 || detection.tools?.python)
|
|
43
|
+
};
|
|
44
|
+
}
|
|
45
|
+
|
|
46
|
+
function assertSupportedInstall(detection) {
|
|
47
|
+
const message = installSupportError(detection);
|
|
48
|
+
if (message) {
|
|
49
|
+
throw new Error(message);
|
|
50
|
+
}
|
|
51
|
+
}
|
|
52
|
+
|
|
37
53
|
function composeArgs(paths) {
|
|
38
54
|
return ["compose", "--env-file", paths.envFile, "-f", `${packageRoot}/compose/preppergpt.yaml`, "-f", paths.generatedCompose];
|
|
39
55
|
}
|
|
@@ -88,19 +104,20 @@ async function commandDetect(flags) {
|
|
|
88
104
|
printJson(detection);
|
|
89
105
|
return;
|
|
90
106
|
}
|
|
91
|
-
console.log(`Host: ${detection.hostname} (${detection.platform}/${detection.arch})`);
|
|
107
|
+
console.log(`Host: ${detection.hostname} (${detection.platformKind || detection.platform}/${detection.arch})`);
|
|
92
108
|
console.log(`CPU: ${detection.cpu.cores} cores, ${detection.cpu.model}`);
|
|
93
109
|
console.log(`RAM: ${detection.memory.totalGb} GB total, ${detection.memory.freeGb} GB free`);
|
|
94
110
|
const bestDisk = detection.disks[0];
|
|
95
111
|
console.log(`Disk: ${bestDisk ? `${bestDisk.freeGb.toFixed(1)} GB free at ${bestDisk.mount}` : "not detected"}`);
|
|
96
112
|
if (detection.gpus.length) {
|
|
97
113
|
for (const gpu of detection.gpus) {
|
|
98
|
-
|
|
114
|
+
const memory = gpu.totalVramGb ? `${gpu.totalVramGb} GB VRAM, ${gpu.freeVramGb ?? "unknown"} GB free` : "VRAM unknown";
|
|
115
|
+
console.log(`GPU ${gpu.index}: ${gpu.vendor}/${gpu.runtime || "unknown"} ${gpu.name}, ${memory}`);
|
|
99
116
|
}
|
|
100
117
|
} else {
|
|
101
|
-
console.log("GPU: no
|
|
118
|
+
console.log("GPU: no supported GPU detected");
|
|
102
119
|
}
|
|
103
|
-
const missing = Object.entries(detection
|
|
120
|
+
const missing = Object.entries(requiredToolStatuses(detection)).filter(([, present]) => !present).map(([tool]) => tool);
|
|
104
121
|
console.log(`Tools: ${missing.length ? `missing ${missing.join(", ")}` : "all required tools present"}`);
|
|
105
122
|
}
|
|
106
123
|
|
|
@@ -117,6 +134,7 @@ async function commandPlan(flags) {
|
|
|
117
134
|
async function commandInstall(flags) {
|
|
118
135
|
const home = flags.home;
|
|
119
136
|
const detection = await detectMachine();
|
|
137
|
+
assertSupportedInstall(detection);
|
|
120
138
|
const plan = buildPlan(detection, profileFrom(flags));
|
|
121
139
|
if (flags.dry_run) {
|
|
122
140
|
printPlan(plan);
|
|
@@ -140,6 +158,7 @@ async function commandInstall(flags) {
|
|
|
140
158
|
async function commandSwitchProfile(flags) {
|
|
141
159
|
const paths = runtimePaths(flags.home);
|
|
142
160
|
const detection = await detectMachine();
|
|
161
|
+
assertSupportedInstall(detection);
|
|
143
162
|
const plan = buildPlan(detection, profileFrom(flags));
|
|
144
163
|
renderInstall(plan, detection, { home: paths.root });
|
|
145
164
|
console.log(`Switched PrepperGPT to ${plan.profile}.`);
|
|
@@ -199,9 +218,23 @@ async function commandDoctor(flags) {
|
|
|
199
218
|
const plan = buildPlan(detection, profileFrom(flags));
|
|
200
219
|
printPlan(plan);
|
|
201
220
|
console.log("\nDoctor:");
|
|
202
|
-
const
|
|
203
|
-
|
|
204
|
-
console.log(`
|
|
221
|
+
const supportError = installSupportError(detection);
|
|
222
|
+
if (supportError) {
|
|
223
|
+
console.log(` platform: unsupported (${supportError})`);
|
|
224
|
+
} else {
|
|
225
|
+
console.log(` platform: ok (${detection.platformKind || detection.platform})`);
|
|
226
|
+
}
|
|
227
|
+
for (const [tool, present] of Object.entries(requiredToolStatuses(detection))) {
|
|
228
|
+
console.log(` ${tool}: ${present ? "ok" : "missing"}`);
|
|
229
|
+
}
|
|
230
|
+
const amdGpu = detection.gpus.some((gpu) => gpu.vendor === "amd");
|
|
231
|
+
const nvidiaGpu = detection.gpus.some((gpu) => gpu.vendor === "nvidia");
|
|
232
|
+
if (nvidiaGpu) {
|
|
233
|
+
console.log(` nvidia cuda: ${detection.tools.nvidiaSmi ? "ok" : "missing nvidia-smi"}`);
|
|
234
|
+
}
|
|
235
|
+
if (amdGpu) {
|
|
236
|
+
const rocmReady = detection.tools.rocmSmi || detection.tools.rocminfo;
|
|
237
|
+
console.log(` amd rocm: ${rocmReady ? "ok" : "missing rocm-smi or rocminfo"}`);
|
|
205
238
|
}
|
|
206
239
|
for (const [port, entry] of Object.entries(detection.ports)) {
|
|
207
240
|
if (!entry.free) {
|
package/installer/lib/detect.mjs
CHANGED
|
@@ -6,6 +6,31 @@ import { commandExists, commandResult, gb } from "./util.mjs";
|
|
|
6
6
|
|
|
7
7
|
const DEFAULT_PORTS = [8080, 11434, 11438, 11441, 18041, 18042, 18043, 18044, 18045, 18080, 8188, 8888, 9998];
|
|
8
8
|
|
|
9
|
+
function readFileMaybe(file) {
|
|
10
|
+
try {
|
|
11
|
+
return fs.readFileSync(file, "utf8");
|
|
12
|
+
} catch {
|
|
13
|
+
return "";
|
|
14
|
+
}
|
|
15
|
+
}
|
|
16
|
+
|
|
17
|
+
export function detectPlatformKind() {
|
|
18
|
+
if (process.platform === "win32") {
|
|
19
|
+
return "windows-native";
|
|
20
|
+
}
|
|
21
|
+
if (process.platform === "darwin") {
|
|
22
|
+
return "macos";
|
|
23
|
+
}
|
|
24
|
+
if (process.platform !== "linux") {
|
|
25
|
+
return "unknown";
|
|
26
|
+
}
|
|
27
|
+
const release = `${readFileMaybe("/proc/sys/kernel/osrelease")}\n${readFileMaybe("/proc/version")}`.toLowerCase();
|
|
28
|
+
if (process.env.WSL_DISTRO_NAME || process.env.WSL_INTEROP || release.includes("microsoft") || release.includes("wsl")) {
|
|
29
|
+
return "wsl2";
|
|
30
|
+
}
|
|
31
|
+
return "linux";
|
|
32
|
+
}
|
|
33
|
+
|
|
9
34
|
function parseDf(target) {
|
|
10
35
|
const result = commandResult("df", ["-Pk", target], { timeoutMs: 5000 });
|
|
11
36
|
if (!result.ok) {
|
|
@@ -47,7 +72,57 @@ function candidateDiskPaths() {
|
|
|
47
72
|
});
|
|
48
73
|
}
|
|
49
74
|
|
|
50
|
-
function
|
|
75
|
+
function usableVram(totalVramGb) {
|
|
76
|
+
return Math.round(totalVramGb * 0.82 * 10) / 10;
|
|
77
|
+
}
|
|
78
|
+
|
|
79
|
+
function normalizeGb(value) {
|
|
80
|
+
const number = Number(value);
|
|
81
|
+
return Number.isFinite(number) ? Math.round(number * 10) / 10 : null;
|
|
82
|
+
}
|
|
83
|
+
|
|
84
|
+
function bytesToGb(value) {
|
|
85
|
+
const number = Number(value);
|
|
86
|
+
return Number.isFinite(number) ? normalizeGb(number / 1024 ** 3) : null;
|
|
87
|
+
}
|
|
88
|
+
|
|
89
|
+
function memoryToGb(value, fallbackUnit = "bytes") {
|
|
90
|
+
if (value === null || value === undefined) {
|
|
91
|
+
return null;
|
|
92
|
+
}
|
|
93
|
+
const text = String(value).trim();
|
|
94
|
+
const matches = [...text.matchAll(/(\d+(?:\.\d+)?)/g)];
|
|
95
|
+
const number = Number(matches.at(-1)?.[1]);
|
|
96
|
+
if (!Number.isFinite(number)) {
|
|
97
|
+
return null;
|
|
98
|
+
}
|
|
99
|
+
const lower = text.toLowerCase();
|
|
100
|
+
if (/(gib|gb)/.test(lower)) {
|
|
101
|
+
return normalizeGb(number);
|
|
102
|
+
}
|
|
103
|
+
if (/(mib|mb)/.test(lower)) {
|
|
104
|
+
return normalizeGb(number / 1024);
|
|
105
|
+
}
|
|
106
|
+
if (/(kib|kb)/.test(lower)) {
|
|
107
|
+
return normalizeGb(number / 1024 / 1024);
|
|
108
|
+
}
|
|
109
|
+
if (/\(b\)|bytes?/.test(lower) || number > 1024 ** 3) {
|
|
110
|
+
return bytesToGb(number);
|
|
111
|
+
}
|
|
112
|
+
if (fallbackUnit === "mib" || number > 1024) {
|
|
113
|
+
return normalizeGb(number / 1024);
|
|
114
|
+
}
|
|
115
|
+
if (fallbackUnit === "bytes") {
|
|
116
|
+
return bytesToGb(number);
|
|
117
|
+
}
|
|
118
|
+
return normalizeGb(number);
|
|
119
|
+
}
|
|
120
|
+
|
|
121
|
+
function bestMatchingKey(record, patterns) {
|
|
122
|
+
return Object.keys(record).find((key) => patterns.some((pattern) => pattern.test(key)));
|
|
123
|
+
}
|
|
124
|
+
|
|
125
|
+
function detectNvidiaGpus() {
|
|
51
126
|
if (!commandExists("nvidia-smi")) {
|
|
52
127
|
return [];
|
|
53
128
|
}
|
|
@@ -66,16 +141,171 @@ function detectGpus() {
|
|
|
66
141
|
return {
|
|
67
142
|
index,
|
|
68
143
|
vendor: "nvidia",
|
|
144
|
+
runtime: "cuda",
|
|
69
145
|
name,
|
|
70
|
-
totalVramGb:
|
|
71
|
-
freeVramGb:
|
|
72
|
-
usableVramGb:
|
|
146
|
+
totalVramGb: normalizeGb(Number(totalMiB) / 1024),
|
|
147
|
+
freeVramGb: normalizeGb(Number(freeMiB) / 1024),
|
|
148
|
+
usableVramGb: usableVram(Number(totalMiB) / 1024),
|
|
73
149
|
driver
|
|
74
150
|
};
|
|
75
151
|
})
|
|
76
152
|
.filter((gpu) => gpu.name && Number.isFinite(gpu.totalVramGb));
|
|
77
153
|
}
|
|
78
154
|
|
|
155
|
+
function detectAmdRocmSmiGpus() {
|
|
156
|
+
if (!commandExists("rocm-smi")) {
|
|
157
|
+
return [];
|
|
158
|
+
}
|
|
159
|
+
const jsonResult = commandResult("rocm-smi", [
|
|
160
|
+
"--showproductname",
|
|
161
|
+
"--showmeminfo",
|
|
162
|
+
"vram",
|
|
163
|
+
"--showdriverversion",
|
|
164
|
+
"--json"
|
|
165
|
+
]);
|
|
166
|
+
if (jsonResult.ok) {
|
|
167
|
+
try {
|
|
168
|
+
const parsed = JSON.parse(jsonResult.stdout);
|
|
169
|
+
return Object.entries(parsed)
|
|
170
|
+
.filter(([id, record]) => /^card\d+/i.test(id) && record && typeof record === "object")
|
|
171
|
+
.map(([id, record], index) => {
|
|
172
|
+
const totalKey = bestMatchingKey(record, [/vram.*total/i, /total.*memory/i]);
|
|
173
|
+
const usedKey = bestMatchingKey(record, [/vram.*used/i, /used.*memory/i]);
|
|
174
|
+
const nameKey = bestMatchingKey(record, [/product.*name/i, /card.*series/i, /marketing.*name/i]);
|
|
175
|
+
const driverKey = bestMatchingKey(record, [/driver/i]);
|
|
176
|
+
const totalVramGb = memoryToGb(record[totalKey]);
|
|
177
|
+
const usedVramGb = memoryToGb(record[usedKey]) || 0;
|
|
178
|
+
const freeVramGb = totalVramGb === null ? null : normalizeGb(Math.max(totalVramGb - usedVramGb, 0));
|
|
179
|
+
return {
|
|
180
|
+
index,
|
|
181
|
+
vendor: "amd",
|
|
182
|
+
runtime: "rocm",
|
|
183
|
+
name: String(record[nameKey] || id),
|
|
184
|
+
totalVramGb,
|
|
185
|
+
freeVramGb,
|
|
186
|
+
usableVramGb: totalVramGb === null ? null : usableVram(totalVramGb),
|
|
187
|
+
driver: String(record[driverKey] || "")
|
|
188
|
+
};
|
|
189
|
+
})
|
|
190
|
+
.filter((gpu) => gpu.name);
|
|
191
|
+
} catch {
|
|
192
|
+
// Fall through to text parsing.
|
|
193
|
+
}
|
|
194
|
+
}
|
|
195
|
+
|
|
196
|
+
const textResult = commandResult("rocm-smi", ["--showproductname", "--showmeminfo", "vram", "--showdriverversion"]);
|
|
197
|
+
if (!textResult.ok) {
|
|
198
|
+
return [];
|
|
199
|
+
}
|
|
200
|
+
const cards = new Map();
|
|
201
|
+
for (const line of textResult.stdout.split(/\n/)) {
|
|
202
|
+
const match = line.match(/(card\d+)\s*[:\t ]+(.*)$/i);
|
|
203
|
+
if (!match) {
|
|
204
|
+
continue;
|
|
205
|
+
}
|
|
206
|
+
const [, id, value] = match;
|
|
207
|
+
const record = cards.get(id) || {};
|
|
208
|
+
if (/product|series|marketing/i.test(value)) {
|
|
209
|
+
record.name = value.split(/[:=]/).at(-1)?.trim() || record.name;
|
|
210
|
+
}
|
|
211
|
+
if (/total.*vram|vram.*total/i.test(value)) {
|
|
212
|
+
record.totalVramGb = memoryToGb(value);
|
|
213
|
+
}
|
|
214
|
+
if (/used.*vram|vram.*used/i.test(value)) {
|
|
215
|
+
record.usedVramGb = memoryToGb(value);
|
|
216
|
+
}
|
|
217
|
+
if (/driver/i.test(value)) {
|
|
218
|
+
record.driver = value.split(/[:=]/).at(-1)?.trim();
|
|
219
|
+
}
|
|
220
|
+
cards.set(id, record);
|
|
221
|
+
}
|
|
222
|
+
return [...cards.entries()].map(([id, record], index) => {
|
|
223
|
+
const totalVramGb = record.totalVramGb ?? null;
|
|
224
|
+
const freeVramGb =
|
|
225
|
+
totalVramGb === null ? null : normalizeGb(Math.max(totalVramGb - (record.usedVramGb || 0), 0));
|
|
226
|
+
return {
|
|
227
|
+
index,
|
|
228
|
+
vendor: "amd",
|
|
229
|
+
runtime: "rocm",
|
|
230
|
+
name: record.name || id,
|
|
231
|
+
totalVramGb,
|
|
232
|
+
freeVramGb,
|
|
233
|
+
usableVramGb: totalVramGb === null ? null : usableVram(totalVramGb),
|
|
234
|
+
driver: record.driver || ""
|
|
235
|
+
};
|
|
236
|
+
});
|
|
237
|
+
}
|
|
238
|
+
|
|
239
|
+
function detectAmdRocinfoGpus() {
|
|
240
|
+
if (!commandExists("rocminfo")) {
|
|
241
|
+
return [];
|
|
242
|
+
}
|
|
243
|
+
const result = commandResult("rocminfo", [], { timeoutMs: 8000 });
|
|
244
|
+
if (!result.ok) {
|
|
245
|
+
return [];
|
|
246
|
+
}
|
|
247
|
+
const names = [];
|
|
248
|
+
for (const line of result.stdout.split(/\n/)) {
|
|
249
|
+
const match = line.match(/^\s*(?:Marketing Name|Name):\s*(.+)$/);
|
|
250
|
+
if (match && /amd|radeon|instinct|gfx/i.test(match[1])) {
|
|
251
|
+
names.push(match[1].trim());
|
|
252
|
+
}
|
|
253
|
+
}
|
|
254
|
+
return [...new Set(names)].map((name, index) => ({
|
|
255
|
+
index,
|
|
256
|
+
vendor: "amd",
|
|
257
|
+
runtime: "rocm",
|
|
258
|
+
name,
|
|
259
|
+
totalVramGb: null,
|
|
260
|
+
freeVramGb: null,
|
|
261
|
+
usableVramGb: null,
|
|
262
|
+
driver: ""
|
|
263
|
+
}));
|
|
264
|
+
}
|
|
265
|
+
|
|
266
|
+
function detectAmdPciGpus() {
|
|
267
|
+
if (!commandExists("lspci")) {
|
|
268
|
+
return [];
|
|
269
|
+
}
|
|
270
|
+
const result = commandResult("lspci", ["-mm"], { timeoutMs: 5000 });
|
|
271
|
+
if (!result.ok) {
|
|
272
|
+
return [];
|
|
273
|
+
}
|
|
274
|
+
return result.stdout
|
|
275
|
+
.split(/\n/)
|
|
276
|
+
.filter((line) => /(VGA compatible controller|3D controller|Display controller)/i.test(line) && /AMD|ATI|Radeon/i.test(line))
|
|
277
|
+
.map((line, index) => ({
|
|
278
|
+
index,
|
|
279
|
+
vendor: "amd",
|
|
280
|
+
runtime: "none",
|
|
281
|
+
name: line.replace(/^\S+\s+/, "").replaceAll('"', "").trim() || "AMD GPU",
|
|
282
|
+
totalVramGb: null,
|
|
283
|
+
freeVramGb: null,
|
|
284
|
+
usableVramGb: null,
|
|
285
|
+
driver: ""
|
|
286
|
+
}));
|
|
287
|
+
}
|
|
288
|
+
|
|
289
|
+
function dedupeGpus(gpus) {
|
|
290
|
+
const seen = new Set();
|
|
291
|
+
return gpus.filter((gpu) => {
|
|
292
|
+
const key = `${gpu.vendor}:${gpu.name}:${gpu.totalVramGb ?? "unknown"}`;
|
|
293
|
+
if (seen.has(key)) {
|
|
294
|
+
return false;
|
|
295
|
+
}
|
|
296
|
+
seen.add(key);
|
|
297
|
+
return true;
|
|
298
|
+
});
|
|
299
|
+
}
|
|
300
|
+
|
|
301
|
+
function detectGpus() {
|
|
302
|
+
const nvidia = detectNvidiaGpus();
|
|
303
|
+
const amdRocm = detectAmdRocmSmiGpus();
|
|
304
|
+
const amdFallback = amdRocm.length ? [] : detectAmdRocinfoGpus();
|
|
305
|
+
const amdPci = amdRocm.length || amdFallback.length ? [] : detectAmdPciGpus();
|
|
306
|
+
return dedupeGpus([...nvidia, ...amdRocm, ...amdFallback, ...amdPci]);
|
|
307
|
+
}
|
|
308
|
+
|
|
79
309
|
async function portFree(port) {
|
|
80
310
|
return new Promise((resolve) => {
|
|
81
311
|
const server = net.createServer();
|
|
@@ -93,6 +323,7 @@ async function detectPorts(ports = DEFAULT_PORTS) {
|
|
|
93
323
|
}
|
|
94
324
|
|
|
95
325
|
export async function detectMachine(options = {}) {
|
|
326
|
+
const platformKind = detectPlatformKind();
|
|
96
327
|
const disks = candidateDiskPaths()
|
|
97
328
|
.map(parseDf)
|
|
98
329
|
.filter(Boolean)
|
|
@@ -104,12 +335,17 @@ export async function detectMachine(options = {}) {
|
|
|
104
335
|
tmux: commandExists("tmux"),
|
|
105
336
|
curl: commandExists("curl"),
|
|
106
337
|
python3: commandExists("python3"),
|
|
338
|
+
python: commandExists("python"),
|
|
107
339
|
git: commandExists("git"),
|
|
108
|
-
nvidiaSmi: commandExists("nvidia-smi")
|
|
340
|
+
nvidiaSmi: commandExists("nvidia-smi"),
|
|
341
|
+
rocmSmi: commandExists("rocm-smi"),
|
|
342
|
+
rocminfo: commandExists("rocminfo")
|
|
109
343
|
};
|
|
110
344
|
return {
|
|
111
345
|
generatedAt: new Date().toISOString(),
|
|
112
346
|
platform: process.platform,
|
|
347
|
+
platformKind,
|
|
348
|
+
isWsl2: platformKind === "wsl2",
|
|
113
349
|
arch: process.arch,
|
|
114
350
|
hostname: os.hostname(),
|
|
115
351
|
cpu: {
|
|
@@ -23,18 +23,57 @@ function bestDisk(detection) {
|
|
|
23
23
|
return detection.disks?.[0] || { freeGb: 0, isNvme: false, path: "" };
|
|
24
24
|
}
|
|
25
25
|
|
|
26
|
-
function
|
|
27
|
-
|
|
26
|
+
function detectedPlatformKind(detection) {
|
|
27
|
+
if (detection.platformKind) {
|
|
28
|
+
return detection.platformKind;
|
|
29
|
+
}
|
|
30
|
+
if (detection.platform === "win32") {
|
|
31
|
+
return "windows-native";
|
|
32
|
+
}
|
|
33
|
+
if (detection.platform === "darwin") {
|
|
34
|
+
return "macos";
|
|
35
|
+
}
|
|
36
|
+
return detection.platform || "unknown";
|
|
37
|
+
}
|
|
38
|
+
|
|
39
|
+
function bestGpu(detection, vendors = null) {
|
|
40
|
+
const allowed = vendors ? new Set(vendors) : null;
|
|
41
|
+
return [...(detection.gpus || [])]
|
|
42
|
+
.filter((gpu) => !allowed || allowed.has(gpu.vendor))
|
|
43
|
+
.sort((a, b) => (b.usableVramGb || 0) - (a.usableVramGb || 0))[0] || null;
|
|
44
|
+
}
|
|
45
|
+
|
|
46
|
+
function gpuVendorLabel(vendors) {
|
|
47
|
+
if (!vendors?.length) {
|
|
48
|
+
return "supported";
|
|
49
|
+
}
|
|
50
|
+
return vendors.map((vendor) => (vendor === "nvidia" ? "NVIDIA" : vendor === "amd" ? "AMD" : vendor)).join("/");
|
|
51
|
+
}
|
|
52
|
+
|
|
53
|
+
function hasRocmRuntime(detection) {
|
|
54
|
+
return Boolean(detection.tools?.rocmSmi || detection.tools?.rocminfo);
|
|
55
|
+
}
|
|
56
|
+
|
|
57
|
+
export function installSupportError(detection) {
|
|
58
|
+
if (detectedPlatformKind(detection) === "windows-native") {
|
|
59
|
+
return "Native Windows install is not supported yet. Install PrepperGPT inside WSL2 so Docker, Linux paths, and local model services use the supported Linux runtime.";
|
|
60
|
+
}
|
|
61
|
+
return null;
|
|
28
62
|
}
|
|
29
63
|
|
|
30
64
|
function requirementFailures(model, detection) {
|
|
31
65
|
const requires = model.requires || {};
|
|
32
66
|
const disk = bestDisk(detection);
|
|
33
|
-
const
|
|
67
|
+
const platformKind = detectedPlatformKind(detection);
|
|
68
|
+
const gpuVendors = requires.gpuVendors || null;
|
|
69
|
+
const gpu = bestGpu(detection, gpuVendors);
|
|
34
70
|
const failures = [];
|
|
35
71
|
if (requires.platforms && !requires.platforms.includes(detection.platform)) {
|
|
36
72
|
failures.push(`requires platform ${requires.platforms.join(", ")}`);
|
|
37
73
|
}
|
|
74
|
+
if (requires.platformKinds && !requires.platformKinds.includes(platformKind)) {
|
|
75
|
+
failures.push(`requires platform kind ${requires.platformKinds.join(", ")}`);
|
|
76
|
+
}
|
|
38
77
|
if (requires.minRamGb && detection.memory.totalGb < requires.minRamGb) {
|
|
39
78
|
failures.push(`requires ${requires.minRamGb} GB RAM`);
|
|
40
79
|
}
|
|
@@ -44,11 +83,22 @@ function requirementFailures(model, detection) {
|
|
|
44
83
|
if (requires.nvme && disk.freeGb >= (requires.diskGb || 0) && !disk.isNvme) {
|
|
45
84
|
failures.push("strongly prefers NVMe for acceptable load time");
|
|
46
85
|
}
|
|
86
|
+
if (requires.gpuVendors && detection.gpus?.length && !gpu) {
|
|
87
|
+
failures.push(`requires ${gpuVendorLabel(requires.gpuVendors)} GPU`);
|
|
88
|
+
}
|
|
47
89
|
if (requires.gpu && !gpu) {
|
|
48
|
-
failures.push(
|
|
90
|
+
failures.push(`requires ${gpuVendorLabel(requires.gpuVendors)} GPU`);
|
|
49
91
|
}
|
|
50
92
|
if (requires.minVramGb && (!gpu || gpu.usableVramGb < requires.minVramGb)) {
|
|
51
|
-
failures.push(`requires about ${requires.minVramGb} GB usable VRAM`);
|
|
93
|
+
failures.push(`requires ${gpuVendorLabel(requires.gpuVendors)} GPU with about ${requires.minVramGb} GB usable VRAM`);
|
|
94
|
+
}
|
|
95
|
+
if (requires.requiresRocm && gpu?.vendor === "amd") {
|
|
96
|
+
if (platformKind !== "linux") {
|
|
97
|
+
failures.push("requires a Linux ROCm host for AMD GPU acceleration");
|
|
98
|
+
}
|
|
99
|
+
if (gpu.runtime !== "rocm" || !hasRocmRuntime(detection)) {
|
|
100
|
+
failures.push("requires ROCm runtime tools for AMD GPU acceleration");
|
|
101
|
+
}
|
|
52
102
|
}
|
|
53
103
|
return failures;
|
|
54
104
|
}
|
|
@@ -62,7 +112,9 @@ function chooseFirst(candidates, models, detection) {
|
|
|
62
112
|
continue;
|
|
63
113
|
}
|
|
64
114
|
const failures = requirementFailures(model, detection);
|
|
65
|
-
|
|
115
|
+
const canUseExternalFallback =
|
|
116
|
+
["manual", "external"].includes(model.source?.type) && !model.source?.requiresHardwareFit;
|
|
117
|
+
if (failures.length === 0 || canUseExternalFallback) {
|
|
66
118
|
return { model, skipped };
|
|
67
119
|
}
|
|
68
120
|
skipped.push({ id, reasons: failures });
|
|
@@ -91,8 +143,9 @@ export function buildPlan(detection, requestedProfile = "balanced", catalog = lo
|
|
|
91
143
|
}
|
|
92
144
|
}
|
|
93
145
|
|
|
146
|
+
const defaultModel = selected.chat?.id || priorities.defaultModel;
|
|
94
147
|
const routeIds = unique([
|
|
95
|
-
|
|
148
|
+
defaultModel,
|
|
96
149
|
selected.chat?.id,
|
|
97
150
|
selected.fast?.id,
|
|
98
151
|
selected.reasoning?.id,
|
|
@@ -113,9 +166,16 @@ export function buildPlan(detection, requestedProfile = "balanced", catalog = lo
|
|
|
113
166
|
}));
|
|
114
167
|
|
|
115
168
|
const warnings = [];
|
|
169
|
+
const installError = installSupportError(detection);
|
|
170
|
+
if (installError) {
|
|
171
|
+
warnings.push(installError);
|
|
172
|
+
}
|
|
116
173
|
const missingTools = Object.entries(detection.tools || {})
|
|
117
|
-
.filter(([tool, present]) => ["docker", "dockerCompose", "curl"
|
|
174
|
+
.filter(([tool, present]) => ["docker", "dockerCompose", "curl"].includes(tool) && !present)
|
|
118
175
|
.map(([tool]) => tool);
|
|
176
|
+
if (!detection.tools?.python3 && !detection.tools?.python) {
|
|
177
|
+
missingTools.push("python3 or python");
|
|
178
|
+
}
|
|
119
179
|
if (missingTools.length) {
|
|
120
180
|
warnings.push(`Missing required tools: ${missingTools.join(", ")}`);
|
|
121
181
|
}
|
|
@@ -125,8 +185,18 @@ export function buildPlan(detection, requestedProfile = "balanced", catalog = lo
|
|
|
125
185
|
if (occupiedPorts.length) {
|
|
126
186
|
warnings.push(`Ports already in use: ${occupiedPorts.join(", ")}`);
|
|
127
187
|
}
|
|
128
|
-
|
|
129
|
-
|
|
188
|
+
const acceleratedGpu = (detection.gpus || []).find(
|
|
189
|
+
(gpu) => gpu.vendor === "nvidia" || (gpu.vendor === "amd" && gpu.runtime === "rocm" && detectedPlatformKind(detection) === "linux")
|
|
190
|
+
);
|
|
191
|
+
if (!acceleratedGpu) {
|
|
192
|
+
warnings.push("No supported GPU acceleration detected; CPU fallback will be much slower.");
|
|
193
|
+
}
|
|
194
|
+
const amdWithoutRocm = (detection.gpus || []).some((gpu) => gpu.vendor === "amd" && gpu.runtime !== "rocm");
|
|
195
|
+
if (amdWithoutRocm) {
|
|
196
|
+
warnings.push("AMD GPU detected without ROCm; install ROCm on Linux to enable AMD acceleration.");
|
|
197
|
+
}
|
|
198
|
+
if ((detection.gpus || []).some((gpu) => gpu.vendor === "amd") && detectedPlatformKind(detection) === "wsl2") {
|
|
199
|
+
warnings.push("AMD GPU acceleration is supported on Linux ROCm hosts; WSL2 installs will use CPU fallback unless an external AMD endpoint is provided.");
|
|
130
200
|
}
|
|
131
201
|
if (manualAssets.length) {
|
|
132
202
|
warnings.push("Some selected high-quality routes need manual model files or already-running external endpoints.");
|
|
@@ -136,7 +206,7 @@ export function buildPlan(detection, requestedProfile = "balanced", catalog = lo
|
|
|
136
206
|
generatedAt: new Date().toISOString(),
|
|
137
207
|
profile,
|
|
138
208
|
profileLabel: priorities.label,
|
|
139
|
-
defaultModel
|
|
209
|
+
defaultModel,
|
|
140
210
|
routeIds,
|
|
141
211
|
selected,
|
|
142
212
|
skipped,
|
|
@@ -144,7 +214,7 @@ export function buildPlan(detection, requestedProfile = "balanced", catalog = lo
|
|
|
144
214
|
estimates: estimatePlan(profile, selected),
|
|
145
215
|
env: {
|
|
146
216
|
PREPPERGPT_PROFILE: profile,
|
|
147
|
-
PREPPERGPT_DEFAULT_MODEL:
|
|
217
|
+
PREPPERGPT_DEFAULT_MODEL: defaultModel,
|
|
148
218
|
PREPPERGPT_MODEL_ORDER_LIST: JSON.stringify(routeIds)
|
|
149
219
|
},
|
|
150
220
|
warnings
|
package/installer/lib/render.mjs
CHANGED
|
@@ -9,10 +9,44 @@ function secret(bytes = 24) {
|
|
|
9
9
|
return crypto.randomBytes(bytes).toString("hex");
|
|
10
10
|
}
|
|
11
11
|
|
|
12
|
+
function platformKind(detection) {
|
|
13
|
+
return detection.platformKind || (detection.platform === "win32" ? "windows-native" : detection.platform || "unknown");
|
|
14
|
+
}
|
|
15
|
+
|
|
16
|
+
function primaryAccelerator(detection) {
|
|
17
|
+
const gpus = detection.gpus || [];
|
|
18
|
+
const nvidia = gpus.find((gpu) => gpu.vendor === "nvidia");
|
|
19
|
+
if (nvidia) {
|
|
20
|
+
return { vendor: "nvidia", runtime: "cuda" };
|
|
21
|
+
}
|
|
22
|
+
const amdRocm = gpus.find((gpu) => gpu.vendor === "amd" && gpu.runtime === "rocm");
|
|
23
|
+
if (amdRocm && platformKind(detection) === "linux") {
|
|
24
|
+
return { vendor: "amd", runtime: "rocm" };
|
|
25
|
+
}
|
|
26
|
+
return { vendor: "cpu", runtime: "cpu" };
|
|
27
|
+
}
|
|
28
|
+
|
|
29
|
+
function desktopIntegrationEnabled(detection) {
|
|
30
|
+
const explicit = String(process.env.LOCAL_AGENT_DESKTOP_ENABLED || "").toLowerCase();
|
|
31
|
+
if (["0", "false", "no", "off"].includes(explicit)) {
|
|
32
|
+
return false;
|
|
33
|
+
}
|
|
34
|
+
const kind = platformKind(detection);
|
|
35
|
+
const platformSupportsDesktop = kind === "linux" || kind === "wsl2";
|
|
36
|
+
const hasDisplay = Boolean(process.env.DISPLAY || process.env.WAYLAND_DISPLAY);
|
|
37
|
+
return platformSupportsDesktop && hasDisplay && explicit !== "0";
|
|
38
|
+
}
|
|
39
|
+
|
|
12
40
|
function envFile(plan, paths, detection) {
|
|
13
41
|
const dataDir = process.env.PREPPERGPT_DATA_DIR || paths.dataDir;
|
|
14
42
|
const modelsDir = process.env.PREPPERGPT_MODELS_DIR || `${dataDir}/models`;
|
|
15
43
|
const whisperHostDir = path.join(modelsDir, "whisper", "base");
|
|
44
|
+
const accelerator = primaryAccelerator(detection);
|
|
45
|
+
const selectedReasoningModel = plan.selected?.reasoning?.id || "glm52-q4-local";
|
|
46
|
+
const selectedGlmBaseUrl =
|
|
47
|
+
selectedReasoningModel === "glm52-q8-local"
|
|
48
|
+
? process.env.GLM52_Q8_BASE_URL || "http://127.0.0.1:11446/v1"
|
|
49
|
+
: process.env.GLM52_BASE_URL || "http://127.0.0.1:11441/v1";
|
|
16
50
|
const adminPassword = process.env.PREPPERGPT_ADMIN_PASSWORD || secret(18);
|
|
17
51
|
const jupyterToken = process.env.JUPYTER_TOKEN || secret(18);
|
|
18
52
|
const searxngSecret = process.env.SEARXNG_SECRET_KEY || secret(24);
|
|
@@ -26,7 +60,12 @@ function envFile(plan, paths, detection) {
|
|
|
26
60
|
PREPPERGPT_PORT: process.env.PREPPERGPT_PORT || "8080",
|
|
27
61
|
PREPPERGPT_DEFAULT_MODEL: plan.defaultModel,
|
|
28
62
|
PREPPERGPT_MODEL_ORDER_LIST: JSON.stringify(plan.routeIds),
|
|
29
|
-
|
|
63
|
+
PREPPERGPT_GLM_MODEL: selectedReasoningModel,
|
|
64
|
+
PREPPERGPT_GLM_BASE_URL: selectedGlmBaseUrl,
|
|
65
|
+
PREPPERGPT_GPU_VENDOR: accelerator.vendor,
|
|
66
|
+
PREPPERGPT_ACCELERATOR: accelerator.runtime,
|
|
67
|
+
PREPPERGPT_DOCKER_GPUS: accelerator.vendor === "nvidia" ? "all" : "",
|
|
68
|
+
OLLAMA_IMAGE: accelerator.vendor === "amd" ? "ollama/ollama:rocm" : "ollama/ollama:latest",
|
|
30
69
|
WEBUI_NAME: "PrepperGPT",
|
|
31
70
|
WEBUI_ADMIN_EMAIL: process.env.WEBUI_ADMIN_EMAIL || "admin@preppergpt.local",
|
|
32
71
|
WEBUI_ADMIN_PASSWORD: adminPassword,
|
|
@@ -35,6 +74,7 @@ function envFile(plan, paths, detection) {
|
|
|
35
74
|
JUPYTER_TOKEN: jupyterToken,
|
|
36
75
|
SEARXNG_SECRET_KEY: searxngSecret,
|
|
37
76
|
GLM52_BASE_URL: process.env.GLM52_BASE_URL || "http://127.0.0.1:11441/v1",
|
|
77
|
+
GLM52_Q8_BASE_URL: process.env.GLM52_Q8_BASE_URL || "http://127.0.0.1:11446/v1",
|
|
38
78
|
SLOCODE_BASE_URL: process.env.SLOCODE_BASE_URL || "http://127.0.0.1:11438/v1",
|
|
39
79
|
OLLAMA_BASE_URL: process.env.OLLAMA_BASE_URL || "http://127.0.0.1:11434"
|
|
40
80
|
};
|
|
@@ -45,13 +85,38 @@ function envFile(plan, paths, detection) {
|
|
|
45
85
|
|
|
46
86
|
function generatedCompose(plan, detection) {
|
|
47
87
|
const modelOrder = JSON.stringify(plan.routeIds);
|
|
48
|
-
const
|
|
49
|
-
|
|
88
|
+
const accelerator = primaryAccelerator(detection);
|
|
89
|
+
const gpuBlock =
|
|
90
|
+
accelerator.vendor === "nvidia"
|
|
91
|
+
? [
|
|
50
92
|
" ollama:",
|
|
51
93
|
" gpus: all",
|
|
52
94
|
" local-vision:",
|
|
53
95
|
" gpus: all"
|
|
54
96
|
]
|
|
97
|
+
: accelerator.vendor === "amd"
|
|
98
|
+
? [
|
|
99
|
+
" ollama:",
|
|
100
|
+
" devices:",
|
|
101
|
+
" - /dev/kfd:/dev/kfd",
|
|
102
|
+
" - /dev/dri:/dev/dri",
|
|
103
|
+
" group_add:",
|
|
104
|
+
" - video",
|
|
105
|
+
" - render",
|
|
106
|
+
" security_opt:",
|
|
107
|
+
" - seccomp=unconfined"
|
|
108
|
+
]
|
|
109
|
+
: [];
|
|
110
|
+
const desktopBlock = desktopIntegrationEnabled(detection)
|
|
111
|
+
? [
|
|
112
|
+
" local-agent:",
|
|
113
|
+
" environment:",
|
|
114
|
+
" LOCAL_AGENT_DESKTOP_ENABLED: \"1\"",
|
|
115
|
+
" volumes:",
|
|
116
|
+
" - /tmp/.X11-unix:/tmp/.X11-unix:rw",
|
|
117
|
+
" - ${XDG_RUNTIME_DIR:-/run/user/1000}:${XDG_RUNTIME_DIR:-/run/user/1000}:rw",
|
|
118
|
+
" - ${XAUTHORITY:-/tmp/.preppergpt-missing-xauthority}:/tmp/.Xauthority:ro"
|
|
119
|
+
]
|
|
55
120
|
: [];
|
|
56
121
|
return [
|
|
57
122
|
"services:",
|
|
@@ -60,7 +125,8 @@ function generatedCompose(plan, detection) {
|
|
|
60
125
|
` DEFAULT_MODELS: "${plan.defaultModel}"`,
|
|
61
126
|
` MODEL_ORDER_LIST: '${modelOrder.replaceAll("'", "''")}'`,
|
|
62
127
|
` TASK_MODEL: "${plan.selected.fast?.id || plan.defaultModel}"`,
|
|
63
|
-
...gpuBlock
|
|
128
|
+
...gpuBlock,
|
|
129
|
+
...desktopBlock
|
|
64
130
|
].join("\n") + "\n";
|
|
65
131
|
}
|
|
66
132
|
|
package/package.json
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "preppergpt",
|
|
3
|
-
"version": "0.1.
|
|
4
|
-
"description": "A
|
|
3
|
+
"version": "0.1.3",
|
|
4
|
+
"description": "A post-apocalyptic local AI field kit for running a ChatGPT-like experience when hosted services are unavailable.",
|
|
5
5
|
"type": "module",
|
|
6
6
|
"bin": {
|
|
7
7
|
"preppergpt": "bin/preppergpt.js"
|
|
@@ -28,7 +28,9 @@
|
|
|
28
28
|
"llm",
|
|
29
29
|
"ollama",
|
|
30
30
|
"preppergpt",
|
|
31
|
-
"offline-ai"
|
|
31
|
+
"offline-ai",
|
|
32
|
+
"survival",
|
|
33
|
+
"post-apocalyptic"
|
|
32
34
|
],
|
|
33
35
|
"homepage": "https://github.com/teamslop/preppergpt#readme",
|
|
34
36
|
"bugs": {
|
package/profiles/models.json
CHANGED
|
@@ -5,8 +5,8 @@
|
|
|
5
5
|
"label": "Max intelligence",
|
|
6
6
|
"defaultModel": "glm52-q4-local",
|
|
7
7
|
"roles": {
|
|
8
|
-
"chat": ["glm52-q4-local", "qwen3.6-35b-a3b:slopcode-cpu-64k", "qwen2.5-coder:14b", "llama3.1:8b"],
|
|
9
|
-
"reasoning": ["glm52-q4-local", "qwen3.6-35b-a3b:slopcode-cpu-64k", "qwen2.5-coder:14b"],
|
|
8
|
+
"chat": ["glm52-q8-local", "glm52-q4-local", "qwen3.6-35b-a3b:slopcode-cpu-64k", "qwen2.5-coder:14b", "llama3.1:8b"],
|
|
9
|
+
"reasoning": ["glm52-q8-local", "glm52-q4-local", "qwen3.6-35b-a3b:slopcode-cpu-64k", "qwen2.5-coder:14b"],
|
|
10
10
|
"fast": ["gemma4:12b-256k-gpu", "llama3.1:8b"],
|
|
11
11
|
"coding": ["qwen3.6-35b-a3b:slopcode-cpu-64k", "qwen2.5-coder:14b"],
|
|
12
12
|
"research": ["deep-research-glm52"],
|
|
@@ -21,7 +21,7 @@
|
|
|
21
21
|
"defaultModel": "local-chatgpt-auto",
|
|
22
22
|
"roles": {
|
|
23
23
|
"chat": ["local-chatgpt-auto", "gemma4:12b-256k-gpu", "glm52-q4-local", "llama3.1:8b"],
|
|
24
|
-
"reasoning": ["glm52-q4-local", "qwen3.6-35b-a3b:slopcode-cpu-64k", "qwen2.5-coder:14b"],
|
|
24
|
+
"reasoning": ["glm52-q8-local", "glm52-q4-local", "qwen3.6-35b-a3b:slopcode-cpu-64k", "qwen2.5-coder:14b"],
|
|
25
25
|
"fast": ["gemma4:12b-256k-gpu", "llama3.1:8b"],
|
|
26
26
|
"coding": ["qwen3.6-35b-a3b:slopcode-cpu-64k", "qwen2.5-coder:14b"],
|
|
27
27
|
"research": ["deep-research-glm52"],
|
|
@@ -66,6 +66,26 @@
|
|
|
66
66
|
"description": "Virtual OpenAI-compatible route exposed by the local-agent sidecar."
|
|
67
67
|
}
|
|
68
68
|
},
|
|
69
|
+
{
|
|
70
|
+
"id": "glm52-q8-local",
|
|
71
|
+
"name": "GLM 5.2 Q8 Local",
|
|
72
|
+
"roles": ["chat", "reasoning"],
|
|
73
|
+
"backend": "llama.cpp",
|
|
74
|
+
"contextTokens": 65536,
|
|
75
|
+
"qualityScore": 104,
|
|
76
|
+
"speedScore": 15,
|
|
77
|
+
"tpsEstimate": "very low tokens/sec may be acceptable in disaster/off-grid use because no hosted model is available; benchmark locally",
|
|
78
|
+
"requires": {
|
|
79
|
+
"minRamGb": 192,
|
|
80
|
+
"diskGb": 1000,
|
|
81
|
+
"nvme": true
|
|
82
|
+
},
|
|
83
|
+
"source": {
|
|
84
|
+
"type": "external",
|
|
85
|
+
"requiresHardwareFit": true,
|
|
86
|
+
"description": "Run a GLM 5.2 Q8 OpenAI-compatible llama.cpp server at http://127.0.0.1:11446/v1 with weights on fast NVMe for maximum local quality when hosted services are unavailable."
|
|
87
|
+
}
|
|
88
|
+
},
|
|
69
89
|
{
|
|
70
90
|
"id": "glm52-q4-local",
|
|
71
91
|
"name": "GLM 5.2 Q4 Local",
|
|
@@ -74,7 +94,7 @@
|
|
|
74
94
|
"contextTokens": 65536,
|
|
75
95
|
"qualityScore": 100,
|
|
76
96
|
"speedScore": 25,
|
|
77
|
-
"tpsEstimate": "0.4-3 completion tokens/sec on large CPU/NVMe builds; benchmark locally",
|
|
97
|
+
"tpsEstimate": "0.4-3 completion tokens/sec on large CPU/NVMe builds; acceptable for disaster/off-grid use when no hosted service is available; benchmark locally",
|
|
78
98
|
"requires": {
|
|
79
99
|
"minRamGb": 96,
|
|
80
100
|
"diskGb": 520,
|
|
@@ -111,16 +131,18 @@
|
|
|
111
131
|
"contextTokens": 262144,
|
|
112
132
|
"qualityScore": 78,
|
|
113
133
|
"speedScore": 96,
|
|
114
|
-
"tpsEstimate": "35-90 completion tokens/sec on a modern 16-24 GB NVIDIA GPU",
|
|
134
|
+
"tpsEstimate": "35-90 completion tokens/sec on a modern 16-24 GB NVIDIA GPU or supported Linux AMD ROCm GPU",
|
|
115
135
|
"requires": {
|
|
116
136
|
"minRamGb": 24,
|
|
117
137
|
"minVramGb": 11,
|
|
118
|
-
"diskGb": 20
|
|
138
|
+
"diskGb": 20,
|
|
139
|
+
"gpuVendors": ["nvidia", "amd"],
|
|
140
|
+
"requiresRocm": true
|
|
119
141
|
},
|
|
120
142
|
"source": {
|
|
121
143
|
"type": "ollama",
|
|
122
144
|
"model": "gemma4:12b",
|
|
123
|
-
"description": "Pulled or provided through the local Ollama server."
|
|
145
|
+
"description": "Pulled or provided through the local Ollama server. AMD hosts use the Ollama ROCm container when ROCm is detected on Linux."
|
|
124
146
|
}
|
|
125
147
|
},
|
|
126
148
|
{
|
|
@@ -131,16 +153,18 @@
|
|
|
131
153
|
"contextTokens": 32768,
|
|
132
154
|
"qualityScore": 76,
|
|
133
155
|
"speedScore": 75,
|
|
134
|
-
"tpsEstimate": "12-45 completion tokens/sec depending on GPU and quantization",
|
|
156
|
+
"tpsEstimate": "12-45 completion tokens/sec depending on GPU vendor, ROCm/CUDA readiness, and quantization",
|
|
135
157
|
"requires": {
|
|
136
158
|
"minRamGb": 24,
|
|
137
159
|
"minVramGb": 10,
|
|
138
|
-
"diskGb": 16
|
|
160
|
+
"diskGb": 16,
|
|
161
|
+
"gpuVendors": ["nvidia", "amd"],
|
|
162
|
+
"requiresRocm": true
|
|
139
163
|
},
|
|
140
164
|
"source": {
|
|
141
165
|
"type": "ollama",
|
|
142
166
|
"model": "qwen2.5-coder:14b",
|
|
143
|
-
"description": "Ollama coding fallback."
|
|
167
|
+
"description": "Ollama coding fallback. AMD hosts use the Ollama ROCm container when ROCm is detected on Linux."
|
|
144
168
|
}
|
|
145
169
|
},
|
|
146
170
|
{
|
|
@@ -210,7 +234,9 @@
|
|
|
210
234
|
"requires": {
|
|
211
235
|
"minRamGb": 24,
|
|
212
236
|
"minVramGb": 11,
|
|
213
|
-
"diskGb": 20
|
|
237
|
+
"diskGb": 20,
|
|
238
|
+
"gpuVendors": ["nvidia", "amd"],
|
|
239
|
+
"requiresRocm": true
|
|
214
240
|
},
|
|
215
241
|
"source": {
|
|
216
242
|
"type": "virtual",
|
|
@@ -248,11 +274,13 @@
|
|
|
248
274
|
"requires": {
|
|
249
275
|
"minRamGb": 32,
|
|
250
276
|
"minVramGb": 16,
|
|
251
|
-
"diskGb": 60
|
|
277
|
+
"diskGb": 60,
|
|
278
|
+
"gpuVendors": ["nvidia", "amd"],
|
|
279
|
+
"requiresRocm": true
|
|
252
280
|
},
|
|
253
281
|
"source": {
|
|
254
282
|
"type": "manual",
|
|
255
|
-
"description": "Place Flux model, text encoder, and VAE files in the configured ComfyUI models directory."
|
|
283
|
+
"description": "Place Flux model, text encoder, and VAE files in the configured ComfyUI models directory. Use a CUDA or Linux ROCm ComfyUI runtime that matches the host GPU."
|
|
256
284
|
}
|
|
257
285
|
},
|
|
258
286
|
{
|
|
@@ -3780,11 +3780,11 @@ def local_parity_recommended_model(feature_family: str, primary_models: list[str
|
|
|
3780
3780
|
return models[0] if models else None
|
|
3781
3781
|
|
|
3782
3782
|
if "codex" in family or "software" in family or "code" in family:
|
|
3783
|
-
return first_available(["slopcode-qwen-coder-local", "local-agent-glm52", "glm52-q4-local"])
|
|
3783
|
+
return first_available(["slopcode-qwen-coder-local", "local-agent-glm52", "glm52-q8-local", "glm52-q4-local"])
|
|
3784
3784
|
if "deep research" in family:
|
|
3785
|
-
return first_available(["deep-research-glm52", "glm52-q4-local"])
|
|
3785
|
+
return first_available(["deep-research-glm52", "glm52-q8-local", "glm52-q4-local"])
|
|
3786
3786
|
if "developer mode" in family or "mcp" in family:
|
|
3787
|
-
return first_available(["local-agent-glm52", "local-chatgpt-auto", "glm52-q4-local"])
|
|
3787
|
+
return first_available(["local-agent-glm52", "local-chatgpt-auto", "glm52-q8-local", "glm52-q4-local"])
|
|
3788
3788
|
if "image generation" in family:
|
|
3789
3789
|
return first_available(["flux-2-klein-9b-fp8"])
|
|
3790
3790
|
if "image editing" in family:
|
|
@@ -3794,16 +3794,16 @@ def local_parity_recommended_model(feature_family: str, primary_models: list[str
|
|
|
3794
3794
|
if "voice" in family or "record mode" in family:
|
|
3795
3795
|
return first_available(["whisper-base-bundled", "whisper-large-v3", "local-agent-glm52"])
|
|
3796
3796
|
if "shopping" in family:
|
|
3797
|
-
return first_available(["glm52-shopping-research-local", "glm52-q4-local"])
|
|
3797
|
+
return first_available(["glm52-shopping-research-local", "glm52-q8-local", "glm52-q4-local"])
|
|
3798
3798
|
if "job search" in family or "resume" in family or "finance" in family:
|
|
3799
|
-
return first_available(["local-agent-glm52", "local-chatgpt-auto", "glm52-q4-local"])
|
|
3799
|
+
return first_available(["local-agent-glm52", "local-chatgpt-auto", "glm52-q8-local", "glm52-q4-local"])
|
|
3800
3800
|
if "study" in family:
|
|
3801
|
-
return first_available(["glm52-study-coach-local", "glm52-q4-local"])
|
|
3801
|
+
return first_available(["glm52-study-coach-local", "glm52-q8-local", "glm52-q4-local"])
|
|
3802
3802
|
if "advanced reasoning" in family or "long context" in family:
|
|
3803
|
-
return first_available(["glm52-q4-local"])
|
|
3803
|
+
return first_available(["glm52-q8-local", "glm52-q4-local"])
|
|
3804
3804
|
if "data analysis" in family or "canvas" in family or "memory" in family or "agent mode" in family:
|
|
3805
|
-
return first_available(["local-agent-glm52", "glm52-q4-local"])
|
|
3806
|
-
return first_available(["local-chatgpt-auto", "local-auto-router", "local-instant-gemma4-12b", "glm52-q4-local"])
|
|
3805
|
+
return first_available(["local-agent-glm52", "glm52-q8-local", "glm52-q4-local"])
|
|
3806
|
+
return first_available(["local-chatgpt-auto", "local-auto-router", "local-instant-gemma4-12b", "glm52-q8-local", "glm52-q4-local"])
|
|
3807
3807
|
|
|
3808
3808
|
|
|
3809
3809
|
def local_parity_route_for_model(feature_family: str, model: str | None, profiles: dict) -> dict:
|
|
@@ -3833,10 +3833,10 @@ def local_parity_route_for_model(feature_family: str, model: str | None, profile
|
|
|
3833
3833
|
route_id = "slopcode_tiny"
|
|
3834
3834
|
route_type = "benchmarked_chat_route"
|
|
3835
3835
|
action = "Select the Slopcode/Qwen coding model in OpenWebUI for local software work."
|
|
3836
|
-
elif model_text
|
|
3836
|
+
elif model_text in {"glm52-q8-local", "glm52-q4-local"} or "advanced reasoning" in family or "long context" in family:
|
|
3837
3837
|
route_id = "glm_tiny"
|
|
3838
3838
|
route_type = "benchmarked_chat_route"
|
|
3839
|
-
action = "Select GLM 5.2
|
|
3839
|
+
action = "Select the best available local GLM 5.2 route in OpenWebUI for private long-context reasoning."
|
|
3840
3840
|
elif "shopping" in model_text:
|
|
3841
3841
|
route_id = "glm52_shopping_research_preset"
|
|
3842
3842
|
route_type = "chat_preset"
|
|
@@ -4462,9 +4462,9 @@ WORKFLOW_RECIPE_BLUEPRINTS = [
|
|
|
4462
4462
|
"id": "private-long-context-workflow",
|
|
4463
4463
|
"task_id": "private-long-context-reasoning",
|
|
4464
4464
|
"title": "Private long-context reasoning with GLM 5.2",
|
|
4465
|
-
"openwebui_entrypoint": "Model picker -> glm52-q4-local",
|
|
4465
|
+
"openwebui_entrypoint": "Model picker -> glm52-q8-local on enterprise rigs, otherwise glm52-q4-local",
|
|
4466
4466
|
"steps": [
|
|
4467
|
-
"Select glm52-
|
|
4467
|
+
"Select glm52-q8-local on enterprise hardware when maximum local quality matters; otherwise select glm52-q4-local.",
|
|
4468
4468
|
"Keep the prompt bounded when possible; use files/projects for reusable context.",
|
|
4469
4469
|
"Use fast local routes for quick follow-ups when GLM latency is not needed.",
|
|
4470
4470
|
],
|
|
@@ -5293,6 +5293,7 @@ def local_parity_dashboard() -> dict:
|
|
|
5293
5293
|
},
|
|
5294
5294
|
"urls": {
|
|
5295
5295
|
"openwebui": "http://127.0.0.1:8080",
|
|
5296
|
+
"glm52_q8_openai": "http://127.0.0.1:11446/v1",
|
|
5296
5297
|
"glm52_openai": "http://127.0.0.1:11441/v1",
|
|
5297
5298
|
"slopcode_openai": "http://127.0.0.1:11438/v1",
|
|
5298
5299
|
"deep_research_openai": "http://127.0.0.1:18041/v1",
|
|
@@ -5306,6 +5307,7 @@ def local_parity_dashboard() -> dict:
|
|
|
5306
5307
|
},
|
|
5307
5308
|
"primary_models": [
|
|
5308
5309
|
{"id": "local-chatgpt-auto", "route": "fast_router", "best_for": "default local ChatGPT-like routing"},
|
|
5310
|
+
{"id": "glm52-q8-local", "route": "glm_tiny", "context_tokens": 65536, "best_for": "enterprise 8-bit private long-context reasoning"},
|
|
5309
5311
|
{"id": "glm52-q4-local", "route": "glm_tiny", "context_tokens": 65536, "best_for": "private long-context reasoning"},
|
|
5310
5312
|
{
|
|
5311
5313
|
"id": "qwen3.6-35b-a3b:slopcode-cpu-64k",
|
|
@@ -6660,7 +6662,7 @@ def local_parity_audit_html() -> str:
|
|
|
6660
6662
|
{metric("Starter prompts", f"{starter_summary.get('ready_starter_prompts')}/{starter_summary.get('starter_prompts')}", "prompt-library items")}
|
|
6661
6663
|
{metric("Current release", f"{source_freshness_summary.get('current_release_covered_families')}/{source_freshness_summary.get('current_release_expected_families')}", "families")}
|
|
6662
6664
|
{metric("Release evidence", f"{source_freshness_summary.get('current_release_covered_evidence_terms')}/{source_freshness_summary.get('current_release_expected_evidence_terms')}", "terms")}
|
|
6663
|
-
{metric("Primary GLM route", "glm52-q4-local", "local long-context model")}
|
|
6665
|
+
{metric("Primary GLM route", "glm52-q8-local / glm52-q4-local", "local long-context model")}
|
|
6664
6666
|
{metric("Scope exclusions", f"{frontier_summary.get('excluded_from_local_goal_items')}/{frontier_summary.get('boundary_items')}", "hosted capabilities")}
|
|
6665
6667
|
{metric("Evidence artifacts", f"{evidence_summary.get('ready_artifacts')}/{evidence_summary.get('artifacts')}", "privacy-safe proof")}
|
|
6666
6668
|
{metric("Quality evals", scorecard_summary.get('quality_evals'), "executable")}
|
|
@@ -9653,7 +9655,7 @@ def local_parity_gap_report_html() -> str:
|
|
|
9653
9655
|
{metric("Quality evals", summary.get('quality_evals'), "executable catalog")}
|
|
9654
9656
|
{metric("Continuity", summary.get('continuity_status'), "fallback status")}
|
|
9655
9657
|
{metric("Sources", summary.get('source_entries'), "source snapshot")}
|
|
9656
|
-
{metric("Primary GLM route", "glm52-q4-local", "local long-context model")}
|
|
9658
|
+
{metric("Primary GLM route", "glm52-q8-local / glm52-q4-local", "local long-context model")}
|
|
9657
9659
|
{metric("GLM context", "65,536", "tokens")}
|
|
9658
9660
|
{metric("Scope exclusions", f"{frontier_summary.get('excluded_from_local_goal_items')}/{frontier_summary.get('boundary_items')}", "hosted capabilities")}
|
|
9659
9661
|
</section>
|
|
@@ -12797,7 +12799,7 @@ def local_model_route_recommendations() -> dict:
|
|
|
12797
12799
|
"glm_tiny": {
|
|
12798
12800
|
"title": "Private GLM 5.2 reasoning route",
|
|
12799
12801
|
"benchmark_suite": "glm_tiny",
|
|
12800
|
-
"default_model": "glm52-q4-local",
|
|
12802
|
+
"default_model": "glm52-q8-local or glm52-q4-local",
|
|
12801
12803
|
"target_tps": 0.1,
|
|
12802
12804
|
"best_for": [
|
|
12803
12805
|
"private long-context reasoning",
|