gpu-container 0.1.0__tar.gz → 0.1.2__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {gpu_container-0.1.0 → gpu_container-0.1.2}/CHANGELOG.md +14 -2
- {gpu_container-0.1.0 → gpu_container-0.1.2}/PKG-INFO +1 -1
- {gpu_container-0.1.0 → gpu_container-0.1.2}/gpu_container/__init__.py +1 -1
- {gpu_container-0.1.0 → gpu_container-0.1.2}/gpu_container/__main__.py +4 -1
- gpu_container-0.1.2/npm/README.es.md +67 -0
- gpu_container-0.1.2/npm/README.fr.md +67 -0
- gpu_container-0.1.2/npm/README.hi.md +67 -0
- gpu_container-0.1.2/npm/README.it.md +67 -0
- gpu_container-0.1.2/npm/README.ja.md +67 -0
- gpu_container-0.1.2/npm/README.md +67 -0
- gpu_container-0.1.2/npm/README.pt-BR.md +67 -0
- gpu_container-0.1.2/npm/README.zh.md +67 -0
- {gpu_container-0.1.0 → gpu_container-0.1.2}/npm/bin/gpu-container.js +4 -4
- {gpu_container-0.1.0 → gpu_container-0.1.2}/npm/package.json +1 -1
- {gpu_container-0.1.0 → gpu_container-0.1.2}/pyproject.toml +1 -1
- {gpu_container-0.1.0 → gpu_container-0.1.2}/tests/test_dispatch.py +2 -1
- gpu_container-0.1.0/npm/README.md +0 -16
- {gpu_container-0.1.0 → gpu_container-0.1.2}/.dockerignore +0 -0
- {gpu_container-0.1.0 → gpu_container-0.1.2}/.github/workflows/ci.yml +0 -0
- {gpu_container-0.1.0 → gpu_container-0.1.2}/.github/workflows/pages.yml +0 -0
- {gpu_container-0.1.0 → gpu_container-0.1.2}/.github/workflows/release.yml +0 -0
- {gpu_container-0.1.0 → gpu_container-0.1.2}/.gitignore +0 -0
- {gpu_container-0.1.0 → gpu_container-0.1.2}/Dockerfile +0 -0
- {gpu_container-0.1.0 → gpu_container-0.1.2}/LICENSE +0 -0
- {gpu_container-0.1.0 → gpu_container-0.1.2}/README.es.md +0 -0
- {gpu_container-0.1.0 → gpu_container-0.1.2}/README.fr.md +0 -0
- {gpu_container-0.1.0 → gpu_container-0.1.2}/README.hi.md +0 -0
- {gpu_container-0.1.0 → gpu_container-0.1.2}/README.it.md +0 -0
- {gpu_container-0.1.0 → gpu_container-0.1.2}/README.ja.md +0 -0
- {gpu_container-0.1.0 → gpu_container-0.1.2}/README.md +0 -0
- {gpu_container-0.1.0 → gpu_container-0.1.2}/README.pt-BR.md +0 -0
- {gpu_container-0.1.0 → gpu_container-0.1.2}/README.zh.md +0 -0
- {gpu_container-0.1.0 → gpu_container-0.1.2}/RELEASE_ASSESSMENT.md +0 -0
- {gpu_container-0.1.0 → gpu_container-0.1.2}/SCORECARD.md +0 -0
- {gpu_container-0.1.0 → gpu_container-0.1.2}/SECURITY.md +0 -0
- {gpu_container-0.1.0 → gpu_container-0.1.2}/SHIP_GATE.md +0 -0
- {gpu_container-0.1.0 → gpu_container-0.1.2}/assets/logo.png +0 -0
- {gpu_container-0.1.0 → gpu_container-0.1.2}/docs/architecture.md +0 -0
- {gpu_container-0.1.0 → gpu_container-0.1.2}/docs/cli.md +0 -0
- {gpu_container-0.1.0 → gpu_container-0.1.2}/docs/constraints.md +0 -0
- {gpu_container-0.1.0 → gpu_container-0.1.2}/docs/decisions/0001-per-expert-cache-build-vs-upstream.md +0 -0
- {gpu_container-0.1.0 → gpu_container-0.1.2}/docs/derisk-concentration.md +0 -0
- {gpu_container-0.1.0 → gpu_container-0.1.2}/docs/feasibility.md +0 -0
- {gpu_container-0.1.0 → gpu_container-0.1.2}/docs/features.md +0 -0
- {gpu_container-0.1.0 → gpu_container-0.1.2}/docs/moe-lane-architecture.md +0 -0
- {gpu_container-0.1.0 → gpu_container-0.1.2}/docs/prior-art.md +0 -0
- {gpu_container-0.1.0 → gpu_container-0.1.2}/docs/quickstart.md +0 -0
- {gpu_container-0.1.0 → gpu_container-0.1.2}/gpu_container/errors.py +0 -0
- {gpu_container-0.1.0 → gpu_container-0.1.2}/gpu_container/planner/__init__.py +0 -0
- {gpu_container-0.1.0 → gpu_container-0.1.2}/gpu_container/planner/activation.py +0 -0
- {gpu_container-0.1.0 → gpu_container-0.1.2}/gpu_container/planner/calibration.py +0 -0
- {gpu_container-0.1.0 → gpu_container-0.1.2}/gpu_container/planner/calibration_seed.json +0 -0
- {gpu_container-0.1.0 → gpu_container-0.1.2}/gpu_container/planner/cli.py +0 -0
- {gpu_container-0.1.0 → gpu_container-0.1.2}/gpu_container/planner/concentration_cli.py +0 -0
- {gpu_container-0.1.0 → gpu_container-0.1.2}/gpu_container/planner/placement.py +0 -0
- {gpu_container-0.1.0 → gpu_container-0.1.2}/gpu_container/planner/receipt.py +0 -0
- {gpu_container-0.1.0 → gpu_container-0.1.2}/gpu_container/planner/receipt_cli.py +0 -0
- {gpu_container-0.1.0 → gpu_container-0.1.2}/gpu_container/profiler/__init__.py +0 -0
- {gpu_container-0.1.0 → gpu_container-0.1.2}/gpu_container/profiler/baseline.py +0 -0
- {gpu_container-0.1.0 → gpu_container-0.1.2}/gpu_container/profiler/cli.py +0 -0
- {gpu_container-0.1.0 → gpu_container-0.1.2}/gpu_container/profiler/cuda_bench.py +0 -0
- {gpu_container-0.1.0 → gpu_container-0.1.2}/gpu_container/profiler/hardware.py +0 -0
- {gpu_container-0.1.0 → gpu_container-0.1.2}/gpu_container/profiler/model.py +0 -0
- {gpu_container-0.1.0 → gpu_container-0.1.2}/gpu_container/profiler/nvme_bench.py +0 -0
- {gpu_container-0.1.0 → gpu_container-0.1.2}/gpu_container/profiler/schema.py +0 -0
- {gpu_container-0.1.0 → gpu_container-0.1.2}/gpu_container/watchdog.py +0 -0
- {gpu_container-0.1.0 → gpu_container-0.1.2}/npm/LICENSE +0 -0
- {gpu_container-0.1.0 → gpu_container-0.1.2}/scripts/gen_calibration_seed.py +0 -0
- {gpu_container-0.1.0 → gpu_container-0.1.2}/scripts/ingest_sweep.py +0 -0
- {gpu_container-0.1.0 → gpu_container-0.1.2}/scripts/verify.py +0 -0
- {gpu_container-0.1.0 → gpu_container-0.1.2}/site/astro.config.mjs +0 -0
- {gpu_container-0.1.0 → gpu_container-0.1.2}/site/package-lock.json +0 -0
- {gpu_container-0.1.0 → gpu_container-0.1.2}/site/package.json +0 -0
- {gpu_container-0.1.0 → gpu_container-0.1.2}/site/src/content/docs/handbook/cli.md +0 -0
- {gpu_container-0.1.0 → gpu_container-0.1.2}/site/src/content/docs/handbook/derisk.md +0 -0
- {gpu_container-0.1.0 → gpu_container-0.1.2}/site/src/content/docs/handbook/getting-started.md +0 -0
- {gpu_container-0.1.0 → gpu_container-0.1.2}/site/src/content/docs/handbook/index.md +0 -0
- {gpu_container-0.1.0 → gpu_container-0.1.2}/site/src/content/docs/handbook/moe-lane.md +0 -0
- {gpu_container-0.1.0 → gpu_container-0.1.2}/site/src/content/docs/handbook/reference.md +0 -0
- {gpu_container-0.1.0 → gpu_container-0.1.2}/site/src/content/docs/handbook/safety.md +0 -0
- {gpu_container-0.1.0 → gpu_container-0.1.2}/site/src/content.config.ts +0 -0
- {gpu_container-0.1.0 → gpu_container-0.1.2}/site/src/pages/index.astro +0 -0
- {gpu_container-0.1.0 → gpu_container-0.1.2}/site/src/site-config.ts +0 -0
- {gpu_container-0.1.0 → gpu_container-0.1.2}/site/src/styles/global.css +0 -0
- {gpu_container-0.1.0 → gpu_container-0.1.2}/site/src/styles/starlight-custom.css +0 -0
- {gpu_container-0.1.0 → gpu_container-0.1.2}/site/tsconfig.json +0 -0
- {gpu_container-0.1.0 → gpu_container-0.1.2}/tests/test_activation.py +0 -0
- {gpu_container-0.1.0 → gpu_container-0.1.2}/tests/test_calibration.py +0 -0
- {gpu_container-0.1.0 → gpu_container-0.1.2}/tests/test_concentration_cli.py +0 -0
- {gpu_container-0.1.0 → gpu_container-0.1.2}/tests/test_errors.py +0 -0
- {gpu_container-0.1.0 → gpu_container-0.1.2}/tests/test_measure.py +0 -0
- {gpu_container-0.1.0 → gpu_container-0.1.2}/tests/test_planner.py +0 -0
- {gpu_container-0.1.0 → gpu_container-0.1.2}/tests/test_profiler.py +0 -0
- {gpu_container-0.1.0 → gpu_container-0.1.2}/tests/test_receipt_trace.py +0 -0
- {gpu_container-0.1.0 → gpu_container-0.1.2}/tests/test_watchdog.py +0 -0
- {gpu_container-0.1.0 → gpu_container-0.1.2}/watchdog.example.json +0 -0
|
@@ -5,9 +5,21 @@ All notable changes to this project will be documented in this file.
|
|
|
5
5
|
The format is based on [Keep a Changelog](https://keepachangelog.com/),
|
|
6
6
|
and this project adheres to [Semantic Versioning](https://semver.org/).
|
|
7
7
|
|
|
8
|
-
## [
|
|
8
|
+
## [0.1.2] - 2026-06-04
|
|
9
9
|
|
|
10
|
-
|
|
10
|
+
### Changed
|
|
11
|
+
- **npm package README** is now first-class — logo, badges, full content, and 8-language translations (it was a thin stub). No code change from 0.1.1; this release exists to publish the corrected npm page (npm only updates the page on a new version, and the launcher version must match a release that carries the binaries).
|
|
12
|
+
|
|
13
|
+
## [0.1.1] - 2026-06-04
|
|
14
|
+
|
|
15
|
+
First **complete** beta (all four channels: PyPI · npm · Docker/ghcr · GitHub Release binaries).
|
|
16
|
+
|
|
17
|
+
### Fixed
|
|
18
|
+
- Standalone PyInstaller binary + npm launcher: the unified `gpu-container` entry used a relative import (`from . import …`) that raises *"attempted relative import with no known parent package"* in a frozen binary (it works under `python -m`, the trap). Switched to an absolute import so `gpu-container <command>` runs from the binary. 0.1.0 shipped to PyPI + Docker, but the binary smoke test correctly gated out the binaries + npm launcher; 0.1.1 ships them.
|
|
19
|
+
|
|
20
|
+
## [0.1.0] - 2026-06-04
|
|
21
|
+
|
|
22
|
+
Initial public beta — PyPI + Docker (the binaries + npm launcher land in 0.1.1).
|
|
11
23
|
|
|
12
24
|
### Added
|
|
13
25
|
- **Hardware + model profiler** (`gpu-container-profile`) — measured PCIe H2D/D2H, NVMe sequential + random-QD1, pinnable-RAM ceiling, CPU RAM bandwidth (all measured in-container, `None`-not-guess); closed-form model param-split (expert vs always-resident) and KV growth.
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: gpu-container
|
|
3
|
-
Version: 0.1.
|
|
3
|
+
Version: 0.1.2
|
|
4
4
|
Summary: Model-aware inference memory-placement planner for single-GPU rigs — profile, plan, prove.
|
|
5
5
|
Author-email: mcp-tool-shop <64996768+mcp-tool-shop@users.noreply.github.com>
|
|
6
6
|
License-Expression: MIT
|
|
@@ -16,7 +16,10 @@ from __future__ import annotations
|
|
|
16
16
|
import sys
|
|
17
17
|
from typing import List, Optional
|
|
18
18
|
|
|
19
|
-
from . import
|
|
19
|
+
# Absolute (not `from . import`): this module is the PyInstaller --onefile entry, run as a top-level
|
|
20
|
+
# `__main__` with no parent package — a relative import raises "attempted relative import with no
|
|
21
|
+
# known parent package" in the frozen binary (it works under `python -m`, which is the trap).
|
|
22
|
+
from gpu_container import __version__
|
|
20
23
|
|
|
21
24
|
_SUB = {
|
|
22
25
|
"profile": "gpu_container.profiler.cli",
|
|
@@ -0,0 +1,67 @@
|
|
|
1
|
+
<p align="center">
|
|
2
|
+
<a href="README.ja.md">日本語</a> | <a href="README.zh.md">中文</a> | <a href="README.md">English</a> | <a href="README.fr.md">Français</a> | <a href="README.hi.md">हिन्दी</a> | <a href="README.it.md">Italiano</a> | <a href="README.pt-BR.md">Português (BR)</a>
|
|
3
|
+
</p>
|
|
4
|
+
|
|
5
|
+
<div align="center">
|
|
6
|
+
|
|
7
|
+
<img src="https://raw.githubusercontent.com/mcp-tool-shop-org/gpu-container/main/assets/logo.png" width="400" alt="gpu-container" />
|
|
8
|
+
|
|
9
|
+
[](https://github.com/mcp-tool-shop-org/gpu-container/actions/workflows/ci.yml)
|
|
10
|
+
[](https://pypi.org/project/gpu-container/)
|
|
11
|
+
[](https://www.npmjs.com/package/gpu-container)
|
|
12
|
+
[](https://github.com/mcp-tool-shop-org/gpu-container/blob/main/LICENSE)
|
|
13
|
+
[](https://mcp-tool-shop-org.github.io/gpu-container/)
|
|
14
|
+
|
|
15
|
+
**Un contenedor habilitado para GPU expone el dispositivo. Un entorno de ejecución consciente del modelo decide qué se almacena en la VRAM, la RAM asignada y la NVMe.**
|
|
16
|
+
|
|
17
|
+
</div
|
|
18
|
+
|
|
19
|
+
Ejecute el modelo local más grande y útil que su máquina pueda soportar de manera realista, con planes de ubicación explícitos, resultados de pruebas comparativas y rechazo cuando el plan cause problemas. Este paquete npm es un **programa de inicio sin requisitos previos**: `npx gpu-container` descarga el binario de la plataforma desde [GitHub Release](https://github.com/mcp-tool-shop-org/gpu-container/releases), verifica su SHA256 con las sumas de comprobación publicadas, lo almacena en caché y lo ejecuta. **No se requiere Python.**
|
|
20
|
+
|
|
21
|
+
```bash
|
|
22
|
+
npx gpu-container --help
|
|
23
|
+
npx gpu-container plan --profile profile.json --model-config qwen3.json --quant gguf-q4_k_m
|
|
24
|
+
```
|
|
25
|
+
|
|
26
|
+
> ¿Prefiere Python? `pip install "gpu-container[host]"` instala directamente los cinco comandos `gpu-container-*`.
|
|
27
|
+
|
|
28
|
+
## Por qué existe
|
|
29
|
+
|
|
30
|
+
En Windows/WSL2, la sobreasignación de memoria unificada de CUDA **no está disponible** (confirmado por NVIDIA) y no es la herramienta adecuada para la decodificación, incluso en Linux. Por lo tanto, `gpu-container` no depende de la magia del entorno de ejecución; en su lugar, hace que la **ubicación explícita y declarada** sea el producto. Esa es la ventaja competitiva.
|
|
31
|
+
|
|
32
|
+
## Qué hace
|
|
33
|
+
|
|
34
|
+
`gpu-container <command>` es un conjunto de cinco herramientas en un solo binario:
|
|
35
|
+
|
|
36
|
+
| Comando | Hace |
|
|
37
|
+
|---|---|
|
|
38
|
+
| `profile` | Mide el hardware (VRAM, PCIe, NVMe, RAM asignable, ancho de banda de la CPU) + el modelo |
|
|
39
|
+
| `plan` | Calcula la ubicación explícita en VRAM/RAM/NVMe + una previsión de rendimiento calibrada; **acepta o rechaza** |
|
|
40
|
+
| `receipt` | Verifica un plan con una ejecución real de `llama-bench`; escribe un punto de calibración |
|
|
41
|
+
| `concentration` | Reduce el riesgo de la caché por experto: mide la concentración del enrutamiento antes de construir para ello |
|
|
42
|
+
| `watchdog` | Supervisa un trabajo de GPU; interrumpe si se supera el límite de memoria del host/potencia/VRAM |
|
|
43
|
+
|
|
44
|
+
- **Niveles de expertos MoE** (principal) — capas compartidas/de atención en VRAM, expertos en la RAM de la CPU a través de llama.cpp `--n-cpu-moe`. Probado en vivo en Qwen3-30B-A3B.
|
|
45
|
+
- **Resultados medidos** — una ejecución real verifica la previsión con un *límite máximo* y una *banda* calibrada; el resultado mejora el siguiente plan.
|
|
46
|
+
- **Rechazo honesto** — si ningún plan supera los >1 tok/s, lo rechaza y explica por qué.
|
|
47
|
+
- **Supervisión de seguridad del hardware** — nació de un incidente real; supervisa cualquier trabajo de GPU para que un plan incorrecto no pueda inutilizar la máquina.
|
|
48
|
+
|
|
49
|
+
## Ejecute un trabajo de GPU de forma segura
|
|
50
|
+
|
|
51
|
+
```bash
|
|
52
|
+
gpu-container watchdog run --on-breach kill-job --peaks-out peaks.json -- \
|
|
53
|
+
docker run --rm --gpus all -v "E:/AI-Models:/models" ghcr.io/ggml-org/llama.cpp:full-cuda \
|
|
54
|
+
llama-bench -m /models/model.gguf --n-cpu-moe 0 -o json > bench.json
|
|
55
|
+
```
|
|
56
|
+
|
|
57
|
+
## Documentación
|
|
58
|
+
|
|
59
|
+
- **Guía de inicio rápido + manual:** https://mcp-tool-shop-org.github.io/gpu-container/handbook/
|
|
60
|
+
- **Código fuente + documentación completa:** https://github.com/mcp-tool-shop-org/gpu-container
|
|
61
|
+
- **Privacidad y seguridad:** local, sin conexión, sin telemetría, sin salida de red. [SECURITY.md](https://github.com/mcp-tool-shop-org/gpu-container/blob/main/SECURITY.md)
|
|
62
|
+
|
|
63
|
+
<div align="center">
|
|
64
|
+
|
|
65
|
+
Creado por <a href="https://mcp-tool-shop.github.io/">MCP Tool Shop</a> · Licencia MIT
|
|
66
|
+
|
|
67
|
+
</div
|
|
@@ -0,0 +1,67 @@
|
|
|
1
|
+
<p align="center">
|
|
2
|
+
<a href="README.ja.md">日本語</a> | <a href="README.zh.md">中文</a> | <a href="README.es.md">Español</a> | <a href="README.md">English</a> | <a href="README.hi.md">हिन्दी</a> | <a href="README.it.md">Italiano</a> | <a href="README.pt-BR.md">Português (BR)</a>
|
|
3
|
+
</p>
|
|
4
|
+
|
|
5
|
+
<div align="center">
|
|
6
|
+
|
|
7
|
+
<img src="https://raw.githubusercontent.com/mcp-tool-shop-org/gpu-container/main/assets/logo.png" width="400" alt="gpu-container" />
|
|
8
|
+
|
|
9
|
+
[](https://github.com/mcp-tool-shop-org/gpu-container/actions/workflows/ci.yml)
|
|
10
|
+
[](https://pypi.org/project/gpu-container/)
|
|
11
|
+
[](https://www.npmjs.com/package/gpu-container)
|
|
12
|
+
[](https://github.com/mcp-tool-shop-org/gpu-container/blob/main/LICENSE)
|
|
13
|
+
[](https://mcp-tool-shop-org.github.io/gpu-container/)
|
|
14
|
+
|
|
15
|
+
**Un conteneur compatible GPU expose le périphérique. Un environnement d’exécution prenant en compte le modèle détermine ce qui doit être placé dans la VRAM, la RAM allouée et le NVMe.**
|
|
16
|
+
|
|
17
|
+
</div
|
|
18
|
+
|
|
19
|
+
Exécutez le modèle local le plus volumineux et le plus utile que votre machine puisse réellement prendre en charge, avec des plans de placement explicites, des résultats de tests de performance et un refus si le plan risque de provoquer des problèmes. Ce paquet npm est un **lanceur sans prérequis** : `npx gpu-container` télécharge le binaire de la plateforme à partir de [GitHub Release](https://github.com/mcp-tool-shop-org/gpu-container/releases), vérifie son SHA256 par rapport aux sommes de contrôle publiées, le met en cache et l’exécute. **Aucun Python requis.**
|
|
20
|
+
|
|
21
|
+
```bash
|
|
22
|
+
npx gpu-container --help
|
|
23
|
+
npx gpu-container plan --profile profile.json --model-config qwen3.json --quant gguf-q4_k_m
|
|
24
|
+
```
|
|
25
|
+
|
|
26
|
+
> Vous préférez Python ? `pip install "gpu-container[host]"` installe directement les cinq commandes `gpu-container-*`.
|
|
27
|
+
|
|
28
|
+
## Pourquoi il existe
|
|
29
|
+
|
|
30
|
+
Sur Windows/WSL2, la surallocation de mémoire unifiée CUDA est **indisponible** (confirmé par NVIDIA) et n’est pas l’outil approprié pour le décodage, même sous Linux. Ainsi, `gpu-container` ne repose pas sur une magie d’exécution, mais rend le **placement explicite et déclaré** le produit. C’est sa force.
|
|
31
|
+
|
|
32
|
+
## Ce qu’il fait
|
|
33
|
+
|
|
34
|
+
`gpu-container <commande>` est un ensemble de cinq outils dans un seul binaire :
|
|
35
|
+
|
|
36
|
+
| Commande | Fait |
|
|
37
|
+
|---|---|
|
|
38
|
+
| `profile` | Mesure la configuration (VRAM, PCIe, NVMe, RAM allouée, bande passante du CPU) + le modèle |
|
|
39
|
+
| `plan` | Calcule le placement explicite dans la VRAM/RAM/NVMe + une prévision de débit calibrée ; **accepte ou refuse** |
|
|
40
|
+
| `receipt` | Vérifie un plan par rapport à une exécution réelle de `llama-bench` ; enregistre un point de calibration |
|
|
41
|
+
| `concentration` | Réduit les risques liés au cache par expert : mesure la concentration du routage avant de construire pour cela |
|
|
42
|
+
| `watchdog` | Supervise un travail GPU ; interrompt en cas de dépassement de la mémoire hôte/de la puissance/de la VRAM |
|
|
43
|
+
|
|
44
|
+
- **Hiérarchisation des experts MoE** (principal) : couches partagées/d’attention dans la VRAM, experts dans la RAM du CPU via llama.cpp `--n-cpu-moe`. Déjà testé en direct sur Qwen3-30B-A3B.
|
|
45
|
+
- **Résultats mesurés** : une exécution réelle vérifie la prévision par rapport à une limite *théorique* et une bande passante *calibrée* ; les résultats affinent le plan suivant.
|
|
46
|
+
- **Refus honnête** : si aucun plan ne permet d’atteindre > 1 tok/s, il refuse et explique pourquoi.
|
|
47
|
+
- **Surveillance de la sécurité de la configuration** : né d’un incident réel ; supervise tout travail GPU afin qu’un mauvais plan ne puisse pas entraîner l’arrêt de la machine.
|
|
48
|
+
|
|
49
|
+
## Exécutez un travail GPU en toute sécurité
|
|
50
|
+
|
|
51
|
+
```bash
|
|
52
|
+
gpu-container watchdog run --on-breach kill-job --peaks-out peaks.json -- \
|
|
53
|
+
docker run --rm --gpus all -v "E:/AI-Models:/models" ghcr.io/ggml-org/llama.cpp:full-cuda \
|
|
54
|
+
llama-bench -m /models/model.gguf --n-cpu-moe 0 -o json > bench.json
|
|
55
|
+
```
|
|
56
|
+
|
|
57
|
+
## Documentation
|
|
58
|
+
|
|
59
|
+
- **Guide de démarrage rapide + manuel :** https://mcp-tool-shop-org.github.io/gpu-container/handbook/
|
|
60
|
+
- **Code source + documentation complète :** https://github.com/mcp-tool-shop-org/gpu-container
|
|
61
|
+
- **Confidentialité et sécurité :** local, hors ligne, pas de télémétrie, pas de transfert de données sur le réseau. [SECURITY.md](https://github.com/mcp-tool-shop-org/gpu-container/blob/main/SECURITY.md)
|
|
62
|
+
|
|
63
|
+
<div align="center">
|
|
64
|
+
|
|
65
|
+
Créé par <a href="https://mcp-tool-shop.github.io/">MCP Tool Shop</a> · Licence MIT
|
|
66
|
+
|
|
67
|
+
</div
|
|
@@ -0,0 +1,67 @@
|
|
|
1
|
+
<p align="center">
|
|
2
|
+
<a href="README.ja.md">日本語</a> | <a href="README.zh.md">中文</a> | <a href="README.es.md">Español</a> | <a href="README.fr.md">Français</a> | <a href="README.md">English</a> | <a href="README.it.md">Italiano</a> | <a href="README.pt-BR.md">Português (BR)</a>
|
|
3
|
+
</p>
|
|
4
|
+
|
|
5
|
+
<div align="center">
|
|
6
|
+
|
|
7
|
+
<img src="https://raw.githubusercontent.com/mcp-tool-shop-org/gpu-container/main/assets/logo.png" width="400" alt="gpu-container" />
|
|
8
|
+
|
|
9
|
+
[](https://github.com/mcp-tool-shop-org/gpu-container/actions/workflows/ci.yml)
|
|
10
|
+
[](https://pypi.org/project/gpu-container/)
|
|
11
|
+
[](https://www.npmjs.com/package/gpu-container)
|
|
12
|
+
[](https://github.com/mcp-tool-shop-org/gpu-container/blob/main/LICENSE)
|
|
13
|
+
[](https://mcp-tool-shop-org.github.io/gpu-container/)
|
|
14
|
+
|
|
15
|
+
**एक जीपीयू-सक्षम कंटेनर डिवाइस को उजागर करता है। एक मॉडल-जागरूक रनटाइम यह तय करता है कि वीआरएएम, पिन की गई रैम और एनवीएमई में क्या होगा।**
|
|
16
|
+
|
|
17
|
+
</div
|
|
18
|
+
|
|
19
|
+
अपने मशीन द्वारा समर्थित सबसे बड़े उपयोगी स्थानीय मॉडल को चलाएं - स्पष्ट प्लेसमेंट योजनाओं, बेंचमार्क परिणामों और उस स्थिति में इनकार के साथ जब योजना विफल हो जाए। यह एनपीएम पैकेज एक **शून्य-आवश्यकता वाला लॉन्चर** है: `npx gpu-container` प्लेटफ़ॉर्म बाइनरी को [गिटहब रिलीज़](https://github.com/mcp-tool-shop-org/gpu-container/releases) से डाउनलोड करता है, प्रकाशित चेकसम के विरुद्ध इसके SHA256 को सत्यापित करता है, इसे कैश करता है, और इसे चलाता है। **पायथन की आवश्यकता नहीं है।**
|
|
20
|
+
|
|
21
|
+
```bash
|
|
22
|
+
npx gpu-container --help
|
|
23
|
+
npx gpu-container plan --profile profile.json --model-config qwen3.json --quant gguf-q4_k_m
|
|
24
|
+
```
|
|
25
|
+
|
|
26
|
+
> क्या आप पायथन पसंद करते हैं? `pip install "gpu-container[host]"` सीधे पांच `gpu-container-*` कमांड स्थापित करता है।
|
|
27
|
+
|
|
28
|
+
## यह क्यों मौजूद है
|
|
29
|
+
|
|
30
|
+
विंडोज/डब्ल्यूएसएल2 पर, क्यूडीए यूनिफाइड-मेमोरी ओवरसब्सक्रिप्शन **उपलब्ध नहीं** है (एनवीडिया द्वारा पुष्टि की गई) और लिनक्स पर भी डिकोड के लिए गलत उपकरण है। इसलिए `gpu-container` रनटाइम ओवरफ्लो जादू पर निर्भर नहीं करता है - यह **स्पष्ट, घोषित प्लेसमेंट** को उत्पाद बनाता है। यही इसकी ताकत है।
|
|
31
|
+
|
|
32
|
+
## यह क्या करता है
|
|
33
|
+
|
|
34
|
+
`gpu-container <कमांड>` एक बाइनरी में पांच उपकरण हैं:
|
|
35
|
+
|
|
36
|
+
| कमांड | यह करता है |
|
|
37
|
+
|---|---|
|
|
38
|
+
| `profile` | मशीन (वीआरएएम, पीसीआईई, एनवीएमई, पिन करने योग्य रैम, सीपीयू बैंडविड्थ) + मॉडल को मापता है |
|
|
39
|
+
| `plan` | स्पष्ट वीआरएएम/रैम/एनवीएमई प्लेसमेंट + एक कैलिब्रेटेड थ्रूपुट पूर्वानुमान की गणना करता है; **शिप या इनकार** |
|
|
40
|
+
| `receipt` | वास्तविक `llama-bench` रन के विरुद्ध एक योजना को सत्यापित करता है; एक कैलिब्रेशन बिंदु वापस लिखता है |
|
|
41
|
+
| `concentration` | प्रति-विशेषज्ञ कैश को जोखिम से बचाता है - इसके लिए निर्माण करने से पहले रूटिंग एकाग्रता को मापता है |
|
|
42
|
+
| `watchdog` | एक जीपीयू नौकरी की निगरानी करता है; होस्ट-मेमोरी / पावर / वीआरएएम उल्लंघन पर रद्द करता है |
|
|
43
|
+
|
|
44
|
+
- **एमओई विशेषज्ञ टियरिंग** (प्रमुख) - वीआरएएम में साझा/अटेंशन परतें, सीपीयू रैम में विशेषज्ञ `llama.cpp --n-cpu-moe` के माध्यम से। Qwen3-30B-A3B पर लाइव साबित।
|
|
45
|
+
- **मापे गए परिणाम** - एक वास्तविक रन छत *सीलिंग* और एक कैलिब्रेटेड *बैंड* के विरुद्ध पूर्वानुमान को सत्यापित करता है; परिणाम अगली योजना को बेहतर बनाता है।
|
|
46
|
+
- **ईमानदार इनकार** - क्या कोई योजना >1 टोकन/सेकंड से अधिक नहीं है? यह इनकार कर देता है, और बताता है कि क्यों।
|
|
47
|
+
- **मशीन-सुरक्षा वॉचडॉग** - एक वास्तविक घटना से उत्पन्न; किसी भी जीपीयू नौकरी की निगरानी करें ताकि एक खराब योजना मशीन को बंद न कर सके।
|
|
48
|
+
|
|
49
|
+
## सुरक्षित रूप से एक जीपीयू नौकरी चलाएं
|
|
50
|
+
|
|
51
|
+
```bash
|
|
52
|
+
gpu-container watchdog run --on-breach kill-job --peaks-out peaks.json -- \
|
|
53
|
+
docker run --rm --gpus all -v "E:/AI-Models:/models" ghcr.io/ggml-org/llama.cpp:full-cuda \
|
|
54
|
+
llama-bench -m /models/model.gguf --n-cpu-moe 0 -o json > bench.json
|
|
55
|
+
```
|
|
56
|
+
|
|
57
|
+
## दस्तावेज़
|
|
58
|
+
|
|
59
|
+
- **क्विकस्टार्ट + हैंडबुक:** https://mcp-tool-shop-org.github.io/gpu-container/handbook/
|
|
60
|
+
- **स्रोत + पूर्ण दस्तावेज़:** https://github.com/mcp-tool-shop-org/gpu-container
|
|
61
|
+
- **गोपनीयता और सुरक्षा:** स्थानीय, ऑफ़लाइन, कोई टेलीमेट्री नहीं, कोई नेटवर्क आउटगोइंग नहीं। [SECURITY.md](https://github.com/mcp-tool-shop-org/gpu-container/blob/main/SECURITY.md)
|
|
62
|
+
|
|
63
|
+
<div align="center">
|
|
64
|
+
|
|
65
|
+
<a href="https://mcp-tool-shop.github.io/">एमसीपी टूल शॉप</a> द्वारा निर्मित · एमआईटी लाइसेंस प्राप्त
|
|
66
|
+
|
|
67
|
+
</div
|
|
@@ -0,0 +1,67 @@
|
|
|
1
|
+
<p align="center">
|
|
2
|
+
<a href="README.ja.md">日本語</a> | <a href="README.zh.md">中文</a> | <a href="README.es.md">Español</a> | <a href="README.fr.md">Français</a> | <a href="README.hi.md">हिन्दी</a> | <a href="README.md">English</a> | <a href="README.pt-BR.md">Português (BR)</a>
|
|
3
|
+
</p>
|
|
4
|
+
|
|
5
|
+
<div align="center">
|
|
6
|
+
|
|
7
|
+
<img src="https://raw.githubusercontent.com/mcp-tool-shop-org/gpu-container/main/assets/logo.png" width="400" alt="gpu-container" />
|
|
8
|
+
|
|
9
|
+
[](https://github.com/mcp-tool-shop-org/gpu-container/actions/workflows/ci.yml)
|
|
10
|
+
[](https://pypi.org/project/gpu-container/)
|
|
11
|
+
[](https://www.npmjs.com/package/gpu-container)
|
|
12
|
+
[](https://github.com/mcp-tool-shop-org/gpu-container/blob/main/LICENSE)
|
|
13
|
+
[](https://mcp-tool-shop-org.github.io/gpu-container/)
|
|
14
|
+
|
|
15
|
+
**Un container abilitato per GPU espone il dispositivo. Un runtime consapevole del modello decide cosa deve essere allocato nella VRAM, nella RAM allocata e nella NVMe.**
|
|
16
|
+
|
|
17
|
+
</div>
|
|
18
|
+
|
|
19
|
+
Esegui il modello locale più grande e utile che la tua macchina può effettivamente supportare, con piani di allocazione espliciti, risultati dei benchmark e rifiuto nel caso in cui il piano causerebbe problemi. Questo pacchetto npm è un **launcher senza prerequisiti**: `npx gpu-container` scarica il binario della piattaforma da [GitHub Release](https://github.com/mcp-tool-shop-org/gpu-container/releases), verifica il suo SHA256 rispetto alle checksum pubblicate, lo memorizza nella cache e lo esegue. **Non è necessario Python.**
|
|
20
|
+
|
|
21
|
+
```bash
|
|
22
|
+
npx gpu-container --help
|
|
23
|
+
npx gpu-container plan --profile profile.json --model-config qwen3.json --quant gguf-q4_k_m
|
|
24
|
+
```
|
|
25
|
+
|
|
26
|
+
> Preferisci Python? `pip install "gpu-container[host]"` installa direttamente i cinque comandi `gpu-container-*`.
|
|
27
|
+
|
|
28
|
+
## Perché esiste
|
|
29
|
+
|
|
30
|
+
Su Windows/WSL2, l'oversubscription di CUDA Unified-Memory **non è disponibile** (confermato da NVIDIA) e non è lo strumento giusto per la decodifica, nemmeno su Linux. Quindi, `gpu-container` non si basa su una "magia" di overflow in fase di esecuzione, ma rende **esplicita e dichiarata l'allocazione** come elemento centrale. Questo è il vantaggio competitivo.
|
|
31
|
+
|
|
32
|
+
## Cosa fa
|
|
33
|
+
|
|
34
|
+
`gpu-container <command>` è costituito da cinque strumenti in un unico binario:
|
|
35
|
+
|
|
36
|
+
| Comando | Fa |
|
|
37
|
+
|---|---|
|
|
38
|
+
| `profile` | Misura le risorse (VRAM, PCIe, NVMe, RAM allocabile, larghezza di banda della CPU) + il modello |
|
|
39
|
+
| `plan` | Calcola l'allocazione esplicita in VRAM/RAM/NVMe + una previsione di throughput calibrata; **accetta o rifiuta** |
|
|
40
|
+
| `receipt` | Verifica un piano rispetto a un'esecuzione reale di `llama-bench`; scrive un punto di calibrazione |
|
|
41
|
+
| `concentration` | Riduce il rischio della cache per ogni esperto: misura la concentrazione del routing prima di procedere alla sua creazione |
|
|
42
|
+
| `watchdog` | Supervisiona un lavoro della GPU; interrompe in caso di superamento dei limiti di memoria host, potenza o VRAM |
|
|
43
|
+
|
|
44
|
+
- **Tiering degli esperti MoE** (funzionalità principale): livelli condivisi/di attenzione in VRAM, esperti nella RAM della CPU tramite llama.cpp `--n-cpu-moe`. Testato in diretta su Qwen3-30B-A3B.
|
|
45
|
+
- **Risultati misurati**: un'esecuzione reale verifica la previsione rispetto a un limite massimo e a una banda calibrata; i risultati affinano il piano successivo.
|
|
46
|
+
- **Rifiuto onesto**: se nessun piano supera 1 tok/s, viene rifiutato e viene spiegato il motivo.
|
|
47
|
+
- **Watchdog per la sicurezza del sistema**: nato da un incidente reale; supervisiona qualsiasi lavoro della GPU in modo che un piano errato non possa compromettere il sistema.
|
|
48
|
+
|
|
49
|
+
## Esegui un lavoro della GPU in modo sicuro
|
|
50
|
+
|
|
51
|
+
```bash
|
|
52
|
+
gpu-container watchdog run --on-breach kill-job --peaks-out peaks.json -- \
|
|
53
|
+
docker run --rm --gpus all -v "E:/AI-Models:/models" ghcr.io/ggml-org/llama.cpp:full-cuda \
|
|
54
|
+
llama-bench -m /models/model.gguf --n-cpu-moe 0 -o json > bench.json
|
|
55
|
+
```
|
|
56
|
+
|
|
57
|
+
## Documentazione
|
|
58
|
+
|
|
59
|
+
- **Guida rapida + manuale**: https://mcp-tool-shop-org.github.io/gpu-container/handbook/
|
|
60
|
+
- **Codice sorgente + documentazione completa**: https://github.com/mcp-tool-shop-org/gpu-container
|
|
61
|
+
- **Privacy e sicurezza**: locale, offline, senza telemetria, nessun trasferimento di dati in rete. [SECURITY.md](https://github.com/mcp-tool-shop-org/gpu-container/blob/main/SECURITY.md)
|
|
62
|
+
|
|
63
|
+
<div align="center">
|
|
64
|
+
|
|
65
|
+
Creato da <a href="https://mcp-tool-shop.github.io/">MCP Tool Shop</a> · Licenza MIT
|
|
66
|
+
|
|
67
|
+
</div>
|
|
@@ -0,0 +1,67 @@
|
|
|
1
|
+
<p align="center">
|
|
2
|
+
<a href="README.md">English</a> | <a href="README.zh.md">中文</a> | <a href="README.es.md">Español</a> | <a href="README.fr.md">Français</a> | <a href="README.hi.md">हिन्दी</a> | <a href="README.it.md">Italiano</a> | <a href="README.pt-BR.md">Português (BR)</a>
|
|
3
|
+
</p>
|
|
4
|
+
|
|
5
|
+
<div align="center">
|
|
6
|
+
|
|
7
|
+
<img src="https://raw.githubusercontent.com/mcp-tool-shop-org/gpu-container/main/assets/logo.png" width="400" alt="gpu-container" />
|
|
8
|
+
|
|
9
|
+
[](https://github.com/mcp-tool-shop-org/gpu-container/actions/workflows/ci.yml)
|
|
10
|
+
[](https://pypi.org/project/gpu-container/)
|
|
11
|
+
[](https://www.npmjs.com/package/gpu-container)
|
|
12
|
+
[](https://github.com/mcp-tool-shop-org/gpu-container/blob/main/LICENSE)
|
|
13
|
+
[](https://mcp-tool-shop-org.github.io/gpu-container/)
|
|
14
|
+
|
|
15
|
+
**GPU を搭載したコンテナは、デバイスを公開します。モデルを認識するランタイムは、VRAM、固定 RAM、および NVMe に何を配置するかを決定します。**
|
|
16
|
+
|
|
17
|
+
</div
|
|
18
|
+
|
|
19
|
+
マシンが実際にサポートできる最大の有用なローカルモデルを、明示的な配置計画、ベンチマークの結果、および計画が過剰な負荷をかける場合に拒否する機能とともに実行します。この npm パッケージは、**前提条件がゼロのランチャー**です。`npx gpu-container` は、[GitHub リリース](https://github.com/mcp-tool-shop-org/gpu-container/releases) からプラットフォームバイナリをダウンロードし、公開されているチェックサムに対して SHA256 を検証し、キャッシュし、実行します。**Python は不要です。**
|
|
20
|
+
|
|
21
|
+
```bash
|
|
22
|
+
npx gpu-container --help
|
|
23
|
+
npx gpu-container plan --profile profile.json --model-config qwen3.json --quant gguf-q4_k_m
|
|
24
|
+
```
|
|
25
|
+
|
|
26
|
+
> Python を使用したいですか? `pip install "gpu-container[host]"` を使用すると、5 つの `gpu-container-*` コマンドが直接インストールされます。
|
|
27
|
+
|
|
28
|
+
## このツールの存在意義
|
|
29
|
+
|
|
30
|
+
Windows/WSL2 では、CUDA Unified-Memory の過剰な割り当ては**利用できません**(NVIDIA が確認済み)であり、Linux でもデコードには適切なツールではありません。したがって、`gpu-container` はランタイムの動的なオーバーフローに依存するのではなく、**明示的で宣言的な配置**を製品の核とします。それがこのツールの強みです。
|
|
31
|
+
|
|
32
|
+
## このツールの機能
|
|
33
|
+
|
|
34
|
+
`gpu-container <command>` は、1 つのバイナリに 5 つのツールをまとめたものです。
|
|
35
|
+
|
|
36
|
+
| コマンド | 実行内容 |
|
|
37
|
+
|---|---|
|
|
38
|
+
| `profile` | マシン(VRAM、PCIe、NVMe、固定 RAM、CPU 帯域幅)とモデルを測定します。 |
|
|
39
|
+
| `plan` | 明示的な VRAM/RAM/NVMe 配置と、キャリブレーションされたスループット予測を計算します。**配置を承認または拒否**します。 |
|
|
40
|
+
| `receipt` | 実際の `llama-bench` 実行に対して計画を検証し、キャリブレーションポイントを記録します。 |
|
|
41
|
+
| `concentration` | 各専門家キャッシュのリスクを軽減します。それに向けてビルドする前に、ルーティングの集中度を測定します。 |
|
|
42
|
+
| `watchdog` | GPU ジョブを監視し、ホストメモリ、電力、または VRAM の制限を超えた場合にジョブを中止します。 |
|
|
43
|
+
|
|
44
|
+
- **MoE 専門家階層化**(主要機能)—共有/アテンション層を VRAM に、専門家を CPU RAM に配置します(llama.cpp の `--n-cpu-moe` オプションを使用)。Qwen3-30B-A3B で実証済み。
|
|
45
|
+
- **測定された結果**—実際の実行で、予測を屋根の線(*ceiling*)とキャリブレーションされた帯域(*band*)に対して検証します。結果は、次の計画を改善するために使用されます。
|
|
46
|
+
- **正直な拒否**—1 秒あたり 1 トークン以上の処理ができない計画の場合、拒否し、その理由を説明します。
|
|
47
|
+
- **マシン保護ウォッチドッグ**—実際のインシデントから生まれました。すべての GPU ジョブを監視し、不適切な計画によってマシンが停止しないようにします。
|
|
48
|
+
|
|
49
|
+
## GPU ジョブを安全に実行します
|
|
50
|
+
|
|
51
|
+
```bash
|
|
52
|
+
gpu-container watchdog run --on-breach kill-job --peaks-out peaks.json -- \
|
|
53
|
+
docker run --rm --gpus all -v "E:/AI-Models:/models" ghcr.io/ggml-org/llama.cpp:full-cuda \
|
|
54
|
+
llama-bench -m /models/model.gguf --n-cpu-moe 0 -o json > bench.json
|
|
55
|
+
```
|
|
56
|
+
|
|
57
|
+
## ドキュメント
|
|
58
|
+
|
|
59
|
+
- **クイックスタート + ハンドブック:** https://mcp-tool-shop-org.github.io/gpu-container/handbook/
|
|
60
|
+
- **ソースコード + 完全なドキュメント:** https://github.com/mcp-tool-shop-org/gpu-container
|
|
61
|
+
- **プライバシーと安全性:** ローカル、オフライン、テレメトリなし、ネットワークへのデータ送信なし。[SECURITY.md](https://github.com/mcp-tool-shop-org/gpu-container/blob/main/SECURITY.md)
|
|
62
|
+
|
|
63
|
+
<div align="center">
|
|
64
|
+
|
|
65
|
+
<a href="https://mcp-tool-shop.github.io/">MCP Tool Shop</a> によって作成されました。MIT ライセンス。
|
|
66
|
+
|
|
67
|
+
</div
|
|
@@ -0,0 +1,67 @@
|
|
|
1
|
+
<p align="center">
|
|
2
|
+
<a href="README.ja.md">日本語</a> | <a href="README.zh.md">中文</a> | <a href="README.es.md">Español</a> | <a href="README.fr.md">Français</a> | <a href="README.hi.md">हिन्दी</a> | <a href="README.it.md">Italiano</a> | <a href="README.pt-BR.md">Português (BR)</a>
|
|
3
|
+
</p>
|
|
4
|
+
|
|
5
|
+
<div align="center">
|
|
6
|
+
|
|
7
|
+
<img src="https://raw.githubusercontent.com/mcp-tool-shop-org/gpu-container/main/assets/logo.png" width="400" alt="gpu-container" />
|
|
8
|
+
|
|
9
|
+
[](https://github.com/mcp-tool-shop-org/gpu-container/actions/workflows/ci.yml)
|
|
10
|
+
[](https://pypi.org/project/gpu-container/)
|
|
11
|
+
[](https://www.npmjs.com/package/gpu-container)
|
|
12
|
+
[](https://github.com/mcp-tool-shop-org/gpu-container/blob/main/LICENSE)
|
|
13
|
+
[](https://mcp-tool-shop-org.github.io/gpu-container/)
|
|
14
|
+
|
|
15
|
+
**A GPU-enabled container exposes the device. A model-aware runtime decides what lives in VRAM, pinned RAM, and NVMe.**
|
|
16
|
+
|
|
17
|
+
</div>
|
|
18
|
+
|
|
19
|
+
Run the largest useful local model your machine can honestly support — with explicit placement plans, benchmark receipts, and refusal when the plan would thrash. This npm package is a **zero-prerequisite launcher**: `npx gpu-container` downloads the platform binary from the [GitHub Release](https://github.com/mcp-tool-shop-org/gpu-container/releases), verifies its SHA256 against the published checksums, caches it, and runs it. **No Python required.**
|
|
20
|
+
|
|
21
|
+
```bash
|
|
22
|
+
npx gpu-container --help
|
|
23
|
+
npx gpu-container plan --profile profile.json --model-config qwen3.json --quant gguf-q4_k_m
|
|
24
|
+
```
|
|
25
|
+
|
|
26
|
+
> Prefer Python? `pip install "gpu-container[host]"` installs the five `gpu-container-*` commands directly.
|
|
27
|
+
|
|
28
|
+
## Why it exists
|
|
29
|
+
|
|
30
|
+
On Windows/WSL2, CUDA Unified-Memory oversubscription is **unavailable** (NVIDIA-confirmed) and the wrong tool for decode even on Linux. So `gpu-container` doesn't rely on runtime overflow magic — it makes **explicit, declared placement** the product. That's the moat.
|
|
31
|
+
|
|
32
|
+
## What it does
|
|
33
|
+
|
|
34
|
+
`gpu-container <command>` is five tools in one binary:
|
|
35
|
+
|
|
36
|
+
| Command | Does |
|
|
37
|
+
|---|---|
|
|
38
|
+
| `profile` | Measure the rig (VRAM, PCIe, NVMe, pinnable RAM, CPU bandwidth) + the model |
|
|
39
|
+
| `plan` | Compute explicit VRAM/RAM/NVMe placement + a calibrated throughput forecast; **ship or refuse** |
|
|
40
|
+
| `receipt` | Verify a plan against a real `llama-bench` run; write a calibration point back |
|
|
41
|
+
| `concentration` | De-risk the per-expert cache — measure routing concentration before building for it |
|
|
42
|
+
| `watchdog` | Supervise a GPU job; abort on a host-memory / power / VRAM breach |
|
|
43
|
+
|
|
44
|
+
- **MoE expert tiering** (flagship) — shared/attention layers in VRAM, experts in CPU RAM via llama.cpp `--n-cpu-moe`. Proven live on Qwen3-30B-A3B.
|
|
45
|
+
- **Measured receipts** — a real run verifies the forecast against a roofline *ceiling* and a calibrated *band*; the receipt sharpens the next plan.
|
|
46
|
+
- **Honest refusal** — no plan clears >1 tok/s? It refuses, and explains why.
|
|
47
|
+
- **Rig-safety watchdog** — born from a real incident; supervise any GPU job so a bad plan can't take the machine down.
|
|
48
|
+
|
|
49
|
+
## Run a GPU job safely
|
|
50
|
+
|
|
51
|
+
```bash
|
|
52
|
+
gpu-container watchdog run --on-breach kill-job --peaks-out peaks.json -- \
|
|
53
|
+
docker run --rm --gpus all -v "E:/AI-Models:/models" ghcr.io/ggml-org/llama.cpp:full-cuda \
|
|
54
|
+
llama-bench -m /models/model.gguf --n-cpu-moe 0 -o json > bench.json
|
|
55
|
+
```
|
|
56
|
+
|
|
57
|
+
## Docs
|
|
58
|
+
|
|
59
|
+
- **Quickstart + handbook:** https://mcp-tool-shop-org.github.io/gpu-container/handbook/
|
|
60
|
+
- **Source + full docs:** https://github.com/mcp-tool-shop-org/gpu-container
|
|
61
|
+
- **Privacy & safety:** local, offline, no telemetry, no network egress. [SECURITY.md](https://github.com/mcp-tool-shop-org/gpu-container/blob/main/SECURITY.md)
|
|
62
|
+
|
|
63
|
+
<div align="center">
|
|
64
|
+
|
|
65
|
+
Built by <a href="https://mcp-tool-shop.github.io/">MCP Tool Shop</a> · MIT Licensed
|
|
66
|
+
|
|
67
|
+
</div>
|
|
@@ -0,0 +1,67 @@
|
|
|
1
|
+
<p align="center">
|
|
2
|
+
<a href="README.ja.md">日本語</a> | <a href="README.zh.md">中文</a> | <a href="README.es.md">Español</a> | <a href="README.fr.md">Français</a> | <a href="README.hi.md">हिन्दी</a> | <a href="README.it.md">Italiano</a> | <a href="README.md">English</a>
|
|
3
|
+
</p>
|
|
4
|
+
|
|
5
|
+
<div align="center">
|
|
6
|
+
|
|
7
|
+
<img src="https://raw.githubusercontent.com/mcp-tool-shop-org/gpu-container/main/assets/logo.png" width="400" alt="gpu-container" />
|
|
8
|
+
|
|
9
|
+
[](https://github.com/mcp-tool-shop-org/gpu-container/actions/workflows/ci.yml)
|
|
10
|
+
[](https://pypi.org/project/gpu-container/)
|
|
11
|
+
[](https://www.npmjs.com/package/gpu-container)
|
|
12
|
+
[](https://github.com/mcp-tool-shop-org/gpu-container/blob/main/LICENSE)
|
|
13
|
+
[](https://mcp-tool-shop-org.github.io/gpu-container/)
|
|
14
|
+
|
|
15
|
+
**Um contêiner habilitado para GPU expõe o dispositivo. Um ambiente de execução consciente do modelo decide o que será armazenado na VRAM, na RAM alocada e na NVMe.**
|
|
16
|
+
|
|
17
|
+
</div>
|
|
18
|
+
|
|
19
|
+
Execute o maior modelo local útil que sua máquina possa suportar — com planos de alocação explícitos, resultados de testes de desempenho e recusa quando o plano causar problemas. Este pacote npm é um **inicializador sem pré-requisitos**: `npx gpu-container` baixa o binário da plataforma do [GitHub Release](https://github.com/mcp-tool-shop-org/gpu-container/releases), verifica seu SHA256 em relação aos hashes publicados, armazena em cache e o executa. **Não é necessário Python.**
|
|
20
|
+
|
|
21
|
+
```bash
|
|
22
|
+
npx gpu-container --help
|
|
23
|
+
npx gpu-container plan --profile profile.json --model-config qwen3.json --quant gguf-q4_k_m
|
|
24
|
+
```
|
|
25
|
+
|
|
26
|
+
> Prefere Python? `pip install "gpu-container[host]"` instala diretamente os cinco comandos `gpu-container-*`.
|
|
27
|
+
|
|
28
|
+
## Por que ele existe
|
|
29
|
+
|
|
30
|
+
No Windows/WSL2, o uso excessivo de memória unificada da CUDA é **indisponível** (confirmado pela NVIDIA) e não é a ferramenta certa para decodificação, mesmo no Linux. Portanto, `gpu-container` não depende de truques de alocação em tempo de execução — ele torna a **alocação explícita e declarada** o produto. Essa é a vantagem.
|
|
31
|
+
|
|
32
|
+
## O que ele faz
|
|
33
|
+
|
|
34
|
+
`gpu-container <command>` é cinco ferramentas em um único binário:
|
|
35
|
+
|
|
36
|
+
| Comando | Faz |
|
|
37
|
+
|---|---|
|
|
38
|
+
| `profile` | Mede o hardware (VRAM, PCIe, NVMe, RAM alocada, largura de banda da CPU) + o modelo |
|
|
39
|
+
| `plan` | Calcula a alocação explícita de VRAM/RAM/NVMe + uma previsão de desempenho calibrada; **executa ou recusa** |
|
|
40
|
+
| `receipt` | Verifica um plano em relação a uma execução real do `llama-bench`; grava um ponto de calibração |
|
|
41
|
+
| `concentration` | Reduz o risco do cache por especialista — mede a concentração de roteamento antes de construir para ele |
|
|
42
|
+
| `watchdog` | Supervisiona um trabalho da GPU; aborta em caso de violação da memória do host/energia/VRAM |
|
|
43
|
+
|
|
44
|
+
- **Níveis de especialistas MoE** (principal) — camadas compartilhadas/de atenção na VRAM, especialistas na RAM da CPU via llama.cpp `--n-cpu-moe`. Comprovado em funcionamento no Qwen3-30B-A3B.
|
|
45
|
+
- **Resultados medidos** — uma execução real verifica a previsão em relação a um *limite máximo* e uma *faixa* calibrada; o resultado refina o próximo plano.
|
|
46
|
+
- **Recusa honesta** — nenhum plano atinge >1 token/s? Ele recusa e explica o motivo.
|
|
47
|
+
- **Monitor de segurança do hardware** — nascido de um incidente real; supervisiona qualquer trabalho da GPU para que um plano ruim não cause a falha da máquina.
|
|
48
|
+
|
|
49
|
+
## Execute um trabalho da GPU com segurança
|
|
50
|
+
|
|
51
|
+
```bash
|
|
52
|
+
gpu-container watchdog run --on-breach kill-job --peaks-out peaks.json -- \
|
|
53
|
+
docker run --rm --gpus all -v "E:/AI-Models:/models" ghcr.io/ggml-org/llama.cpp:full-cuda \
|
|
54
|
+
llama-bench -m /models/model.gguf --n-cpu-moe 0 -o json > bench.json
|
|
55
|
+
```
|
|
56
|
+
|
|
57
|
+
## Documentação
|
|
58
|
+
|
|
59
|
+
- **Guia rápido + manual:** https://mcp-tool-shop-org.github.io/gpu-container/handbook/
|
|
60
|
+
- **Código-fonte + documentação completa:** https://github.com/mcp-tool-shop-org/gpu-container
|
|
61
|
+
- **Privacidade e segurança:** local, offline, sem telemetria, sem envio de dados para a rede. [SECURITY.md](https://github.com/mcp-tool-shop-org/gpu-container/blob/main/SECURITY.md)
|
|
62
|
+
|
|
63
|
+
<div align="center">
|
|
64
|
+
|
|
65
|
+
Criado por <a href="https://mcp-tool-shop.github.io/">MCP Tool Shop</a> · Licenciado sob MIT
|
|
66
|
+
|
|
67
|
+
</div>
|
|
@@ -0,0 +1,67 @@
|
|
|
1
|
+
<p align="center">
|
|
2
|
+
<a href="README.ja.md">日本語</a> | <a href="README.md">English</a> | <a href="README.es.md">Español</a> | <a href="README.fr.md">Français</a> | <a href="README.hi.md">हिन्दी</a> | <a href="README.it.md">Italiano</a> | <a href="README.pt-BR.md">Português (BR)</a>
|
|
3
|
+
</p>
|
|
4
|
+
|
|
5
|
+
<div align="center">
|
|
6
|
+
|
|
7
|
+
<img src="https://raw.githubusercontent.com/mcp-tool-shop-org/gpu-container/main/assets/logo.png" width="400" alt="gpu-container" />
|
|
8
|
+
|
|
9
|
+

|
|
10
|
+

|
|
11
|
+

|
|
12
|
+

|
|
13
|
+

|
|
14
|
+
|
|
15
|
+
**启用 GPU 的容器会暴露设备。一个具备模型感知能力的运行时环境会决定哪些数据存储在显存 (VRAM)、固定内存 (pinned RAM) 和 NVMe 存储中。**
|
|
16
|
+
|
|
17
|
+
</div>
|
|
18
|
+
|
|
19
|
+
运行您机器能够稳定支持的最大规模的实用本地模型——通过明确的部署计划、基准测试结果,并在计划可能导致系统崩溃时进行拒绝。这个 npm 包是一个**无需任何先决条件的启动器**:`npx gpu-container` 从 [GitHub 发布页面](https://github.com/mcp-tool-shop-org/gpu-container/releases) 下载平台二进制文件,验证其 SHA256 值与已发布的校验和是否匹配,然后将其缓存并运行。**无需 Python。**
|
|
20
|
+
|
|
21
|
+
```bash
|
|
22
|
+
npx gpu-container --help
|
|
23
|
+
npx gpu-container plan --profile profile.json --model-config qwen3.json --quant gguf-q4_k_m
|
|
24
|
+
```
|
|
25
|
+
|
|
26
|
+
如果您更喜欢使用 Python,请运行 `pip install "gpu-container[host]"`,此命令将直接安装五个 `gpu-container-*` 命令。
|
|
27
|
+
|
|
28
|
+
## 它存在的意义是什么?
|
|
29
|
+
|
|
30
|
+
在 Windows/WSL2 平台上,CUDA 统一内存超配功能**不可用**(已得到 NVIDIA 的确认),即使在 Linux 平台上,它也不是解码的合适工具。因此,`gpu-container` 不依赖于运行时溢出机制,而是采用**明确、声明式的资源分配**方式。这就是它的优势所在。
|
|
31
|
+
|
|
32
|
+
## 它的作用是什么
|
|
33
|
+
|
|
34
|
+
`gpu-container <命令>` 是一个包含五个工具的程序:
|
|
35
|
+
|
|
36
|
+
| 命令;指挥 | 是否 |
|
|
37
|
+
|---|---|
|
|
38
|
+
| `profile` | 测量硬件配置(显存、PCIe、NVMe、可分配的内存、CPU 带宽)以及模型。 |
|
|
39
|
+
| `plan` | 计算出明确的 VRAM/RAM/NVMe 存储分配方案,并进行校准后的性能预测;**如果可行,就交付;如果不可行,就拒绝。** |
|
|
40
|
+
| `receipt` | 将计划与实际的 `llama-bench` 运行结果进行比对;并将校准点写回。 |
|
|
41
|
+
| `concentration` | 降低专家级缓存的风险——在构建缓存之前,先评估路由集中度。 |
|
|
42
|
+
| `watchdog` | 监控 GPU 任务;如果出现主机内存、电源或显存不足的情况,则中止任务。 |
|
|
43
|
+
|
|
44
|
+
- **MoE 专家分层**(旗舰版)——在 VRAM 中共享/注意力层,通过 llama.cpp 的 `--n-cpu-moe` 参数,将专家模型置于 CPU RAM 中。已在 Qwen3-30B-A3B 上进行过实际测试。
|
|
45
|
+
- **精确的验证**——通过实际运行,将预测结果与性能上限和校准后的性能范围进行对比;验证结果将用于优化下一个计划。
|
|
46
|
+
- **诚实的拒绝**——如果没有任何计划能够达到超过 1 个 token/秒的性能,系统会拒绝该计划,并解释原因。
|
|
47
|
+
- **硬件安全监控**——源于一次真实的事件;监控任何 GPU 任务,以防止不良计划导致系统崩溃。
|
|
48
|
+
|
|
49
|
+
## 安全地运行 GPU 任务
|
|
50
|
+
|
|
51
|
+
```bash
|
|
52
|
+
gpu-container watchdog run --on-breach kill-job --peaks-out peaks.json -- \
|
|
53
|
+
docker run --rm --gpus all -v "E:/AI-Models:/models" ghcr.io/ggml-org/llama.cpp:full-cuda \
|
|
54
|
+
llama-bench -m /models/model.gguf --n-cpu-moe 0 -o json > bench.json
|
|
55
|
+
```
|
|
56
|
+
|
|
57
|
+
## 文档
|
|
58
|
+
|
|
59
|
+
- **快速入门指南 + 使用手册:**https://mcp-tool-shop-org.github.io/gpu-container/handbook/
|
|
60
|
+
- **源代码 + 完整文档:**https://github.com/mcp-tool-shop-org/gpu-container
|
|
61
|
+
- **隐私与安全:**本地运行,离线使用,不收集用户数据,不进行网络数据传输。[SECURITY.md](https://github.com/mcp-tool-shop-org/gpu-container/blob/main/SECURITY.md)
|
|
62
|
+
|
|
63
|
+
<div align="center">
|
|
64
|
+
|
|
65
|
+
由 <a href="https://mcp-tool-shop.github.io/">MCP Tool Shop</a> 构建 · 采用 MIT 许可。
|
|
66
|
+
|
|
67
|
+
</div>
|
|
@@ -5,14 +5,14 @@
|
|
|
5
5
|
// the release-asset names from convention, downloads the platform binary from the gpu-container
|
|
6
6
|
// GitHub Release, verifies its SHA256 against checksums-<version>.txt, caches it, and runs it with
|
|
7
7
|
// full arg passthrough.
|
|
8
|
-
// binary: gpu-container-0.1.
|
|
9
|
-
// checksums: checksums-0.1.
|
|
8
|
+
// binary: gpu-container-0.1.2-linux-x64
|
|
9
|
+
// checksums: checksums-0.1.2.txt
|
|
10
10
|
process.env.MCPTOOLSHOP_LAUNCH_CONFIG = JSON.stringify({
|
|
11
11
|
toolName: "gpu-container",
|
|
12
12
|
owner: "mcp-tool-shop-org",
|
|
13
13
|
repo: "gpu-container",
|
|
14
|
-
version: "0.1.
|
|
15
|
-
tag: "v0.1.
|
|
14
|
+
version: "0.1.2",
|
|
15
|
+
tag: "v0.1.2",
|
|
16
16
|
});
|
|
17
17
|
|
|
18
18
|
require("@mcptoolshop/npm-launcher/bin/mcptoolshop-launch.js");
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "gpu-container",
|
|
3
|
-
"version": "0.1.
|
|
3
|
+
"version": "0.1.2",
|
|
4
4
|
"description": "gpu-container — model-aware inference memory-placement planner for single-GPU rigs. Zero-prerequisite npx install via a verified binary launcher.",
|
|
5
5
|
"type": "commonjs",
|
|
6
6
|
"license": "MIT",
|
|
@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
|
|
|
4
4
|
|
|
5
5
|
[project]
|
|
6
6
|
name = "gpu-container"
|
|
7
|
-
version = "0.1.
|
|
7
|
+
version = "0.1.2"
|
|
8
8
|
description = "Model-aware inference memory-placement planner for single-GPU rigs — profile, plan, prove."
|
|
9
9
|
readme = "README.md"
|
|
10
10
|
requires-python = ">=3.10"
|
|
@@ -1,4 +1,5 @@
|
|
|
1
1
|
"""Tests for the unified `gpu-container <command>` dispatcher (the binary/launcher entry)."""
|
|
2
|
+
from gpu_container import __version__
|
|
2
3
|
from gpu_container.__main__ import main as gpc
|
|
3
4
|
|
|
4
5
|
|
|
@@ -7,7 +8,7 @@ def test_version_help_and_bare_exit_zero(capsys):
|
|
|
7
8
|
assert gpc(["--help"]) == 0
|
|
8
9
|
assert gpc([]) == 0
|
|
9
10
|
out = capsys.readouterr().out
|
|
10
|
-
assert
|
|
11
|
+
assert __version__ in out # --version printed the static package version
|
|
11
12
|
|
|
12
13
|
|
|
13
14
|
def test_unknown_command_exits_2():
|
|
@@ -1,16 +0,0 @@
|
|
|
1
|
-
# gpu-container (npm launcher)
|
|
2
|
-
|
|
3
|
-
Zero-prerequisite launcher for **gpu-container** — a model-aware inference memory-placement planner for single-GPU rigs.
|
|
4
|
-
|
|
5
|
-
```bash
|
|
6
|
-
npx gpu-container --help
|
|
7
|
-
npx gpu-container plan --profile profile.json --model-config qwen3.json
|
|
8
|
-
```
|
|
9
|
-
|
|
10
|
-
This npm package is a thin launcher: on first run it downloads the platform binary from the [gpu-container GitHub Release](https://github.com/mcp-tool-shop-org/gpu-container/releases), verifies its SHA256 against the published `checksums-<version>.txt`, caches it, and runs it with full argument passthrough. No Python required.
|
|
11
|
-
|
|
12
|
-
Prefer Python? `pip install "gpu-container[host]"` installs the five `gpu-container-*` commands directly.
|
|
13
|
-
|
|
14
|
-
- **Source + docs:** https://github.com/mcp-tool-shop-org/gpu-container
|
|
15
|
-
- **Handbook:** https://mcp-tool-shop-org.github.io/gpu-container/handbook/
|
|
16
|
-
- **License:** MIT
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
{gpu_container-0.1.0 → gpu_container-0.1.2}/site/src/content/docs/handbook/getting-started.md
RENAMED
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|