nvdc 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
nvdc-0.1.0/PKG-INFO ADDED
@@ -0,0 +1,169 @@
1
+ Metadata-Version: 2.4
2
+ Name: nvdc
3
+ Version: 0.1.0
4
+ Summary: Bring your GPU onto the network: one command turns a GPU into a verifiable, OpenAI-compatible inference node.
5
+ Author: NVDC
6
+ License: Apache-2.0
7
+ Requires-Python: >=3.9
8
+ Description-Content-Type: text/markdown
9
+ Requires-Dist: fastapi>=0.110
10
+ Requires-Dist: uvicorn[standard]>=0.27
11
+ Requires-Dist: websockets>=12.0
12
+ Requires-Dist: httpx>=0.27
13
+ Requires-Dist: cryptography>=42.0
14
+ Requires-Dist: nvidia-ml-py>=12.535.77
15
+ Requires-Dist: redis>=5.0
16
+ Requires-Dist: stripe>=9.0
17
+ Provides-Extra: attestation
18
+ Requires-Dist: nv-attestation-sdk>=2.7.0; extra == "attestation"
19
+ Requires-Dist: nv-local-gpu-verifier>=2.7.0; extra == "attestation"
20
+
21
+ # NVDC — bring your GPU onto the network
22
+
23
+ NVDC turns any GPU machine into a **verifiable, OpenAI-compatible inference node**
24
+ on a shared network. The node operator runs one command, opens a small visual
25
+ client, picks a model to hold hot in memory, and flips the switch to go live.
26
+ A coordinator exposes a standard `POST /v1/chat/completions` endpoint and routes
27
+ each request — over an outbound tunnel — to a connected GPU node.
28
+
29
+ ```
30
+ ┌─────────────┐ OpenAI API ┌──────────────┐ WebSocket tunnel ┌──────────────┐
31
+ │ any client │ ───────────────▶│ coordinator │◀────────────────────▶│ GPU node │
32
+ │ (OpenAI SDK)│ /v1/chat/... │ (public) │ (node dials out) │ Ollama + UI │
33
+ └─────────────┘ └──────────────┘ └──────────────┘
34
+ ```
35
+
36
+ ## Why a tunnel?
37
+ The node opens a single **outbound** WebSocket to the coordinator, so it never
38
+ needs an inbound public port and its IP stays private — the same pattern used by
39
+ `brev register` (NetBird) and consumer GPU marketplaces.
40
+
41
+ ## Deployment (split: hosted web + downloadable client)
42
+
43
+ Three pieces, three homes:
44
+
45
+ | Component | Where it runs | Notes |
46
+ |---|---|---|
47
+ | **Coordinator** (`nvdc coordinator`) | A **persistent host** (Railway / Render / Fly.io / VM) | Needs long-lived WebSockets + in-memory state. **Not** Vercel serverless. A `Dockerfile` + `Procfile` are included. |
48
+ | **Web app** (`site/`) | **Vercel** (static) | Mirrors the client UI: Home/Chat/Network read from the coordinator (CORS); Mine shows a download CTA + live market figures, and lights up with real data if the client is running locally. |
49
+ | **Downloadable client** (`nvdc app`) | The miner's GPU box | The full app from above — detects the GPU, mines, holds the signing identity. |
50
+
51
+ ### Deploy the coordinator (example: Railway)
52
+ ```bash
53
+ # from the repo root — Railway/Render auto-detect the Dockerfile
54
+ # exposes the OpenAI API + /node/ws tunnel + ledger on $PORT
55
+ # After deploy you'll get a URL like https://nvdc-xxxx.up.railway.app
56
+ ```
57
+
58
+ ### Deploy the web app to Vercel
59
+ The root `vercel.json` deploys `site/` as a static site (bypassing the Python
60
+ FastAPI auto-detection). If Vercel still tries a Python build, set the project's
61
+ **Root Directory** to `site/` in the Vercel dashboard.
62
+
63
+ In the deployed site, click **"set network…"** under the logo and paste your
64
+ coordinator URL (or load it with `?coordinator=https://...`). The page then reads
65
+ the live network and, if the downloadable client is running on the visitor's
66
+ machine, recognizes it automatically (CORS + Private Network Access).
67
+
68
+ ## Quick start
69
+
70
+ One-line install (installs Python deps + Ollama + the `nvdc` client, then launches it):
71
+
72
+ ```bash
73
+ # macOS / Linux
74
+ curl -fsSL https://nvdc.ai/download/install.sh | bash # Linux
75
+ curl -fsSL https://nvdc.ai/download/install.command | bash # macOS
76
+ ```
77
+ ```powershell
78
+ # Windows (PowerShell)
79
+ irm https://nvdc.ai/download/install.ps1 | iex
80
+ ```
81
+
82
+ Or install the package directly (Python 3.9+):
83
+
84
+ ```bash
85
+ pipx install nvdc # or: pip install nvdc
86
+
87
+ # on the GPU machine, launch the visual client
88
+ # it defaults to the public network at wss://api.nvdc.ai
89
+ nvdc app
90
+
91
+ # (running your own hub? point the client at it)
92
+ nvdc coordinator --port 8000
93
+ nvdc app --coordinator ws://<coordinator-host>:8000
94
+ ```
95
+
96
+ Then in the browser UI: see your hardware, pick a model (it must load **hot into
97
+ memory** first), and click **Go Live**. The green light turns on only when a
98
+ model is hot *and* the node is live.
99
+
100
+ ### Try it without a GPU / without downloading weights
101
+ ```bash
102
+ nvdc coordinator --port 8000 &
103
+ nvdc app --mock --coordinator ws://127.0.0.1:8000
104
+ ```
105
+ Mock mode simulates model loading and uses an echo backend, so you can exercise
106
+ the entire flow (load → hot → go live → green light → routed inference).
107
+
108
+ ### Use it from any OpenAI client
109
+ ```python
110
+ from openai import OpenAI
111
+ client = OpenAI(base_url="https://api.nvdc.ai/v1", api_key="x")
112
+ client.chat.completions.create(model="llama3.1:8b",
113
+ messages=[{"role": "user", "content": "hello"}])
114
+ ```
115
+
116
+ ## CLI
117
+
118
+ | Command | What it does |
119
+ |---|---|
120
+ | `nvdc app` | Launch the visual node client (web UI) |
121
+ | `nvdc serve` | Headless node: bring this GPU onto the network |
122
+ | `nvdc coordinator` | Run the public hub + OpenAI-compatible API |
123
+ | `nvdc status` | Print local GPU + attestation status as JSON |
124
+
125
+ ## Models
126
+
127
+ The catalog is pinned to the **Ollama** library (reliable, known sizes; Ollama
128
+ also handles CUDA / Apple Metal / CPU offload). Each node reports its memory
129
+ budget and the UI marks every model **Fits / Tight / Won't fit** against it:
130
+
131
+ - unified-memory systems (DGX Spark / GB10, Apple Silicon) → budget = system RAM
132
+ - dedicated-VRAM GPUs → budget = VRAM
133
+
134
+ Popular tags included: `gpt-oss:20b`, `gpt-oss:120b`, `llama3.1:8b/70b`,
135
+ `qwen2.5:7b/32b`, `deepseek-r1`, `mistral`, `gemma2`, `phi4`.
136
+
137
+ ## Attestation (verifiable work)
138
+
139
+ Attestation is a first-class, pluggable component (`nvdc/attestation.py`):
140
+
141
+ - On a **Confidential-Computing-capable** GPU (H100/H200, B100/B200, GB200,
142
+ RTX PRO 6000 Blackwell) with CC enabled, it performs a real NVIDIA **nvTrust**
143
+ local GPU attestation and reports the verdict + claims.
144
+ - On hardware without CC (e.g. **GB10 / DGX Spark**, consumer GPUs), it reports
145
+ `supported: false` with a clear reason — it never fabricates a "verified"
146
+ result.
147
+
148
+ A coordinator can enforce policy with `--require-attested` to only route work to
149
+ nodes whose attestation verifies.
150
+
151
+ > Note: the DGX Spark / GB10 cannot produce hardware attestation (NVIDIA disabled
152
+ > CC on this SKU). It serves inference fine; it just joins as an unattested node.
153
+
154
+ ## Layout
155
+
156
+ ```
157
+ src/nvdc/
158
+ cli.py # nvdc app | serve | coordinator | status
159
+ app.py # local web server for the visual client
160
+ web/index.html # the visual client UI
161
+ runtime.py # node state machine: load → hot → live
162
+ hardware.py # accelerator + memory-budget detection (CUDA/MPS/CPU)
163
+ catalog.py # curated Ollama model catalog + fit logic
164
+ attestation.py # pluggable nvTrust attestation hook
165
+ agent.py # node agent: outbound tunnel + request handling
166
+ coordinator.py # hub: node registry + OpenAI-compatible API
167
+ inference.py # Ollama + echo backends
168
+ protocol.py # tiny JSON wire protocol
169
+ ```
nvdc-0.1.0/README.md ADDED
@@ -0,0 +1,149 @@
1
+ # NVDC — bring your GPU onto the network
2
+
3
+ NVDC turns any GPU machine into a **verifiable, OpenAI-compatible inference node**
4
+ on a shared network. The node operator runs one command, opens a small visual
5
+ client, picks a model to hold hot in memory, and flips the switch to go live.
6
+ A coordinator exposes a standard `POST /v1/chat/completions` endpoint and routes
7
+ each request — over an outbound tunnel — to a connected GPU node.
8
+
9
+ ```
10
+ ┌─────────────┐ OpenAI API ┌──────────────┐ WebSocket tunnel ┌──────────────┐
11
+ │ any client │ ───────────────▶│ coordinator │◀────────────────────▶│ GPU node │
12
+ │ (OpenAI SDK)│ /v1/chat/... │ (public) │ (node dials out) │ Ollama + UI │
13
+ └─────────────┘ └──────────────┘ └──────────────┘
14
+ ```
15
+
16
+ ## Why a tunnel?
17
+ The node opens a single **outbound** WebSocket to the coordinator, so it never
18
+ needs an inbound public port and its IP stays private — the same pattern used by
19
+ `brev register` (NetBird) and consumer GPU marketplaces.
20
+
21
+ ## Deployment (split: hosted web + downloadable client)
22
+
23
+ Three pieces, three homes:
24
+
25
+ | Component | Where it runs | Notes |
26
+ |---|---|---|
27
+ | **Coordinator** (`nvdc coordinator`) | A **persistent host** (Railway / Render / Fly.io / VM) | Needs long-lived WebSockets + in-memory state. **Not** Vercel serverless. A `Dockerfile` + `Procfile` are included. |
28
+ | **Web app** (`site/`) | **Vercel** (static) | Mirrors the client UI: Home/Chat/Network read from the coordinator (CORS); Mine shows a download CTA + live market figures, and lights up with real data if the client is running locally. |
29
+ | **Downloadable client** (`nvdc app`) | The miner's GPU box | The full app from above — detects the GPU, mines, holds the signing identity. |
30
+
31
+ ### Deploy the coordinator (example: Railway)
32
+ ```bash
33
+ # from the repo root — Railway/Render auto-detect the Dockerfile
34
+ # exposes the OpenAI API + /node/ws tunnel + ledger on $PORT
35
+ # After deploy you'll get a URL like https://nvdc-xxxx.up.railway.app
36
+ ```
37
+
38
+ ### Deploy the web app to Vercel
39
+ The root `vercel.json` deploys `site/` as a static site (bypassing the Python
40
+ FastAPI auto-detection). If Vercel still tries a Python build, set the project's
41
+ **Root Directory** to `site/` in the Vercel dashboard.
42
+
43
+ In the deployed site, click **"set network…"** under the logo and paste your
44
+ coordinator URL (or load it with `?coordinator=https://...`). The page then reads
45
+ the live network and, if the downloadable client is running on the visitor's
46
+ machine, recognizes it automatically (CORS + Private Network Access).
47
+
48
+ ## Quick start
49
+
50
+ One-line install (installs Python deps + Ollama + the `nvdc` client, then launches it):
51
+
52
+ ```bash
53
+ # macOS / Linux
54
+ curl -fsSL https://nvdc.ai/download/install.sh | bash # Linux
55
+ curl -fsSL https://nvdc.ai/download/install.command | bash # macOS
56
+ ```
57
+ ```powershell
58
+ # Windows (PowerShell)
59
+ irm https://nvdc.ai/download/install.ps1 | iex
60
+ ```
61
+
62
+ Or install the package directly (Python 3.9+):
63
+
64
+ ```bash
65
+ pipx install nvdc # or: pip install nvdc
66
+
67
+ # on the GPU machine, launch the visual client
68
+ # it defaults to the public network at wss://api.nvdc.ai
69
+ nvdc app
70
+
71
+ # (running your own hub? point the client at it)
72
+ nvdc coordinator --port 8000
73
+ nvdc app --coordinator ws://<coordinator-host>:8000
74
+ ```
75
+
76
+ Then in the browser UI: see your hardware, pick a model (it must load **hot into
77
+ memory** first), and click **Go Live**. The green light turns on only when a
78
+ model is hot *and* the node is live.
79
+
80
+ ### Try it without a GPU / without downloading weights
81
+ ```bash
82
+ nvdc coordinator --port 8000 &
83
+ nvdc app --mock --coordinator ws://127.0.0.1:8000
84
+ ```
85
+ Mock mode simulates model loading and uses an echo backend, so you can exercise
86
+ the entire flow (load → hot → go live → green light → routed inference).
87
+
88
+ ### Use it from any OpenAI client
89
+ ```python
90
+ from openai import OpenAI
91
+ client = OpenAI(base_url="https://api.nvdc.ai/v1", api_key="x")
92
+ client.chat.completions.create(model="llama3.1:8b",
93
+ messages=[{"role": "user", "content": "hello"}])
94
+ ```
95
+
96
+ ## CLI
97
+
98
+ | Command | What it does |
99
+ |---|---|
100
+ | `nvdc app` | Launch the visual node client (web UI) |
101
+ | `nvdc serve` | Headless node: bring this GPU onto the network |
102
+ | `nvdc coordinator` | Run the public hub + OpenAI-compatible API |
103
+ | `nvdc status` | Print local GPU + attestation status as JSON |
104
+
105
+ ## Models
106
+
107
+ The catalog is pinned to the **Ollama** library (reliable, known sizes; Ollama
108
+ also handles CUDA / Apple Metal / CPU offload). Each node reports its memory
109
+ budget and the UI marks every model **Fits / Tight / Won't fit** against it:
110
+
111
+ - unified-memory systems (DGX Spark / GB10, Apple Silicon) → budget = system RAM
112
+ - dedicated-VRAM GPUs → budget = VRAM
113
+
114
+ Popular tags included: `gpt-oss:20b`, `gpt-oss:120b`, `llama3.1:8b/70b`,
115
+ `qwen2.5:7b/32b`, `deepseek-r1`, `mistral`, `gemma2`, `phi4`.
116
+
117
+ ## Attestation (verifiable work)
118
+
119
+ Attestation is a first-class, pluggable component (`nvdc/attestation.py`):
120
+
121
+ - On a **Confidential-Computing-capable** GPU (H100/H200, B100/B200, GB200,
122
+ RTX PRO 6000 Blackwell) with CC enabled, it performs a real NVIDIA **nvTrust**
123
+ local GPU attestation and reports the verdict + claims.
124
+ - On hardware without CC (e.g. **GB10 / DGX Spark**, consumer GPUs), it reports
125
+ `supported: false` with a clear reason — it never fabricates a "verified"
126
+ result.
127
+
128
+ A coordinator can enforce policy with `--require-attested` to only route work to
129
+ nodes whose attestation verifies.
130
+
131
+ > Note: the DGX Spark / GB10 cannot produce hardware attestation (NVIDIA disabled
132
+ > CC on this SKU). It serves inference fine; it just joins as an unattested node.
133
+
134
+ ## Layout
135
+
136
+ ```
137
+ src/nvdc/
138
+ cli.py # nvdc app | serve | coordinator | status
139
+ app.py # local web server for the visual client
140
+ web/index.html # the visual client UI
141
+ runtime.py # node state machine: load → hot → live
142
+ hardware.py # accelerator + memory-budget detection (CUDA/MPS/CPU)
143
+ catalog.py # curated Ollama model catalog + fit logic
144
+ attestation.py # pluggable nvTrust attestation hook
145
+ agent.py # node agent: outbound tunnel + request handling
146
+ coordinator.py # hub: node registry + OpenAI-compatible API
147
+ inference.py # Ollama + echo backends
148
+ protocol.py # tiny JSON wire protocol
149
+ ```
@@ -0,0 +1,34 @@
1
+ [project]
2
+ name = "nvdc"
3
+ version = "0.1.0"
4
+ description = "Bring your GPU onto the network: one command turns a GPU into a verifiable, OpenAI-compatible inference node."
5
+ readme = "README.md"
6
+ requires-python = ">=3.9"
7
+ license = { text = "Apache-2.0" }
8
+ authors = [{ name = "NVDC" }]
9
+ dependencies = [
10
+ "fastapi>=0.110",
11
+ "uvicorn[standard]>=0.27",
12
+ "websockets>=12.0",
13
+ "httpx>=0.27",
14
+ "cryptography>=42.0",
15
+ "nvidia-ml-py>=12.535.77",
16
+ "redis>=5.0",
17
+ "stripe>=9.0",
18
+ ]
19
+
20
+ [project.optional-dependencies]
21
+ attestation = ["nv-attestation-sdk>=2.7.0", "nv-local-gpu-verifier>=2.7.0"]
22
+
23
+ [project.scripts]
24
+ nvdc = "nvdc.cli:main"
25
+
26
+ [build-system]
27
+ requires = ["setuptools>=68"]
28
+ build-backend = "setuptools.build_meta"
29
+
30
+ [tool.setuptools.packages.find]
31
+ where = ["src"]
32
+
33
+ [tool.setuptools.package-data]
34
+ nvdc = ["web/*.html"]
nvdc-0.1.0/setup.cfg ADDED
@@ -0,0 +1,4 @@
1
+ [egg_info]
2
+ tag_build =
3
+ tag_date = 0
4
+
@@ -0,0 +1,3 @@
1
+ """nvdc: bring your GPU onto the network as a verifiable, OpenAI-compatible inference node."""
2
+
3
+ __version__ = "0.1.0"
@@ -0,0 +1,329 @@
1
+ """Node agent: the thing `nvdc serve` runs.
2
+
3
+ Opens ONE outbound WebSocket to the coordinator (so the node never needs an
4
+ inbound public port and its IP stays private), registers its GPU + attestation
5
+ profile, then services inference requests over that tunnel.
6
+ """
7
+
8
+ from __future__ import annotations
9
+
10
+ import asyncio
11
+ import hashlib
12
+ import json as _json
13
+ import logging
14
+ from typing import Any, Dict, Optional
15
+
16
+ import websockets
17
+
18
+ from . import __version__, keys, protocol
19
+ from .attestation import attest
20
+ from .gpu import detect_gpu, detect_gpus, detect_interconnect
21
+ from .hardware import detect_hardware, machine_id as hw_machine_id
22
+ from .inference import Backend, make_backend
23
+ from .keys import Identity
24
+
25
+ log = logging.getLogger("nvdc.agent")
26
+
27
+
28
+ def _extract_content(sse_line: str) -> str:
29
+ """Pull the delta content out of one OpenAI SSE 'data: {...}' line."""
30
+ line = sse_line.strip()
31
+ if not line.startswith("data:"):
32
+ return ""
33
+ try:
34
+ obj = _json.loads(line[len("data:"):].strip())
35
+ return obj["choices"][0]["delta"].get("content") or ""
36
+ except Exception:
37
+ return ""
38
+
39
+
40
+ class NodeAgent:
41
+ def __init__(
42
+ self,
43
+ coordinator_url: str,
44
+ name: str,
45
+ backend: Backend,
46
+ model: str,
47
+ token: str = "",
48
+ require_attestation: bool = False,
49
+ status_cb=None,
50
+ drain_timeout: float = 120.0,
51
+ price_per_mtok: float = 0.0,
52
+ account_id: str = "",
53
+ identity: Optional[Identity] = None,
54
+ owner_account: str = "",
55
+ machine_id: str = "",
56
+ cluster: str = "",
57
+ ):
58
+ # A node commits to exactly ONE hot-loaded model at a time — the
59
+ # "mining algorithm" it has chosen. It advertises and serves only this
60
+ # model; requests for anything else are rejected at the node boundary.
61
+ self.coordinator_url = coordinator_url
62
+ self.name = name
63
+ self.backend = backend
64
+ self.model = model
65
+ self.price_per_mtok = price_per_mtok
66
+ self.identity = identity or Identity()
67
+ self.account_id = account_id or self.identity.account_id
68
+ # Earnings credit the owner account; a single machine owns itself.
69
+ self.owner_account = owner_account or self.account_id
70
+ self.machine_id = machine_id or hw_machine_id()
71
+ self.cluster = cluster
72
+ self.token = token
73
+ self.require_attestation = require_attestation
74
+ self.status_cb = status_cb
75
+ self.drain_timeout = drain_timeout
76
+ self._ws = None
77
+ self._send_lock = asyncio.Lock()
78
+ # graceful drain bookkeeping
79
+ self._stopped = False
80
+ self._draining = False
81
+ self._inflight_ids = set() # request ids currently being served
82
+ self._inflight_zero = asyncio.Event()
83
+ self._inflight_zero.set() # starts idle
84
+
85
+ def _emit(self, status: str, **info):
86
+ if self.status_cb:
87
+ try:
88
+ self.status_cb(status, info)
89
+ except Exception:
90
+ log.debug("status_cb error", exc_info=True)
91
+
92
+ async def run_forever(self):
93
+ backoff = 1
94
+ while not self._stopped:
95
+ try:
96
+ await self._connect_and_serve()
97
+ backoff = 1
98
+ except (OSError, websockets.WebSocketException) as e:
99
+ if self._stopped:
100
+ break
101
+ log.warning("connection lost (%s); reconnecting in %ss", e, backoff)
102
+ self._emit("connecting", detail=str(e))
103
+ await asyncio.sleep(backoff)
104
+ backoff = min(backoff * 2, 30)
105
+
106
+ async def drain(self):
107
+ """Gracefully leave: tell the coordinator to stop routing new work,
108
+ let in-flight requests finish, then disconnect. In-flight responses are
109
+ never interrupted, so the node's delivery/completion score is preserved.
110
+ """
111
+ if self._draining:
112
+ return
113
+ self._draining = True
114
+ log.info("draining: %d request(s) in flight", len(self._inflight_ids))
115
+ self._emit("draining", inflight=len(self._inflight_ids))
116
+ try:
117
+ await self._send(protocol.MSG_DRAIN) # coordinator stops routing now
118
+ except Exception:
119
+ pass
120
+ try:
121
+ # block until all in-flight complete, but not forever
122
+ await asyncio.wait_for(self._inflight_zero.wait(), timeout=self.drain_timeout)
123
+ log.info("drain complete; disconnecting")
124
+ except asyncio.TimeoutError:
125
+ stuck = list(self._inflight_ids)
126
+ log.warning(
127
+ "drain timeout after %ss; force-failing %d stuck request(s): %s",
128
+ self.drain_timeout, len(stuck), stuck,
129
+ )
130
+ # Fail only the stuck requests as node_failed; completed ones are
131
+ # already done and unaffected.
132
+ for rid in stuck:
133
+ try:
134
+ await self._send(
135
+ protocol.MSG_ERROR, id=rid,
136
+ error=f"node_failed: drain timeout after {self.drain_timeout}s",
137
+ )
138
+ except Exception:
139
+ pass
140
+ self._stopped = True
141
+ if self._ws is not None:
142
+ try:
143
+ await self._ws.close()
144
+ except Exception:
145
+ pass
146
+
147
+ async def _connect_and_serve(self):
148
+ headers = {"Authorization": f"Bearer {self.token}"} if self.token else {}
149
+ log.info("connecting to coordinator %s", self.coordinator_url)
150
+ async with websockets.connect(
151
+ self.coordinator_url,
152
+ additional_headers=headers,
153
+ max_size=32 * 1024 * 1024,
154
+ ping_interval=20,
155
+ ) as ws:
156
+ self._ws = ws
157
+ await self._register()
158
+ try:
159
+ async for raw in ws:
160
+ msg = protocol.decode(raw)
161
+ await self._dispatch(msg)
162
+ finally:
163
+ self._ws = None
164
+ self._emit("offline")
165
+
166
+ async def _register(self):
167
+ gpus = detect_gpus()
168
+ gpu = gpus[0] if gpus else detect_gpu()
169
+ interconnect = detect_interconnect() if len(gpus) > 1 else ""
170
+ hw = detect_hardware()
171
+ att = attest(require=self.require_attestation)
172
+ if self.require_attestation and not att.verified:
173
+ raise RuntimeError(
174
+ f"attestation required but not verified: {att.reason or att.mode}"
175
+ )
176
+ profile = protocol.NodeProfile(
177
+ name=self.name,
178
+ models=[self.model],
179
+ gpu=gpu,
180
+ attestation=att,
181
+ gpus=gpus,
182
+ gpu_count=len(gpus),
183
+ interconnect=interconnect,
184
+ ram_mb=hw.ram_mb,
185
+ memory_budget_mb=hw.memory_budget_mb,
186
+ accelerator=hw.accelerator.type,
187
+ price_per_mtok=self.price_per_mtok,
188
+ account_id=self.account_id,
189
+ owner_account=self.owner_account,
190
+ machine_id=self.machine_id,
191
+ cluster=self.cluster,
192
+ agent_version=__version__,
193
+ )
194
+ await self._send(protocol.MSG_REGISTER, profile=protocol.node_profile_to_dict(profile))
195
+ log.info(
196
+ "registered '%s' | gpu=%s | serving=%s | attestation=%s(verified=%s)",
197
+ self.name, gpu.name, self.model, att.mode, att.verified,
198
+ )
199
+
200
+ async def _dispatch(self, msg: Dict[str, Any]):
201
+ t = msg.get("t")
202
+ if t == protocol.MSG_INFER:
203
+ asyncio.create_task(self._handle_infer(msg))
204
+ elif t == protocol.MSG_PING:
205
+ await self._send(protocol.MSG_PONG)
206
+ elif t == protocol.MSG_REGISTERED:
207
+ log.info("coordinator assigned node_id=%s", msg.get("node_id"))
208
+ self._emit("live", node_id=msg.get("node_id"))
209
+ else:
210
+ log.debug("ignoring message type %s", t)
211
+
212
+ async def _handle_infer(self, msg: Dict[str, Any]):
213
+ req_id = msg.get("id")
214
+ body = dict(msg.get("body", {}))
215
+ requested = body.get("model", "")
216
+
217
+ # Once draining, refuse new work so it can be routed elsewhere. This
218
+ # closes the race between the operator leaving and the coordinator
219
+ # marking us un-routable; in-flight requests (already past this point)
220
+ # are unaffected and run to completion.
221
+ if self._draining:
222
+ await self._send(
223
+ protocol.MSG_ERROR, id=req_id,
224
+ error="node is draining; request not accepted",
225
+ )
226
+ return
227
+
228
+ # Enforce the single committed model at the node boundary. A node only
229
+ # serves the model it has hot-loaded; anything else is refused so it
230
+ # can never be coerced into running a cold/different model.
231
+ if requested and requested != self.model:
232
+ await self._send(
233
+ protocol.MSG_ERROR, id=req_id,
234
+ error=f"this node only serves '{self.model}', not '{requested}'",
235
+ )
236
+ return
237
+ body["model"] = self.model # pin, in case the request omitted it
238
+ prompt_commit = hashlib.sha256(
239
+ _json.dumps(body.get("messages", []), sort_keys=True).encode()).hexdigest()
240
+
241
+ self._inflight_ids.add(req_id)
242
+ self._inflight_zero.clear()
243
+ stream = bool(body.get("stream", False))
244
+ try:
245
+ if stream:
246
+ acc, tokens = [], 0
247
+ async for line in self.backend.chat_stream(body):
248
+ await self._send(protocol.MSG_CHUNK, id=req_id, data=line)
249
+ if '"content"' in line:
250
+ tokens += 1
251
+ c = _extract_content(line)
252
+ if c:
253
+ acc.append(c)
254
+ response_commit = hashlib.sha256("".join(acc).encode()).hexdigest()
255
+ sig = self._sign_work(req_id, prompt_commit, tokens, response_commit, "complete")
256
+ await self._send(protocol.MSG_END, id=req_id, tokens=tokens,
257
+ response_commit=response_commit, sig=sig)
258
+ else:
259
+ result = await self.backend.chat_once(body)
260
+ content = ""
261
+ try:
262
+ content = result["choices"][0]["message"].get("content") or ""
263
+ except Exception:
264
+ pass
265
+ tokens = (result.get("usage", {}) or {}).get("completion_tokens", 0)
266
+ response_commit = hashlib.sha256(content.encode()).hexdigest()
267
+ sig = self._sign_work(req_id, prompt_commit, tokens, response_commit, "complete")
268
+ await self._send(protocol.MSG_RESULT, id=req_id, body=result,
269
+ tokens=tokens, response_commit=response_commit, sig=sig)
270
+ except Exception as e:
271
+ log.exception("inference failed for %s", req_id)
272
+ await self._send(protocol.MSG_ERROR, id=req_id, error=str(e))
273
+ finally:
274
+ self._inflight_ids.discard(req_id)
275
+ if not self._inflight_ids:
276
+ self._inflight_zero.set()
277
+
278
+ def _sign_work(self, req_id, prompt_commit, tokens, response_commit, delivery) -> str:
279
+ payload = keys.work_payload(req_id, self.model, prompt_commit, tokens,
280
+ response_commit, delivery)
281
+ return self.identity.sign(payload)
282
+
283
+ async def _send(self, msg_type: str, **fields: Any):
284
+ if self._ws is None:
285
+ return
286
+ frame = protocol.encode(msg_type, **fields)
287
+ async with self._send_lock:
288
+ await self._ws.send(frame)
289
+
290
+
291
+ async def serve(
292
+ coordinator_url: str,
293
+ name: str,
294
+ backend_kind: str,
295
+ model: str,
296
+ ollama_url: str,
297
+ token: str = "",
298
+ require_attestation: bool = False,
299
+ warm: bool = True,
300
+ drain_timeout: float = 120.0,
301
+ owner_account: str = "",
302
+ cluster: str = "",
303
+ ):
304
+ if not model:
305
+ raise ValueError("a single --model must be specified; a node serves exactly one model")
306
+ backend = make_backend(backend_kind, model=model, ollama_url=ollama_url)
307
+
308
+ # Hot-load the committed model before advertising it to the network, so the
309
+ # node is never live with a cold model (low TTFT guarantee).
310
+ if warm:
311
+ log.info("hot-loading '%s' into memory ...", model)
312
+ try:
313
+ await backend.warm(model)
314
+ log.info("'%s' is hot", model)
315
+ except Exception as e:
316
+ log.warning("warm-up failed for '%s' (%s); serving anyway", model, e)
317
+
318
+ agent = NodeAgent(
319
+ coordinator_url=coordinator_url,
320
+ name=name,
321
+ backend=backend,
322
+ model=model,
323
+ token=token,
324
+ require_attestation=require_attestation,
325
+ drain_timeout=drain_timeout,
326
+ owner_account=owner_account,
327
+ cluster=cluster,
328
+ )
329
+ await agent.run_forever()