g2n-enterprise 1.3.4__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (45) hide show
  1. g2n_enterprise-1.3.4/PKG-INFO +400 -0
  2. g2n_enterprise-1.3.4/README.md +382 -0
  3. g2n_enterprise-1.3.4/g2n_enterprise/__init__.py +120 -0
  4. g2n_enterprise-1.3.4/g2n_enterprise/__main__.py +5 -0
  5. g2n_enterprise-1.3.4/g2n_enterprise/accel/__init__.py +23 -0
  6. g2n_enterprise-1.3.4/g2n_enterprise/accel/registry.py +171 -0
  7. g2n_enterprise-1.3.4/g2n_enterprise/api.py +258 -0
  8. g2n_enterprise-1.3.4/g2n_enterprise/cache/__init__.py +3 -0
  9. g2n_enterprise-1.3.4/g2n_enterprise/cache/persistent.py +124 -0
  10. g2n_enterprise-1.3.4/g2n_enterprise/cli.py +239 -0
  11. g2n_enterprise-1.3.4/g2n_enterprise/licensing/__init__.py +246 -0
  12. g2n_enterprise-1.3.4/g2n_enterprise/licensing/_pubkey.py +6 -0
  13. g2n_enterprise-1.3.4/g2n_enterprise/licensing/features.py +81 -0
  14. g2n_enterprise-1.3.4/g2n_enterprise/licensing/keys.py +205 -0
  15. g2n_enterprise-1.3.4/g2n_enterprise/model_zoo.py +252 -0
  16. g2n_enterprise-1.3.4/g2n_enterprise/planner_pro.py +70 -0
  17. g2n_enterprise-1.3.4/g2n_enterprise/serve/__init__.py +60 -0
  18. g2n_enterprise-1.3.4/g2n_enterprise/serve/_demo.py +33 -0
  19. g2n_enterprise-1.3.4/g2n_enterprise/serve/bench.py +158 -0
  20. g2n_enterprise-1.3.4/g2n_enterprise/serve/client.py +98 -0
  21. g2n_enterprise-1.3.4/g2n_enterprise/serve/guard.py +110 -0
  22. g2n_enterprise-1.3.4/g2n_enterprise/serve/optimize.py +489 -0
  23. g2n_enterprise-1.3.4/g2n_enterprise/serve/reference.py +80 -0
  24. g2n_enterprise-1.3.4/g2n_enterprise/serve/registry.py +212 -0
  25. g2n_enterprise-1.3.4/g2n_enterprise/serve/runtime.py +427 -0
  26. g2n_enterprise-1.3.4/g2n_enterprise/serve/server.py +425 -0
  27. g2n_enterprise-1.3.4/g2n_enterprise/updates.py +106 -0
  28. g2n_enterprise-1.3.4/g2n_enterprise.egg-info/PKG-INFO +400 -0
  29. g2n_enterprise-1.3.4/g2n_enterprise.egg-info/SOURCES.txt +43 -0
  30. g2n_enterprise-1.3.4/g2n_enterprise.egg-info/dependency_links.txt +1 -0
  31. g2n_enterprise-1.3.4/g2n_enterprise.egg-info/entry_points.txt +2 -0
  32. g2n_enterprise-1.3.4/g2n_enterprise.egg-info/requires.txt +9 -0
  33. g2n_enterprise-1.3.4/g2n_enterprise.egg-info/top_level.txt +1 -0
  34. g2n_enterprise-1.3.4/pyproject.toml +43 -0
  35. g2n_enterprise-1.3.4/setup.cfg +4 -0
  36. g2n_enterprise-1.3.4/tests/test_fusion_guard.py +73 -0
  37. g2n_enterprise-1.3.4/tests/test_licensing.py +118 -0
  38. g2n_enterprise-1.3.4/tests/test_lookup_and_version.py +98 -0
  39. g2n_enterprise-1.3.4/tests/test_paddle.py +185 -0
  40. g2n_enterprise-1.3.4/tests/test_persistent_cache.py +75 -0
  41. g2n_enterprise-1.3.4/tests/test_rate_limit.py +33 -0
  42. g2n_enterprise-1.3.4/tests/test_registry.py +53 -0
  43. g2n_enterprise-1.3.4/tests/test_serving.py +539 -0
  44. g2n_enterprise-1.3.4/tests/test_serving_torch.py +102 -0
  45. g2n_enterprise-1.3.4/tests/test_updates_and_zoo.py +56 -0
@@ -0,0 +1,400 @@
1
+ Metadata-Version: 2.4
2
+ Name: g2n-enterprise
3
+ Version: 1.3.4
4
+ Summary: A platform to optimize AND run PyTorch models: license-gated compiler (enhanced planner, persistent cache, multi-accelerator routing) plus a model registry and inference server, on top of open-core g2n.
5
+ Author: g2n
6
+ License: Proprietary
7
+ Project-URL: Homepage, https://g2n.example.com
8
+ Keywords: pytorch,triton,compiler,gpu,npu,inference,serving,model-server,license
9
+ Requires-Python: >=3.10
10
+ Description-Content-Type: text/markdown
11
+ Requires-Dist: cryptography>=41.0
12
+ Provides-Extra: runtime
13
+ Requires-Dist: torch>=2.11.0; extra == "runtime"
14
+ Requires-Dist: g2n>=0.4; extra == "runtime"
15
+ Requires-Dist: triton>=3.6; extra == "runtime"
16
+ Provides-Extra: dev
17
+ Requires-Dist: pytest>=7; extra == "dev"
18
+
19
+ # g2n-Enterprise
20
+
21
+ A platform to **optimize and run** PyTorch models. It has two halves that share
22
+ one code path and one license:
23
+
24
+ * **Optimize** — `g2n.compile(model)` (or `torch.compile(model, backend="g2n")`):
25
+ hybrid fusion, a custom Triton LayerNorm(+GELU) kernel, an in-place-aware
26
+ buffer planner, a persistent cross-run compile cache, and multi-accelerator
27
+ routing.
28
+ * **Run** — a built-in **model registry** + **inference server**
29
+ (`g2n.serve()`): register a model once, and the node loads it, optimizes it on
30
+ the way in, and serves predictions over HTTP, with optional dynamic batching.
31
+
32
+ Around that sits the full machinery to sell it as a service: a signed **license
33
+ system**, a zero-dependency **license server**, a license-management
34
+ **dashboard**, and an ancient-Greek-styled **WordPress** front end (theme +
35
+ plugin) that talks to the license API.
36
+
37
+ > **Deployment model.** The customer installs g2n inside *their* environment.
38
+ > Everything (compile + serve) executes there; the license server only mints and
39
+ > validates entitlements.
40
+
41
+ ```python
42
+ import g2n_enterprise as g2n
43
+ g2n.activate("G2N-8H4K-L92X-QF7M") # online once, then cached + offline
44
+
45
+ # OPTIMIZE
46
+ model = g2n.compile(model) # or torch.compile(model, backend="g2n")
47
+
48
+ # RUN
49
+ g2n.register_model("resnet", "torchscript:/models/resnet50.pt",
50
+ max_batch=16, max_latency_ms=8)
51
+ g2n.serve(port=8900) # POST /v1/models/resnet/predict
52
+ ```
53
+
54
+ ---
55
+
56
+ ## What's sold (tiers)
57
+
58
+ | Capability | Community | Pro | Enterprise |
59
+ |---|:--:|:--:|:--:|
60
+ | Hybrid fusion + JIT pointwise codegen | ✓ | ✓ | ✓ |
61
+ | Enhanced buffer planner (in-place aware) | | ✓ | ✓ |
62
+ | Persistent cross-run compile cache | | ✓ | ✓ |
63
+ | **Model registry + inference server (run models)** | | ✓ | ✓ |
64
+ | **Dynamic request batching (autotuned)** | | | ✓ |
65
+ | Multi-accelerator auto-routing (GPU / NPU / CPU) | | | ✓ |
66
+ | Validated model-zoo configs + priority support | | | ✓ |
67
+
68
+ Code never changes between tiers — gated features light up when the license
69
+ grants them and silently fall back to the open-core path otherwise.
70
+
71
+ ---
72
+
73
+ ## How the three pieces connect
74
+
75
+ ```
76
+ ┌─────────────────────────┐ ┌──────────────────────────────┐
77
+ │ WordPress storefront │ │ License server │
78
+ │ (frontend repo) │ │ (backend/license_server) │
79
+ │ • [g2n_pricing] ────────┼──GET /v1/catalog──▶ tiers + price ────┤ one source
80
+ │ • [g2n_buy] ────────────┼──POST /v1/checkout─▶ Paddle ──webhook─▶│ of truth for
81
+ │ • [g2n_dashboard] ──────┼──POST /v1/portal ──▶ Paddle portal │ entitlements
82
+ │ • [g2n_status] ─────────┼──GET /v1/health,/version │
83
+ │ • [g2n_node_status] ──┐ │ └──────────┬───────────────────┘
84
+ └────────────────────────┼─┘ │ mints signed token
85
+ │ ▼
86
+ │ ┌──────────────────────────────┐
87
+ browser reads the user's│ │ pip install g2n │
88
+ OWN node directly ──────┘ │ + g2n-enterprise (backend) │
89
+ ┌────────│ g2n.activate(KEY) ─verifies──┘ offline,
90
+ │ │ g2n.compile(model) OPTIMIZE Ed25519
91
+ ▼ │ g2n.serve() RUN
92
+ ┌─────────────────────────┐ └──────────────┬───────────────┘
93
+ │ customer inference node │◀── activate/validate │
94
+ │ /v1/healthz /readyz │── runs in the customer's environment
95
+ │ /v1/models/<id>/predict │
96
+ └─────────────────────────┘
97
+ ```
98
+
99
+ * **WordPress ↔ license server** — the plugin proxies the server's `/v1/catalog`
100
+ (price/blurb/seats), `/v1/checkout`, `/v1/portal`, `/v1/health` and `/v1/version`
101
+ server-side; the storefront price always reflects the server (single source of
102
+ truth). The admin token never reaches the browser.
103
+ * **license server ↔ client** — `g2n.activate(KEY)` exchanges a short key for an
104
+ Ed25519-signed token (once, online), then verifies it **offline**. The same
105
+ feature flags (`g2n_enterprise/licensing/features.py`) gate the compiler and
106
+ the serving platform, and feed the WordPress pricing — one definition,
107
+ everywhere.
108
+ * **WordPress ↔ customer node** — `[g2n_node_status]` reads the customer's *own*
109
+ inference node directly from the browser (the node's CORS allows it); the
110
+ vendor never touches customer inference.
111
+
112
+ ---
113
+
114
+ ## Monorepo layout
115
+
116
+ ```
117
+ g2n-enterprise/
118
+ g2n_enterprise/ # the closed-source client package (pip-installable)
119
+ licensing/ # Ed25519 keys, signed tokens, activate(), gating
120
+ accel/registry.py # AcceleratorBackend ABC + auto_select (CUDA/CPU/NPU)
121
+ cache/persistent.py # cross-run Triton + artifact cache (Windows warmup fix)
122
+ planner_pro.py # in-place-aware planner (gated)
123
+ model_zoo.py # validated compile configs + parity harness (gated)
124
+ serve/ # -- the "run models" half --
125
+ registry.py # ModelRegistry: register/list/version models (JSON)
126
+ runtime.py # ModelRuntime + DynamicBatcher + latency stats
127
+ server.py # stdlib inference HTTP node (/v1/models/.../predict)
128
+ _demo.py # torch-free demo models (python: sources)
129
+ api.py # compile() + serve()/register_model()/load_model()
130
+ cli.py # doctor/activate/status + models/register/predict/serve
131
+ license_server/ # ZERO-dependency server (stdlib http.server + sqlite3)
132
+ app.py # /v1/activate /v1/validate /v1/catalog + Paddle + admin
133
+ mint.py # CLI: keygen / issue / list / trial
134
+ paddle_gateway.py # Paddle Billing (checkout, webhook verify, portal)
135
+ dashboard/index.html # license-management dashboard
136
+ packaging/ # builds the open-core `g2n` wheel (Apache-2.0)
137
+ examples/ # quickstart, torch.compile backend, serve_quickstart
138
+ tests/ # licensing / registry / cache / serving (no GPU needed)
139
+ ```
140
+
141
+ ---
142
+
143
+ ## Optimize: `g2n.compile`
144
+
145
+ `g2n.compile(model)` routes to the best available accelerator and runs the
146
+ open-core g2n pipeline (custom-kernel FX pass + Inductor) under license-gated
147
+ config. As a `torch.compile` backend:
148
+
149
+ ```python
150
+ import g2n_enterprise # registers backend="g2n" on import
151
+ compiled = torch.compile(model, backend="g2n")
152
+ ```
153
+
154
+ Pro lights up memory-fusion + the persistent cache; Enterprise adds
155
+ max-autotune and multi-accelerator routing. Without the entitlement (or without
156
+ torch/Triton/CUDA) every path degrades to stock — never worse than eager.
157
+
158
+ ## Run: `g2n.serve` (Pro+)
159
+
160
+ The serving half turns a node into a model server.
161
+
162
+ ```python
163
+ import g2n_enterprise as g2n
164
+
165
+ # register once (persisted under ~/.g2n/models)
166
+ g2n.register_model("bert", "torchscript:/models/bert.pt",
167
+ max_batch=32, max_latency_ms=10)
168
+
169
+ # bring one model up locally
170
+ rt = g2n.load_model("bert")
171
+ rt.predict(batch) # g2n-optimized on load
172
+
173
+ # or serve every registered model over HTTP
174
+ g2n.serve(port=8900, token="node-admin-token")
175
+ ```
176
+
177
+ Source URIs: `torchscript:/path.pt`, `state_dict:/w.pt@pkg.mod:build_fn`,
178
+ `callable:pkg.mod:factory`, and `python:pkg.mod:fn` (torch-free — used by the
179
+ demo models so the node runs anywhere).
180
+
181
+ ### Faster, lower-VRAM inference
182
+
183
+ Each served model gets real inference optimizations (engage on CUDA, no-op on
184
+ CPU): `inference_mode` always on (free latency + memory); `precision="auto"`
185
+ (fp16/bf16 autocast — halves activation VRAM, tensor cores) or `"int8"` (dynamic
186
+ quantization — halves weight memory again, CPU path, opt-in); `cuda_graph=True`
187
+ (capture + replay the forward, which removes the kernel-launch overhead that
188
+ makes "compiled tie eager" on small GPUs); `channels_last` for conv nets; a
189
+ **ResidencyManager** that keeps K models hot on the GPU and pages the rest from
190
+ CPU (`G2N_SERVE_RESIDENT_MODELS=2`); and **admission control**
191
+ (`G2N_SERVE_MAX_CONCURRENCY`, `G2N_SERVE_VRAM_FLOOR_MB`) so a saturated 6 GB node
192
+ returns 503 + `Retry-After` instead of OOM-crashing.
193
+
194
+ ```python
195
+ g2n.register_model("resnet", "torchscript:/m/resnet50.pt",
196
+ precision="auto", cuda_graph=True, channels_last=True, max_batch=16)
197
+ res = g2n.benchmark("resnet", sample, rounds=200) # eager vs optimized, measured on YOUR box
198
+ ```
199
+
200
+ `g2n.benchmark` / `g2n-enterprise bench` report median latency (p50/p95/p99),
201
+ throughput and **peak VRAM** for eager vs optimized — measured on your hardware,
202
+ never hardcoded. See [`docs/SERVING.md`](docs/SERVING.md) §4b.
203
+
204
+ Inference HTTP contract (stdlib server):
205
+
206
+ | Method | Path | Auth | Purpose |
207
+ |---|---|---|---|
208
+ | GET | `/v1/healthz` | — | liveness + uptime + ready-model count |
209
+ | GET | `/v1/models` | — | list models + per-model latency stats |
210
+ | GET | `/v1/models/<id>` | — | one model's info |
211
+ | GET | `/v1/metrics` | — | aggregate counters (JSON) |
212
+ | POST | `/v1/models/<id>/predict` | optional | `{inputs}` -> `{outputs, latency_ms}` |
213
+ | POST | `/v1/models` | node token | register + load a model entry |
214
+
215
+ CLI mirror: `g2n-enterprise register NAME SOURCE`, `... models`, `... predict
216
+ NAME JSON`, `... serve --port 8900`.
217
+
218
+ **Dynamic batching (Enterprise).** When the license grants `auto_batch`, the
219
+ runtime coalesces concurrent requests into one batched call within
220
+ `max_latency_ms`, preserving per-caller order and length. Below Enterprise,
221
+ models serve one item at a time (still fully functional).
222
+
223
+ ---
224
+
225
+ ## Deploy the license server (runs immediately)
226
+
227
+ Pure Python stdlib (only `cryptography` for signing). No web framework, no DB
228
+ server.
229
+
230
+ ```bash
231
+ cd license_server
232
+ pip install cryptography
233
+ cp .env.example .env # set a strong G2N_ADMIN_TOKEN
234
+ python3 mint.py keygen # generate YOUR signing key (rotate the demo one!)
235
+ # -> paste the printed public key into g2n_enterprise/licensing/_pubkey.py
236
+ ./run.sh # serves http://0.0.0.0:8800 (+ dashboard at /)
237
+ ```
238
+
239
+ Mint and inspect licenses:
240
+
241
+ ```bash
242
+ python3 mint.py issue --tier enterprise --seats 25 --days 365 --email acme@co.com
243
+ python3 mint.py list
244
+ ```
245
+
246
+ ### License-server API
247
+
248
+ | Method | Path | Auth | Purpose |
249
+ |---|---|---|---|
250
+ | POST | `/v1/activate` | — | `{key, machine_id}` -> signed token |
251
+ | POST | `/v1/trial` | — | `{machine_id, email?}` -> 14-day hardware-bound trial |
252
+ | POST | `/v1/validate` | — | `{token}` -> server-side verify |
253
+ | POST | `/v1/checkout` | — | `{tier, billing, email}` -> **Paddle** checkout URL |
254
+ | POST | `/v1/paddle/webhook` | Paddle sig | mint/cancel on subscription + transaction events |
255
+ | POST | `/v1/portal` | — | `{key}` -> Paddle Customer Portal URL (self-service) |
256
+ | POST | `/v1/license/lookup` | — | `{key}` -> single masked license row (key is the credential) |
257
+ | GET | `/v1/catalog` | — | tiers + pricing (used by WordPress) |
258
+ | GET | `/v1/health` | — | uptime + status (status widget) |
259
+ | GET | `/v1/version` | — | latest client version + `info_url` (auto-update channel) |
260
+ | GET | `/v1/admin/licenses` | admin | list |
261
+ | POST | `/v1/admin/licenses` | admin | mint a key |
262
+ | POST | `/v1/admin/licenses/<KEY>/revoke` | admin | revoke |
263
+ | GET | `/v1/admin/outbox` | admin | recent license-delivery emails |
264
+ | GET | `/v1/admin/subscriptions` | admin | active subscriptions |
265
+
266
+ Admin auth: `X-Admin-Token: <token>` **or** `Authorization: Bearer <token>`
267
+ (compared in constant time).
268
+
269
+ ---
270
+
271
+ ## License system: how the security actually works
272
+
273
+ * **Asymmetric (Ed25519).** The server holds the *private* signing key; the
274
+ client ships only the *public* key (`_pubkey.py`). Clients verify, never forge.
275
+ * **Short keys, signed tokens.** A customer buys `G2N-XXXX-XXXX-XXXX`.
276
+ `activate()` exchanges it (once, online) for a signed token encoding
277
+ tier/features/expiry/seat-binding, cached under `~/.g2n/` and verified
278
+ **offline** with a 14-day grace window.
279
+ * **Seat binding + protected trials.** Activation registers a hashed machine id;
280
+ the server enforces the seat cap and allows one hardware-bound trial per
281
+ machine.
282
+
283
+ **Honest limitation (important):** any license check that runs inside code the
284
+ customer controls is *soft* protection — a determined customer can patch the
285
+ client. This is the correct cryptographic backbone plus a professional deterrent
286
+ and contractual line, **not** unbreakable DRM. Don't price or promise as if it
287
+ were. Concretely: `G2N_MACHINE_ID` can be overridden via env, so a determined
288
+ user can present as a "new machine" to dodge seat binding and the one-trial-per-
289
+ machine rule — treat seat/trial enforcement as a deterrent, not a hard wall.
290
+
291
+ Built-in abuse controls: the public endpoints (`/v1/license/lookup`,
292
+ `/v1/checkout`, `/v1/trial`, `/v1/activate`) are per-IP rate-limited in-process
293
+ (tune with `G2N_RL_*`); for multi-worker deployments put your reverse proxy's
294
+ limiter in front too.
295
+
296
+ Operational security: put the server behind **HTTPS**, restrict the wide-open
297
+ CORS (`*`) in `app.py` to your origins, **change** `G2N_ADMIN_TOKEN`, and
298
+ **replace** the demo `secrets/signing_key.pem` (never commit/ship it). All four
299
+ are mandatory pre-launch steps in `PRODUCTION_CHECKLIST.md`.
300
+
301
+ ## Versioning
302
+
303
+ The three shipping artifacts version independently — they are separate products
304
+ with separate release cadences, so their numbers will not match:
305
+
306
+ | Artifact | Current | What it is |
307
+ |---|---|---|
308
+ | `g2n` (PyPI, open-core) | 0.5.x | the compiler wheel customers `pip install` |
309
+ | `g2n-enterprise` (client + server) | 1.3.x | the closed client + license server |
310
+ | WordPress plugin + theme | 1.4.x | the storefront |
311
+
312
+ The client and license server share a version (they're released together); the
313
+ server advertises the latest **client** version separately via `/v1/version`.
314
+
315
+ ---
316
+
317
+ ## Payments — the self-serve loop (Paddle Billing)
318
+
319
+ Paddle is the **Merchant of Record**: it hosts the payment page and issues
320
+ invoices, so there is no PCI scope on your server. The financial loop is closed:
321
+
322
+ ```
323
+ buyer -> [g2n_buy] (WordPress) -> POST /v1/checkout -> Paddle Checkout (hosted)
324
+ -> pays -> Paddle fires webhook -> POST /v1/paddle/webhook (signature-verified)
325
+ -> mint_license(tier) -> email key to buyer -> license active 24/7
326
+ cancel/expire -> subscription.* / transaction.* webhook -> license canceled/past_due
327
+ ```
328
+
329
+ * **Stdlib only.** Paddle's REST API is called with `urllib`; webhook signatures
330
+ are verified with `hmac` (HMAC-SHA256 over `"{ts}:{body}"`, constant-time
331
+ compare, replay-window check). No `paddle` SDK needed.
332
+ * **Idempotent.** Events are de-duplicated by id; a subscription never mints two
333
+ licenses.
334
+ * **Email delivery.** Keys are sent via SMTP if configured; otherwise they queue
335
+ in an `email_outbox` table so nothing is lost.
336
+ * **Self-service portal.** `[g2n_dashboard]` (key in) -> `/v1/portal` -> the
337
+ official Paddle Customer Portal (upgrade/downgrade, cancel, update card,
338
+ invoices). The admin dashboard adds live **Active subscriptions** and **Email
339
+ outbox** panels.
340
+
341
+ Setup: create recurring Prices in Paddle, set `PADDLE_*` and `SMTP_*` in `.env`,
342
+ and register the webhook endpoint `https://your-server/v1/paddle/webhook`
343
+ (events: `subscription.created/activated/updated/canceled`,
344
+ `transaction.completed`, `transaction.payment_failed`).
345
+
346
+ ---
347
+
348
+ ## WordPress front end
349
+
350
+ 1. Copy the plugin + theme into `wp-content/`, activate both.
351
+ 2. Under **Settings -> G2N**, set the API base (`https://your-server/v1`) and
352
+ admin token.
353
+ 3. The plugin auto-creates **Pricing**, **Account**, and **Status** pages and a
354
+ `/docs` library. Shortcodes: `[g2n_pricing]`, `[g2n_dashboard]`,
355
+ `[g2n_status]`, `[g2n_buy tier="pro"]`.
356
+
357
+ The plugin calls the API server-side via the WP HTTP API; the admin token never
358
+ reaches the browser.
359
+
360
+ ---
361
+
362
+ ## What is verified vs. what is scaffold
363
+
364
+ **Verified runnable (in this build, CPU-only — no GPU was available):**
365
+ * Licensing crypto: sign/verify, tamper + foreign-signer rejection, expiry,
366
+ offline grace, machine binding, feature gating.
367
+ * License server over real HTTP: mint -> activate -> validate -> **seat-limit
368
+ enforcement** -> offline verify; Paddle webhook signature verification and the
369
+ full webhook-driven lifecycle (mint, idempotency, cancel, payment-failed).
370
+ * **Serving platform:** model registry persistence + name-uniqueness, the
371
+ dynamic batcher's order/length/coalescing guarantees, the torch-free runtime,
372
+ latency stats, and the inference HTTP node end-to-end (health -> register ->
373
+ predict -> metrics). Plus the entitlement gate (community refused, Pro
374
+ unlocked). See `tests/test_serving.py`.
375
+ * Enterprise package imports and runs **without torch** (degrades to Community).
376
+
377
+ **Scaffold / needs your hardware or vendor SDKs:**
378
+ * The CUDA/Triton compile path and on-GPU speedups are **your** measurements.
379
+ * `NPUBackend` is an integration contract — subclass it for OpenVINO / CoreML /
380
+ QNN / ONNX Runtime.
381
+ * The in-place planner path is correctness-sensitive; gated AND opt-in. Validate
382
+ on a real model first.
383
+ * Serving real `torchscript:`/`state_dict:` models requires torch in the node
384
+ environment; the torch-free `python:` path is what's exercised here.
385
+ * WordPress PHP is written to standard but not executed here (no PHP runtime).
386
+
387
+ ---
388
+
389
+ ## A note on the benchmarks
390
+
391
+ Two reports exist: a friend's RTX 5070 Ti report (large wins: +16.5% latency,
392
+ 50.8% VRAM via the planner) and your own RTX 4050 numbers (g2n roughly ties
393
+ eager; the real win shows in the `g2n + torch.compile` synergy). Neither was
394
+ produced or verified in this build environment. For marketing, lead with the
395
+ conservative, reproducible numbers from your own hardware and clearly attribute
396
+ the 5070 Ti figures as an independent third-party run.
397
+
398
+ ## License
399
+
400
+ Proprietary. © g2n. (The open-core `g2n` wheel under `packaging/` is Apache-2.0.)