PyPI - g2n-enterprise - Versions diffs - 1.3.4__tar.gz - Mend

g2n-enterprise 1.3.4__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (45) hide show

g2n_enterprise-1.3.4/PKG-INFO +400 -0
g2n_enterprise-1.3.4/README.md +382 -0
g2n_enterprise-1.3.4/g2n_enterprise/__init__.py +120 -0
g2n_enterprise-1.3.4/g2n_enterprise/__main__.py +5 -0
g2n_enterprise-1.3.4/g2n_enterprise/accel/__init__.py +23 -0
g2n_enterprise-1.3.4/g2n_enterprise/accel/registry.py +171 -0
g2n_enterprise-1.3.4/g2n_enterprise/api.py +258 -0
g2n_enterprise-1.3.4/g2n_enterprise/cache/__init__.py +3 -0
g2n_enterprise-1.3.4/g2n_enterprise/cache/persistent.py +124 -0
g2n_enterprise-1.3.4/g2n_enterprise/cli.py +239 -0
g2n_enterprise-1.3.4/g2n_enterprise/licensing/__init__.py +246 -0
g2n_enterprise-1.3.4/g2n_enterprise/licensing/_pubkey.py +6 -0
g2n_enterprise-1.3.4/g2n_enterprise/licensing/features.py +81 -0
g2n_enterprise-1.3.4/g2n_enterprise/licensing/keys.py +205 -0
g2n_enterprise-1.3.4/g2n_enterprise/model_zoo.py +252 -0
g2n_enterprise-1.3.4/g2n_enterprise/planner_pro.py +70 -0
g2n_enterprise-1.3.4/g2n_enterprise/serve/__init__.py +60 -0
g2n_enterprise-1.3.4/g2n_enterprise/serve/_demo.py +33 -0
g2n_enterprise-1.3.4/g2n_enterprise/serve/bench.py +158 -0
g2n_enterprise-1.3.4/g2n_enterprise/serve/client.py +98 -0
g2n_enterprise-1.3.4/g2n_enterprise/serve/guard.py +110 -0
g2n_enterprise-1.3.4/g2n_enterprise/serve/optimize.py +489 -0
g2n_enterprise-1.3.4/g2n_enterprise/serve/reference.py +80 -0
g2n_enterprise-1.3.4/g2n_enterprise/serve/registry.py +212 -0
g2n_enterprise-1.3.4/g2n_enterprise/serve/runtime.py +427 -0
g2n_enterprise-1.3.4/g2n_enterprise/serve/server.py +425 -0
g2n_enterprise-1.3.4/g2n_enterprise/updates.py +106 -0
g2n_enterprise-1.3.4/g2n_enterprise.egg-info/PKG-INFO +400 -0
g2n_enterprise-1.3.4/g2n_enterprise.egg-info/SOURCES.txt +43 -0
g2n_enterprise-1.3.4/g2n_enterprise.egg-info/dependency_links.txt +1 -0
g2n_enterprise-1.3.4/g2n_enterprise.egg-info/entry_points.txt +2 -0
g2n_enterprise-1.3.4/g2n_enterprise.egg-info/requires.txt +9 -0
g2n_enterprise-1.3.4/g2n_enterprise.egg-info/top_level.txt +1 -0
g2n_enterprise-1.3.4/pyproject.toml +43 -0
g2n_enterprise-1.3.4/setup.cfg +4 -0
g2n_enterprise-1.3.4/tests/test_fusion_guard.py +73 -0
g2n_enterprise-1.3.4/tests/test_licensing.py +118 -0
g2n_enterprise-1.3.4/tests/test_lookup_and_version.py +98 -0
g2n_enterprise-1.3.4/tests/test_paddle.py +185 -0
g2n_enterprise-1.3.4/tests/test_persistent_cache.py +75 -0
g2n_enterprise-1.3.4/tests/test_rate_limit.py +33 -0
g2n_enterprise-1.3.4/tests/test_registry.py +53 -0
g2n_enterprise-1.3.4/tests/test_serving.py +539 -0
g2n_enterprise-1.3.4/tests/test_serving_torch.py +102 -0
g2n_enterprise-1.3.4/tests/test_updates_and_zoo.py +56 -0

g2n_enterprise-1.3.4/PKG-INFO ADDED Viewed

@@ -0,0 +1,400 @@
+Metadata-Version: 2.4
+Name: g2n-enterprise
+Version: 1.3.4
+Summary: A platform to optimize AND run PyTorch models: license-gated compiler (enhanced planner, persistent cache, multi-accelerator routing) plus a model registry and inference server, on top of open-core g2n.
+Author: g2n
+License: Proprietary
+Project-URL: Homepage, https://g2n.example.com
+Keywords: pytorch,triton,compiler,gpu,npu,inference,serving,model-server,license
+Requires-Python: >=3.10
+Description-Content-Type: text/markdown
+Requires-Dist: cryptography>=41.0
+Provides-Extra: runtime
+Requires-Dist: torch>=2.11.0; extra == "runtime"
+Requires-Dist: g2n>=0.4; extra == "runtime"
+Requires-Dist: triton>=3.6; extra == "runtime"
+Provides-Extra: dev
+Requires-Dist: pytest>=7; extra == "dev"
+# g2n-Enterprise
+A platform to **optimize and run** PyTorch models. It has two halves that share
+one code path and one license:
+* **Optimize** — `g2n.compile(model)` (or `torch.compile(model, backend="g2n")`):
+  hybrid fusion, a custom Triton LayerNorm(+GELU) kernel, an in-place-aware
+  buffer planner, a persistent cross-run compile cache, and multi-accelerator
+  routing.
+* **Run** — a built-in **model registry** + **inference server**
+  (`g2n.serve()`): register a model once, and the node loads it, optimizes it on
+  the way in, and serves predictions over HTTP, with optional dynamic batching.
+Around that sits the full machinery to sell it as a service: a signed **license
+system**, a zero-dependency **license server**, a license-management
+**dashboard**, and an ancient-Greek-styled **WordPress** front end (theme +
+plugin) that talks to the license API.
+> **Deployment model.** The customer installs g2n inside *their* environment.
+> Everything (compile + serve) executes there; the license server only mints and
+> validates entitlements.
+```python
+import g2n_enterprise as g2n
+g2n.activate("G2N-8H4K-L92X-QF7M")          # online once, then cached + offline
+# OPTIMIZE
+model = g2n.compile(model)                   # or torch.compile(model, backend="g2n")
+# RUN
+g2n.register_model("resnet", "torchscript:/models/resnet50.pt",
+                   max_batch=16, max_latency_ms=8)
+g2n.serve(port=8900)                          # POST /v1/models/resnet/predict
+```
+---
+## What's sold (tiers)
+| Capability | Community | Pro | Enterprise |
+|---|:--:|:--:|:--:|
+| Hybrid fusion + JIT pointwise codegen | ✓ | ✓ | ✓ |
+| Enhanced buffer planner (in-place aware) | | ✓ | ✓ |
+| Persistent cross-run compile cache | | ✓ | ✓ |
+| **Model registry + inference server (run models)** | | ✓ | ✓ |
+| **Dynamic request batching (autotuned)** | | | ✓ |
+| Multi-accelerator auto-routing (GPU / NPU / CPU) | | | ✓ |
+| Validated model-zoo configs + priority support | | | ✓ |
+Code never changes between tiers — gated features light up when the license
+grants them and silently fall back to the open-core path otherwise.
+---
+## How the three pieces connect
+```
+   ┌─────────────────────────┐        ┌──────────────────────────────┐
+   │  WordPress storefront    │        │   License server             │
+   │  (frontend repo)         │        │   (backend/license_server)   │
+   │  • [g2n_pricing] ────────┼──GET /v1/catalog──▶ tiers + price ────┤  one source
+   │  • [g2n_buy] ────────────┼──POST /v1/checkout─▶ Paddle ──webhook─▶│  of truth for
+   │  • [g2n_dashboard] ──────┼──POST /v1/portal ──▶ Paddle portal     │  entitlements
+   │  • [g2n_status] ─────────┼──GET /v1/health,/version               │
+   │  • [g2n_node_status] ──┐ │        └──────────┬───────────────────┘
+   └────────────────────────┼─┘                   │ mints signed token
+                            │                      ▼
+                            │        ┌──────────────────────────────┐
+   browser reads the user's│        │  pip install g2n              │
+   OWN node directly ──────┘        │  + g2n-enterprise (backend)   │
+                            ┌────────│  g2n.activate(KEY) ─verifies──┘ offline,
+                            │        │  g2n.compile(model)   OPTIMIZE   Ed25519
+                            ▼        │  g2n.serve()          RUN
+   ┌─────────────────────────┐      └──────────────┬───────────────┘
+   │ customer inference node  │◀── activate/validate │
+   │ /v1/healthz /readyz      │── runs in the customer's environment
+   │ /v1/models/<id>/predict  │
+   └─────────────────────────┘
+```
+* **WordPress ↔ license server** — the plugin proxies the server's `/v1/catalog`
+  (price/blurb/seats), `/v1/checkout`, `/v1/portal`, `/v1/health` and `/v1/version`
+  server-side; the storefront price always reflects the server (single source of
+  truth). The admin token never reaches the browser.
+* **license server ↔ client** — `g2n.activate(KEY)` exchanges a short key for an
+  Ed25519-signed token (once, online), then verifies it **offline**. The same
+  feature flags (`g2n_enterprise/licensing/features.py`) gate the compiler and
+  the serving platform, and feed the WordPress pricing — one definition,
+  everywhere.
+* **WordPress ↔ customer node** — `[g2n_node_status]` reads the customer's *own*
+  inference node directly from the browser (the node's CORS allows it); the
+  vendor never touches customer inference.
+---
+## Monorepo layout
+```
+g2n-enterprise/
+  g2n_enterprise/            # the closed-source client package (pip-installable)
+    licensing/              # Ed25519 keys, signed tokens, activate(), gating
+    accel/registry.py       # AcceleratorBackend ABC + auto_select (CUDA/CPU/NPU)
+    cache/persistent.py     # cross-run Triton + artifact cache (Windows warmup fix)
+    planner_pro.py          # in-place-aware planner (gated)
+    model_zoo.py            # validated compile configs + parity harness (gated)
+    serve/                  # -- the "run models" half --
+      registry.py           #   ModelRegistry: register/list/version models (JSON)
+      runtime.py            #   ModelRuntime + DynamicBatcher + latency stats
+      server.py             #   stdlib inference HTTP node (/v1/models/.../predict)
+      _demo.py              #   torch-free demo models (python: sources)
+    api.py                  # compile() + serve()/register_model()/load_model()
+    cli.py                  # doctor/activate/status + models/register/predict/serve
+  license_server/           # ZERO-dependency server (stdlib http.server + sqlite3)
+    app.py                  #   /v1/activate /v1/validate /v1/catalog + Paddle + admin
+    mint.py                 #   CLI: keygen / issue / list / trial
+    paddle_gateway.py       #   Paddle Billing (checkout, webhook verify, portal)
+    dashboard/index.html    #   license-management dashboard
+  packaging/                # builds the open-core `g2n` wheel (Apache-2.0)
+  examples/                 # quickstart, torch.compile backend, serve_quickstart
+  tests/                    # licensing / registry / cache / serving (no GPU needed)
+```
+---
+## Optimize: `g2n.compile`
+`g2n.compile(model)` routes to the best available accelerator and runs the
+open-core g2n pipeline (custom-kernel FX pass + Inductor) under license-gated
+config. As a `torch.compile` backend:
+```python
+import g2n_enterprise            # registers backend="g2n" on import
+compiled = torch.compile(model, backend="g2n")
+```
+Pro lights up memory-fusion + the persistent cache; Enterprise adds
+max-autotune and multi-accelerator routing. Without the entitlement (or without
+torch/Triton/CUDA) every path degrades to stock — never worse than eager.
+## Run: `g2n.serve` (Pro+)
+The serving half turns a node into a model server.
+```python
+import g2n_enterprise as g2n
+# register once (persisted under ~/.g2n/models)
+g2n.register_model("bert", "torchscript:/models/bert.pt",
+                   max_batch=32, max_latency_ms=10)
+# bring one model up locally
+rt = g2n.load_model("bert")
+rt.predict(batch)                 # g2n-optimized on load
+# or serve every registered model over HTTP
+g2n.serve(port=8900, token="node-admin-token")
+```
+Source URIs: `torchscript:/path.pt`, `state_dict:/w.pt@pkg.mod:build_fn`,
+`callable:pkg.mod:factory`, and `python:pkg.mod:fn` (torch-free — used by the
+demo models so the node runs anywhere).
+### Faster, lower-VRAM inference
+Each served model gets real inference optimizations (engage on CUDA, no-op on
+CPU): `inference_mode` always on (free latency + memory); `precision="auto"`
+(fp16/bf16 autocast — halves activation VRAM, tensor cores) or `"int8"` (dynamic
+quantization — halves weight memory again, CPU path, opt-in); `cuda_graph=True`
+(capture + replay the forward, which removes the kernel-launch overhead that
+makes "compiled tie eager" on small GPUs); `channels_last` for conv nets; a
+**ResidencyManager** that keeps K models hot on the GPU and pages the rest from
+CPU (`G2N_SERVE_RESIDENT_MODELS=2`); and **admission control**
+(`G2N_SERVE_MAX_CONCURRENCY`, `G2N_SERVE_VRAM_FLOOR_MB`) so a saturated 6 GB node
+returns 503 + `Retry-After` instead of OOM-crashing.
+```python
+g2n.register_model("resnet", "torchscript:/m/resnet50.pt",
+                   precision="auto", cuda_graph=True, channels_last=True, max_batch=16)
+res = g2n.benchmark("resnet", sample, rounds=200)   # eager vs optimized, measured on YOUR box
+```
+`g2n.benchmark` / `g2n-enterprise bench` report median latency (p50/p95/p99),
+throughput and **peak VRAM** for eager vs optimized — measured on your hardware,
+never hardcoded. See [`docs/SERVING.md`](docs/SERVING.md) §4b.
+Inference HTTP contract (stdlib server):
+| Method | Path | Auth | Purpose |
+|---|---|---|---|
+| GET | `/v1/healthz` | — | liveness + uptime + ready-model count |
+| GET | `/v1/models` | — | list models + per-model latency stats |
+| GET | `/v1/models/<id>` | — | one model's info |
+| GET | `/v1/metrics` | — | aggregate counters (JSON) |
+| POST | `/v1/models/<id>/predict` | optional | `{inputs}` -> `{outputs, latency_ms}` |
+| POST | `/v1/models` | node token | register + load a model entry |
+CLI mirror: `g2n-enterprise register NAME SOURCE`, `... models`, `... predict
+NAME JSON`, `... serve --port 8900`.
+**Dynamic batching (Enterprise).** When the license grants `auto_batch`, the
+runtime coalesces concurrent requests into one batched call within
+`max_latency_ms`, preserving per-caller order and length. Below Enterprise,
+models serve one item at a time (still fully functional).
+---
+## Deploy the license server (runs immediately)
+Pure Python stdlib (only `cryptography` for signing). No web framework, no DB
+server.
+```bash
+cd license_server
+pip install cryptography
+cp .env.example .env            # set a strong G2N_ADMIN_TOKEN
+python3 mint.py keygen          # generate YOUR signing key (rotate the demo one!)
+#   -> paste the printed public key into g2n_enterprise/licensing/_pubkey.py
+./run.sh                        # serves http://0.0.0.0:8800  (+ dashboard at /)
+```
+Mint and inspect licenses:
+```bash
+python3 mint.py issue --tier enterprise --seats 25 --days 365 --email acme@co.com
+python3 mint.py list
+```
+### License-server API
+| Method | Path | Auth | Purpose |
+|---|---|---|---|
+| POST | `/v1/activate` | — | `{key, machine_id}` -> signed token |
+| POST | `/v1/trial` | — | `{machine_id, email?}` -> 14-day hardware-bound trial |
+| POST | `/v1/validate` | — | `{token}` -> server-side verify |
+| POST | `/v1/checkout` | — | `{tier, billing, email}` -> **Paddle** checkout URL |
+| POST | `/v1/paddle/webhook` | Paddle sig | mint/cancel on subscription + transaction events |
+| POST | `/v1/portal` | — | `{key}` -> Paddle Customer Portal URL (self-service) |
+| POST | `/v1/license/lookup` | — | `{key}` -> single masked license row (key is the credential) |
+| GET | `/v1/catalog` | — | tiers + pricing (used by WordPress) |
+| GET | `/v1/health` | — | uptime + status (status widget) |
+| GET | `/v1/version` | — | latest client version + `info_url` (auto-update channel) |
+| GET | `/v1/admin/licenses` | admin | list |
+| POST | `/v1/admin/licenses` | admin | mint a key |
+| POST | `/v1/admin/licenses/<KEY>/revoke` | admin | revoke |
+| GET | `/v1/admin/outbox` | admin | recent license-delivery emails |
+| GET | `/v1/admin/subscriptions` | admin | active subscriptions |
+Admin auth: `X-Admin-Token: <token>` **or** `Authorization: Bearer <token>`
+(compared in constant time).
+---
+## License system: how the security actually works
+* **Asymmetric (Ed25519).** The server holds the *private* signing key; the
+  client ships only the *public* key (`_pubkey.py`). Clients verify, never forge.
+* **Short keys, signed tokens.** A customer buys `G2N-XXXX-XXXX-XXXX`.
+  `activate()` exchanges it (once, online) for a signed token encoding
+  tier/features/expiry/seat-binding, cached under `~/.g2n/` and verified
+  **offline** with a 14-day grace window.
+* **Seat binding + protected trials.** Activation registers a hashed machine id;
+  the server enforces the seat cap and allows one hardware-bound trial per
+  machine.
+**Honest limitation (important):** any license check that runs inside code the
+customer controls is *soft* protection — a determined customer can patch the
+client. This is the correct cryptographic backbone plus a professional deterrent
+and contractual line, **not** unbreakable DRM. Don't price or promise as if it
+were. Concretely: `G2N_MACHINE_ID` can be overridden via env, so a determined
+user can present as a "new machine" to dodge seat binding and the one-trial-per-
+machine rule — treat seat/trial enforcement as a deterrent, not a hard wall.
+Built-in abuse controls: the public endpoints (`/v1/license/lookup`,
+`/v1/checkout`, `/v1/trial`, `/v1/activate`) are per-IP rate-limited in-process
+(tune with `G2N_RL_*`); for multi-worker deployments put your reverse proxy's
+limiter in front too.
+Operational security: put the server behind **HTTPS**, restrict the wide-open
+CORS (`*`) in `app.py` to your origins, **change** `G2N_ADMIN_TOKEN`, and
+**replace** the demo `secrets/signing_key.pem` (never commit/ship it). All four
+are mandatory pre-launch steps in `PRODUCTION_CHECKLIST.md`.
+## Versioning
+The three shipping artifacts version independently — they are separate products
+with separate release cadences, so their numbers will not match:
+| Artifact | Current | What it is |
+|---|---|---|
+| `g2n` (PyPI, open-core) | 0.5.x | the compiler wheel customers `pip install` |
+| `g2n-enterprise` (client + server) | 1.3.x | the closed client + license server |
+| WordPress plugin + theme | 1.4.x | the storefront |
+The client and license server share a version (they're released together); the
+server advertises the latest **client** version separately via `/v1/version`.
+---
+## Payments — the self-serve loop (Paddle Billing)
+Paddle is the **Merchant of Record**: it hosts the payment page and issues
+invoices, so there is no PCI scope on your server. The financial loop is closed:
+```
+buyer -> [g2n_buy] (WordPress) -> POST /v1/checkout -> Paddle Checkout (hosted)
+      -> pays -> Paddle fires webhook -> POST /v1/paddle/webhook (signature-verified)
+      -> mint_license(tier) -> email key to buyer -> license active 24/7
+cancel/expire -> subscription.* / transaction.* webhook -> license canceled/past_due
+```
+* **Stdlib only.** Paddle's REST API is called with `urllib`; webhook signatures
+  are verified with `hmac` (HMAC-SHA256 over `"{ts}:{body}"`, constant-time
+  compare, replay-window check). No `paddle` SDK needed.
+* **Idempotent.** Events are de-duplicated by id; a subscription never mints two
+  licenses.
+* **Email delivery.** Keys are sent via SMTP if configured; otherwise they queue
+  in an `email_outbox` table so nothing is lost.
+* **Self-service portal.** `[g2n_dashboard]` (key in) -> `/v1/portal` -> the
+  official Paddle Customer Portal (upgrade/downgrade, cancel, update card,
+  invoices). The admin dashboard adds live **Active subscriptions** and **Email
+  outbox** panels.
+Setup: create recurring Prices in Paddle, set `PADDLE_*` and `SMTP_*` in `.env`,
+and register the webhook endpoint `https://your-server/v1/paddle/webhook`
+(events: `subscription.created/activated/updated/canceled`,
+`transaction.completed`, `transaction.payment_failed`).
+---
+## WordPress front end
+1. Copy the plugin + theme into `wp-content/`, activate both.
+2. Under **Settings -> G2N**, set the API base (`https://your-server/v1`) and
+   admin token.
+3. The plugin auto-creates **Pricing**, **Account**, and **Status** pages and a
+   `/docs` library. Shortcodes: `[g2n_pricing]`, `[g2n_dashboard]`,
+   `[g2n_status]`, `[g2n_buy tier="pro"]`.
+The plugin calls the API server-side via the WP HTTP API; the admin token never
+reaches the browser.
+---
+## What is verified vs. what is scaffold
+**Verified runnable (in this build, CPU-only — no GPU was available):**
+* Licensing crypto: sign/verify, tamper + foreign-signer rejection, expiry,
+  offline grace, machine binding, feature gating.
+* License server over real HTTP: mint -> activate -> validate -> **seat-limit
+  enforcement** -> offline verify; Paddle webhook signature verification and the
+  full webhook-driven lifecycle (mint, idempotency, cancel, payment-failed).
+* **Serving platform:** model registry persistence + name-uniqueness, the
+  dynamic batcher's order/length/coalescing guarantees, the torch-free runtime,
+  latency stats, and the inference HTTP node end-to-end (health -> register ->
+  predict -> metrics). Plus the entitlement gate (community refused, Pro
+  unlocked). See `tests/test_serving.py`.
+* Enterprise package imports and runs **without torch** (degrades to Community).
+**Scaffold / needs your hardware or vendor SDKs:**
+* The CUDA/Triton compile path and on-GPU speedups are **your** measurements.
+* `NPUBackend` is an integration contract — subclass it for OpenVINO / CoreML /
+  QNN / ONNX Runtime.
+* The in-place planner path is correctness-sensitive; gated AND opt-in. Validate
+  on a real model first.
+* Serving real `torchscript:`/`state_dict:` models requires torch in the node
+  environment; the torch-free `python:` path is what's exercised here.
+* WordPress PHP is written to standard but not executed here (no PHP runtime).
+---
+## A note on the benchmarks
+Two reports exist: a friend's RTX 5070 Ti report (large wins: +16.5% latency,
+50.8% VRAM via the planner) and your own RTX 4050 numbers (g2n roughly ties
+eager; the real win shows in the `g2n + torch.compile` synergy). Neither was
+produced or verified in this build environment. For marketing, lead with the
+conservative, reproducible numbers from your own hardware and clearly attribute
+the 5070 Ti figures as an independent third-party run.
+## License
+Proprietary. © g2n. (The open-core `g2n` wheel under `packaging/` is Apache-2.0.)