hpc-as-api 0.3.2__tar.gz → 0.3.4__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (34) hide show
  1. {hpc_as_api-0.3.2 → hpc_as_api-0.3.4}/CHANGELOG.md +66 -0
  2. {hpc_as_api-0.3.2 → hpc_as_api-0.3.4}/PKG-INFO +159 -49
  3. hpc_as_api-0.3.4/README.md +267 -0
  4. hpc_as_api-0.3.4/docs/tutorial.ipynb +1308 -0
  5. {hpc_as_api-0.3.2 → hpc_as_api-0.3.4}/hpc_as_api/__init__.py +12 -4
  6. hpc_as_api-0.3.4/hpc_as_api/app.py +512 -0
  7. {hpc_as_api-0.3.2 → hpc_as_api-0.3.4}/hpc_as_api/core.py +20 -0
  8. {hpc_as_api-0.3.2 → hpc_as_api-0.3.4}/hpc_as_api/presets/openai.py +20 -38
  9. hpc_as_api-0.3.4/paper/paper.md +229 -0
  10. {hpc_as_api-0.3.2 → hpc_as_api-0.3.4}/pyproject.toml +3 -3
  11. hpc_as_api-0.3.4/tests/test_app.py +240 -0
  12. {hpc_as_api-0.3.2 → hpc_as_api-0.3.4}/tests/test_compute.py +5 -0
  13. {hpc_as_api-0.3.2 → hpc_as_api-0.3.4}/uv.lock +6 -6
  14. hpc_as_api-0.3.2/README.md +0 -157
  15. hpc_as_api-0.3.2/docs/tutorial.ipynb +0 -811
  16. hpc_as_api-0.3.2/hpc_as_api/app.py +0 -552
  17. hpc_as_api-0.3.2/paper/paper.md +0 -224
  18. hpc_as_api-0.3.2/tests/test_app.py +0 -70
  19. {hpc_as_api-0.3.2 → hpc_as_api-0.3.4}/.github/workflows/security.yml +0 -0
  20. {hpc_as_api-0.3.2 → hpc_as_api-0.3.4}/.gitignore +0 -0
  21. {hpc_as_api-0.3.2 → hpc_as_api-0.3.4}/.gitleaks.toml +0 -0
  22. {hpc_as_api-0.3.2 → hpc_as_api-0.3.4}/CONTRIBUTING.md +0 -0
  23. {hpc_as_api-0.3.2 → hpc_as_api-0.3.4}/LICENSE +0 -0
  24. {hpc_as_api-0.3.2 → hpc_as_api-0.3.4}/Relay_Architecture.png +0 -0
  25. {hpc_as_api-0.3.2 → hpc_as_api-0.3.4}/docs/deployment.md +0 -0
  26. {hpc_as_api-0.3.2 → hpc_as_api-0.3.4}/hpc_as_api/auth.py +0 -0
  27. {hpc_as_api-0.3.2 → hpc_as_api-0.3.4}/hpc_as_api/cli.py +0 -0
  28. {hpc_as_api-0.3.2 → hpc_as_api-0.3.4}/hpc_as_api/compute.py +0 -0
  29. {hpc_as_api-0.3.2 → hpc_as_api-0.3.4}/hpc_as_api/crypto.py +0 -0
  30. {hpc_as_api-0.3.2 → hpc_as_api-0.3.4}/hpc_as_api/presets/__init__.py +0 -0
  31. {hpc_as_api-0.3.2 → hpc_as_api-0.3.4}/hpc_as_api/utils.py +0 -0
  32. {hpc_as_api-0.3.2 → hpc_as_api-0.3.4}/paper/paper.bib +0 -0
  33. {hpc_as_api-0.3.2 → hpc_as_api-0.3.4}/tests/__init__.py +0 -0
  34. {hpc_as_api-0.3.2 → hpc_as_api-0.3.4}/tests/test_utils.py +0 -0
@@ -1,5 +1,71 @@
1
1
  # Changelog
2
2
 
3
+ ## 0.3.4 (2026-06-10)
4
+
5
+ ### Messaging: domain-agnostic positioning throughout
6
+
7
+ All public-facing materials now consistently present `hpc-as-api` as a
8
+ **domain-agnostic HTTP gateway for any HPC function** — not as an LLM-specific tool.
9
+
10
+ - **README**: Rewritten to lead with `HPCApp` and the general-purpose streaming
11
+ gateway pattern. The simulation example appears before the LLM preset. "OpenAI-compatible
12
+ gateway" is now framed as a built-in preset, not the product's identity.
13
+ - **paper/paper.md**: Title changed to *"A Domain-Agnostic HTTP Gateway for HPC Functions…"*.
14
+ `HPCApp`, `make_app()`, and the framework architecture are now described as first-class
15
+ subjects. The LLM preset is introduced as one application of the framework.
16
+ - **`hpc_as_api/__init__.py`**: Module docstring updated to lead with the domain-agnostic
17
+ description and mark the LLM preset as "built-in application of the framework".
18
+ - **`pyproject.toml`**: Description updated to "Domain-agnostic HTTP gateway…"; keywords
19
+ updated to add `domain-agnostic` and `scientific-computing`, remove `openai`.
20
+
21
+ No API or behavior changes.
22
+
23
+ ## 0.3.3 (2026-06-10)
24
+
25
+ ### Refactor: `make_app()` factory — multiple independent instances, no env-var injection
26
+
27
+ `app.py` now exposes a `make_app()` factory function. Every call returns a
28
+ fresh, independent `FastAPI` instance that captures its configuration in closures
29
+ — there are no module-level globals involved in request handling.
30
+
31
+ **Before (0.3.x):** `create_openai_app()` in `presets/openai.py` worked by
32
+ injecting arguments into `os.environ` and then importing the cached module-level
33
+ singleton. Calling it a second time with different arguments silently had no
34
+ effect.
35
+
36
+ **After:** `create_openai_app()` calls `make_app()` directly. Each call returns
37
+ a genuinely independent app — two gateways with different endpoints, models, or
38
+ auth settings can live in the same process without interfering.
39
+
40
+ ### Fix: per-user Globus token fully wired end-to-end
41
+
42
+ `app.py` now passes `caller.globus_token` to `submit_streaming_inference()` when
43
+ the caller authenticated with a Globus token. In 0.3.1–0.3.2, `submit_streaming_inference()`
44
+ gained the `globus_token=` parameter but `app.py` never forwarded it.
45
+
46
+ ### Fix: `HPCApp` payload size check
47
+
48
+ `HPCApp._add_route` now strips old images and enforces the Globus payload limit
49
+ before submitting any job with a `messages` field. Previously only
50
+ `GlobusComputeClient.submit_inference()` did this check; the domain-agnostic
51
+ `HPCApp` path bypassed it entirely.
52
+
53
+ ### Fix: version mismatch
54
+
55
+ `__init__.py.__version__` was stuck at `"0.2.0"` while `pyproject.toml` had
56
+ already advanced to `0.3.2`. Both are now `"0.3.3"`.
57
+
58
+ ### Fix: `globus_sdk.authorizers` mock missing in test_compute.py
59
+
60
+ The `mock_globus_modules` fixture now also stubs `globus_sdk.authorizers`, which
61
+ `compute.py` imports at module level via `AccessTokenAuthorizer`.
62
+
63
+ ### Notes
64
+
65
+ - No breaking changes. `app.py` still exposes a module-level `app` singleton
66
+ (built from env vars) and a `router` for embedding; both continue to work.
67
+ - `make_app()` is now the recommended entry point for programmatic configuration.
68
+
3
69
  ## 0.3.2 (2026-06-04)
4
70
 
5
71
  ### New feature: programmatic auth configuration (`AuthConfig`)
@@ -1,7 +1,7 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: hpc-as-api
3
- Version: 0.3.2
4
- Summary: HTTP gateway for any HPC function via Globus Compute + WebSocket relay
3
+ Version: 0.3.4
4
+ Summary: Domain-agnostic HTTP gateway for any HPC function via Globus Compute + WebSocket relay
5
5
  Project-URL: Homepage, https://github.com/uicacer/hpc-as-api
6
6
  Project-URL: Repository, https://github.com/uicacer/hpc-as-api
7
7
  Project-URL: Bug Tracker, https://github.com/uicacer/hpc-as-api/issues
@@ -160,7 +160,7 @@ License: Apache License
160
160
  See the License for the specific language governing permissions and
161
161
  limitations under the License.
162
162
  License-File: LICENSE
163
- Keywords: api-gateway,globus-compute,hpc,llm,openai,slurm,streaming,vllm
163
+ Keywords: api-gateway,domain-agnostic,globus-compute,hpc,llm,scientific-computing,slurm,streaming,vllm
164
164
  Classifier: Development Status :: 3 - Alpha
165
165
  Classifier: Intended Audience :: Science/Research
166
166
  Classifier: License :: OSI Approved :: Apache Software License
@@ -193,58 +193,63 @@ Description-Content-Type: text/markdown
193
193
  [![License](https://img.shields.io/badge/license-Apache%202.0-blue)](https://github.com/uicacer/hpc-as-api/blob/main/LICENSE)
194
194
  [![Tests](https://github.com/uicacer/hpc-as-api/actions/workflows/tests.yml/badge.svg)](https://github.com/uicacer/hpc-as-api/actions)
195
195
 
196
- **OpenAI-compatible API gateway for HPC clusters via Globus Compute.**
196
+ **HTTP gateway for any HPC function real-time streaming from any HPC workload.**
197
197
 
198
- `hpc-as-api` exposes any vLLM-served model running on an HPC cluster (SLURM, PBS, etc.) as a standard OpenAI-compatible REST API. It handles authentication, rate limiting, payload size management, and real-time token streaming — so your existing OpenAI clients work without modification.
198
+ `hpc-as-api` turns any Python function running on an HPC cluster into a streaming HTTP endpoint. Register your function, define its input schema with Pydantic, and get a production-ready REST API with authentication, rate limiting, and live SSE streaming — no open ports, no VPN, no firewall changes on the HPC side.
199
199
 
200
200
  ```python
201
- from hpc_as_api.compute import GlobusComputeClient
202
-
203
- client = GlobusComputeClient(
204
- endpoint_id="8d978809-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
205
- models={
206
- "qwen25-vl-72b": {
207
- "hf_name": "Qwen/Qwen2.5-VL-72B-Instruct-AWQ",
208
- "url": "http://ghi2-002:8000",
209
- "context_reserve_output": 4096,
210
- }
211
- },
212
- )
213
- result = await client.submit_inference(
214
- messages=[{"role": "user", "content": "Explain quantum entanglement."}],
215
- model="qwen25-vl-72b",
216
- )
201
+ from hpc_as_api.core import HPCApp
202
+ from pydantic import BaseModel
203
+
204
+ class SimRequest(BaseModel):
205
+ steps: int = 1000
206
+ grid_size: int = 100
207
+
208
+ def hpc_simulation(steps, grid_size, relay_url, channel_id, relay_secret=""):
209
+ from streamrelay import RelayProducer
210
+ with RelayProducer(relay_url, channel_id, relay_secret=relay_secret) as relay:
211
+ for i in range(steps):
212
+ result = run_timestep(i, grid_size)
213
+ relay.send_token(f"step={i} energy={result:.4f}\n")
214
+
215
+ app = HPCApp(endpoint_id="...", relay_url="wss://relay.example.com") \
216
+ .mount("/simulate", hpc_simulation, SimRequest) \
217
+ .create_app()
217
218
  ```
218
219
 
220
+ Any output produced incrementally on the HPC side arrives in real time: simulation checkpoints, solver residuals, genome alignment progress, molecular dynamics snapshots, LLM tokens — anything.
221
+
219
222
  ## Why
220
223
 
221
- HPC clusters run the largest open-source LLMs (72B+ parameters) on GPU hardware that typical cloud users can't afford. But HPC infrastructure has no standard API surface — each cluster has its own SLURM scripts, SSH tunnels, and authentication systems. `hpc-as-api` provides a uniform OpenAI-compatible interface over any vLLM-served model, using [Globus Compute](https://www.globus.org/compute) for authentication and job dispatch (no open ports required on the HPC side).
224
+ HPC clusters run workloads impossible on commodity hardware — 72B+ parameter models, climate simulations, molecular dynamics at scale. But they expose no standard API. Each cluster has its own SLURM scripts, SSH tunnels, authentication systems, and job submission conventions.
225
+
226
+ `hpc-as-api` provides a uniform HTTP interface over any HPC function using [Globus Compute](https://www.globus.org/compute) for authentication and job dispatch and [streamrelay](https://github.com/uicacer/streamrelay) for real-time output streaming. Callers send a POST request; the framework handles everything else.
222
227
 
223
228
  ## Architecture
224
229
 
225
230
  ![Relay architecture: the HPC compute node and gateway consumer both connect outbound to the WebSocket relay, traversing firewalls without VPN or inbound ports.](Relay_Architecture.png)
226
231
 
227
232
  ```
228
- Your App / OpenAI Client
229
- │ POST /v1/chat/completions
233
+ Your Application / HTTP Client
234
+ │ POST /your-endpoint (any input schema)
230
235
 
231
236
  hpc-as-api (FastAPI)
232
237
  │ Globus Compute (AMQP — no HPC firewall holes)
233
238
 
234
- HPC Cluster (SLURM)
235
- vLLM HTTP API (internal LAN)
239
+ HPC Cluster (SLURM / PBS / …)
240
+ your function runs; output flows via streamrelay
236
241
 
237
- GPU Compute Node
238
- │ tokens flow via WebSocket relay (streamrelay)
242
+ GPU / CPU Compute Node
243
+ │ tokens / results / checkpoints via WebSocket relay
239
244
 
240
- hpc-as-api → SSE stream → Your App
245
+ hpc-as-api → SSE stream → Your Application
241
246
  ```
242
247
 
243
248
  Key design points:
244
249
  - **No open ports on HPC**: Globus Compute is outbound-only from the cluster
245
- - **Real-time streaming**: Tokens stream back via [streamrelay](https://github.com/uicacer/streamrelay) WebSocket relay
246
- - **E2E encryption**: Optional AES-256-GCM encryption between HPC and consumer (relay sees only ciphertext)
247
- - **OpenAI-compatible**: Drop-in for any client using the OpenAI SDK
250
+ - **Real-time streaming**: Any incremental output arrives as SSE via [streamrelay](https://github.com/uicacer/streamrelay)
251
+ - **E2E encryption**: Optional AES-256-GCM encryption relay sees only ciphertext
252
+ - **Domain-agnostic**: Register any Python function; not limited to LLMs
248
253
 
249
254
  ## Installation
250
255
 
@@ -256,9 +261,69 @@ pip install hpc-as-api
256
261
  pip install "hpc-as-api[globus]"
257
262
  ```
258
263
 
259
- ## Quickstart: Run as a service
264
+ ## Quickstart: Domain-agnostic gateway
260
265
 
261
- Set environment variables and start:
266
+ Register any HPC function and stream its output:
267
+
268
+ ```python
269
+ from hpc_as_api.core import HPCApp
270
+ from pydantic import BaseModel
271
+
272
+ class RunRequest(BaseModel):
273
+ steps: int = 1000
274
+ param: float = 0.5
275
+
276
+ def my_hpc_function(steps, param, relay_url, channel_id, relay_secret=""):
277
+ from streamrelay import RelayProducer
278
+ with RelayProducer(relay_url, channel_id, relay_secret=relay_secret) as relay:
279
+ for i in range(steps):
280
+ relay.send_token(f"step={i} value={compute(i, param)}\n")
281
+
282
+ gateway = HPCApp(
283
+ endpoint_id="your-globus-endpoint-uuid",
284
+ relay_url="wss://relay.example.com",
285
+ relay_secret="your-relay-secret",
286
+ )
287
+ gateway.mount("/run", my_hpc_function, RunRequest)
288
+ app = gateway.create_app()
289
+ ```
290
+
291
+ Run with:
292
+ ```bash
293
+ uvicorn mymodule:app --host 0.0.0.0 --port 8001
294
+ ```
295
+
296
+ Clients stream the output in real time:
297
+ ```bash
298
+ curl -X POST http://localhost:8001/run \
299
+ -H "Authorization: Bearer <token>" \
300
+ -H "Content-Type: application/json" \
301
+ -d '{"steps": 500, "param": 0.7}'
302
+ ```
303
+
304
+ ## Built-in preset: OpenAI-compatible LLM gateway
305
+
306
+ For vLLM-served language models, the OpenAI preset provides a drop-in
307
+ `/v1/chat/completions` endpoint compatible with any OpenAI client:
308
+
309
+ ```python
310
+ from hpc_as_api.presets.openai import create_openai_app
311
+
312
+ app = create_openai_app(
313
+ endpoint_id="8d978809-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
314
+ models={
315
+ "qwen25-vl-72b": {
316
+ "hf_name": "Qwen/Qwen2.5-VL-72B-Instruct-AWQ",
317
+ "url": "http://ghi2-002:8000",
318
+ "context_reserve_output": 4096,
319
+ }
320
+ },
321
+ relay_url="wss://relay.example.com",
322
+ relay_secret="your-relay-secret",
323
+ )
324
+ ```
325
+
326
+ Or run as a service from environment variables:
262
327
 
263
328
  ```bash
264
329
  export GLOBUS_COMPUTE_ENDPOINT_ID="your-endpoint-uuid"
@@ -269,7 +334,24 @@ export RELAY_SECRET="your-relay-secret"
269
334
  uvicorn hpc_as_api.app:app --host 0.0.0.0 --port 8001
270
335
  ```
271
336
 
272
- The gateway is now reachable at `http://localhost:8001/v1/chat/completions` with the standard OpenAI API schema.
337
+ Any OpenAI client works without modification:
338
+ ```python
339
+ import openai
340
+ client = openai.OpenAI(base_url="http://localhost:8001/v1", api_key="sk-xxxx")
341
+ response = client.chat.completions.create(model="qwen25-vl-72b", messages=[...], stream=True)
342
+ ```
343
+
344
+ ## Multiple independent gateways
345
+
346
+ `make_app()` returns a fresh, independent instance each time — safe to use
347
+ multiple gateways with different configurations in the same process:
348
+
349
+ ```python
350
+ from hpc_as_api.app import make_app
351
+
352
+ sim_app = make_app(endpoint_id="endpoint-a", relay_url="wss://relay.example.com", models={...})
353
+ llm_app = make_app(endpoint_id="endpoint-b", relay_url="wss://relay.example.com", models={...})
354
+ ```
273
355
 
274
356
  ## Embed in an existing FastAPI app
275
357
 
@@ -281,21 +363,47 @@ app = FastAPI()
281
363
  app.include_router(router, prefix="/hpc")
282
364
  ```
283
365
 
366
+ ## Programmatic auth configuration
367
+
368
+ ```python
369
+ from hpc_as_api import AuthConfig
370
+ from hpc_as_api.core import HPCApp
371
+
372
+ gateway = HPCApp(
373
+ endpoint_id="...",
374
+ relay_url="wss://relay.example.com",
375
+ auth=AuthConfig(
376
+ globus_client_id="your-client-id",
377
+ globus_client_secret="your-client-secret",
378
+ allowed_domains=["university.edu"],
379
+ api_keys={"my-service": "sk-xxxx"},
380
+ rate_limit_requests=20,
381
+ rate_limit_window=60,
382
+ ),
383
+ )
384
+ ```
385
+
284
386
  ## Configuration reference
285
387
 
388
+ ### HPCApp / make_app()
389
+
390
+ | Argument | Env var fallback | Description |
391
+ |---|---|---|
392
+ | `endpoint_id` | `GLOBUS_COMPUTE_ENDPOINT_ID` | Globus endpoint UUID for the HPC cluster |
393
+ | `relay_url` | `RELAY_URL` | WebSocket relay URL for streaming |
394
+ | `relay_secret` | `RELAY_SECRET` | Shared secret for relay auth |
395
+ | `relay_encryption_key` | `RELAY_ENCRYPTION_KEY` | AES-256 hex key for E2E encryption |
396
+ | `auth` | — | `AuthConfig` or `Authenticator` instance |
397
+
398
+ ### OpenAI preset additional settings
399
+
286
400
  | Variable | Default | Description |
287
401
  |---|---|---|
288
- | `GLOBUS_COMPUTE_ENDPOINT_ID` | — | Globus endpoint UUID for the HPC cluster |
289
402
  | `HPC_MODELS` | `{}` | JSON dict: model name → HPC config |
290
- | `RELAY_URL` | | WebSocket relay URL for token streaming |
291
- | `RELAY_SECRET` | | Shared secret for relay auth |
292
- | `RELAY_ENCRYPTION_KEY` | — | AES-256 hex key for E2E encryption |
293
- | `USE_GLOBUS_COMPUTE` | `true` | `false` to route directly via SSH tunnel |
294
- | `LAKESHORE_VLLM_ENDPOINT` | `http://localhost:8000` | Direct vLLM URL (SSH mode) |
295
- | `HPC_PROXY_HOST` | `0.0.0.0` | Bind host |
296
- | `HPC_PROXY_PORT` | `8001` | Bind port |
403
+ | `USE_GLOBUS_COMPUTE` | `true` | `false` to route directly via vLLM URL |
404
+ | `LAKESHORE_VLLM_ENDPOINT` | `http://localhost:8000` | Direct vLLM URL (non-Globus mode) |
297
405
 
298
- ### HPC_MODELS schema
406
+ ### HPC_MODELS schema (LLM preset)
299
407
 
300
408
  ```json
301
409
  {
@@ -309,10 +417,12 @@ app.include_router(router, prefix="/hpc")
309
417
 
310
418
  ## Authentication
311
419
 
312
- The gateway supports two auth modes (configured in `hpc_as_api/auth.py`):
420
+ Two auth modes, configurable via `AuthConfig` or environment variables:
421
+
422
+ - **Globus token**: Bearer token from Globus Auth, validated via introspection; email domain filtering supported
423
+ - **API key**: Static key from `HPC_API_KEYS` env var (comma-separated `name:key` pairs)
313
424
 
314
- - **Globus token**: Bearer token from Globus Auth, validated via introspection
315
- - **API key**: Static key from `HPC_API_KEYS` env var (comma-separated)
425
+ Both modes coexist on the same endpoint.
316
426
 
317
427
  ## Development
318
428
 
@@ -325,7 +435,7 @@ uv run pytest
325
435
 
326
436
  ## Related
327
437
 
328
- - [streamrelay](https://github.com/uicacer/streamrelay) — WebSocket relay for real-time token streaming from Globus Compute
438
+ - [streamrelay](https://github.com/uicacer/streamrelay) — WebSocket relay for real-time output streaming from Globus Compute
329
439
  - [STREAM](https://github.com/uicacer/stream) — Full tiered LLM routing system that uses hpc-as-api
330
440
 
331
441
  ## License
@@ -339,7 +449,7 @@ If you use hpc-as-api in research, please cite:
339
449
  ```bibtex
340
450
  @software{nassar2025hpcgateway,
341
451
  author = {Nassar, Anas},
342
- title = {hpc-as-api: OpenAI-compatible API gateway for HPC clusters via Globus Compute},
452
+ title = {hpc-as-api: HTTP gateway for any HPC function via Globus Compute and WebSocket relay},
343
453
  year = {2025},
344
454
  url = {https://github.com/uicacer/hpc-as-api}
345
455
  }
@@ -0,0 +1,267 @@
1
+ # hpc-as-api
2
+
3
+ [![PyPI](https://img.shields.io/pypi/v/hpc-as-api)](https://pypi.org/project/hpc-as-api/)
4
+ [![License](https://img.shields.io/badge/license-Apache%202.0-blue)](https://github.com/uicacer/hpc-as-api/blob/main/LICENSE)
5
+ [![Tests](https://github.com/uicacer/hpc-as-api/actions/workflows/tests.yml/badge.svg)](https://github.com/uicacer/hpc-as-api/actions)
6
+
7
+ **HTTP gateway for any HPC function — real-time streaming from any HPC workload.**
8
+
9
+ `hpc-as-api` turns any Python function running on an HPC cluster into a streaming HTTP endpoint. Register your function, define its input schema with Pydantic, and get a production-ready REST API with authentication, rate limiting, and live SSE streaming — no open ports, no VPN, no firewall changes on the HPC side.
10
+
11
+ ```python
12
+ from hpc_as_api.core import HPCApp
13
+ from pydantic import BaseModel
14
+
15
+ class SimRequest(BaseModel):
16
+ steps: int = 1000
17
+ grid_size: int = 100
18
+
19
+ def hpc_simulation(steps, grid_size, relay_url, channel_id, relay_secret=""):
20
+ from streamrelay import RelayProducer
21
+ with RelayProducer(relay_url, channel_id, relay_secret=relay_secret) as relay:
22
+ for i in range(steps):
23
+ result = run_timestep(i, grid_size)
24
+ relay.send_token(f"step={i} energy={result:.4f}\n")
25
+
26
+ app = HPCApp(endpoint_id="...", relay_url="wss://relay.example.com") \
27
+ .mount("/simulate", hpc_simulation, SimRequest) \
28
+ .create_app()
29
+ ```
30
+
31
+ Any output produced incrementally on the HPC side arrives in real time: simulation checkpoints, solver residuals, genome alignment progress, molecular dynamics snapshots, LLM tokens — anything.
32
+
33
+ ## Why
34
+
35
+ HPC clusters run workloads impossible on commodity hardware — 72B+ parameter models, climate simulations, molecular dynamics at scale. But they expose no standard API. Each cluster has its own SLURM scripts, SSH tunnels, authentication systems, and job submission conventions.
36
+
37
+ `hpc-as-api` provides a uniform HTTP interface over any HPC function using [Globus Compute](https://www.globus.org/compute) for authentication and job dispatch and [streamrelay](https://github.com/uicacer/streamrelay) for real-time output streaming. Callers send a POST request; the framework handles everything else.
38
+
39
+ ## Architecture
40
+
41
+ ![Relay architecture: the HPC compute node and gateway consumer both connect outbound to the WebSocket relay, traversing firewalls without VPN or inbound ports.](Relay_Architecture.png)
42
+
43
+ ```
44
+ Your Application / HTTP Client
45
+ │ POST /your-endpoint (any input schema)
46
+
47
+ hpc-as-api (FastAPI)
48
+ │ Globus Compute (AMQP — no HPC firewall holes)
49
+
50
+ HPC Cluster (SLURM / PBS / …)
51
+ │ your function runs; output flows via streamrelay
52
+
53
+ GPU / CPU Compute Node
54
+ │ tokens / results / checkpoints via WebSocket relay
55
+
56
+ hpc-as-api → SSE stream → Your Application
57
+ ```
58
+
59
+ Key design points:
60
+ - **No open ports on HPC**: Globus Compute is outbound-only from the cluster
61
+ - **Real-time streaming**: Any incremental output arrives as SSE via [streamrelay](https://github.com/uicacer/streamrelay)
62
+ - **E2E encryption**: Optional AES-256-GCM encryption — relay sees only ciphertext
63
+ - **Domain-agnostic**: Register any Python function; not limited to LLMs
64
+
65
+ ## Installation
66
+
67
+ ```bash
68
+ # Base package (no Globus SDK)
69
+ pip install hpc-as-api
70
+
71
+ # With Globus Compute support
72
+ pip install "hpc-as-api[globus]"
73
+ ```
74
+
75
+ ## Quickstart: Domain-agnostic gateway
76
+
77
+ Register any HPC function and stream its output:
78
+
79
+ ```python
80
+ from hpc_as_api.core import HPCApp
81
+ from pydantic import BaseModel
82
+
83
+ class RunRequest(BaseModel):
84
+ steps: int = 1000
85
+ param: float = 0.5
86
+
87
+ def my_hpc_function(steps, param, relay_url, channel_id, relay_secret=""):
88
+ from streamrelay import RelayProducer
89
+ with RelayProducer(relay_url, channel_id, relay_secret=relay_secret) as relay:
90
+ for i in range(steps):
91
+ relay.send_token(f"step={i} value={compute(i, param)}\n")
92
+
93
+ gateway = HPCApp(
94
+ endpoint_id="your-globus-endpoint-uuid",
95
+ relay_url="wss://relay.example.com",
96
+ relay_secret="your-relay-secret",
97
+ )
98
+ gateway.mount("/run", my_hpc_function, RunRequest)
99
+ app = gateway.create_app()
100
+ ```
101
+
102
+ Run with:
103
+ ```bash
104
+ uvicorn mymodule:app --host 0.0.0.0 --port 8001
105
+ ```
106
+
107
+ Clients stream the output in real time:
108
+ ```bash
109
+ curl -X POST http://localhost:8001/run \
110
+ -H "Authorization: Bearer <token>" \
111
+ -H "Content-Type: application/json" \
112
+ -d '{"steps": 500, "param": 0.7}'
113
+ ```
114
+
115
+ ## Built-in preset: OpenAI-compatible LLM gateway
116
+
117
+ For vLLM-served language models, the OpenAI preset provides a drop-in
118
+ `/v1/chat/completions` endpoint compatible with any OpenAI client:
119
+
120
+ ```python
121
+ from hpc_as_api.presets.openai import create_openai_app
122
+
123
+ app = create_openai_app(
124
+ endpoint_id="8d978809-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
125
+ models={
126
+ "qwen25-vl-72b": {
127
+ "hf_name": "Qwen/Qwen2.5-VL-72B-Instruct-AWQ",
128
+ "url": "http://ghi2-002:8000",
129
+ "context_reserve_output": 4096,
130
+ }
131
+ },
132
+ relay_url="wss://relay.example.com",
133
+ relay_secret="your-relay-secret",
134
+ )
135
+ ```
136
+
137
+ Or run as a service from environment variables:
138
+
139
+ ```bash
140
+ export GLOBUS_COMPUTE_ENDPOINT_ID="your-endpoint-uuid"
141
+ export HPC_MODELS='{"qwen25-vl-72b": {"hf_name": "Qwen/Qwen2.5-VL-72B-Instruct-AWQ", "url": "http://ghi2-002:8000", "context_reserve_output": 4096}}'
142
+ export RELAY_URL="wss://relay.example.com"
143
+ export RELAY_SECRET="your-relay-secret"
144
+
145
+ uvicorn hpc_as_api.app:app --host 0.0.0.0 --port 8001
146
+ ```
147
+
148
+ Any OpenAI client works without modification:
149
+ ```python
150
+ import openai
151
+ client = openai.OpenAI(base_url="http://localhost:8001/v1", api_key="sk-xxxx")
152
+ response = client.chat.completions.create(model="qwen25-vl-72b", messages=[...], stream=True)
153
+ ```
154
+
155
+ ## Multiple independent gateways
156
+
157
+ `make_app()` returns a fresh, independent instance each time — safe to use
158
+ multiple gateways with different configurations in the same process:
159
+
160
+ ```python
161
+ from hpc_as_api.app import make_app
162
+
163
+ sim_app = make_app(endpoint_id="endpoint-a", relay_url="wss://relay.example.com", models={...})
164
+ llm_app = make_app(endpoint_id="endpoint-b", relay_url="wss://relay.example.com", models={...})
165
+ ```
166
+
167
+ ## Embed in an existing FastAPI app
168
+
169
+ ```python
170
+ from fastapi import FastAPI
171
+ from hpc_as_api.app import router
172
+
173
+ app = FastAPI()
174
+ app.include_router(router, prefix="/hpc")
175
+ ```
176
+
177
+ ## Programmatic auth configuration
178
+
179
+ ```python
180
+ from hpc_as_api import AuthConfig
181
+ from hpc_as_api.core import HPCApp
182
+
183
+ gateway = HPCApp(
184
+ endpoint_id="...",
185
+ relay_url="wss://relay.example.com",
186
+ auth=AuthConfig(
187
+ globus_client_id="your-client-id",
188
+ globus_client_secret="your-client-secret",
189
+ allowed_domains=["university.edu"],
190
+ api_keys={"my-service": "sk-xxxx"},
191
+ rate_limit_requests=20,
192
+ rate_limit_window=60,
193
+ ),
194
+ )
195
+ ```
196
+
197
+ ## Configuration reference
198
+
199
+ ### HPCApp / make_app()
200
+
201
+ | Argument | Env var fallback | Description |
202
+ |---|---|---|
203
+ | `endpoint_id` | `GLOBUS_COMPUTE_ENDPOINT_ID` | Globus endpoint UUID for the HPC cluster |
204
+ | `relay_url` | `RELAY_URL` | WebSocket relay URL for streaming |
205
+ | `relay_secret` | `RELAY_SECRET` | Shared secret for relay auth |
206
+ | `relay_encryption_key` | `RELAY_ENCRYPTION_KEY` | AES-256 hex key for E2E encryption |
207
+ | `auth` | — | `AuthConfig` or `Authenticator` instance |
208
+
209
+ ### OpenAI preset additional settings
210
+
211
+ | Variable | Default | Description |
212
+ |---|---|---|
213
+ | `HPC_MODELS` | `{}` | JSON dict: model name → HPC config |
214
+ | `USE_GLOBUS_COMPUTE` | `true` | `false` to route directly via vLLM URL |
215
+ | `LAKESHORE_VLLM_ENDPOINT` | `http://localhost:8000` | Direct vLLM URL (non-Globus mode) |
216
+
217
+ ### HPC_MODELS schema (LLM preset)
218
+
219
+ ```json
220
+ {
221
+ "my-model-name": {
222
+ "hf_name": "org/ModelName",
223
+ "url": "http://compute-node:8000",
224
+ "context_reserve_output": 4096
225
+ }
226
+ }
227
+ ```
228
+
229
+ ## Authentication
230
+
231
+ Two auth modes, configurable via `AuthConfig` or environment variables:
232
+
233
+ - **Globus token**: Bearer token from Globus Auth, validated via introspection; email domain filtering supported
234
+ - **API key**: Static key from `HPC_API_KEYS` env var (comma-separated `name:key` pairs)
235
+
236
+ Both modes coexist on the same endpoint.
237
+
238
+ ## Development
239
+
240
+ ```bash
241
+ git clone https://github.com/uicacer/hpc-as-api
242
+ cd hpc-as-api
243
+ uv sync --extra dev
244
+ uv run pytest
245
+ ```
246
+
247
+ ## Related
248
+
249
+ - [streamrelay](https://github.com/uicacer/streamrelay) — WebSocket relay for real-time output streaming from Globus Compute
250
+ - [STREAM](https://github.com/uicacer/stream) — Full tiered LLM routing system that uses hpc-as-api
251
+
252
+ ## License
253
+
254
+ Apache 2.0 — see [LICENSE](LICENSE).
255
+
256
+ ## Citation
257
+
258
+ If you use hpc-as-api in research, please cite:
259
+
260
+ ```bibtex
261
+ @software{nassar2025hpcgateway,
262
+ author = {Nassar, Anas},
263
+ title = {hpc-as-api: HTTP gateway for any HPC function via Globus Compute and WebSocket relay},
264
+ year = {2025},
265
+ url = {https://github.com/uicacer/hpc-as-api}
266
+ }
267
+ ```