runinfra 0.1.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- runinfra-0.1.0/MANIFEST.in +8 -0
- runinfra-0.1.0/PKG-INFO +386 -0
- runinfra-0.1.0/README.md +362 -0
- runinfra-0.1.0/pyproject.toml +47 -0
- runinfra-0.1.0/runinfra/__init__.py +1388 -0
- runinfra-0.1.0/runinfra/py.typed +1 -0
- runinfra-0.1.0/runinfra.egg-info/PKG-INFO +386 -0
- runinfra-0.1.0/runinfra.egg-info/SOURCES.txt +9 -0
- runinfra-0.1.0/runinfra.egg-info/dependency_links.txt +1 -0
- runinfra-0.1.0/runinfra.egg-info/top_level.txt +1 -0
- runinfra-0.1.0/setup.cfg +4 -0
runinfra-0.1.0/PKG-INFO
ADDED
|
@@ -0,0 +1,386 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: runinfra
|
|
3
|
+
Version: 0.1.0
|
|
4
|
+
Summary: RunInfra SDK for optimized inference deployments
|
|
5
|
+
Author: RunInfra
|
|
6
|
+
License-Expression: LicenseRef-Proprietary
|
|
7
|
+
Project-URL: Documentation, https://runinfra.ai/docs/tools-sdks/runinfra-sdk
|
|
8
|
+
Project-URL: Homepage, https://runinfra.ai
|
|
9
|
+
Project-URL: Issues, https://github.com/RightNow-AI/RunPipe/issues
|
|
10
|
+
Keywords: runinfra,inference,openai,responses,embeddings,tts,asr,image-generation
|
|
11
|
+
Classifier: Development Status :: 5 - Production/Stable
|
|
12
|
+
Classifier: Intended Audience :: Developers
|
|
13
|
+
Classifier: Programming Language :: Python :: 3
|
|
14
|
+
Classifier: Programming Language :: Python :: 3 :: Only
|
|
15
|
+
Classifier: Programming Language :: Python :: 3.9
|
|
16
|
+
Classifier: Programming Language :: Python :: 3.10
|
|
17
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
18
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
19
|
+
Classifier: Programming Language :: Python :: 3.13
|
|
20
|
+
Classifier: Programming Language :: Python :: 3.14
|
|
21
|
+
Classifier: Typing :: Typed
|
|
22
|
+
Requires-Python: >=3.9
|
|
23
|
+
Description-Content-Type: text/markdown
|
|
24
|
+
|
|
25
|
+
# RunInfra Python SDK
|
|
26
|
+
|
|
27
|
+
Access optimized RunInfra deployments through the verified public gateway.
|
|
28
|
+
|
|
29
|
+
Requires Python 3.9 or newer.
|
|
30
|
+
|
|
31
|
+
## Install
|
|
32
|
+
|
|
33
|
+
```bash
|
|
34
|
+
pip install runinfra
|
|
35
|
+
```
|
|
36
|
+
|
|
37
|
+
## Create a client
|
|
38
|
+
|
|
39
|
+
Use a workspace-scoped key to reach verified active deployments through the `model` field.
|
|
40
|
+
In RunPipe, open Settings, API Keys, Create key, and keep Scope set to Workspace.
|
|
41
|
+
|
|
42
|
+
The Deploy tab can create a pipeline-scoped key for one optimized pipeline.
|
|
43
|
+
The one-time secret is shown once after creation. Store it as `RUNINFRA_API_KEY`
|
|
44
|
+
for app snippets before leaving the page. For repo live canaries, keep the
|
|
45
|
+
workspace key in `RUNINFRA_API_KEY` and put the pipeline-scoped key for
|
|
46
|
+
`TEST_PIPELINE_ID` in `RUNINFRA_PIPELINE_API_KEY` so flat and pipeline routes
|
|
47
|
+
are verified independently.
|
|
48
|
+
|
|
49
|
+
After a runbook finishes in RunPipe, choose Open Deploy from the runbook handoff.
|
|
50
|
+
Deploy only shows SDK operations that the verified endpoint supports, so copy
|
|
51
|
+
the native or OpenAI-compatible snippet from there instead of guessing a route.
|
|
52
|
+
|
|
53
|
+
```python
|
|
54
|
+
import os
|
|
55
|
+
from runinfra import RunInfra
|
|
56
|
+
|
|
57
|
+
api_key = os.environ.get("RUNINFRA_API_KEY")
|
|
58
|
+
if not api_key:
|
|
59
|
+
raise RuntimeError("Set RUNINFRA_API_KEY before running this snippet.")
|
|
60
|
+
|
|
61
|
+
client = RunInfra(api_key=api_key)
|
|
62
|
+
```
|
|
63
|
+
|
|
64
|
+
Use `pipeline_id` when the key or integration should be locked to one optimized pipeline.
|
|
65
|
+
|
|
66
|
+
```python
|
|
67
|
+
api_key = os.environ.get("RUNINFRA_API_KEY")
|
|
68
|
+
if not api_key:
|
|
69
|
+
raise RuntimeError("Set RUNINFRA_API_KEY before running this snippet.")
|
|
70
|
+
|
|
71
|
+
client = RunInfra(
|
|
72
|
+
api_key=api_key,
|
|
73
|
+
pipeline_id="pipe_123",
|
|
74
|
+
)
|
|
75
|
+
```
|
|
76
|
+
|
|
77
|
+
The default base URL is `https://api.runinfra.ai/v1`.
|
|
78
|
+
`pipeline_id` is stripped and URL-encoded before it is added to the base URL. Use either `pipeline_id` with the default base URL, or a pipeline-scoped `base_url` such as `https://api.runinfra.ai/v1/pipe_123`. If both point to the same pipeline, the SDK keeps the URL scoped once.
|
|
79
|
+
RunPipe generated native SDK snippets prefer `pipeline_id` with the root `https://api.runinfra.ai/v1` base URL. OpenAI-compatible snippets use the pipeline-scoped base URL because the OpenAI SDK has no RunInfra pipeline option.
|
|
80
|
+
Custom base URLs must use `http` or `https`. Other schemes and malformed URLs are rejected before a bearer API key can be sent.
|
|
81
|
+
Remote custom base URLs must use `https`. Plain `http` is accepted only for local development hosts: `localhost`, `127.0.0.1`, `0.0.0.0`, and `[::1]`.
|
|
82
|
+
Custom base URLs must not include usernames or passwords.
|
|
83
|
+
Custom base URLs must not include query strings or fragments.
|
|
84
|
+
|
|
85
|
+
## Responses and streaming
|
|
86
|
+
|
|
87
|
+
```python
|
|
88
|
+
stream = client.responses.create(
|
|
89
|
+
model="llama-3.1-8b",
|
|
90
|
+
input="Hello",
|
|
91
|
+
max_output_tokens=512,
|
|
92
|
+
stream=True,
|
|
93
|
+
)
|
|
94
|
+
|
|
95
|
+
for event in stream:
|
|
96
|
+
if event.get("type") == "response.output_text.delta":
|
|
97
|
+
print(event.get("delta", ""), end="")
|
|
98
|
+
|
|
99
|
+
print(stream.request_id)
|
|
100
|
+
```
|
|
101
|
+
|
|
102
|
+
## Supported public routes
|
|
103
|
+
|
|
104
|
+
- `models.list()`
|
|
105
|
+
- `models.retrieve(model)`
|
|
106
|
+
- `responses.create()`
|
|
107
|
+
- `chat.completions.create()`
|
|
108
|
+
- `embeddings.create()`
|
|
109
|
+
- `audio.speech.create()`
|
|
110
|
+
- `audio.transcriptions.create()`
|
|
111
|
+
- `images.generate()`
|
|
112
|
+
- `voice.pipeline.create()`
|
|
113
|
+
|
|
114
|
+
## Text to speech
|
|
115
|
+
|
|
116
|
+
TTS deployments can expose named voices or Base/reference-audio voice cloning.
|
|
117
|
+
Use `RUNINFRA_TTS_VOICE` when the deployment lists a voice or speaker. Use
|
|
118
|
+
`RUNINFRA_TTS_REF_AUDIO` and `RUNINFRA_TTS_REF_TEXT` when the deployment expects
|
|
119
|
+
reference-audio input.
|
|
120
|
+
|
|
121
|
+
```python
|
|
122
|
+
voice = os.environ.get("RUNINFRA_TTS_VOICE", "").strip()
|
|
123
|
+
ref_audio = os.environ.get("RUNINFRA_TTS_REF_AUDIO", "").strip()
|
|
124
|
+
ref_text = os.environ.get("RUNINFRA_TTS_REF_TEXT", "").strip()
|
|
125
|
+
|
|
126
|
+
if voice:
|
|
127
|
+
speech_voice = {"voice": voice}
|
|
128
|
+
elif ref_audio and ref_text:
|
|
129
|
+
speech_voice = {
|
|
130
|
+
"ref_audio": ref_audio,
|
|
131
|
+
"ref_text": ref_text,
|
|
132
|
+
"task_type": os.environ.get("RUNINFRA_TTS_TASK_TYPE", "Base").strip() or "Base",
|
|
133
|
+
}
|
|
134
|
+
else:
|
|
135
|
+
raise RuntimeError("Set RUNINFRA_TTS_VOICE, or RUNINFRA_TTS_REF_AUDIO and RUNINFRA_TTS_REF_TEXT.")
|
|
136
|
+
|
|
137
|
+
audio = client.audio.speech.create(
|
|
138
|
+
model="your-tts-model-id",
|
|
139
|
+
input="Hello from your optimized RunInfra endpoint.",
|
|
140
|
+
**speech_voice,
|
|
141
|
+
)
|
|
142
|
+
```
|
|
143
|
+
|
|
144
|
+
## Timeouts and retries
|
|
145
|
+
|
|
146
|
+
```python
|
|
147
|
+
import os
|
|
148
|
+
|
|
149
|
+
api_key = os.environ.get("RUNINFRA_API_KEY")
|
|
150
|
+
if not api_key:
|
|
151
|
+
raise RuntimeError("Set RUNINFRA_API_KEY before running this snippet.")
|
|
152
|
+
|
|
153
|
+
client = RunInfra(
|
|
154
|
+
api_key=api_key,
|
|
155
|
+
timeout_seconds=60,
|
|
156
|
+
max_retries=2,
|
|
157
|
+
retry_base_seconds=0.25,
|
|
158
|
+
)
|
|
159
|
+
```
|
|
160
|
+
|
|
161
|
+
The SDK retries transient transport failures and `408`, `409`, `429`, `500`, `502`, `503`, and `504` responses for safe `GET` requests. Charge-bearing `POST` inference requests retry only when you provide `idempotency_key`, and automatic POST retries are limited to non-streaming JSON calls whose gateway responses can be replayed safely. That covers `responses.create()`, non-streaming `chat.completions.create()`, `embeddings.create()`, and `images.generate()`. Streaming calls, binary TTS responses, and multipart ASR uploads are sent once even when you provide an idempotency key. The gateway still binds idempotency keys for TTS and ASR, so a manual retry with the same key will not run or charge a second inference after the first request settles. Automatic retries honor reasonable `Retry-After` values up to 60 seconds when the header is a plain integer second value or HTTP-date, then fall back to bounded exponential backoff. The SDK does not retry authentication errors, insufficient credits, or unsupported operations.
|
|
162
|
+
|
|
163
|
+
If the gateway successfully finishes a request but the response body is too large to replay from the idempotency cache, later calls with the same `idempotency_key` return `idempotency_replay_unavailable` without running or charging the inference again.
|
|
164
|
+
|
|
165
|
+
`timeout_seconds` must be positive, `max_retries` must be a non-negative integer, and `retry_base_seconds` must be non-negative. Unknown per-request option keys are rejected so typos do not silently disable idempotency, tracing, timeout, or retry behavior. Python request option aliases cannot be mixed; choose either snake_case or camelCase for a given option. Invalid values raise `RunInfraError` with `type == "invalid_request_options"` before any network request is sent.
|
|
166
|
+
|
|
167
|
+
## Request validation
|
|
168
|
+
|
|
169
|
+
Required request fields are validated before any network request is sent. The model must be a non-blank string, chat messages must be a non-empty array, each chat message must be an object with a non-empty role, Responses input must be a non-empty string or array, Responses input array items must be objects, JSON request bodies must be serializable and contain only finite numbers, embedding input must be a non-empty string or array of non-empty strings, TTS input and image prompts must be non-empty strings, and ASR file must be bytes or bytearray. ASR multipart filenames, content types, and extra form field names and values are validated before the multipart body is built. Invalid request values raise `RunInfraError` with `type == "invalid_request_options"` and do not reach the gateway or billing path.
|
|
170
|
+
|
|
171
|
+
Use per-request options when a call needs a shorter timeout, a trace ID, or a retry-safe idempotency key.
|
|
172
|
+
Custom headers are for app metadata only. They cannot override SDK-controlled headers such as `Authorization`, `Content-Type`, `X-Client-Request-Id`, `Idempotency-Key`, `X-RunInfra-SDK`, or `X-RunInfra-SDK-Version`, and they cannot set transport or credential headers such as `Host`, `Cookie`, `Content-Length`, `Transfer-Encoding`, `Connection`, `Proxy-Authorization`, `Api-Key`, `X-API-Key`, `X-Auth-Token`, or `X-Access-Token`.
|
|
173
|
+
|
|
174
|
+
```python
|
|
175
|
+
import uuid
|
|
176
|
+
|
|
177
|
+
client.responses.create(
|
|
178
|
+
model="llama-3.1-8b",
|
|
179
|
+
input="Summarize this incident.",
|
|
180
|
+
request_options={
|
|
181
|
+
"client_request_id": str(uuid.uuid4()),
|
|
182
|
+
"idempotency_key": str(uuid.uuid4()),
|
|
183
|
+
"timeout_seconds": 20,
|
|
184
|
+
"max_retries": 0,
|
|
185
|
+
},
|
|
186
|
+
)
|
|
187
|
+
```
|
|
188
|
+
|
|
189
|
+
## Typed errors
|
|
190
|
+
|
|
191
|
+
The SDK exposes `AuthenticationError`, `PermissionDeniedError`, `RateLimitError`, `InsufficientCreditsError`, `DeploymentError`, `ModelNotFoundError`, `RunInfraTimeoutError`, `RunInfraConnectionError`, `RunInfraStreamParseError`, and `UnsupportedOperationError`.
|
|
192
|
+
`RateLimitError` includes `retry_after_seconds` when the gateway returns `Retry-After`.
|
|
193
|
+
`RunInfraStreamParseError` includes `request_id` when a malformed SSE frame came from a traced gateway response.
|
|
194
|
+
`RunInfraTimeoutError` also covers stalled streaming reads and default non-streaming body reads after headers arrive, and includes `request_id` when the response was traced.
|
|
195
|
+
`RunInfraConnectionError` also covers streaming body transport failures and default non-streaming body transport failures after headers arrive, and includes `request_id` when the response was traced.
|
|
196
|
+
|
|
197
|
+
## Traceability and typing
|
|
198
|
+
|
|
199
|
+
Every request includes `X-RunInfra-SDK: python`, `X-RunInfra-SDK-Version`, and `X-Client-Request-Id`. These headers help support trace requests without changing billing or routing.
|
|
200
|
+
|
|
201
|
+
When `idempotency_key` is provided, the SDK sends it as `Idempotency-Key`. Use a unique value for each logical retry-safe operation. Idempotency keys must be non-blank, ASCII, 255 characters or less, and must not contain secrets or personal data.
|
|
202
|
+
|
|
203
|
+
Successful JSON object responses include `_request_id` when the gateway returns `x-request-id`. Streaming responses expose the same value as `stream.request_id`, malformed stream frames raise `RunInfraStreamParseError` with that request id, and binary audio responses expose it as `audio.request_id`. Log that value with production errors and customer support reports.
|
|
204
|
+
|
|
205
|
+
The wheel ships `py.typed` so type checkers can inspect the package. Fixed-shape helpers expose `TypedDict` response contracts: `ModelListResponse`, `ModelObject`, `ResponsesCreateResponse`, `ChatCompletionResponse`, `EmbeddingResponse`, `TranscriptionResponse`, and `ImageGenerationResponse`. Stream-capable helpers are typed as either the JSON response contract or `RunInfraStream` when `stream=True`.
|
|
206
|
+
|
|
207
|
+
## Webhook verification
|
|
208
|
+
|
|
209
|
+
Public webhook delivery routes are not shipped yet, but the SDK includes local verification helpers for signed RunInfra webhook deliveries once you receive them in your own server. Always verify the exact raw body before parsing JSON. The `RunInfra-Signature` timestamp must be a non-negative integer Unix second.
|
|
210
|
+
|
|
211
|
+
```python
|
|
212
|
+
import os
|
|
213
|
+
|
|
214
|
+
from runinfra import (
|
|
215
|
+
WebhookVerificationError,
|
|
216
|
+
construct_webhook_event,
|
|
217
|
+
verify_webhook_signature,
|
|
218
|
+
)
|
|
219
|
+
|
|
220
|
+
webhook_secret = os.environ.get("RUNINFRA_WEBHOOK_SECRET")
|
|
221
|
+
if not webhook_secret or not webhook_secret.strip():
|
|
222
|
+
raise RuntimeError("Set RUNINFRA_WEBHOOK_SECRET before verifying webhook events.")
|
|
223
|
+
|
|
224
|
+
event = construct_webhook_event(
|
|
225
|
+
payload=raw_body,
|
|
226
|
+
signature_header=signature_header,
|
|
227
|
+
secret=webhook_secret,
|
|
228
|
+
)
|
|
229
|
+
```
|
|
230
|
+
|
|
231
|
+
`construct_webhook_event` verifies the signature, checks timestamp tolerance, and parses JSON. Use `verify_webhook_signature` when your framework parses JSON separately and you only need to validate the raw delivery. Invalid signatures, stale timestamps, and invalid webhook JSON raise `WebhookVerificationError`.
|
|
232
|
+
|
|
233
|
+
## OpenAI-compatible clients
|
|
234
|
+
|
|
235
|
+
OpenAI-compatible clients can use the same verified base URL:
|
|
236
|
+
|
|
237
|
+
```python
|
|
238
|
+
import os
|
|
239
|
+
from openai import OpenAI
|
|
240
|
+
|
|
241
|
+
api_key = os.environ.get("RUNINFRA_API_KEY")
|
|
242
|
+
if not api_key:
|
|
243
|
+
raise RuntimeError("Set RUNINFRA_API_KEY before running this snippet.")
|
|
244
|
+
|
|
245
|
+
client = OpenAI(
|
|
246
|
+
api_key=api_key,
|
|
247
|
+
base_url="https://api.runinfra.ai/v1/pipe_123",
|
|
248
|
+
)
|
|
249
|
+
```
|
|
250
|
+
|
|
251
|
+
## Production promotion
|
|
252
|
+
|
|
253
|
+
Local package tests prove SDK shape, retry behavior, streaming parsing, and
|
|
254
|
+
typed errors. They do not prove that a newly optimized deployment is ready for
|
|
255
|
+
customers. For production promotion, run the strict live SDK canary gate from
|
|
256
|
+
the RunPipe repo against the same base URL and API key you plan to expose. The gate starts with `test:sdk-live-api-key`, which verifies the plaintext key hashes to an active workspace-scoped `api_keys` row before any paid inference canary runs:
|
|
257
|
+
|
|
258
|
+
```bash
|
|
259
|
+
pnpm verify:sdk-release
|
|
260
|
+
pnpm test:sdk-canary:live -- --print-env-template
|
|
261
|
+
pnpm test:sdk-canary:live -- --env-file .env.sdk-live.local --print-env-status
|
|
262
|
+
pnpm sync:sdk-live-env -- --source .env.local --target .env.sdk-live.local
|
|
263
|
+
pnpm discover:sdk-live-targets -- --env-file .env.sdk-live.local --probe-inference --report artifacts/sdk/live-targets-discovery.json
|
|
264
|
+
pnpm bootstrap:sdk-live-key -- --env-file .env.sdk-live.local
|
|
265
|
+
pnpm discover:sdk-live-targets -- --env-file .env.sdk-live.local --probe-inference --report artifacts/sdk/live-targets-discovery.json
|
|
266
|
+
pnpm prepare:sdk-live-env -- --discovery-report artifacts/sdk/live-targets-discovery.json --output .env.sdk-live.local
|
|
267
|
+
pnpm discover:sdk-live-targets -- --env-file .env.sdk-live.local --probe-inference --report artifacts/sdk/live-targets-discovery.json
|
|
268
|
+
pnpm verify:sdk-live-targets -- --env-file .env.sdk-live.local --require-available --discovery-report artifacts/sdk/live-targets-discovery.json
|
|
269
|
+
pnpm test:sdk-canary:live -- --env-file .env.sdk-live.local --check-env-only
|
|
270
|
+
pnpm test:sdk-canary:live -- --env-file .env.sdk-live.local --discovery-report artifacts/sdk/live-targets-discovery.json --report artifacts/sdk/live-canary.json
|
|
271
|
+
pnpm verify:sdk-live-report -- artifacts/sdk/live-canary.json
|
|
272
|
+
pnpm test:sdk-canary -- --env-file .env.sdk-live.local --report artifacts/sdk/native-focused-smoke.json
|
|
273
|
+
pnpm test:openai-compat -- --env-file .env.sdk-live.local --report artifacts/sdk/openai-focused-smoke.json
|
|
274
|
+
pnpm verify:sdk-goal -- --release-report artifacts/sdk/release-verification.json --live-report artifacts/sdk/live-canary.json --live-targets-report artifacts/sdk/live-targets-discovery.json --env-check-report artifacts/sdk/live-canary-env-check.json --focused-smoke-report artifacts/sdk/native-focused-smoke.json --openai-focused-smoke-report artifacts/sdk/openai-focused-smoke.json --report artifacts/sdk/goal-readiness.json
|
|
275
|
+
pnpm verify:sdk-publish -- --release-report artifacts/sdk/release-verification.json --goal-report artifacts/sdk/goal-readiness.json --live-report artifacts/sdk/live-canary.json --live-targets-report artifacts/sdk/live-targets-discovery.json --env-check-report artifacts/sdk/live-canary-env-check.json --focused-smoke-report artifacts/sdk/native-focused-smoke.json --openai-focused-smoke-report artifacts/sdk/openai-focused-smoke.json --report artifacts/sdk/publish-readiness.json
|
|
276
|
+
```
|
|
277
|
+
|
|
278
|
+
Save the printed template as `.env.sdk-live.local`; it is ignored by git and
|
|
279
|
+
should contain the real production gateway, workspace-scoped API key, database
|
|
280
|
+
URL, pipeline-scoped API key for the optimized LLM pipeline, `RUNPOD_API_KEY`,
|
|
281
|
+
deployed model IDs, the optimized LLM `TEST_PIPELINE_ID`, TTS proof inputs, and
|
|
282
|
+
ASR clip path. `RUNPOD_API_KEY` is
|
|
283
|
+
used only by discovery to prove checked RunPod endpoint inventory. A checked
|
|
284
|
+
inventory needs endpointCount greater than zero, and `endpointCount: 0` blocks
|
|
285
|
+
promotion even if old database rows still mention active deployments. Discovery
|
|
286
|
+
also blocks any selected target with `endpoint_not_in_runpod_inventory` and
|
|
287
|
+
emits `runpod_endpoint_inventory_empty` when RunPod returns no endpoints. For
|
|
288
|
+
operator handoffs, set optional `RUNPOD_EXPECTED_ENDPOINT_IDS` to a
|
|
289
|
+
comma-separated list of endpoint IDs. Discovery compares those IDs against the
|
|
290
|
+
same checked inventory and reports only redacted verified or missing endpoint
|
|
291
|
+
IDs, so a wrong RunPod account/scope becomes an explicit blocker.
|
|
292
|
+
For TTS, set either `TEST_TTS_VOICE` or both `TEST_TTS_REF_AUDIO` and
|
|
293
|
+
`TEST_TTS_REF_TEXT` for Base/voice-cloning models.
|
|
294
|
+
`sync:sdk-live-env` copies `RUNPOD_API_KEY` from the source env file when it is
|
|
295
|
+
present. If you keep the RunPod key only in the shell, discovery uses that
|
|
296
|
+
process value instead of the generated placeholder in `.env.sdk-live.local`.
|
|
297
|
+
Use `--print-env-status` before running the canary to see missing, placeholder,
|
|
298
|
+
or invalid fields without printing API keys, database URLs, or file paths. Use `pnpm sync:sdk-live-env -- --source .env.local --target .env.sdk-live.local` to copy whitelisted local values such as `DATABASE_URL` without printing secrets, then use `pnpm discover:sdk-live-targets -- --env-file .env.sdk-live.local --probe-inference --report artifacts/sdk/live-targets-discovery.json` to inspect
|
|
299
|
+
which `active_verified` deployments can satisfy strict modality coverage without
|
|
300
|
+
printing API keys, key hashes, or database credentials. Deployments that are
|
|
301
|
+
close but not promotable appear under redacted `nearEligibleTargets` with
|
|
302
|
+
reasons such as `status_not_active_verified`, `missing_inference_url`, or
|
|
303
|
+
`missing_endpoint_id`. The discovery report also includes `nextActions`, so
|
|
304
|
+
deployment and SDK agents can follow safe commands without scraping error text.
|
|
305
|
+
A skipped probe is diagnostic only. It means discovery intentionally avoided a
|
|
306
|
+
live inference call after an earlier eligibility failure; only `passed` probes can promote `targets_available`.
|
|
307
|
+
After discovery reports eligible `active_verified` targets, `pnpm bootstrap:sdk-live-key -- --env-file .env.sdk-live.local` can create a workspace-scoped key for that workspace, store only its hash in the database, and write the plaintext once into the ignored live env file without printing it. Rerun discovery after bootstrap so the report proves the selected workspace now has an active workspace-scoped key.
|
|
308
|
+
When discovery is complete, use
|
|
309
|
+
`pnpm prepare:sdk-live-env -- --discovery-report artifacts/sdk/live-targets-discovery.json --output .env.sdk-live.local`
|
|
310
|
+
to fill the deployment model IDs and the optimized LLM `TEST_PIPELINE_ID`, then rerun `discover:sdk-live-targets` against the prepared env file.
|
|
311
|
+
`prepare:sdk-live-env` cannot recover a one-time plaintext pipeline secret. Set
|
|
312
|
+
`RUNINFRA_PIPELINE_API_KEY` from the Deploy tab for `TEST_PIPELINE_ID` before
|
|
313
|
+
strict live canaries, while keeping `RUNINFRA_API_KEY` workspace-scoped for
|
|
314
|
+
billing and flat-route verification.
|
|
315
|
+
Before running live canaries, run
|
|
316
|
+
`pnpm verify:sdk-live-targets -- --env-file .env.sdk-live.local --require-available --discovery-report artifacts/sdk/live-targets-discovery.json`
|
|
317
|
+
against the prepared-env discovery report to validate that it is redacted, same-workspace, uses exact live env values, and only promotes callable `active_verified` targets.
|
|
318
|
+
If the output file already has `RUNINFRA_API_KEY`, `RUNINFRA_PIPELINE_API_KEY`,
|
|
319
|
+
`DATABASE_URL`, `TEST_ASR_FILE`, or local TTS reference inputs, the helper
|
|
320
|
+
preserves them and does not print them.
|
|
321
|
+
|
|
322
|
+
That gate must cover LLM, embeddings, image, TTS, and ASR endpoints before the
|
|
323
|
+
deployment is treated as production verified. Those strict modality targets must
|
|
324
|
+
be distinct deployed model IDs in the same workspace, because the promotion
|
|
325
|
+
canary uses one workspace-scoped key and then proves billing for every reported
|
|
326
|
+
model. The generated live report also records the SDK version and source digest,
|
|
327
|
+
so stale canaries cannot promote a newer SDK build.
|
|
328
|
+
Focused `pnpm test:sdk-canary -- --report ...` smoke reports also record the
|
|
329
|
+
same SDK / Docs / Engine source digest and stay redacted. The raw
|
|
330
|
+
OpenAI-compatible focused smoke writes `artifacts/sdk/openai-focused-smoke.json`
|
|
331
|
+
and the native SDK smoke writes `artifacts/sdk/native-focused-smoke.json`.
|
|
332
|
+
`verify:sdk-goal` rejects either focused smoke report when it was generated from older source,
|
|
333
|
+
so focused LLM debugging evidence cannot be reused as fresh promotion evidence.
|
|
334
|
+
Each canary result also records redacted `checks` for the required proof checks
|
|
335
|
+
it emitted. If a canary exits successfully but misses a required proof line, the
|
|
336
|
+
strict report records `missingChecks` and stays blocked. The runner only counts
|
|
337
|
+
a proof line when the child canary prints `[ OK ] <required check>`; `[FAIL]` lines do not satisfy promotion evidence. The proof set covers model discovery
|
|
338
|
+
and retrieval, LLM responses and streaming,
|
|
339
|
+
pipeline-scoped native SDK responses, OpenAI-compatible pipeline-scoped `/v1/responses`, embeddings vectors, image data, TTS audio bytes, ASR transcription text,
|
|
340
|
+
OpenAI-compatible auth and error paths, native SDK typed `AuthenticationError`
|
|
341
|
+
mapping, request ID propagation, unsupported webhook guards, API key scope, and
|
|
342
|
+
billing usage verification. OpenAI-compatible security checks also prove request tracing, HSTS, `nosniff`, path traversal blocking, invalid model 404, missing model 400, and auth failures before publish promotion can pass.
|
|
343
|
+
The live-target gate also requires checked RunPod endpoint inventory before
|
|
344
|
+
promotion. `selectedTargets.*.runpodEndpointVerified` must be true for every
|
|
345
|
+
strict modality.
|
|
346
|
+
|
|
347
|
+
Use registry trusted publishing first. Do not provide NPM or PyPI publish tokens.
|
|
348
|
+
Registry tokens are not used; publish through GitHub trusted publishing only. If
|
|
349
|
+
trusted publishing is unavailable, do not publish until the registry identity is fixed.
|
|
350
|
+
The publish-readiness report ties the local release verification,
|
|
351
|
+
goal-readiness report, live-target discovery report, and strict live canary
|
|
352
|
+
report, plus both focused smoke reports, to the same source digest.
|
|
353
|
+
PyPI publishing should use the same verified release flow; do not upload the
|
|
354
|
+
wheel or sdist from a local shell until the publish-readiness report passes.
|
|
355
|
+
Use `pnpm prepare:sdk-publish` to build the npm package, Python wheel, and
|
|
356
|
+
Python sdist only after publish readiness passes; the command writes a
|
|
357
|
+
source-digest-tied manifest with artifact SHA-256 hashes, byte counts, and
|
|
358
|
+
checksummed release / goal / live-target / live-canary proof reports at
|
|
359
|
+
`artifacts/sdk/publish/publish-artifacts.json`.
|
|
360
|
+
Use `pnpm publish:sdk -- --dry-run` to validate that manifest and print the npm
|
|
361
|
+
trusted-publishing and PyPI action handoff without sending packages. The guarded publish wrapper
|
|
362
|
+
refuses `--execute` outside CI, validates artifact checksums again, and supports
|
|
363
|
+
npm trusted publishing through GitHub OIDC on the Node 22.14.0 publish workflow;
|
|
364
|
+
PyPI should be uploaded by `pypa/gh-action-pypi-publish` in the `SDK Publish`
|
|
365
|
+
workflow.
|
|
366
|
+
GitHub Actions has two SDK-only workflows for the same path. `SDK Release Gate`
|
|
367
|
+
runs `pnpm verify:sdk-release` and requires `RUNINFRA_SDK_CI_TOKEN` with
|
|
368
|
+
read-only access to the docs and Engine contract repositories. `SDK Publish` is
|
|
369
|
+
manual only, defaults to `publish: false`, runs `verify:sdk-release`, creates
|
|
370
|
+
the strict live SDK env from GitHub secrets, discovers `active_verified` targets,
|
|
371
|
+
prepares the modality env, runs strict live canaries, verifies goal and publish
|
|
372
|
+
readiness, prepares artifacts, runs `publish:sdk -- --dry-run`, then uploads the
|
|
373
|
+
proof reports and package artifacts. The verification job has no registry OIDC permission;
|
|
374
|
+
the npm and PyPI publish jobs download the checked artifacts and are the only
|
|
375
|
+
jobs with registry trusted publishing identity. The npm publish job uses GitHub environment `npm`;
|
|
376
|
+
the PyPI publish job revalidates the publish manifest before upload and uses GitHub environment `pypi`. The live proof secrets are
|
|
377
|
+
`RUNINFRA_SDK_LIVE_API_KEY`, `RUNINFRA_SDK_LIVE_DATABASE_URL`, `RUNPOD_API_KEY`, and
|
|
378
|
+
`RUNINFRA_SDK_LIVE_ASR_FILE_BASE64`; voice-less Base TTS canaries can also set
|
|
379
|
+
`RUNINFRA_SDK_LIVE_TTS_REF_AUDIO` and `RUNINFRA_SDK_LIVE_TTS_REF_TEXT`. Only
|
|
380
|
+
after those gates pass should a maintainer rerun it with `publish: true`; npm
|
|
381
|
+
and PyPI must use trusted publishers.
|
|
382
|
+
`pnpm verify:sdk-release` also runs SDK secret hygiene and fails if non-test
|
|
383
|
+
handoff docs or release files contain full-shaped RunInfra keys, service tokens,
|
|
384
|
+
or package publish tokens.
|
|
385
|
+
|
|
386
|
+
Co-located voice pipelines are available through the native `client.voice.pipeline.create()` helper on pipeline-scoped keys. The helper posts binary audio to the verified `/pipeline` route and returns the JSON transcript / response envelope. Public webhook create/list calls are intentionally unavailable until their gateway routes are verified, and the SDK raises `UnsupportedOperationError` locally for those webhook capabilities without making a request.
|