npm - @aexhq/sdk - Versions diffs - 0.21.0 → 0.22.1 - Mend

@aexhq/sdk 0.21.0 → 0.22.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (26) hide show

package/dist/_contracts/index.d.ts +1 -0
package/dist/_contracts/index.js +1 -0
package/dist/_contracts/run-config.d.ts +23 -0
package/dist/_contracts/run-config.js +44 -2
package/dist/_contracts/run-custody.js +24 -1
package/dist/_contracts/run-trace.d.ts +105 -0
package/dist/_contracts/run-trace.js +174 -0
package/dist/_contracts/runtime-manifest.d.ts +24 -0
package/dist/_contracts/runtime-manifest.js +16 -1
package/dist/_contracts/runtime-types.d.ts +17 -0
package/dist/_contracts/submission.js +2 -3
package/dist/cli.mjs +39 -3
package/dist/cli.mjs.sha256 +1 -1
package/dist/client.js +6 -12
package/dist/client.js.map +1 -1
package/dist/file.d.ts +25 -10
package/dist/file.js +87 -26
package/dist/file.js.map +1 -1
package/dist/index.d.ts +2 -0
package/dist/index.js +4 -0
package/dist/index.js.map +1 -1
package/dist/version.d.ts +1 -1
package/dist/version.js +1 -1
package/docs/quickstart.md +18 -1
package/docs/vision-skills.md +159 -0
package/package.json +2 -2

package/docs/vision-skills.md ADDED Viewed

@@ -0,0 +1,159 @@
+---
+title: Call a vision (or any model) API from a skill
+---
+# Call a vision (or any model) API from a skill
+aex has no built-in vision tool. The agent's `provider`/`model` selects the
+*reasoning* model — it is not an endpoint a skill can POST an image to mid-run.
+To give a run image understanding (or to call any other model/HTTP API), ship a
+**skill** that POSTs to the provider's OpenAI-compatible endpoint **through the
+managed proxy**, with the key supplied via `secrets.proxyEndpointAuth`. The raw
+key never enters the container.
+This is the same proxy described in `credentials.md` — this page is the worked
+recipe for the model-API case, which has two wrinkles a plain JSON call does not:
+the image rides as a **base64 data URL** in the request body, and that body is
+large enough to need a raised `maxRequestBytes`.
+The canonical, runnable example lives in the repo at
+[`examples/vision-skill/`](../../../examples/vision-skill) (`SKILL.md`,
+`caption_frame.py`, `verify_frame.py`, `submit_with_vision_skill.mjs`). It
+captions a frame with ByteDance Doubao Seed Vision (Ark) and returns a per-noun
+"does the frame depict X?" verdict. Everything below is taken from it.
+## 1. Declare the model endpoint as a proxy endpoint
+The vision provider's API is just an HTTPS host. Declare it as a `bearer` proxy
+endpoint and supply the key in `secrets.proxyEndpointAuth`. The two model-specific
+settings are `responseMode: "full"` (so the skill gets the upstream JSON back) and
+a raised `maxRequestBytes` (so the base64 image fits):
+```ts
+import { AgentExecutor, RunModels, Skill, ProxyEndpoint, validateProxyAuth } from "@aexhq/sdk";
+const aex = new AgentExecutor({ apiToken: process.env.AEX_WORKSPACE_TOKEN! });
+const proxyEndpoints = [
+  ProxyEndpoint.bearer({
+    name: "doubao-ark",
+    baseUrl: "https://ark.ap-southeast.bytepluses.com", // intl BytePlus gateway
+    allowMethods: ["POST"],
+    allowPathPrefixes: ["/api/v3/chat/completions"],
+    maxRequestBytes: 2_000_000, // base64 image POSTs — see note below
+    responseMode: "full",
+    timeoutMs: 60_000
+  })
+];
+const proxyEndpointAuth = [
+  { name: "doubao-ark", value: { type: "bearer", token: process.env.DOUBAO_API_KEY! } }
+];
+validateProxyAuth(proxyEndpoints, proxyEndpointAuth); // fail fast at submit time
+const runId = await aex.submit({
+  model: RunModels.CLAUDE_HAIKU_4_5,
+  prompt: "…read skills/frame-vision-gate/SKILL.md, then caption + verify the frame…",
+  skills: [await Skill.fromPath("./vision-skill", { name: "frame-vision-gate" })],
+  proxyEndpoints,
+  secrets: {
+    apiKey: process.env.ANTHROPIC_API_KEY!,
+    proxyEndpointAuth
+  }
+});
+```
+`Skill.fromPath("./vision-skill", …)` is resolved relative to the process CWD, so
+run the submit script from the directory that *contains* `vision-skill/` (in the
+repo, that is `examples/`). The same pattern works for OpenAI, Gemini's
+OpenAI-compatible endpoint, or any other OpenAI-chat-shaped vision API — only
+`baseUrl` and the path prefix change.
+## 2. POST the image as a base64 data URL through the proxy
+Inside the run, the skill builds the OpenAI-compatible chat-completions body. The
+image is **base64-inlined as a data URL** in an `image_url` content part — it is
+not uploaded:
+```python
+import base64, json
+b64 = base64.b64encode(open("/workspace/files/frame.jpg", "rb").read()).decode()
+request_body = {
+    "model": "doubao-seed-1-6-vision-250815",
+    "temperature": 0,
+    "response_format": {"type": "json_object"},
+    "messages": [
+        {"role": "system", "content": "Describe only what the pixels show."},
+        {"role": "user", "content": [
+            {"type": "text", "text": "Does this frame depict an owlbear? Answer as JSON."},
+            {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{b64}"}}
+        ]}
+    ]
+}
+```
+Write the body to a file and hand it to the mounted CLI with `--data @<file>`
+(the mount has no execute bit, so invoke through `node`; see `credentials.md`):
+```python
+import subprocess
+body_path = "/workspace/.aex/_ark_request.json"
+open(body_path, "w").write(json.dumps(request_body))
+result = subprocess.run(
+    ["node", "/mnt/session/uploads/aex/aex", "proxy", "doubao-ark",
+     "--method", "POST",
+     "--path", "/api/v3/chat/completions",
+     "--header", "content-type=application/json",
+     "--data", f"@{body_path}",
+     "--response-mode", "full"],
+    capture_output=True, text=True, timeout=90,
+)
+```
+In `--response-mode full` the CLI prints a `ProxyResponseEnvelope` on stdout. The
+upstream JSON is **base64-encoded** in `upstreamBodyBase64`; an error instead
+carries an `error` field. Unwrap it:
+```python
+envelope = json.loads(result.stdout)
+if "error" in envelope:
+    raise RuntimeError(f"proxy error: {envelope['error']}: {envelope['message']}")
+upstream = json.loads(base64.b64decode(envelope["upstreamBodyBase64"]).decode())
+content = upstream["choices"][0]["message"]["content"]  # the model's JSON answer
+```
+The key is injected by the BFF on the outbound call; it never appears on disk in
+the container or in the model's context.
+## `maxRequestBytes` is required for image POSTs
+The per-endpoint `maxRequestBytes` default is **1 MiB**. A base64 data-URL image
+is ~1.33x the raw bytes, so a ~480px JPEG (~40-150 KB raw) becomes ~55-200 KB in
+the request — within the default, but only just. **Set `maxRequestBytes`
+explicitly** (a couple of MB) whenever you POST images so a higher-res frame or a
+larger prompt does not trip the cap. If a body does exceed the cap, the proxy
+rejects it before any upstream call with an explicit error naming the observed
+size, the configured cap, and how to raise it:
+> request body is 2400000 bytes, which exceeds this endpoint's maxRequestBytes
+> (1048576). Raise the per-endpoint maxRequestBytes in the proxy endpoint policy …
+Two ways to stay under the cap: raise `maxRequestBytes`, and/or scale frames to
+~480px wide before captioning (`ffmpeg -i source.mp4 -vf fps=1,scale=480:-1
+frame_%03d.jpg`) — full-res adds payload and cost, not signal.
+## Notes
+- **Egress.** The named proxy reaches any HTTPS host you declare as `baseUrl`
+  (no upstream allow-list; only a literal-IP SSRF deny-list). The international
+  BytePlus host (`ark.ap-southeast.bytepluses.com`) is a normal public host. The
+  China host (`ark.cn-beijing.volces.com`) is reachable in principle but the
+  platform's egress to Beijing is currently unverified — prefer the BytePlus host.
+- **Keyless model hosts.** If the upstream takes no credential, declare the
+  endpoint with `authShape: { type: "none" }` and omit the `proxyEndpointAuth`
+  entry (see `credentials.md`).
+- **Response size.** `responseMode: "full"` is required to read the model's reply
+  back. Leave `maxResponseBytes` at its default (`0` = unlimited, streamed) unless
+  you want a truncation cap.

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@aexhq/sdk",
-  "version": "0.21.0",
+  "version": "0.22.1",
   "description": "TypeScript SDK for running autonomous agent sessions across providers (Anthropic, OpenAI, DeepSeek, Gemini, Mistral) behind one interface.",
   "license": "Apache-2.0",
   "repository": {
@@ -26,7 +26,7 @@
     "examples"
   ],
   "devDependencies": {
-    "@aexhq/contracts": "0.21.0"
+    "@aexhq/contracts": "0.22.1"
   },
   "engines": {
     "node": ">=20"