@aexhq/sdk 0.21.0 → 0.22.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,159 @@
1
+ ---
2
+ title: Call a vision (or any model) API from a skill
3
+ ---
4
+
5
+ # Call a vision (or any model) API from a skill
6
+
7
+ aex has no built-in vision tool. The agent's `provider`/`model` selects the
8
+ *reasoning* model — it is not an endpoint a skill can POST an image to mid-run.
9
+ To give a run image understanding (or to call any other model/HTTP API), ship a
10
+ **skill** that POSTs to the provider's OpenAI-compatible endpoint **through the
11
+ managed proxy**, with the key supplied via `secrets.proxyEndpointAuth`. The raw
12
+ key never enters the container.
13
+
14
+ This is the same proxy described in `credentials.md` — this page is the worked
15
+ recipe for the model-API case, which has two wrinkles a plain JSON call does not:
16
+ the image rides as a **base64 data URL** in the request body, and that body is
17
+ large enough to need a raised `maxRequestBytes`.
18
+
19
+ The canonical, runnable example lives in the repo at
20
+ [`examples/vision-skill/`](../../../examples/vision-skill) (`SKILL.md`,
21
+ `caption_frame.py`, `verify_frame.py`, `submit_with_vision_skill.mjs`). It
22
+ captions a frame with ByteDance Doubao Seed Vision (Ark) and returns a per-noun
23
+ "does the frame depict X?" verdict. Everything below is taken from it.
24
+
25
+ ## 1. Declare the model endpoint as a proxy endpoint
26
+
27
+ The vision provider's API is just an HTTPS host. Declare it as a `bearer` proxy
28
+ endpoint and supply the key in `secrets.proxyEndpointAuth`. The two model-specific
29
+ settings are `responseMode: "full"` (so the skill gets the upstream JSON back) and
30
+ a raised `maxRequestBytes` (so the base64 image fits):
31
+
32
+ ```ts
33
+ import { AgentExecutor, RunModels, Skill, ProxyEndpoint, validateProxyAuth } from "@aexhq/sdk";
34
+
35
+ const aex = new AgentExecutor({ apiToken: process.env.AEX_WORKSPACE_TOKEN! });
36
+
37
+ const proxyEndpoints = [
38
+ ProxyEndpoint.bearer({
39
+ name: "doubao-ark",
40
+ baseUrl: "https://ark.ap-southeast.bytepluses.com", // intl BytePlus gateway
41
+ allowMethods: ["POST"],
42
+ allowPathPrefixes: ["/api/v3/chat/completions"],
43
+ maxRequestBytes: 2_000_000, // base64 image POSTs — see note below
44
+ responseMode: "full",
45
+ timeoutMs: 60_000
46
+ })
47
+ ];
48
+
49
+ const proxyEndpointAuth = [
50
+ { name: "doubao-ark", value: { type: "bearer", token: process.env.DOUBAO_API_KEY! } }
51
+ ];
52
+
53
+ validateProxyAuth(proxyEndpoints, proxyEndpointAuth); // fail fast at submit time
54
+
55
+ const runId = await aex.submit({
56
+ model: RunModels.CLAUDE_HAIKU_4_5,
57
+ prompt: "…read skills/frame-vision-gate/SKILL.md, then caption + verify the frame…",
58
+ skills: [await Skill.fromPath("./vision-skill", { name: "frame-vision-gate" })],
59
+ proxyEndpoints,
60
+ secrets: {
61
+ apiKey: process.env.ANTHROPIC_API_KEY!,
62
+ proxyEndpointAuth
63
+ }
64
+ });
65
+ ```
66
+
67
+ `Skill.fromPath("./vision-skill", …)` is resolved relative to the process CWD, so
68
+ run the submit script from the directory that *contains* `vision-skill/` (in the
69
+ repo, that is `examples/`). The same pattern works for OpenAI, Gemini's
70
+ OpenAI-compatible endpoint, or any other OpenAI-chat-shaped vision API — only
71
+ `baseUrl` and the path prefix change.
72
+
73
+ ## 2. POST the image as a base64 data URL through the proxy
74
+
75
+ Inside the run, the skill builds the OpenAI-compatible chat-completions body. The
76
+ image is **base64-inlined as a data URL** in an `image_url` content part — it is
77
+ not uploaded:
78
+
79
+ ```python
80
+ import base64, json
81
+ b64 = base64.b64encode(open("/workspace/files/frame.jpg", "rb").read()).decode()
82
+ request_body = {
83
+ "model": "doubao-seed-1-6-vision-250815",
84
+ "temperature": 0,
85
+ "response_format": {"type": "json_object"},
86
+ "messages": [
87
+ {"role": "system", "content": "Describe only what the pixels show."},
88
+ {"role": "user", "content": [
89
+ {"type": "text", "text": "Does this frame depict an owlbear? Answer as JSON."},
90
+ {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{b64}"}}
91
+ ]}
92
+ ]
93
+ }
94
+ ```
95
+
96
+ Write the body to a file and hand it to the mounted CLI with `--data @<file>`
97
+ (the mount has no execute bit, so invoke through `node`; see `credentials.md`):
98
+
99
+ ```python
100
+ import subprocess
101
+ body_path = "/workspace/.aex/_ark_request.json"
102
+ open(body_path, "w").write(json.dumps(request_body))
103
+
104
+ result = subprocess.run(
105
+ ["node", "/mnt/session/uploads/aex/aex", "proxy", "doubao-ark",
106
+ "--method", "POST",
107
+ "--path", "/api/v3/chat/completions",
108
+ "--header", "content-type=application/json",
109
+ "--data", f"@{body_path}",
110
+ "--response-mode", "full"],
111
+ capture_output=True, text=True, timeout=90,
112
+ )
113
+ ```
114
+
115
+ In `--response-mode full` the CLI prints a `ProxyResponseEnvelope` on stdout. The
116
+ upstream JSON is **base64-encoded** in `upstreamBodyBase64`; an error instead
117
+ carries an `error` field. Unwrap it:
118
+
119
+ ```python
120
+ envelope = json.loads(result.stdout)
121
+ if "error" in envelope:
122
+ raise RuntimeError(f"proxy error: {envelope['error']}: {envelope['message']}")
123
+ upstream = json.loads(base64.b64decode(envelope["upstreamBodyBase64"]).decode())
124
+ content = upstream["choices"][0]["message"]["content"] # the model's JSON answer
125
+ ```
126
+
127
+ The key is injected by the BFF on the outbound call; it never appears on disk in
128
+ the container or in the model's context.
129
+
130
+ ## `maxRequestBytes` is required for image POSTs
131
+
132
+ The per-endpoint `maxRequestBytes` default is **1 MiB**. A base64 data-URL image
133
+ is ~1.33x the raw bytes, so a ~480px JPEG (~40-150 KB raw) becomes ~55-200 KB in
134
+ the request — within the default, but only just. **Set `maxRequestBytes`
135
+ explicitly** (a couple of MB) whenever you POST images so a higher-res frame or a
136
+ larger prompt does not trip the cap. If a body does exceed the cap, the proxy
137
+ rejects it before any upstream call with an explicit error naming the observed
138
+ size, the configured cap, and how to raise it:
139
+
140
+ > request body is 2400000 bytes, which exceeds this endpoint's maxRequestBytes
141
+ > (1048576). Raise the per-endpoint maxRequestBytes in the proxy endpoint policy …
142
+
143
+ Two ways to stay under the cap: raise `maxRequestBytes`, and/or scale frames to
144
+ ~480px wide before captioning (`ffmpeg -i source.mp4 -vf fps=1,scale=480:-1
145
+ frame_%03d.jpg`) — full-res adds payload and cost, not signal.
146
+
147
+ ## Notes
148
+
149
+ - **Egress.** The named proxy reaches any HTTPS host you declare as `baseUrl`
150
+ (no upstream allow-list; only a literal-IP SSRF deny-list). The international
151
+ BytePlus host (`ark.ap-southeast.bytepluses.com`) is a normal public host. The
152
+ China host (`ark.cn-beijing.volces.com`) is reachable in principle but the
153
+ platform's egress to Beijing is currently unverified — prefer the BytePlus host.
154
+ - **Keyless model hosts.** If the upstream takes no credential, declare the
155
+ endpoint with `authShape: { type: "none" }` and omit the `proxyEndpointAuth`
156
+ entry (see `credentials.md`).
157
+ - **Response size.** `responseMode: "full"` is required to read the model's reply
158
+ back. Leave `maxResponseBytes` at its default (`0` = unlimited, streamed) unless
159
+ you want a truncation cap.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@aexhq/sdk",
3
- "version": "0.21.0",
3
+ "version": "0.22.1",
4
4
  "description": "TypeScript SDK for running autonomous agent sessions across providers (Anthropic, OpenAI, DeepSeek, Gemini, Mistral) behind one interface.",
5
5
  "license": "Apache-2.0",
6
6
  "repository": {
@@ -26,7 +26,7 @@
26
26
  "examples"
27
27
  ],
28
28
  "devDependencies": {
29
- "@aexhq/contracts": "0.21.0"
29
+ "@aexhq/contracts": "0.22.1"
30
30
  },
31
31
  "engines": {
32
32
  "node": ">=20"