@cuylabs/physical-capx-agent-core 0.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,344 @@
1
+ # Examples
2
+
3
+ Runnable examples for `@cuylabs/physical-capx-agent-core`.
4
+
5
+ These examples are the `@cuylabs/agent-core` client path.
6
+ `capx-agent-runtime` remains the harness-neutral Python service; any other
7
+ agent or workflow can use the same CaP-X runtime by calling its HTTP API
8
+ directly.
9
+
10
+ ## Main Examples
11
+
12
+ There are two solver examples. Both connect an `agent-core` agent to an
13
+ already-running `capx-agent-runtime` service and expose the CaP-X session
14
+ through the `capx_*` tools.
15
+
16
+ `01-capx-runtime-solver.ts` is the default single-turn example. It creates one
17
+ agent, starts one runtime session, and gives the model one user turn. Inside
18
+ that turn, `agent-core` may still run multiple model/tool steps, but the prompt
19
+ asks for one useful Code-as-Policy action and a short result summary.
20
+
21
+ That flow is:
22
+
23
+ 1. observe the CaP-X task, simulator state, and rendered frame,
24
+ 2. inspect runtime turn history and available policy-code context,
25
+ 3. write one Python Code-as-Policy step,
26
+ 4. execute that Python through `capx-agent-runtime`,
27
+ 5. observe again and summarize reward, stdout/stderr, artifacts, and task
28
+ completion.
29
+
30
+ `02-capx-runtime-autosolve.ts` is the multi-turn example. It keeps the same
31
+ agent and runtime session open across several user turns. After each turn, the
32
+ script observes the runtime result and stops when CaP-X reports task completion
33
+ or when `CAPX_MAX_SOLVER_TURNS` is reached. Use it when the harness should keep
34
+ trying the task instead of exiting after one solver turn.
35
+
36
+ Both examples enable the packaged `capx-code-as-policy` agent-core skill by
37
+ default. That skill teaches the model how to use the `capx_*` tools and how to
38
+ write policy code for the runtime. It is separate from CaP-X's runtime-side
39
+ Python skill library, which appears dynamically in observation `codeContext`
40
+ and deliberate runtime APIs.
41
+
42
+ ## Service-First Setup
43
+
44
+ The normal path is to start the runtime service first, usually on a Linux GPU
45
+ workstation, then run the TypeScript agent from your local machine or another
46
+ client.
47
+
48
+ Follow the runtime project docs first:
49
+
50
+ 1. Prepare the GPU workstation with
51
+ [Workstation Setup](https://github.com/cuylabs-ai/capx-agent-runtime/blob/main/docs/workstation-setup.md).
52
+ 2. Start and validate the runtime server with
53
+ [Runtime Server](https://github.com/cuylabs-ai/capx-agent-runtime/blob/main/docs/runtime-server.md).
54
+
55
+ The runtime server is typically started from the CaP-X checkout like this:
56
+
57
+ ```bash
58
+ cd /path/to/cap-x
59
+ uv run --no-sync --active capx-agent-runtime serve \
60
+ --repo-path "$(pwd)" \
61
+ --config-path env_configs/cube_stack/franka_robosuite_cube_stack.yaml \
62
+ --host 127.0.0.1 \
63
+ --port 8210
64
+ ```
65
+
66
+ That command starts the CaP-X runtime around the selected YAML config. For the
67
+ cube-stack config, the runtime is a Robosuite simulation. The TypeScript example
68
+ connects to this service and acts as the external solver agent.
69
+
70
+ You can point the runtime service at another compatible CaP-X config, for
71
+ example:
72
+
73
+ ```bash
74
+ --config-path env_configs/cube_stack/franka_robosuite_cube_stack_multiturn.yaml
75
+ --config-path env_configs/cube_stack/franka_robosuite_cube_stack_multiturn_vf.yaml
76
+ --config-path env_configs/cube_stack/franka_robosuite_cube_stack_multiturn_vdm.yaml
77
+ ```
78
+
79
+ The `02-capx-runtime-autosolve.ts` example can run multiple external-agent
80
+ turns against any of those configs. The `multiturn` configs add CaP-X
81
+ continuation prompt text. The `vf` and `vdm` configs expose visual-feedback or
82
+ visual-differencing intent; in this bring-your-own-agent path, the agent should
83
+ call `capx_observe` with `includeImages=true` and do the comparison in the host
84
+ model/harness.
85
+
86
+ If the runtime is remote, open an SSH tunnel so your local machine can reach
87
+ the service:
88
+
89
+ ```bash
90
+ ssh -L 8210:127.0.0.1:8210 <user>@<gpu-host>
91
+ ```
92
+
93
+ ## Client Setup
94
+
95
+ In an application that consumes the released packages, install the TypeScript
96
+ client packages and the example runner dependencies:
97
+
98
+ ```bash
99
+ npm install @cuylabs/agent-core @cuylabs/physical-core @cuylabs/physical-capx-agent-core
100
+ npm install --save-dev @ai-sdk/openai dotenv tsx
101
+ ```
102
+
103
+ The released package already includes its built `dist/` files, so there is no
104
+ workspace build step in the normal install path.
105
+
106
+ If you are running the examples from a local `physical-ai-ts` monorepo checkout
107
+ while changing package source, install workspace dependencies first:
108
+
109
+ ```bash
110
+ cd /path/to/physical-ai-ts
111
+ pnpm install
112
+ ```
113
+
114
+ Use the `pnpm` already available on your machine. If `pnpm` is missing and your
115
+ Node install includes Corepack, you can enable it with `corepack enable`; if
116
+ `corepack` is not available, install `pnpm` directly with your normal Node
117
+ package-manager setup.
118
+
119
+ For the checked-in examples, configure the local example environment from this
120
+ package directory:
121
+
122
+ ```bash
123
+ cd packages/physical-capx-agent-core
124
+ cp examples/.env.example examples/.env
125
+ ```
126
+
127
+ Both examples import `examples/_setup.ts`. You do not run `_setup.ts`
128
+ directly; it loads `examples/.env` and creates the OpenAI-compatible provider
129
+ used by `agent-core`.
130
+
131
+ Set the required values in `examples/.env`:
132
+
133
+ ```bash
134
+ OPENAI_API_KEY=...
135
+ OPENAI_MODEL=gpt-4o-mini
136
+
137
+ CAPX_RUNTIME_SERVER_URL=http://127.0.0.1:8210
138
+ ```
139
+
140
+ `OPENAI_BASE_URL` is optional. Leave it unset for the default OpenAI endpoint.
141
+ Set it only when using an OpenAI-compatible provider, for example a local
142
+ gateway or hosted inference endpoint.
143
+
144
+ ## Run Modes
145
+
146
+ The examples default to observe-only mode. In that mode, the agent can inspect
147
+ the task, frame, runtime state, and policy-code context, but it cannot call
148
+ `capx_run_policy_code`.
149
+
150
+ ### Observe Only
151
+
152
+ ```bash
153
+ npx tsx examples/01-capx-runtime-solver.ts
154
+ ```
155
+
156
+ Use this first to confirm that the runtime URL, model provider, session
157
+ creation, observation, and tool wiring are working.
158
+
159
+ ### Single Policy Step
160
+
161
+ Allow the single-turn example to execute one Python Code-as-Policy action in
162
+ simulation.
163
+
164
+ ```bash
165
+ CAPX_ALLOW_DESTRUCTIVE=1 \
166
+ npx tsx examples/01-capx-runtime-solver.ts
167
+ ```
168
+
169
+ The startup line should show `approval=policy-code-enabled`. If it still shows
170
+ `approval=observe-only`, the environment variable did not reach the Node
171
+ process. Use a single-line command to verify:
172
+
173
+ ```bash
174
+ env CAPX_ALLOW_DESTRUCTIVE=1 npx tsx examples/01-capx-runtime-solver.ts
175
+ ```
176
+
177
+ ### Single Policy Step With Video
178
+
179
+ ```bash
180
+ CAPX_ALLOW_DESTRUCTIVE=1 \
181
+ CAPX_POLICY_EXECUTION_RECORD_VIDEO=1 \
182
+ npx tsx examples/01-capx-runtime-solver.ts
183
+ ```
184
+
185
+ ### Multi-Turn Autosolve
186
+
187
+ Run the autosolver in observe-only mode.
188
+
189
+ ```bash
190
+ CAPX_MAX_SOLVER_TURNS=6 npx tsx examples/02-capx-runtime-autosolve.ts
191
+ ```
192
+
193
+ Allow policy-code execution across the autosolver loop.
194
+
195
+ ```bash
196
+ CAPX_ALLOW_DESTRUCTIVE=1 \
197
+ CAPX_MAX_SOLVER_TURNS=6 \
198
+ npx tsx examples/02-capx-runtime-autosolve.ts
199
+ ```
200
+
201
+ For the most complete demo, enable execution, video recording, one runtime
202
+ recovery reset, and stop-on-exit so the combined video artifact is flushed.
203
+
204
+ ```bash
205
+ CAPX_ALLOW_DESTRUCTIVE=1 \
206
+ CAPX_POLICY_EXECUTION_RECORD_VIDEO=1 \
207
+ CAPX_MAX_SOLVER_TURNS=6 \
208
+ CAPX_RECOVER_ON_RUNTIME_ERROR=reset \
209
+ CAPX_MAX_RUNTIME_RESETS=1 \
210
+ CAPX_STOP_ON_EXIT=1 \
211
+ npx tsx examples/02-capx-runtime-autosolve.ts
212
+ ```
213
+
214
+ ## Expected Output
215
+
216
+ For the default Franka cube-stack config, a healthy run usually finishes after
217
+ one useful policy-code turn. Exact sampled poses and artifact paths vary, but
218
+ the important terminal lines look like this:
219
+
220
+ ```text
221
+ executionOk=true, taskCompleted=true, reward=1
222
+ terminated=true, truncated=false
223
+ sandboxRc=0
224
+ CaP-X reported completion state: taskCompleted=true terminated=true truncated=false sandboxRc=0 reward=1
225
+ ```
226
+
227
+ The server log should show the same lifecycle:
228
+
229
+ ```text
230
+ POST /sessions ... 200 OK
231
+ POST /sessions/<id>/execute-code ... 200 OK
232
+ Saved interaction video to .../video_1.000_turn_00.mp4
233
+ Saved interaction video to .../video_session_combined.mp4
234
+ POST /sessions/<id>/stop ... 200 OK
235
+ ```
236
+
237
+ The `video_..._turn_00.mp4` file is the per-policy-turn recording.
238
+ `video_session_combined.mp4` is written when the session stops, so
239
+ `CAPX_STOP_ON_EXIT=1` is recommended for video examples. The runtime console
240
+ shows the combined session video first and links the per-turn videos as
241
+ individual artifact files.
242
+
243
+ ## Recovery And Cleanup
244
+
245
+ The autosolver distinguishes ordinary policy-code failures from runtime-level
246
+ CaP-X failures.
247
+
248
+ | Case | What Happens |
249
+ | ------------------------------------------------------------------- | ------------------------------------------------------------------------------ |
250
+ | Python policy returns stderr | The next agent turn can inspect the error and write better code. |
251
+ | Observation or depth pipeline fails before `env.step(code)` returns | The autosolver stops or uses `CAPX_RECOVER_ON_RUNTIME_ERROR=reset` if enabled. |
252
+ | Recovery reset is enabled | The session resets to the next trial/seed. The default reset budget is `1`. |
253
+ | `CAPX_STOP_ON_EXIT=1` is set | The example stops the runtime session at exit and flushes the combined video. |
254
+
255
+ If the reset budget is exhausted, clean up first and retry with a fresh
256
+ session. When a session is still running, find its id and stop it:
257
+
258
+ ```bash
259
+ curl -sS http://127.0.0.1:8210/sessions
260
+ curl -X POST http://127.0.0.1:8210/sessions/<session-id>/stop
261
+ ```
262
+
263
+ You can also reset an existing session, but for observation/depth assertion
264
+ failures a fresh session is usually clearer:
265
+
266
+ ```bash
267
+ curl -X POST \
268
+ -H 'content-type: application/json' \
269
+ -d '{}' \
270
+ http://127.0.0.1:8210/sessions/<session-id>/reset
271
+ ```
272
+
273
+ If the depth assertion repeats immediately on a clean session, restart the
274
+ `capx-agent-runtime serve` process too. That recreates the Python environment
275
+ and the child API services instead of reusing the same process state.
276
+
277
+ To isolate the TypeScript adapter and `agent-core` loop from the vision/depth
278
+ stack, start `capx-agent-runtime` with a privileged cube-stack config when that
279
+ config is available:
280
+
281
+ ```bash
282
+ uv run --no-sync --active capx-agent-runtime serve \
283
+ --repo-path "$(pwd)" \
284
+ --config-path env_configs/cube_stack/franka_robosuite_cube_stack_privileged.yaml \
285
+ --host 127.0.0.1 \
286
+ --port 8210
287
+ ```
288
+
289
+ That path avoids some vision-derived object-pose calls and is useful when you
290
+ want to validate HTTP tools, approvals, artifacts, videos, and the external
291
+ agent loop before debugging the Robosuite camera/depth pipeline.
292
+
293
+ ## Environment Variables
294
+
295
+ | Variable | Purpose |
296
+ | ---------------------------------------- | --------------------------------------------------------------------------------------------- |
297
+ | `OPENAI_API_KEY` | Configures the `agent-core` model provider. |
298
+ | `OPENAI_MODEL` | Model id. Defaults to `gpt-4o-mini` in `examples/_setup.ts`. |
299
+ | `OPENAI_BASE_URL` | Optional OpenAI-compatible provider endpoint. |
300
+ | `CAPX_RUNTIME_SERVER_URL` | URL for the running `capx-agent-runtime` service. |
301
+ | `CAPX_ALLOW_DESTRUCTIVE=1` | Lets the example approval policy execute `capx_run_policy_code`. |
302
+ | `CAPX_ALLOW_HARDWARE_POLICY_EXECUTION=1` | Extra gate required before policy execution against hardware configs. |
303
+ | `CAPX_MAX_SOLVER_TURNS` | Outer loop limit for `02-capx-runtime-autosolve.ts`. |
304
+ | `CAPX_RECOVER_ON_RUNTIME_ERROR=reset` | Reset the live runtime session after runtime-level observation/depth failures. |
305
+ | `CAPX_MAX_RUNTIME_RESETS` | Recovery reset budget. Defaults to `1` when recovery is enabled. |
306
+ | `CAPX_POLICY_EXECUTION_RECORD_VIDEO` | Optional `1` or `0` override for the selected YAML's video setting. |
307
+ | `CAPX_STOP_ON_EXIT=1` | Stop the runtime session when the example exits and flush combined video artifacts. |
308
+ | `CAPX_SESSION_OUTPUT_DIR` | Privileged per-session output override. Leave unset for normal server-owned paths. |
309
+ | `CAPX_SESSION_SKILL_LIBRARY_PATH` | Privileged per-session skill-library override. Leave unset unless path overrides are enabled. |
310
+ | `CAPX_TOOL_RESULT_MAX_CHARS` | Increase printed tool-result previews while debugging. |
311
+
312
+ By default, each example run uses the runtime server's configured output
313
+ directory and skill-library path. Set `CAPX_SESSION_OUTPUT_DIR` or
314
+ `CAPX_SESSION_SKILL_LIBRARY_PATH` only when the runtime server was started with
315
+ `--allow-client-path-overrides` and allowed roots for those paths.
316
+
317
+ ## Runtime Contract
318
+
319
+ The examples always use the live runtime path: `mode: "runtime"`,
320
+ `startSession: true`, `enablePolicyCodeExecution: true`, and
321
+ `policyExecutionMode: "live-runtime"`.
322
+
323
+ The adapter does not accept `repoPath` or `configPath`, and it omits
324
+ `outputDir` and `skillLibraryPath` by default. Those path choices belong to the
325
+ runtime server startup command. That keeps the architecture clean: the Python
326
+ runtime service owns the CaP-X repo/config/output/simulator setup, and
327
+ `agent-core` owns the external agent loop.
328
+
329
+ The adapter defaults to `toolExecutionMode: "plan"`. In `agent-core`, "plan"
330
+ means framework-owned tool dispatch, not "only write a textual plan." The model
331
+ can still emit tool calls; `agent-core` applies approval and scheduling policy,
332
+ executes approved tools, then records tool results before the next model step.
333
+
334
+ Both examples use `agent-core`'s `createEventPrinter` to render steps, tool
335
+ calls, tool results, approval events, text output, and completion. For CaP-X,
336
+ those logs are the easiest way to see the external agent loop: status, observe,
337
+ optional policy-code execution, observe again, then final summary.
338
+
339
+ This package does not copy CaP-X prompt templates into TypeScript. In runtime
340
+ mode, `capx-agent-runtime` loads the selected CaP-X YAML config and trial.
341
+ `capx_observe` returns the CaP-X task prompt, full prompt, observations, API
342
+ descriptions, rendered frame when available, and last-step result. The external
343
+ agent reads that CaP-X-provided context and acts by calling
344
+ `capx_run_policy_code`.
@@ -0,0 +1,61 @@
1
+ /**
2
+ * Shared example setup — loads `.env` from the examples directory.
3
+ */
4
+
5
+ import { createOpenAI } from "@ai-sdk/openai";
6
+ import { config } from "dotenv";
7
+ import { dirname, join } from "node:path";
8
+ import { fileURLToPath } from "node:url";
9
+
10
+ export const examplesDir = dirname(fileURLToPath(import.meta.url));
11
+
12
+ config({ path: join(examplesDir, ".env"), quiet: true });
13
+
14
+ const DEFAULT_OPENAI_MODEL = "gpt-4o-mini";
15
+
16
+ function firstEnv(names: string[]): string | undefined {
17
+ for (const name of names) {
18
+ const value = process.env[name];
19
+ if (value && value.trim()) {
20
+ return value.trim();
21
+ }
22
+ }
23
+ return undefined;
24
+ }
25
+
26
+ export function getExampleOpenAIModelId(
27
+ fallback = DEFAULT_OPENAI_MODEL,
28
+ ): string {
29
+ return (
30
+ firstEnv([
31
+ "OPENAI_MODEL",
32
+ "OPENAI_MODEL_ID",
33
+ "openai_model",
34
+ "openai_model_id",
35
+ ]) ?? fallback
36
+ );
37
+ }
38
+
39
+ export function getExampleOpenAIBaseURL(): string | undefined {
40
+ return firstEnv([
41
+ "OPENAI_BASE_URL",
42
+ "OPENAI_API_BASE_URL",
43
+ "OPENAI_BASEURL",
44
+ "openai_base_url",
45
+ "openai_api_base_url",
46
+ ]);
47
+ }
48
+
49
+ export function createExampleOpenAIProvider() {
50
+ const apiKey = firstEnv(["OPENAI_API_KEY", "openai_api_key"]);
51
+ const baseURL = getExampleOpenAIBaseURL();
52
+
53
+ return createOpenAI({
54
+ ...(apiKey ? { apiKey } : {}),
55
+ ...(baseURL ? { baseURL } : {}),
56
+ });
57
+ }
58
+
59
+ export function exampleOpenAIModel(modelId = getExampleOpenAIModelId()) {
60
+ return createExampleOpenAIProvider()(modelId);
61
+ }
package/package.json ADDED
@@ -0,0 +1,86 @@
1
+ {
2
+ "name": "@cuylabs/physical-capx-agent-core",
3
+ "version": "0.1.1",
4
+ "description": "Agent-core CaP-X agent and physical tool adapter",
5
+ "type": "module",
6
+ "main": "./dist/index.js",
7
+ "types": "./dist/index.d.ts",
8
+ "exports": {
9
+ ".": {
10
+ "types": "./dist/index.d.ts",
11
+ "import": "./dist/index.js",
12
+ "default": "./dist/index.js"
13
+ },
14
+ "./agent": {
15
+ "types": "./dist/agent.d.ts",
16
+ "import": "./dist/agent.js",
17
+ "default": "./dist/agent.js"
18
+ },
19
+ "./tools": {
20
+ "types": "./dist/tools.d.ts",
21
+ "import": "./dist/tools.js",
22
+ "default": "./dist/tools.js"
23
+ },
24
+ "./session": {
25
+ "types": "./dist/session.d.ts",
26
+ "import": "./dist/session.js",
27
+ "default": "./dist/session.js"
28
+ }
29
+ },
30
+ "files": [
31
+ "dist",
32
+ "docs",
33
+ "examples/*.ts",
34
+ "examples/.env.example",
35
+ "examples/README.md",
36
+ "skills",
37
+ "README.md"
38
+ ],
39
+ "dependencies": {
40
+ "zod": "^3.25.76 || ^4.1.8",
41
+ "@cuylabs/physical-agent-core": "^0.1.1",
42
+ "@cuylabs/physical-capx": "^0.1.1",
43
+ "@cuylabs/physical-core": "^0.1.1"
44
+ },
45
+ "devDependencies": {
46
+ "@ai-sdk/openai": "4.0.0-beta.38",
47
+ "@cuylabs/agent-core": "^7.2.1",
48
+ "@types/node": "^22.0.0",
49
+ "dotenv": "^17.2.3",
50
+ "tsup": "^8.0.0",
51
+ "tsx": "^4.21.0",
52
+ "typescript": "^5.7.0",
53
+ "vitest": "^4.0.18"
54
+ },
55
+ "peerDependencies": {
56
+ "@cuylabs/agent-core": "^7.0.0"
57
+ },
58
+ "keywords": [
59
+ "agent",
60
+ "physical-ai",
61
+ "robotics",
62
+ "capx",
63
+ "code-as-policy"
64
+ ],
65
+ "author": "cuylabs",
66
+ "license": "Apache-2.0",
67
+ "repository": {
68
+ "type": "git",
69
+ "url": "https://github.com/cuylabs-ai/physical-ai-ts.git",
70
+ "directory": "packages/physical-capx-agent-core"
71
+ },
72
+ "engines": {
73
+ "node": ">=20"
74
+ },
75
+ "publishConfig": {
76
+ "access": "public"
77
+ },
78
+ "scripts": {
79
+ "build": "tsup --config tsup.config.ts",
80
+ "dev": "tsup --config tsup.config.ts --watch",
81
+ "typecheck": "tsc --noEmit",
82
+ "test": "vitest run",
83
+ "test:watch": "vitest",
84
+ "clean": "rm -rf dist"
85
+ }
86
+ }
@@ -0,0 +1,22 @@
1
+ ---
2
+ name: capx-code-as-policy
3
+ description: Use this when controlling a CaP-X Code-as-Policy runtime through capx_* tools. It explains the observe, render, policy-code, turn-history, and artifact loop for an external agent harness.
4
+ version: 1.0.0
5
+ tags: [capx, robotics, physical-ai, code-as-policy]
6
+ ---
7
+
8
+ # CaP-X Code-as-Policy Agent Skill
9
+
10
+ Use CaP-X as the Python robotics runtime. The external agent owns the reasoning loop. First inspect the session with `capx_status` and `capx_observe`. Treat the CaP-X task prompt, full prompt, `codeContext`, policy-code context, reset metadata, rendered frames, and last-step result as the source of truth.
11
+
12
+ One CaP-X runtime session represents one live environment. Keep using that same session across observe/run/observe turns. Do not ask for a new session, reset, or stop unless the user asks, the current trial needs a deliberate retry, or continuing would be unsafe.
13
+
14
+ When `capx_run_policy_code` is available, write concise Python policy code using APIs exposed by the observed CaP-X prompt and `codeContext`. Prefer one purposeful code step at a time, then observe again.
15
+
16
+ Use `capx_observe` with image observations when visual state matters. Compare the new observation, frame, stdout, stderr, reward, and task-completion status against the previous turn before deciding whether another policy-code step is needed.
17
+
18
+ Use `capx_turn_history` to inspect prior submitted code and results. Use `capx_artifacts` when you need saved code, logs, summaries, images, or videos.
19
+
20
+ Use policy-code context returned by `capx_observe` for reusable CaP-X Python helper hints. These are runtime-side Python functions, not agent-core skills or separate robot tools. Submit policy code through `capx_run_policy_code`.
21
+
22
+ For ensemble reasoning, generate and compare candidate plans in the agent harness, then submit only the selected Python code through `capx_run_policy_code`. Keep execution approval and hardware safety policy in the host application.