@autenai/sdk 0.5.0 → 0.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,154 +1,760 @@
1
- # auten
1
+ # @autenai/sdk
2
2
 
3
- Programmatic control of Android phones through the Auten relay.
3
+ > Programmatic control of Android phones via the Auten relay — send tasks in plain English, query device state, manage encrypted credentials, watch live screens. SDK + CLI in one package.
4
4
 
5
5
  ```bash
6
6
  npm install @autenai/sdk
7
7
  ```
8
8
 
9
- Or install the CLI globally:
9
+ ```ts
10
+ import { Auten } from "@autenai/sdk";
10
11
 
11
- ```bash
12
- npm install -g @autenai/sdk
13
- auten login
12
+ const auten = new Auten({ apiKey: process.env.AUTEN_API_KEY! });
13
+
14
+ const phone = await auten.devices.firstOnline();
15
+ const result = await auten.tasks.run({
16
+ device: phone!.serial,
17
+ prompt: "Open Calculator and compute 999 ÷ 3",
18
+ speed: "lightning",
19
+ });
20
+ console.log(result.verified, result.result?.summary);
21
+ // → true, "Calculator displays 333."
14
22
  ```
15
23
 
16
- ## Quickstart
24
+ ---
25
+
26
+ ## Table of contents
27
+
28
+ - [Concepts](#concepts)
29
+ - [Setup](#setup)
30
+ - [Quickstart](#quickstart)
31
+ - [SDK reference](#sdk-reference)
32
+ - [`new Auten(config)`](#new-autenconfig)
33
+ - [`auten.me()`](#autenme)
34
+ - [`auten.devices`](#autendevices)
35
+ - [`auten.tasks`](#autentasks)
36
+ - [`auten.keys`](#autenkeys)
37
+ - [`auten.phone(serial)`](#autenphoneserial)
38
+ - [CLI reference](#cli-reference)
39
+ - [REST API reference](#rest-api-reference)
40
+ - [Recipes](#recipes)
41
+ - [Errors](#errors)
42
+ - [Pitfalls](#pitfalls)
43
+ - [For AI agents](#for-ai-agents)
44
+ - [Versioning](#versioning)
45
+
46
+ ---
47
+
48
+ ## Concepts
49
+
50
+ | Term | Meaning |
51
+ |---|---|
52
+ | **Relay** | Server (`https://relay.auten.ai` by default) that sits between your code and the phones. Phones connect outbound (so they work behind NAT). |
53
+ | **Owner** | The principal an API key belongs to. All resources (devices, tasks, sessions, credentials) are filtered by `ownerId` server-side — your key only sees your stuff. |
54
+ | **Device** | A physical Android phone running the Auten APK. Registered to one owner. |
55
+ | **Task** | A natural-language goal dispatched to a phone ("open Chrome and search for X"). The agent on the relay decomposes it into per-tap actions. |
56
+ | **Plan** | Cleaned-up action sequence extracted from a verified-successful task. Future tasks with similar prompts replay the plan deterministically (cheap + fast) before invoking the LLM. |
57
+ | **Screen graph** | Per-device DAG of `(fromFP, action, toFP)` edges learned from every successful tap. Powers cached replay on familiar screens. |
58
+ | **Speed preset** | One knob (`fast` / `instant` / `lightning`) that drives every artificial delay during replay. See [speed table](#speed-presets). |
59
+
60
+ Tasks resolve in this order, cheapest first:
61
+
62
+ 1. **Synthesize** from the per-screen Step KB if every required label was observed before. No LLM call.
63
+ 2. **Replay** a similar past task's `cleanPlanJson`. Deterministic, label-based; auto-scrolls to find off-screen targets.
64
+ 3. **Delegate** to Claude Opus 4.7 via the engine loop — only when the first two miss.
65
+
66
+ You don't pick which path runs; the relay does. Your code just calls `auten.tasks.create` or `.run` and gets a result.
67
+
68
+ ---
69
+
70
+ ## Setup
17
71
 
18
72
  ### 1. Get a key
19
73
 
20
- Sign up at https://auten.ai (or contact the operator of your relay) to receive an API key.
74
+ Sign up at https://auten.ai, or contact your relay operator. Keys look like `sk_live_<48 hex>`.
21
75
 
22
76
  ### 2. Save it
23
77
 
24
78
  ```bash
25
- $ auten login
79
+ $ npx @autenai/sdk login
26
80
  Relay base URL [https://relay.auten.ai]:
27
81
  API key: ********************************
28
82
  ✓ Authenticated as my-team — 1 device(s), 0 task(s).
29
83
  ✓ Saved to ~/.autenrc
30
84
  ```
31
85
 
32
- The CLI also reads `AUTEN_API_KEY` and `AUTEN_BASE_URL` from the environment, so you don't need `.autenrc` in CI.
86
+ The CLI also reads `AUTEN_API_KEY` and `AUTEN_BASE_URL` from the environment, so CI doesn't need an `.autenrc`.
33
87
 
34
- ### 3. Send a task
88
+ ### 3. Pair a phone
35
89
 
36
90
  ```bash
37
- $ auten devices
38
- STATUS SERIAL MODEL SCREEN LASTSEEN
39
- ────── ──────────────── ──────── ───────── ───────────────────
40
- online a4e0eff201d020fd SM-A556B 1080x2116 2026-05-04 09:00:00
91
+ auten add-phone # USB-tethered Samsung-style wizard
92
+ ```
41
93
 
42
- $ auten task --speed lightning "open the calculator and compute 999÷3"
43
- auten task
44
- Phone: a4e0eff201d020fd
45
- Speed: lightning
46
- Prompt: open the calculator and compute 999÷3
94
+ Or have the operator pre-pair phones for you and share the serials.
47
95
 
48
- ✓ Task 0f3789ec-68b1-4322-9fe2-15b33dfd8dc6 dispatched.
49
- → Live: https://relay.auten.ai/w/a4e0eff201d020fd?t=...
50
- → status: running
51
- → status: completed
96
+ ---
52
97
 
53
- done (10992ms, $0.0308) — Calculator displays 333.
54
- ```
98
+ ## Quickstart
55
99
 
56
- ## SDK
100
+ ### Run a task and wait
57
101
 
58
102
  ```ts
59
103
  import { Auten } from "@autenai/sdk";
60
104
 
61
105
  const auten = new Auten({ apiKey: process.env.AUTEN_API_KEY! });
62
106
 
63
- // Identity & devices
64
- const me = await auten.me();
65
- const devices = await auten.devices.list();
66
-
67
- // Run a task and wait for the result
68
107
  const result = await auten.tasks.run({
69
- device: devices[0].serial,
70
- prompt: "open instagram, scroll the feed for 30 seconds",
71
- speed: "lightning",
108
+ device: "a4e0eff201d020fd",
109
+ prompt: "Open Instagram and like the latest 5 posts on the feed",
110
+ speed: "fast",
111
+ timeout_seconds: 300,
112
+ });
113
+
114
+ if (result.verified) {
115
+ console.log("Done:", result.result?.summary);
116
+ } else {
117
+ console.warn("Verifier flagged failure:", result.result?.verify_reason);
118
+ }
119
+ ```
120
+
121
+ ### Fire-and-forget + poll later
122
+
123
+ ```ts
124
+ const { task_id, watch_url } = await auten.tasks.create({
125
+ device,
126
+ prompt: "Translate the article on news.tv3.lt and save the title to clipboard",
72
127
  });
73
- console.log(result.status, result.verified, result.result?.summary);
74
128
 
75
- // Or fire-and-forget + poll later
76
- const { task_id } = await auten.tasks.create({ device, prompt });
77
- const final = await auten.tasks.wait(task_id);
129
+ // browser-renderable live view
130
+ console.log(`Watch: ${auten.baseUrl}${watch_url}`);
131
+
132
+ // later:
133
+ const final = await auten.tasks.wait(task_id, { timeoutMs: 10 * 60_000 });
134
+ ```
135
+
136
+ ### Direct phone control
137
+
138
+ ```ts
139
+ const phone = auten.phone(device);
140
+
141
+ const screen = await phone.look(); // SoM screenshot + element list
142
+ const submit = screen.elements.find(e => /sign in/i.test(e.text ?? ""));
143
+ if (submit) await phone.tap(submit.x, submit.y);
78
144
 
79
- // Direct phone control
80
- const phone = auten.phone(devices[0].serial);
81
- const screen = await phone.look(); // SoM screenshot + element list
82
- await phone.tap(500, 800);
83
- await phone.type("hello", { x: 540, y: 700 });
84
145
  await phone.openUrl("https://news.tv3.lt");
146
+ await phone.type("hello", { x: 540, y: 800 }); // accessibility ACTION_SET_TEXT, no keyboard
147
+ await phone.key("back");
85
148
  ```
86
149
 
87
- ## Per-device credentials
150
+ ---
88
151
 
89
- Login data is encrypted server-side and only ever decrypted into the agent's
90
- runtime when it actually needs to fill a form.
152
+ ## SDK reference
91
153
 
92
- ```bash
93
- $ auten creds add --device a4e0eff... --service instagram \
94
- --username myaccount --password secret
95
- $ auten creds ls --device a4e0eff...
96
- $ auten creds rm --device a4e0eff... instagram
154
+ ### `new Auten(config)`
155
+
156
+ ```ts
157
+ type AutenConfig = {
158
+ apiKey: string; // required
159
+ baseUrl?: string; // default: https://relay.auten.ai
160
+ timeoutMs?: number; // per-request, default 60_000
161
+ };
162
+ ```
163
+
164
+ Construct one client per process. The `Transport` underneath uses `fetch` (Node 18.17+ has it built in). Bearer auth is applied to every request.
165
+
166
+ `auten.baseUrl` getter returns the resolved base URL.
167
+
168
+ ### `auten.me()`
169
+
170
+ ```ts
171
+ auten.me(): Promise<{
172
+ owner_id: string;
173
+ key_id: string;
174
+ is_root: boolean; // env-key admin (sees all owners)
175
+ device_count: number; // owner's devices
176
+ task_count: number; // owner's tasks ever
177
+ }>
178
+ ```
179
+
180
+ Identity probe. Use after login to verify the key is valid and to log who's running.
181
+
182
+ ### `auten.devices`
183
+
184
+ ```ts
185
+ auten.devices.list(): Promise<Device[]>
186
+ auten.devices.get(serial: string): Promise<Device | null>
187
+ auten.devices.firstOnline(): Promise<Device | null>
188
+ auten.devices.stats(serial: string): Promise<{
189
+ serial: string;
190
+ online: boolean;
191
+ graph_edges: number;
192
+ task_count: number;
193
+ cache_hit_rate: number; // 0..1, fraction of past turns served from graph cache
194
+ }>
195
+ ```
196
+
197
+ ```ts
198
+ type Device = {
199
+ serial: string;
200
+ model: string | null;
201
+ online: boolean;
202
+ type: string; // "physical" | "emulator"
203
+ lastSeenAt: string | null; // ISO timestamp
204
+ androidVersion: string | null;
205
+ screenW: number | null;
206
+ screenH: number | null;
207
+ };
208
+ ```
209
+
210
+ `online` flips when the phone disconnects from the relay's WS reverse tunnel. Pollable; the relay updates it within a few seconds of disconnect.
211
+
212
+ ### `auten.tasks`
213
+
214
+ ```ts
215
+ type Speed = "fast" | "instant" | "lightning";
216
+ type TaskMode = "task" | "explore";
217
+ type TaskStatus = "queued" | "running" | "completed" | "failed" | "cancelled";
218
+
219
+ auten.tasks.create(input: {
220
+ device: string;
221
+ prompt: string;
222
+ mode?: TaskMode;
223
+ speed?: Speed;
224
+ webhook_url?: string;
225
+ webhook_secret?: string; // HMAC-SHA256 signing for webhook deliveries
226
+ timeout_seconds?: number; // default 300; 0 = no limit (explore mode)
227
+ }): Promise<{ task_id: string; status: string; watch_url: string }>
228
+
229
+ auten.tasks.get(id: string): Promise<Task>
230
+ auten.tasks.list(opts?: {
231
+ device?: string;
232
+ status?: TaskStatus;
233
+ limit?: number; // 1..100, default 25
234
+ }): Promise<Task[]>
235
+ auten.tasks.cancel(id: string): Promise<{ task_id: string; status: string }>
236
+
237
+ // Poll until terminal state. Returns the final Task.
238
+ auten.tasks.wait(id: string, opts?: {
239
+ intervalMs?: number; // default 1000
240
+ timeoutMs?: number; // default 300_000
241
+ }): Promise<Task>
242
+
243
+ // Sugar: create + wait. Same options as create + wait.
244
+ auten.tasks.run(input: CreateTaskInput, waitOpts?: WaitOpts): Promise<Task>
245
+ ```
246
+
247
+ ```ts
248
+ type Task = {
249
+ task_id: string;
250
+ device_serial: string;
251
+ prompt: string;
252
+ status: TaskStatus;
253
+ mode: TaskMode;
254
+ verified?: boolean | null;
255
+ result?: {
256
+ summary?: string;
257
+ cost_usd?: number;
258
+ duration_ms?: number;
259
+ verify_reason?: string; // verifier's explanation when verified=false
260
+ } | null;
261
+ error?: { code?: string; message?: string; retryable?: boolean } | null;
262
+ created_at?: string;
263
+ started_at?: string | null;
264
+ completed_at?: string | null;
265
+ turns?: TaskTurn[]; // present in get(), not list()
266
+ artifacts?: TaskArtifact[]; // screenshots, recordings, files saved during run
267
+ };
268
+
269
+ type TaskTurn = {
270
+ index: number;
271
+ source: string; // "cached" | "decide" | "delegate" | "replay"
272
+ label: string | null;
273
+ ok: boolean;
274
+ cost_usd: number;
275
+ duration_ms: number | null;
276
+ };
277
+ ```
278
+
279
+ #### Speed presets
280
+
281
+ | Preset | settle | postAction | look-after-each | Use when |
282
+ |---|---|---|---|---|
283
+ | `fast` (default) | 250ms | 350ms | yes | First runs of unfamiliar tasks; debugging. Human-pace. |
284
+ | `instant` | 50ms | 100ms | yes | Familiar flows that already verified once at `fast`. |
285
+ | `lightning` | 10ms | 30ms | **no** | Production replay of plans you trust. Skips per-step look — fastest, less verifiable mid-flight. |
286
+
287
+ `lightning` is the "this plan worked yesterday, just run it" knob. Don't use it for the first run of anything novel.
288
+
289
+ ### `auten.keys`
290
+
291
+ ```ts
292
+ auten.keys.list(): Promise<ApiKey[]> // never reveals full secrets
293
+ auten.keys.create(input?: {
294
+ name?: string;
295
+ ownerId?: string; // root-only override
296
+ }): Promise<ApiKeyWithSecret> // .key contains the full secret — printed once
297
+ auten.keys.revoke(id: string): Promise<{ ok: boolean; id: string }>
298
+ ```
299
+
300
+ Self-service rotation: mint a new key, deploy it, revoke the old one. Cache invalidates within 30s server-side.
301
+
302
+ ### `auten.phone(serial)`
303
+
304
+ Returns a `Phone` handle. All methods route through the relay's owner-scoped phone-proxy endpoint, which forwards via the WS reverse tunnel to the APK. The phone doesn't need to be reachable from your IP — only from the relay.
305
+
306
+ #### Vision
307
+
308
+ ```ts
309
+ phone.look(): Promise<LookResult> // SoM annotated screenshot + element list (accessibility tree)
310
+ phone.screenshot(): Promise<{ jpeg: string; width: number; height: number; durationMs: number }>
311
+ // raw pixels, no SoM, ~280ms faster than look()
312
+ ```
313
+
314
+ ```ts
315
+ type LookResult = {
316
+ annotated: string; // base64 JPEG with numbered markers
317
+ elements: SomElement[];
318
+ width: number;
319
+ height: number;
320
+ };
321
+
322
+ type SomElement = {
323
+ id: number; // SoM marker number, matches the image overlay
324
+ x: number; y: number; w: number; h: number; // pixel coords (center)
325
+ text?: string;
326
+ desc?: string; // content-description (accessibility label)
327
+ cls?: string; // class name (e.g. "android.widget.Button")
328
+ res?: string; // resource id (e.g. "com.app:id/submit_button")
329
+ clickable?: boolean;
330
+ editable?: boolean;
331
+ scrollable?: boolean;
332
+ via_ocr?: boolean; // synthesized from on-device OCR (Compose / Flutter / canvas UIs)
333
+ };
334
+ ```
335
+
336
+ #### Input
337
+
338
+ ```ts
339
+ phone.tap(x: number, y: number): Promise<{ ok: boolean }>
340
+ phone.longPress(x: number, y: number, durationMs?: number): Promise<{ ok: boolean }>
341
+ phone.swipe(x1: number, y1: number, x2: number, y2: number, durationMs?: number): Promise<{ ok: boolean }>
342
+
343
+ // `target` of an editable element bypasses the soft keyboard via accessibility
344
+ // ACTION_SET_TEXT — instant, no layout shift. Without target, routes through IME.
345
+ phone.type(text: string, target?: { x: number; y: number }): Promise<{ ok: boolean }>
346
+
347
+ phone.key(name: KeyName): Promise<{ ok: boolean }>
348
+ // KeyName: "back" | "home" | "recents" | "enter" | "delete" | "tab" | "menu"
349
+ // | "search" | "volume_up" | "volume_down" | "power"
350
+ ```
351
+
352
+ #### Apps
353
+
354
+ ```ts
355
+ phone.launch(packageName: string): Promise<{ ok: boolean }> // force-stops then launches
356
+ phone.stop(packageName: string): Promise<{ ok: boolean }>
357
+ phone.openUrl(url: string, pkg?: string): Promise<{ ok: boolean; url: string; package?: string }>
358
+ // ACTION_VIEW intent
359
+ ```
360
+
361
+ #### Other
362
+
363
+ ```ts
364
+ phone.status(): Promise<unknown> // APK health: { ok, service, version, accessibility }
365
+ phone.info(): Promise<unknown>
366
+ phone.reset(): Promise<{ ok: boolean; killedCount: number }> // kill all 3rd-party + key:home
367
+ phone.notifications(): Promise<unknown[]>
368
+ phone.clearNotifications(): Promise<{ ok: boolean }>
369
+ ```
370
+
371
+ #### Clipboard *(varies by APK build — see [Pitfalls](#pitfalls))*
372
+
373
+ ```ts
374
+ phone.clipboardSet(text: string): Promise<{ ok: boolean }>
375
+ phone.clipboardGet(): Promise<{ ok: boolean; text: string }>
376
+ phone.pasteClipboard(target?: { x: number; y: number }): Promise<{ ok: boolean }>
377
+ // ACTION_PASTE on focused or coord-targeted editable
378
+ ```
379
+
380
+ #### Task sugar
381
+
382
+ ```ts
383
+ phone.task(prompt: string, opts?: {
384
+ speed?: Speed;
385
+ mode?: TaskMode;
386
+ timeout_seconds?: number;
387
+ webhook_url?: string;
388
+ webhook_secret?: string;
389
+ }): Promise<{ task_id: string; status: string; watch_url: string }>
390
+ ```
391
+
392
+ Equivalent to `auten.tasks.create({ device: this.serial, prompt, ...opts })`.
393
+
394
+ #### Credentials
395
+
396
+ Encrypted server-side with AES; only ever decrypted into the agent's runtime when it actually needs to fill a form. Per-device.
397
+
398
+ ```ts
399
+ phone.credentials.save(input: {
400
+ service: string;
401
+ username?: string;
402
+ password?: string;
403
+ totp_secret?: string;
404
+ notes?: string;
405
+ [k: string]: unknown; // any extra fields are encrypted with the rest
406
+ }): Promise<{ ok: boolean; service: string }>
407
+
408
+ phone.credentials.list(): Promise<Credential[]> // each row carries `id` (UUID) + `deviceSerial`
409
+ phone.credentials.reveal<T>(service: string): Promise<T> // full payload, decrypted
410
+ phone.credentials.delete(service: string): Promise<{ ok: boolean }> // by service name
411
+ phone.credentials.deleteById(id: string): Promise<{ ok: boolean; id: string }> // by UUID
412
+
413
+ // Cross-device variants on the top-level client:
414
+ auten.credentials.list(): Promise<Credential[]> // every credential the owner has, across all devices
415
+ auten.credentials.deleteById(id: string): Promise<{ ok: boolean; id: string }> // by UUID, any device the owner has
97
416
  ```
98
417
 
99
- From code:
418
+ #### Escape hatch
100
419
 
101
420
  ```ts
102
- const phone = auten.phone(serial);
103
- await phone.credentials.save({ service: "instagram", username: "x", password: "y" });
104
- const list = await phone.credentials.list();
421
+ phone.proxy<T>(method: "GET" | "POST", path: string, body?: unknown, timeoutMs?: number): Promise<T>
105
422
  ```
106
423
 
107
- ## API keys
424
+ For APK endpoints not yet wrapped by the SDK. Forwards `{method, path, body}` through the WS tunnel as-is. Useful if the APK ships a new route before the SDK does.
425
+
426
+ ---
427
+
428
+ ## CLI reference
429
+
430
+ Installed as `auten` when the package is global, or via `npx @autenai/sdk <cmd>`.
431
+
432
+ | Command | Aliases | Description |
433
+ |---|---|---|
434
+ | `auten login` | | Save API key + relay URL to `~/.autenrc` (chmod 600). |
435
+ | `auten me` | `whoami` | Show whoami + counts for the calling key. |
436
+ | `auten devices` | `list`, `ls` | List devices belonging to your owner. |
437
+ | `auten add-phone` | `add` | Interactive USB-pair wizard. |
438
+ | `auten build-apk` | `build` | Build a fresh APK on the relay host (requires SSH access). |
439
+ | `auten task "<prompt>"` | `run` | Dispatch a task and follow until done. |
440
+ | `auten task --no-follow` | | Fire-and-forget; returns task id. |
441
+ | `auten task --device <serial>` | | Pin to a specific phone. |
442
+ | `auten task --speed lightning\|instant\|fast` | | Speed preset. |
443
+ | `auten tasks` | | List recent tasks. |
444
+ | `auten tasks <id>` | | Show one task in detail. |
445
+ | `auten creds add` | `save` | Save a service login (interactive — password input is masked). |
446
+ | `auten creds ls` | `list` | List saved services for a device. |
447
+ | `auten creds show <service>` | `reveal` | Print the full credential JSON (passwords included). |
448
+ | `auten creds rm <service>` | `delete` | Delete a credential. |
449
+ | `auten keys` | | List your API keys. |
450
+ | `auten keys create [name]` | `add`, `new` | Mint a new key — secret printed once. |
451
+ | `auten keys revoke <id>` | `rm`, `delete` | Disable a key (instant). |
452
+ | `auten version` | `-v` | Print package version. |
453
+ | `auten help` | `-h` | Print usage. |
454
+
455
+ Common flags across `creds`/`task`: `--device <serial>` to pin (defaults to `lastSerial` from `~/.autenrc`, then first online).
456
+
457
+ ---
458
+
459
+ ## REST API reference
460
+
461
+ The SDK is a thin wrapper over a small REST surface. If you're not using Node, hit it directly.
462
+
463
+ ### Auth
464
+
465
+ `Authorization: Bearer <api-key>` header on every request. Or `?apiKey=<key>` query param if you can't set headers (e.g. WS).
466
+
467
+ ### Endpoints
468
+
469
+ | Method | Path | Description |
470
+ |---|---|---|
471
+ | `GET` | `/v1/me` | Identity. |
472
+ | `GET` | `/v1/keys` | List your keys (no full secrets). |
473
+ | `POST` | `/v1/keys` | Mint a new key. Body: `{name?, ownerId?}`. Returns full secret once. |
474
+ | `DELETE` | `/v1/keys/:id` | Revoke a key. |
475
+ | `GET` | `/v1/devices` | List your devices. |
476
+ | `GET` | `/v1/devices/:serial/stats` | Per-device counters (graph edges, cache hit rate). |
477
+ | `GET` | `/v1/devices/:serial/graph` | Top 500 screen-transition edges for the device. |
478
+ | `POST` | `/v1/devices/:serial/proxy` | Forward `{method, path, body, timeout_ms?}` to the APK over the WS tunnel. The SDK's `phone.*` methods all route through this. |
479
+ | `GET` | `/v1/tasks` | List tasks. Query: `device`, `status`, `limit`. |
480
+ | `POST` | `/v1/tasks` | Create a task. Body: `{device_serial, prompt, mode?, speed?, webhook_url?, webhook_secret?, timeout_seconds?}`. |
481
+ | `GET` | `/v1/tasks/:id` | Get one task with turns + artifacts. |
482
+ | `POST` | `/v1/tasks/:id/cancel` | Cancel a running task. |
483
+ | `GET` | `/v1/credentials` | Every credential the caller owns, across all their devices. Each row has `id` + `deviceSerial`. |
484
+ | `DELETE` | `/v1/credentials/:id` | Delete one by UUID (any device the caller owns). |
485
+ | `POST` | `/v1/devices/:serial/credentials` | Save a credential. |
486
+ | `GET` | `/v1/devices/:serial/credentials` | One device's credentials (now includes `id`). |
487
+ | `GET` | `/v1/devices/:serial/credentials/:service/reveal` | Decrypt + return one by service name. |
488
+ | `DELETE` | `/v1/devices/:serial/credentials/:service` | Delete by service name. |
489
+ | `POST` | `/v1/transitions` | Used by APK to record passive learning edges. |
490
+ | `GET` | `/health` | Liveness probe (no auth). |
491
+ | `GET` | `/w/:serial?t=<watch-token>` | HTML viewer for the live phone screen + chat. |
492
+
493
+ ### curl example
108
494
 
109
495
  ```bash
110
- $ auten keys # list yours
111
- $ auten keys create laptop # mint a new key
112
- Key created.
113
- sk_live_abcdef0123456789... # printed once
114
- Save this key now — it will never be shown again.
115
- $ auten keys revoke <id> # disable instantly
496
+ curl -X POST https://relay.auten.ai/v1/tasks \
497
+ -H "Authorization: Bearer $AUTEN_API_KEY" \
498
+ -H "Content-Type: application/json" \
499
+ -d '{
500
+ "device_serial": "a4e0eff201d020fd",
501
+ "prompt": "Compute 999 ÷ 3 in the calculator",
502
+ "speed": "lightning"
503
+ }'
504
+ # → {"task_id":"...","status":"running","watch_url":"/w/.../?t=..."}
505
+ ```
506
+
507
+ ### Webhook deliveries
508
+
509
+ If `webhook_url` was set on `tasks.create`, the relay POSTs to it on every status change with body:
510
+
511
+ ```json
512
+ {
513
+ "task_id": "...",
514
+ "status": "completed",
515
+ "result": { "summary": "...", "cost_usd": 0.03, "duration_ms": 12000, "verified": true },
516
+ "error": null
517
+ }
518
+ ```
519
+
520
+ Headers:
521
+ - `Content-Type: application/json`
522
+ - `X-Auten-Signature: sha256=<hex>` if `webhook_secret` was provided. Body is HMAC-SHA256-signed.
523
+
524
+ Verify with:
525
+
526
+ ```ts
527
+ import crypto from "node:crypto";
528
+ const expected = "sha256=" + crypto.createHmac("sha256", secret).update(rawBody).digest("hex");
529
+ if (!crypto.timingSafeEqual(Buffer.from(sig), Buffer.from(expected))) reject();
530
+ ```
531
+
532
+ ---
533
+
534
+ ## Recipes
535
+
536
+ ### Run a complex multi-step task
537
+
538
+ ```ts
539
+ const result = await auten.tasks.run({
540
+ device,
541
+ prompt: `
542
+ 1. Open Chrome and go to coingecko.com
543
+ 2. Find Bitcoin's average price for April 2026
544
+ 3. Open Calculator and compute that price ÷ 2
545
+ 4. Report the result
546
+ `.trim(),
547
+ speed: "fast",
548
+ timeout_seconds: 600,
549
+ });
550
+ ```
551
+
552
+ The agent will pick the path itself (web fetching, app switching, math). You don't pre-compose actions.
553
+
554
+ ### Save a login and let the agent use it
555
+
556
+ ```ts
557
+ await auten.phone(device).credentials.save({
558
+ service: "instagram",
559
+ username: "myhandle",
560
+ password: process.env.INSTAGRAM_PW!,
561
+ });
562
+
563
+ await auten.tasks.run({
564
+ device,
565
+ prompt: "Log into Instagram (credentials are saved as service=instagram) and post 'Hello from Auten' as a story",
566
+ });
116
567
  ```
117
568
 
118
- ## Speed presets
569
+ The agent calls `get_credentials("instagram")` server-side; the password is decrypted into the runtime, used, then dropped. It never lands in logs or the LLM context.
119
570
 
120
- `--speed` controls every artificial delay during replay:
571
+ ### Stream live progress to a browser
121
572
 
122
- | preset | settle | post-action | look-after-each |
123
- |-------------|--------|-------------|-----------------|
124
- | `fast` | 250ms | 350ms | yes (default) |
125
- | `instant` | 50ms | 100ms | yes |
126
- | `lightning` | 10ms | 30ms | no (skipped) |
573
+ ```ts
574
+ const { task_id, watch_url } = await auten.tasks.create({ device, prompt });
575
+ // share watch_url with the operator — it's signed (HMAC-SHA256, 1h expiry default)
576
+ console.log(`Live: https://relay.auten.ai${watch_url}`);
577
+ ```
127
578
 
128
- `lightning` is the fastest but skips per-action verification only use it for
129
- flows that have already been verified once at `fast`.
579
+ The viewer is a vanilla HTML page (no auth needed beyond the signed token in the URL). It mirrors the phone screen + tool calls in real time.
130
580
 
131
- ## Configuration
581
+ ### Webhook-driven async pipeline
132
582
 
583
+ ```ts
584
+ // in your worker
585
+ const { task_id } = await auten.tasks.create({
586
+ device,
587
+ prompt,
588
+ webhook_url: "https://my-app.com/auten-webhook",
589
+ webhook_secret: process.env.WEBHOOK_SECRET,
590
+ });
591
+
592
+ // in your /auten-webhook handler:
593
+ app.post("/auten-webhook", async (req, res) => {
594
+ const sig = req.headers["x-auten-signature"];
595
+ const body = req.rawBody; // make sure you have raw bytes, not parsed JSON
596
+ // ... verify HMAC, then act on body.task_id + body.status
597
+ });
133
598
  ```
134
- AUTEN_API_KEY Your key (can be set in ~/.autenrc instead)
135
- AUTEN_BASE_URL Relay URL — default https://relay.auten.ai
599
+
600
+ ### Cross-app workflow
601
+
602
+ Mix high-level prompts with direct phone calls when you need precision:
603
+
604
+ ```ts
605
+ const phone = auten.phone(device);
606
+
607
+ // 1. natural-language step
608
+ await auten.tasks.run({ device, prompt: "Open Samsung Internet and go to icecode.lt" });
609
+
610
+ // 2. take over for a precise interaction
611
+ const screen = await phone.look();
612
+ const submit = screen.elements.find(e => /submit|send/i.test(e.text ?? ""));
613
+ if (submit) await phone.tap(submit.x, submit.y);
614
+
615
+ // 3. natural-language verification
616
+ const verify = await auten.tasks.run({ device, prompt: "Confirm the form said 'thank you' or similar" });
136
617
  ```
137
618
 
619
+ ---
620
+
138
621
  ## Errors
139
622
 
140
623
  ```ts
141
- import { AuthError, ApiError, DeviceOfflineError } from "@autenai/sdk";
624
+ import { AuthError, ApiError, DeviceOfflineError, AutenError } from "@autenai/sdk";
142
625
 
143
626
  try {
144
627
  await auten.tasks.create({ device, prompt });
145
628
  } catch (err) {
146
- if (err instanceof DeviceOfflineError) /* phone disconnected from WS */;
147
- if (err instanceof AuthError) /* bad / revoked key */;
148
- if (err instanceof ApiError && err.status >= 500) /* relay error */;
629
+ if (err instanceof AuthError) // 401 / 403 bad or revoked key
630
+ if (err instanceof DeviceOfflineError) // phone disconnected from WS
631
+ if (err instanceof ApiError) // generic HTTP error; .status has the code
632
+ if (err instanceof AutenError) // base class for all of the above
633
+ throw err;
149
634
  }
150
635
  ```
151
636
 
637
+ `ApiError.status === 0` indicates a network error before the relay was reached (DNS failure, timeout, etc.).
638
+
639
+ ---
640
+
641
+ ## Pitfalls
642
+
643
+ These are real failure modes you will hit. Read them once.
644
+
645
+ ### `phone.type()` returns `ok: true` but the text doesn't appear
646
+
647
+ Some custom Android editors (notably **Samsung Notes Compose canvas**, some Flutter apps, certain Compose-only inputs) ignore accessibility `ACTION_SET_TEXT`. The relay's IME path also fails on these.
648
+
649
+ **Detection:** call `phone.look()` after `type()` and check whether the text is visible. The relay returns `ok: true` because the call was *accepted*, not because the text *landed*.
650
+
651
+ **Workaround:** route through a different surface — a web form in the browser, the calculator, a regular EditText in a non-Compose app. The agent's task runner has the same limitation; it'll loop and the verifier will catch it. If you control the target app, expose the field via standard `EditText`.
652
+
653
+ ### `phone.clipboardSet()` may silently fail on Android 10+
654
+
655
+ Samsung and some other OEMs block background apps from writing to the system clipboard since Android 10. The APK call returns `ok: true` (write was accepted) but `clipboardGet()` returns empty. Treat clipboard as best-effort.
656
+
657
+ ### `lightning` speed skips per-action `look()`
658
+
659
+ The relay records `fpAfter: null` for actions taken in lightning mode, which means subsequent screen-graph cache hits use the post-replay terminal `look()` only. Fine for plans you trust; bad for first runs because failures don't get caught mid-flight.
660
+
661
+ ### Plans expire when the UI drifts
662
+
663
+ A `cleanPlanJson` keyed on labels ("Sign in", "Continue") survives most UI drift, but a major redesign breaks it. The verifier catches it (`verified: false`); the next run will replan via the LLM. No manual intervention needed — the failed plan auto-deprecates after `planUseFailure > planUseSuccess`.
664
+
665
+ ### Owner scoping is enforced server-side
666
+
667
+ Passing a `device_serial` you don't own returns `404 not found`, not `403 forbidden`, on purpose — we don't want to leak the existence of other owners' phones. If you're sure you should own it but get 404, run `auten devices` to confirm the relay has it under your `ownerId`.
668
+
669
+ ### `auten.tasks.wait` polls — there's no SSE yet
670
+
671
+ Each `wait` call hits `GET /v1/tasks/:id` every `intervalMs` (default 1s). For very long tasks, raise `intervalMs` to 5–10 s. Server-Sent Events streaming is on the roadmap; in the meantime, if you need real-time telemetry, use webhooks.
672
+
673
+ ---
674
+
675
+ ## For AI agents
676
+
677
+ This package is the **only** supported way to control Auten phones from JavaScript/TypeScript today. If you're an AI agent integrating it into another project, here's what to know up front.
678
+
679
+ ### Verify you're using the current SDK
680
+
681
+ The class name is `Auten` (not `MobileAgent`, which was a deprecated pre-1.0 surface). Endpoints live under `/v1/...`. Quick check:
682
+
683
+ ```ts
684
+ import { Auten } from "@autenai/sdk";
685
+ console.log(typeof Auten); // "function"
686
+ ```
687
+
688
+ If you're seeing `MobileAgent`, you're on an old version (`@autenai/sdk@<0.5.0`, deprecated, or the unscoped `auten` package which never published).
689
+
690
+ ### Default to high-level prompts, not low-level calls
691
+
692
+ When you have a goal, prefer:
693
+
694
+ ```ts
695
+ await auten.tasks.run({ device, prompt: "<natural language>" });
696
+ ```
697
+
698
+ Over hand-orchestrating taps/types. The relay's planner handles cross-app navigation, retries, and verification. Reach for `auten.phone(serial).tap(x, y)` only when you need precision the LLM can't reach (e.g. exact pixel placement in a canvas).
699
+
700
+ ### Picking a speed
701
+
702
+ - Use `fast` (default) for the first run of any new prompt or after a UI redesign.
703
+ - Use `lightning` for prompts that already succeeded once verified — replays will skip per-action `look()` and shave seconds.
704
+ - Use `instant` rarely; it's a middle point.
705
+
706
+ ### Error-handling pattern
707
+
708
+ ```ts
709
+ const task = await auten.tasks.run({ device, prompt }, { timeoutMs: 600_000 });
710
+
711
+ if (task.status === "completed" && task.verified === true) {
712
+ return task.result?.summary ?? "ok";
713
+ }
714
+
715
+ // completed-but-verifier-flagged: the agent thought it was done but the verifier disagreed
716
+ if (task.status === "completed" && task.verified === false) {
717
+ return { failed: true, reason: task.result?.verify_reason };
718
+ }
719
+
720
+ // runtime failures (device offline, max turns hit, etc.)
721
+ return { failed: true, reason: task.error?.message ?? "unknown" };
722
+ ```
723
+
724
+ ### When to use `phone.proxy()`
725
+
726
+ Almost never. The named methods (`tap`, `type`, `look`, etc.) cover the entire stable APK surface. `proxy()` exists for forward compatibility — if the APK ships a new endpoint that the SDK hasn't wrapped yet, you can call it without waiting for an SDK release. Don't reach for it on the first try.
727
+
728
+ ### Owner scoping summary
729
+
730
+ - Your key has an `ownerId`. Every resource you create gets that `ownerId`.
731
+ - You can't see other owners' resources. `auten.me()` shows your scope.
732
+ - The `is_root` flag (env-set admin key) sees everything — use it for ops, not for app code.
733
+
734
+ ### Deprecated surfaces — do NOT use
735
+
736
+ - `MobileAgent` class → replaced by `Auten`
737
+ - `/devices`, `/sessions`, `/recordings` paths → replaced by `/v1/...`
738
+ - `@autenai/cli` package → bundled into `@autenai/sdk` now
739
+ - `auten run` SSH-based command → replaced by `auten task` (HTTP)
740
+
741
+ If you find docs or example code mentioning these, treat as outdated.
742
+
743
+ ---
744
+
745
+ ## Versioning
746
+
747
+ `@autenai/sdk` follows semver:
748
+
749
+ - `0.x.y` — APIs may break between minor versions while we shake out the surface
750
+ - `1.0.0` — first frozen public surface (TBD)
751
+
752
+ Breaking changes always land in CHANGELOG with migration notes. Pre-1.0 deprecations are marked in JSDoc comments and shipped as runtime warnings for at least one minor version before removal.
753
+
754
+ Latest version: see https://www.npmjs.com/package/@autenai/sdk.
755
+
756
+ ---
757
+
152
758
  ## License
153
759
 
154
- MIT
760
+ MIT.