@autenai/sdk 0.4.0 → 0.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (88) hide show
  1. package/README.md +760 -0
  2. package/bin/auten.js +2 -0
  3. package/dist/cli/add-phone.d.ts +2 -0
  4. package/dist/cli/add-phone.d.ts.map +1 -0
  5. package/dist/cli/add-phone.js +254 -0
  6. package/dist/cli/add-phone.js.map +1 -0
  7. package/dist/cli/autocomplete.d.ts +31 -0
  8. package/dist/cli/autocomplete.d.ts.map +1 -0
  9. package/dist/cli/autocomplete.js +438 -0
  10. package/dist/cli/autocomplete.js.map +1 -0
  11. package/dist/cli/build-apk.d.ts +2 -0
  12. package/dist/cli/build-apk.d.ts.map +1 -0
  13. package/dist/cli/build-apk.js +97 -0
  14. package/dist/cli/build-apk.js.map +1 -0
  15. package/dist/cli/creds.d.ts +2 -0
  16. package/dist/cli/creds.d.ts.map +1 -0
  17. package/dist/cli/creds.js +161 -0
  18. package/dist/cli/creds.js.map +1 -0
  19. package/dist/cli/devices.d.ts +2 -0
  20. package/dist/cli/devices.d.ts.map +1 -0
  21. package/dist/cli/devices.js +38 -0
  22. package/dist/cli/devices.js.map +1 -0
  23. package/dist/cli/index.d.ts +2 -0
  24. package/dist/cli/index.d.ts.map +1 -0
  25. package/dist/cli/index.js +111 -0
  26. package/dist/cli/index.js.map +1 -0
  27. package/dist/cli/keys.d.ts +2 -0
  28. package/dist/cli/keys.d.ts.map +1 -0
  29. package/dist/cli/keys.js +88 -0
  30. package/dist/cli/keys.js.map +1 -0
  31. package/dist/cli/login.d.ts +2 -0
  32. package/dist/cli/login.d.ts.map +1 -0
  33. package/dist/cli/login.js +35 -0
  34. package/dist/cli/login.js.map +1 -0
  35. package/dist/cli/logo.d.ts +2 -0
  36. package/dist/cli/logo.d.ts.map +1 -0
  37. package/dist/cli/logo.js +11 -0
  38. package/dist/cli/logo.js.map +1 -0
  39. package/dist/cli/me.d.ts +2 -0
  40. package/dist/cli/me.d.ts.map +1 -0
  41. package/dist/cli/me.js +19 -0
  42. package/dist/cli/me.js.map +1 -0
  43. package/dist/cli/shell.d.ts +2 -0
  44. package/dist/cli/shell.d.ts.map +1 -0
  45. package/dist/cli/shell.js +22 -0
  46. package/dist/cli/shell.js.map +1 -0
  47. package/dist/cli/task.d.ts +2 -0
  48. package/dist/cli/task.d.ts.map +1 -0
  49. package/dist/cli/task.js +148 -0
  50. package/dist/cli/task.js.map +1 -0
  51. package/dist/cli/tasks.d.ts +2 -0
  52. package/dist/cli/tasks.d.ts.map +1 -0
  53. package/dist/cli/tasks.js +76 -0
  54. package/dist/cli/tasks.js.map +1 -0
  55. package/dist/cli/util.d.ts +46 -0
  56. package/dist/cli/util.d.ts.map +1 -0
  57. package/dist/cli/util.js +134 -0
  58. package/dist/cli/util.js.map +1 -0
  59. package/dist/package.json +56 -0
  60. package/dist/src/client.d.ts +92 -0
  61. package/dist/src/client.d.ts.map +1 -0
  62. package/dist/src/client.js +131 -0
  63. package/dist/src/client.js.map +1 -0
  64. package/dist/src/errors.d.ts +16 -0
  65. package/dist/src/errors.d.ts.map +1 -0
  66. package/dist/src/errors.js +31 -0
  67. package/dist/src/errors.js.map +1 -0
  68. package/dist/src/index.d.ts +7 -0
  69. package/dist/src/index.d.ts.map +1 -0
  70. package/dist/src/index.js +5 -0
  71. package/dist/src/index.js.map +1 -0
  72. package/dist/src/phone.d.ts +124 -0
  73. package/dist/src/phone.d.ts.map +1 -0
  74. package/dist/src/phone.js +129 -0
  75. package/dist/src/phone.js.map +1 -0
  76. package/dist/src/transport.d.ts +14 -0
  77. package/dist/src/transport.d.ts.map +1 -0
  78. package/dist/src/transport.js +71 -0
  79. package/dist/src/transport.js.map +1 -0
  80. package/dist/src/types.d.ts +142 -0
  81. package/dist/src/types.d.ts.map +1 -0
  82. package/dist/src/types.js +2 -0
  83. package/dist/src/types.js.map +1 -0
  84. package/package.json +43 -26
  85. package/dist/index.d.mts +0 -500
  86. package/dist/index.d.ts +0 -500
  87. package/dist/index.js +0 -450
  88. package/dist/index.mjs +0 -418
package/README.md ADDED
@@ -0,0 +1,760 @@
1
+ # @autenai/sdk
2
+
3
+ > Programmatic control of Android phones via the Auten relay — send tasks in plain English, query device state, manage encrypted credentials, watch live screens. SDK + CLI in one package.
4
+
5
+ ```bash
6
+ npm install @autenai/sdk
7
+ ```
8
+
9
+ ```ts
10
+ import { Auten } from "@autenai/sdk";
11
+
12
+ const auten = new Auten({ apiKey: process.env.AUTEN_API_KEY! });
13
+
14
+ const phone = await auten.devices.firstOnline();
15
+ const result = await auten.tasks.run({
16
+ device: phone!.serial,
17
+ prompt: "Open Calculator and compute 999 ÷ 3",
18
+ speed: "lightning",
19
+ });
20
+ console.log(result.verified, result.result?.summary);
21
+ // → true, "Calculator displays 333."
22
+ ```
23
+
24
+ ---
25
+
26
+ ## Table of contents
27
+
28
+ - [Concepts](#concepts)
29
+ - [Setup](#setup)
30
+ - [Quickstart](#quickstart)
31
+ - [SDK reference](#sdk-reference)
32
+ - [`new Auten(config)`](#new-autenconfig)
33
+ - [`auten.me()`](#autenme)
34
+ - [`auten.devices`](#autendevices)
35
+ - [`auten.tasks`](#autentasks)
36
+ - [`auten.keys`](#autenkeys)
37
+ - [`auten.phone(serial)`](#autenphoneserial)
38
+ - [CLI reference](#cli-reference)
39
+ - [REST API reference](#rest-api-reference)
40
+ - [Recipes](#recipes)
41
+ - [Errors](#errors)
42
+ - [Pitfalls](#pitfalls)
43
+ - [For AI agents](#for-ai-agents)
44
+ - [Versioning](#versioning)
45
+
46
+ ---
47
+
48
+ ## Concepts
49
+
50
+ | Term | Meaning |
51
+ |---|---|
52
+ | **Relay** | Server (`https://relay.auten.ai` by default) that sits between your code and the phones. Phones connect outbound (so they work behind NAT). |
53
+ | **Owner** | The principal an API key belongs to. All resources (devices, tasks, sessions, credentials) are filtered by `ownerId` server-side — your key only sees your stuff. |
54
+ | **Device** | A physical Android phone running the Auten APK. Registered to one owner. |
55
+ | **Task** | A natural-language goal dispatched to a phone ("open Chrome and search for X"). The agent on the relay decomposes it into per-tap actions. |
56
+ | **Plan** | Cleaned-up action sequence extracted from a verified-successful task. Future tasks with similar prompts replay the plan deterministically (cheap + fast) before invoking the LLM. |
57
+ | **Screen graph** | Per-device DAG of `(fromFP, action, toFP)` edges learned from every successful tap. Powers cached replay on familiar screens. |
58
+ | **Speed preset** | One knob (`fast` / `instant` / `lightning`) that drives every artificial delay during replay. See [speed table](#speed-presets). |
59
+
60
+ Tasks resolve in this order, cheapest first:
61
+
62
+ 1. **Synthesize** from the per-screen Step KB if every required label was observed before. No LLM call.
63
+ 2. **Replay** a similar past task's `cleanPlanJson`. Deterministic, label-based; auto-scrolls to find off-screen targets.
64
+ 3. **Delegate** to Claude Opus 4.7 via the engine loop — only when the first two miss.
65
+
66
+ You don't pick which path runs; the relay does. Your code just calls `auten.tasks.create` or `.run` and gets a result.
67
+
68
+ ---
69
+
70
+ ## Setup
71
+
72
+ ### 1. Get a key
73
+
74
+ Sign up at https://auten.ai, or contact your relay operator. Keys look like `sk_live_<48 hex>`.
75
+
76
+ ### 2. Save it
77
+
78
+ ```bash
79
+ $ npx @autenai/sdk login
80
+ Relay base URL [https://relay.auten.ai]:
81
+ API key: ********************************
82
+ ✓ Authenticated as my-team — 1 device(s), 0 task(s).
83
+ ✓ Saved to ~/.autenrc
84
+ ```
85
+
86
+ The CLI also reads `AUTEN_API_KEY` and `AUTEN_BASE_URL` from the environment, so CI doesn't need an `.autenrc`.
87
+
88
+ ### 3. Pair a phone
89
+
90
+ ```bash
91
+ auten add-phone # USB-tethered Samsung-style wizard
92
+ ```
93
+
94
+ Or have the operator pre-pair phones for you and share the serials.
95
+
96
+ ---
97
+
98
+ ## Quickstart
99
+
100
+ ### Run a task and wait
101
+
102
+ ```ts
103
+ import { Auten } from "@autenai/sdk";
104
+
105
+ const auten = new Auten({ apiKey: process.env.AUTEN_API_KEY! });
106
+
107
+ const result = await auten.tasks.run({
108
+ device: "a4e0eff201d020fd",
109
+ prompt: "Open Instagram and like the latest 5 posts on the feed",
110
+ speed: "fast",
111
+ timeout_seconds: 300,
112
+ });
113
+
114
+ if (result.verified) {
115
+ console.log("Done:", result.result?.summary);
116
+ } else {
117
+ console.warn("Verifier flagged failure:", result.result?.verify_reason);
118
+ }
119
+ ```
120
+
121
+ ### Fire-and-forget + poll later
122
+
123
+ ```ts
124
+ const { task_id, watch_url } = await auten.tasks.create({
125
+ device,
126
+ prompt: "Translate the article on news.tv3.lt and save the title to clipboard",
127
+ });
128
+
129
+ // browser-renderable live view
130
+ console.log(`Watch: ${auten.baseUrl}${watch_url}`);
131
+
132
+ // later:
133
+ const final = await auten.tasks.wait(task_id, { timeoutMs: 10 * 60_000 });
134
+ ```
135
+
136
+ ### Direct phone control
137
+
138
+ ```ts
139
+ const phone = auten.phone(device);
140
+
141
+ const screen = await phone.look(); // SoM screenshot + element list
142
+ const submit = screen.elements.find(e => /sign in/i.test(e.text ?? ""));
143
+ if (submit) await phone.tap(submit.x, submit.y);
144
+
145
+ await phone.openUrl("https://news.tv3.lt");
146
+ await phone.type("hello", { x: 540, y: 800 }); // accessibility ACTION_SET_TEXT, no keyboard
147
+ await phone.key("back");
148
+ ```
149
+
150
+ ---
151
+
152
+ ## SDK reference
153
+
154
+ ### `new Auten(config)`
155
+
156
+ ```ts
157
+ type AutenConfig = {
158
+ apiKey: string; // required
159
+ baseUrl?: string; // default: https://relay.auten.ai
160
+ timeoutMs?: number; // per-request, default 60_000
161
+ };
162
+ ```
163
+
164
+ Construct one client per process. The `Transport` underneath uses `fetch` (Node 18.17+ has it built in). Bearer auth is applied to every request.
165
+
166
+ `auten.baseUrl` getter returns the resolved base URL.
167
+
168
+ ### `auten.me()`
169
+
170
+ ```ts
171
+ auten.me(): Promise<{
172
+ owner_id: string;
173
+ key_id: string;
174
+ is_root: boolean; // env-key admin (sees all owners)
175
+ device_count: number; // owner's devices
176
+ task_count: number; // owner's tasks ever
177
+ }>
178
+ ```
179
+
180
+ Identity probe. Use after login to verify the key is valid and to log who's running.
181
+
182
+ ### `auten.devices`
183
+
184
+ ```ts
185
+ auten.devices.list(): Promise<Device[]>
186
+ auten.devices.get(serial: string): Promise<Device | null>
187
+ auten.devices.firstOnline(): Promise<Device | null>
188
+ auten.devices.stats(serial: string): Promise<{
189
+ serial: string;
190
+ online: boolean;
191
+ graph_edges: number;
192
+ task_count: number;
193
+ cache_hit_rate: number; // 0..1, fraction of past turns served from graph cache
194
+ }>
195
+ ```
196
+
197
+ ```ts
198
+ type Device = {
199
+ serial: string;
200
+ model: string | null;
201
+ online: boolean;
202
+ type: string; // "physical" | "emulator"
203
+ lastSeenAt: string | null; // ISO timestamp
204
+ androidVersion: string | null;
205
+ screenW: number | null;
206
+ screenH: number | null;
207
+ };
208
+ ```
209
+
210
+ `online` flips when the phone disconnects from the relay's WS reverse tunnel. Pollable; the relay updates it within a few seconds of disconnect.
211
+
212
+ ### `auten.tasks`
213
+
214
+ ```ts
215
+ type Speed = "fast" | "instant" | "lightning";
216
+ type TaskMode = "task" | "explore";
217
+ type TaskStatus = "queued" | "running" | "completed" | "failed" | "cancelled";
218
+
219
+ auten.tasks.create(input: {
220
+ device: string;
221
+ prompt: string;
222
+ mode?: TaskMode;
223
+ speed?: Speed;
224
+ webhook_url?: string;
225
+ webhook_secret?: string; // HMAC-SHA256 signing for webhook deliveries
226
+ timeout_seconds?: number; // default 300; 0 = no limit (explore mode)
227
+ }): Promise<{ task_id: string; status: string; watch_url: string }>
228
+
229
+ auten.tasks.get(id: string): Promise<Task>
230
+ auten.tasks.list(opts?: {
231
+ device?: string;
232
+ status?: TaskStatus;
233
+ limit?: number; // 1..100, default 25
234
+ }): Promise<Task[]>
235
+ auten.tasks.cancel(id: string): Promise<{ task_id: string; status: string }>
236
+
237
+ // Poll until terminal state. Returns the final Task.
238
+ auten.tasks.wait(id: string, opts?: {
239
+ intervalMs?: number; // default 1000
240
+ timeoutMs?: number; // default 300_000
241
+ }): Promise<Task>
242
+
243
+ // Sugar: create + wait. Same options as create + wait.
244
+ auten.tasks.run(input: CreateTaskInput, waitOpts?: WaitOpts): Promise<Task>
245
+ ```
246
+
247
+ ```ts
248
+ type Task = {
249
+ task_id: string;
250
+ device_serial: string;
251
+ prompt: string;
252
+ status: TaskStatus;
253
+ mode: TaskMode;
254
+ verified?: boolean | null;
255
+ result?: {
256
+ summary?: string;
257
+ cost_usd?: number;
258
+ duration_ms?: number;
259
+ verify_reason?: string; // verifier's explanation when verified=false
260
+ } | null;
261
+ error?: { code?: string; message?: string; retryable?: boolean } | null;
262
+ created_at?: string;
263
+ started_at?: string | null;
264
+ completed_at?: string | null;
265
+ turns?: TaskTurn[]; // present in get(), not list()
266
+ artifacts?: TaskArtifact[]; // screenshots, recordings, files saved during run
267
+ };
268
+
269
+ type TaskTurn = {
270
+ index: number;
271
+ source: string; // "cached" | "decide" | "delegate" | "replay"
272
+ label: string | null;
273
+ ok: boolean;
274
+ cost_usd: number;
275
+ duration_ms: number | null;
276
+ };
277
+ ```
278
+
279
+ #### Speed presets
280
+
281
+ | Preset | settle | postAction | look-after-each | Use when |
282
+ |---|---|---|---|---|
283
+ | `fast` (default) | 250ms | 350ms | yes | First runs of unfamiliar tasks; debugging. Human-pace. |
284
+ | `instant` | 50ms | 100ms | yes | Familiar flows that already verified once at `fast`. |
285
+ | `lightning` | 10ms | 30ms | **no** | Production replay of plans you trust. Skips per-step look — fastest, less verifiable mid-flight. |
286
+
287
+ `lightning` is the "this plan worked yesterday, just run it" knob. Don't use it for the first run of anything novel.
288
+
289
+ ### `auten.keys`
290
+
291
+ ```ts
292
+ auten.keys.list(): Promise<ApiKey[]> // never reveals full secrets
293
+ auten.keys.create(input?: {
294
+ name?: string;
295
+ ownerId?: string; // root-only override
296
+ }): Promise<ApiKeyWithSecret> // .key contains the full secret — printed once
297
+ auten.keys.revoke(id: string): Promise<{ ok: boolean; id: string }>
298
+ ```
299
+
300
+ Self-service rotation: mint a new key, deploy it, revoke the old one. Cache invalidates within 30s server-side.
301
+
302
+ ### `auten.phone(serial)`
303
+
304
+ Returns a `Phone` handle. All methods route through the relay's owner-scoped phone-proxy endpoint, which forwards via the WS reverse tunnel to the APK. The phone doesn't need to be reachable from your IP — only from the relay.
305
+
306
+ #### Vision
307
+
308
+ ```ts
309
+ phone.look(): Promise<LookResult> // SoM annotated screenshot + element list (accessibility tree)
310
+ phone.screenshot(): Promise<{ jpeg: string; width: number; height: number; durationMs: number }>
311
+ // raw pixels, no SoM, ~280ms faster than look()
312
+ ```
313
+
314
+ ```ts
315
+ type LookResult = {
316
+ annotated: string; // base64 JPEG with numbered markers
317
+ elements: SomElement[];
318
+ width: number;
319
+ height: number;
320
+ };
321
+
322
+ type SomElement = {
323
+ id: number; // SoM marker number, matches the image overlay
324
+ x: number; y: number; w: number; h: number; // pixel coords (center)
325
+ text?: string;
326
+ desc?: string; // content-description (accessibility label)
327
+ cls?: string; // class name (e.g. "android.widget.Button")
328
+ res?: string; // resource id (e.g. "com.app:id/submit_button")
329
+ clickable?: boolean;
330
+ editable?: boolean;
331
+ scrollable?: boolean;
332
+ via_ocr?: boolean; // synthesized from on-device OCR (Compose / Flutter / canvas UIs)
333
+ };
334
+ ```
335
+
336
+ #### Input
337
+
338
+ ```ts
339
+ phone.tap(x: number, y: number): Promise<{ ok: boolean }>
340
+ phone.longPress(x: number, y: number, durationMs?: number): Promise<{ ok: boolean }>
341
+ phone.swipe(x1: number, y1: number, x2: number, y2: number, durationMs?: number): Promise<{ ok: boolean }>
342
+
343
+ // `target` of an editable element bypasses the soft keyboard via accessibility
344
+ // ACTION_SET_TEXT — instant, no layout shift. Without target, routes through IME.
345
+ phone.type(text: string, target?: { x: number; y: number }): Promise<{ ok: boolean }>
346
+
347
+ phone.key(name: KeyName): Promise<{ ok: boolean }>
348
+ // KeyName: "back" | "home" | "recents" | "enter" | "delete" | "tab" | "menu"
349
+ // | "search" | "volume_up" | "volume_down" | "power"
350
+ ```
351
+
352
+ #### Apps
353
+
354
+ ```ts
355
+ phone.launch(packageName: string): Promise<{ ok: boolean }> // force-stops then launches
356
+ phone.stop(packageName: string): Promise<{ ok: boolean }>
357
+ phone.openUrl(url: string, pkg?: string): Promise<{ ok: boolean; url: string; package?: string }>
358
+ // ACTION_VIEW intent
359
+ ```
360
+
361
+ #### Other
362
+
363
+ ```ts
364
+ phone.status(): Promise<unknown> // APK health: { ok, service, version, accessibility }
365
+ phone.info(): Promise<unknown>
366
+ phone.reset(): Promise<{ ok: boolean; killedCount: number }> // kill all 3rd-party + key:home
367
+ phone.notifications(): Promise<unknown[]>
368
+ phone.clearNotifications(): Promise<{ ok: boolean }>
369
+ ```
370
+
371
+ #### Clipboard *(varies by APK build — see [Pitfalls](#pitfalls))*
372
+
373
+ ```ts
374
+ phone.clipboardSet(text: string): Promise<{ ok: boolean }>
375
+ phone.clipboardGet(): Promise<{ ok: boolean; text: string }>
376
+ phone.pasteClipboard(target?: { x: number; y: number }): Promise<{ ok: boolean }>
377
+ // ACTION_PASTE on focused or coord-targeted editable
378
+ ```
379
+
380
+ #### Task sugar
381
+
382
+ ```ts
383
+ phone.task(prompt: string, opts?: {
384
+ speed?: Speed;
385
+ mode?: TaskMode;
386
+ timeout_seconds?: number;
387
+ webhook_url?: string;
388
+ webhook_secret?: string;
389
+ }): Promise<{ task_id: string; status: string; watch_url: string }>
390
+ ```
391
+
392
+ Equivalent to `auten.tasks.create({ device: this.serial, prompt, ...opts })`.
393
+
394
+ #### Credentials
395
+
396
+ Encrypted server-side with AES; only ever decrypted into the agent's runtime when it actually needs to fill a form. Per-device.
397
+
398
+ ```ts
399
+ phone.credentials.save(input: {
400
+ service: string;
401
+ username?: string;
402
+ password?: string;
403
+ totp_secret?: string;
404
+ notes?: string;
405
+ [k: string]: unknown; // any extra fields are encrypted with the rest
406
+ }): Promise<{ ok: boolean; service: string }>
407
+
408
+ phone.credentials.list(): Promise<Credential[]> // each row carries `id` (UUID) + `deviceSerial`
409
+ phone.credentials.reveal<T>(service: string): Promise<T> // full payload, decrypted
410
+ phone.credentials.delete(service: string): Promise<{ ok: boolean }> // by service name
411
+ phone.credentials.deleteById(id: string): Promise<{ ok: boolean; id: string }> // by UUID
412
+
413
+ // Cross-device variants on the top-level client:
414
+ auten.credentials.list(): Promise<Credential[]> // every credential the owner has, across all devices
415
+ auten.credentials.deleteById(id: string): Promise<{ ok: boolean; id: string }> // by UUID, any device the owner has
416
+ ```
417
+
418
+ #### Escape hatch
419
+
420
+ ```ts
421
+ phone.proxy<T>(method: "GET" | "POST", path: string, body?: unknown, timeoutMs?: number): Promise<T>
422
+ ```
423
+
424
+ For APK endpoints not yet wrapped by the SDK. Forwards `{method, path, body}` through the WS tunnel as-is. Useful if the APK ships a new route before the SDK does.
425
+
426
+ ---
427
+
428
+ ## CLI reference
429
+
430
+ Installed as `auten` when the package is global, or via `npx @autenai/sdk <cmd>`.
431
+
432
+ | Command | Aliases | Description |
433
+ |---|---|---|
434
+ | `auten login` | | Save API key + relay URL to `~/.autenrc` (chmod 600). |
435
+ | `auten me` | `whoami` | Show whoami + counts for the calling key. |
436
+ | `auten devices` | `list`, `ls` | List devices belonging to your owner. |
437
+ | `auten add-phone` | `add` | Interactive USB-pair wizard. |
438
+ | `auten build-apk` | `build` | Build a fresh APK on the relay host (requires SSH access). |
439
+ | `auten task "<prompt>"` | `run` | Dispatch a task and follow until done. |
440
+ | `auten task --no-follow` | | Fire-and-forget; returns task id. |
441
+ | `auten task --device <serial>` | | Pin to a specific phone. |
442
+ | `auten task --speed lightning\|instant\|fast` | | Speed preset. |
443
+ | `auten tasks` | | List recent tasks. |
444
+ | `auten tasks <id>` | | Show one task in detail. |
445
+ | `auten creds add` | `save` | Save a service login (interactive — password input is masked). |
446
+ | `auten creds ls` | `list` | List saved services for a device. |
447
+ | `auten creds show <service>` | `reveal` | Print the full credential JSON (passwords included). |
448
+ | `auten creds rm <service>` | `delete` | Delete a credential. |
449
+ | `auten keys` | | List your API keys. |
450
+ | `auten keys create [name]` | `add`, `new` | Mint a new key — secret printed once. |
451
+ | `auten keys revoke <id>` | `rm`, `delete` | Disable a key (instant). |
452
+ | `auten version` | `-v` | Print package version. |
453
+ | `auten help` | `-h` | Print usage. |
454
+
455
+ Common flags across `creds`/`task`: `--device <serial>` to pin (defaults to `lastSerial` from `~/.autenrc`, then first online).
456
+
457
+ ---
458
+
459
+ ## REST API reference
460
+
461
+ The SDK is a thin wrapper over a small REST surface. If you're not using Node, hit it directly.
462
+
463
+ ### Auth
464
+
465
+ `Authorization: Bearer <api-key>` header on every request. Or `?apiKey=<key>` query param if you can't set headers (e.g. WS).
466
+
467
+ ### Endpoints
468
+
469
+ | Method | Path | Description |
470
+ |---|---|---|
471
+ | `GET` | `/v1/me` | Identity. |
472
+ | `GET` | `/v1/keys` | List your keys (no full secrets). |
473
+ | `POST` | `/v1/keys` | Mint a new key. Body: `{name?, ownerId?}`. Returns full secret once. |
474
+ | `DELETE` | `/v1/keys/:id` | Revoke a key. |
475
+ | `GET` | `/v1/devices` | List your devices. |
476
+ | `GET` | `/v1/devices/:serial/stats` | Per-device counters (graph edges, cache hit rate). |
477
+ | `GET` | `/v1/devices/:serial/graph` | Top 500 screen-transition edges for the device. |
478
+ | `POST` | `/v1/devices/:serial/proxy` | Forward `{method, path, body, timeout_ms?}` to the APK over the WS tunnel. The SDK's `phone.*` methods all route through this. |
479
+ | `GET` | `/v1/tasks` | List tasks. Query: `device`, `status`, `limit`. |
480
+ | `POST` | `/v1/tasks` | Create a task. Body: `{device_serial, prompt, mode?, speed?, webhook_url?, webhook_secret?, timeout_seconds?}`. |
481
+ | `GET` | `/v1/tasks/:id` | Get one task with turns + artifacts. |
482
+ | `POST` | `/v1/tasks/:id/cancel` | Cancel a running task. |
483
+ | `GET` | `/v1/credentials` | Every credential the caller owns, across all their devices. Each row has `id` + `deviceSerial`. |
484
+ | `DELETE` | `/v1/credentials/:id` | Delete one by UUID (any device the caller owns). |
485
+ | `POST` | `/v1/devices/:serial/credentials` | Save a credential. |
486
+ | `GET` | `/v1/devices/:serial/credentials` | One device's credentials (now includes `id`). |
487
+ | `GET` | `/v1/devices/:serial/credentials/:service/reveal` | Decrypt + return one by service name. |
488
+ | `DELETE` | `/v1/devices/:serial/credentials/:service` | Delete by service name. |
489
+ | `POST` | `/v1/transitions` | Used by APK to record passive learning edges. |
490
+ | `GET` | `/health` | Liveness probe (no auth). |
491
+ | `GET` | `/w/:serial?t=<watch-token>` | HTML viewer for the live phone screen + chat. |
492
+
493
+ ### curl example
494
+
495
+ ```bash
496
+ curl -X POST https://relay.auten.ai/v1/tasks \
497
+ -H "Authorization: Bearer $AUTEN_API_KEY" \
498
+ -H "Content-Type: application/json" \
499
+ -d '{
500
+ "device_serial": "a4e0eff201d020fd",
501
+ "prompt": "Compute 999 ÷ 3 in the calculator",
502
+ "speed": "lightning"
503
+ }'
504
+ # → {"task_id":"...","status":"running","watch_url":"/w/.../?t=..."}
505
+ ```
506
+
507
+ ### Webhook deliveries
508
+
509
+ If `webhook_url` was set on `tasks.create`, the relay POSTs to it on every status change with body:
510
+
511
+ ```json
512
+ {
513
+ "task_id": "...",
514
+ "status": "completed",
515
+ "result": { "summary": "...", "cost_usd": 0.03, "duration_ms": 12000, "verified": true },
516
+ "error": null
517
+ }
518
+ ```
519
+
520
+ Headers:
521
+ - `Content-Type: application/json`
522
+ - `X-Auten-Signature: sha256=<hex>` if `webhook_secret` was provided. Body is HMAC-SHA256-signed.
523
+
524
+ Verify with:
525
+
526
+ ```ts
527
+ import crypto from "node:crypto";
528
+ const expected = "sha256=" + crypto.createHmac("sha256", secret).update(rawBody).digest("hex");
529
+ if (!crypto.timingSafeEqual(Buffer.from(sig), Buffer.from(expected))) reject();
530
+ ```
531
+
532
+ ---
533
+
534
+ ## Recipes
535
+
536
+ ### Run a complex multi-step task
537
+
538
+ ```ts
539
+ const result = await auten.tasks.run({
540
+ device,
541
+ prompt: `
542
+ 1. Open Chrome and go to coingecko.com
543
+ 2. Find Bitcoin's average price for April 2026
544
+ 3. Open Calculator and compute that price ÷ 2
545
+ 4. Report the result
546
+ `.trim(),
547
+ speed: "fast",
548
+ timeout_seconds: 600,
549
+ });
550
+ ```
551
+
552
+ The agent will pick the path itself (web fetching, app switching, math). You don't pre-compose actions.
553
+
554
+ ### Save a login and let the agent use it
555
+
556
+ ```ts
557
+ await auten.phone(device).credentials.save({
558
+ service: "instagram",
559
+ username: "myhandle",
560
+ password: process.env.INSTAGRAM_PW!,
561
+ });
562
+
563
+ await auten.tasks.run({
564
+ device,
565
+ prompt: "Log into Instagram (credentials are saved as service=instagram) and post 'Hello from Auten' as a story",
566
+ });
567
+ ```
568
+
569
+ The agent calls `get_credentials("instagram")` server-side; the password is decrypted into the runtime, used, then dropped. It never lands in logs or the LLM context.
570
+
571
+ ### Stream live progress to a browser
572
+
573
+ ```ts
574
+ const { task_id, watch_url } = await auten.tasks.create({ device, prompt });
575
+ // share watch_url with the operator — it's signed (HMAC-SHA256, 1h expiry default)
576
+ console.log(`Live: https://relay.auten.ai${watch_url}`);
577
+ ```
578
+
579
+ The viewer is a vanilla HTML page (no auth needed beyond the signed token in the URL). It mirrors the phone screen + tool calls in real time.
580
+
581
+ ### Webhook-driven async pipeline
582
+
583
+ ```ts
584
+ // in your worker
585
+ const { task_id } = await auten.tasks.create({
586
+ device,
587
+ prompt,
588
+ webhook_url: "https://my-app.com/auten-webhook",
589
+ webhook_secret: process.env.WEBHOOK_SECRET,
590
+ });
591
+
592
+ // in your /auten-webhook handler:
593
+ app.post("/auten-webhook", async (req, res) => {
594
+ const sig = req.headers["x-auten-signature"];
595
+ const body = req.rawBody; // make sure you have raw bytes, not parsed JSON
596
+ // ... verify HMAC, then act on body.task_id + body.status
597
+ });
598
+ ```
599
+
600
+ ### Cross-app workflow
601
+
602
+ Mix high-level prompts with direct phone calls when you need precision:
603
+
604
+ ```ts
605
+ const phone = auten.phone(device);
606
+
607
+ // 1. natural-language step
608
+ await auten.tasks.run({ device, prompt: "Open Samsung Internet and go to icecode.lt" });
609
+
610
+ // 2. take over for a precise interaction
611
+ const screen = await phone.look();
612
+ const submit = screen.elements.find(e => /submit|send/i.test(e.text ?? ""));
613
+ if (submit) await phone.tap(submit.x, submit.y);
614
+
615
+ // 3. natural-language verification
616
+ const verify = await auten.tasks.run({ device, prompt: "Confirm the form said 'thank you' or similar" });
617
+ ```
618
+
619
+ ---
620
+
621
+ ## Errors
622
+
623
+ ```ts
624
+ import { AuthError, ApiError, DeviceOfflineError, AutenError } from "@autenai/sdk";
625
+
626
+ try {
627
+ await auten.tasks.create({ device, prompt });
628
+ } catch (err) {
629
+ if (err instanceof AuthError) // 401 / 403 — bad or revoked key
630
+ if (err instanceof DeviceOfflineError) // phone disconnected from WS
631
+ if (err instanceof ApiError) // generic HTTP error; .status has the code
632
+ if (err instanceof AutenError) // base class for all of the above
633
+ throw err;
634
+ }
635
+ ```
636
+
637
+ `ApiError.status === 0` indicates a network error before the relay was reached (DNS failure, timeout, etc.).
638
+
639
+ ---
640
+
641
+ ## Pitfalls
642
+
643
+ These are real failure modes you will hit. Read them once.
644
+
645
+ ### `phone.type()` returns `ok: true` but the text doesn't appear
646
+
647
+ Some custom Android editors (notably **Samsung Notes Compose canvas**, some Flutter apps, certain Compose-only inputs) ignore accessibility `ACTION_SET_TEXT`. The relay's IME path also fails on these.
648
+
649
+ **Detection:** call `phone.look()` after `type()` and check whether the text is visible. The relay returns `ok: true` because the call was *accepted*, not because the text *landed*.
650
+
651
+ **Workaround:** route through a different surface — a web form in the browser, the calculator, a regular EditText in a non-Compose app. The agent's task runner has the same limitation; it'll loop and the verifier will catch it. If you control the target app, expose the field via standard `EditText`.
652
+
653
+ ### `phone.clipboardSet()` may silently fail on Android 10+
654
+
655
+ Samsung and some other OEMs block background apps from writing to the system clipboard since Android 10. The APK call returns `ok: true` (write was accepted) but `clipboardGet()` returns empty. Treat clipboard as best-effort.
656
+
657
+ ### `lightning` speed skips per-action `look()`
658
+
659
+ The relay records `fpAfter: null` for actions taken in lightning mode, which means subsequent screen-graph cache hits use the post-replay terminal `look()` only. Fine for plans you trust; bad for first runs because failures don't get caught mid-flight.
660
+
661
+ ### Plans expire when the UI drifts
662
+
663
+ A `cleanPlanJson` keyed on labels ("Sign in", "Continue") survives most UI drift, but a major redesign breaks it. The verifier catches it (`verified: false`); the next run will replan via the LLM. No manual intervention needed — the failed plan auto-deprecates after `planUseFailure > planUseSuccess`.
664
+
665
+ ### Owner scoping is enforced server-side
666
+
667
+ Passing a `device_serial` you don't own returns `404 not found`, not `403 forbidden`, on purpose — we don't want to leak the existence of other owners' phones. If you're sure you should own it but get 404, run `auten devices` to confirm the relay has it under your `ownerId`.
668
+
669
+ ### `auten.tasks.wait` polls — there's no SSE yet
670
+
671
+ Each `wait` call hits `GET /v1/tasks/:id` every `intervalMs` (default 1s). For very long tasks, raise `intervalMs` to 5–10 s. Server-Sent Events streaming is on the roadmap; in the meantime, if you need real-time telemetry, use webhooks.
672
+
673
+ ---
674
+
675
+ ## For AI agents
676
+
677
+ This package is the **only** supported way to control Auten phones from JavaScript/TypeScript today. If you're an AI agent integrating it into another project, here's what to know up front.
678
+
679
+ ### Verify you're using the current SDK
680
+
681
+ The class name is `Auten` (not `MobileAgent`, which was a deprecated pre-1.0 surface). Endpoints live under `/v1/...`. Quick check:
682
+
683
+ ```ts
684
+ import { Auten } from "@autenai/sdk";
685
+ console.log(typeof Auten); // "function"
686
+ ```
687
+
688
+ If you're seeing `MobileAgent`, you're on an old version (`@autenai/sdk@<0.5.0`, deprecated, or the unscoped `auten` package which never published).
689
+
690
+ ### Default to high-level prompts, not low-level calls
691
+
692
+ When you have a goal, prefer:
693
+
694
+ ```ts
695
+ await auten.tasks.run({ device, prompt: "<natural language>" });
696
+ ```
697
+
698
+ Over hand-orchestrating taps/types. The relay's planner handles cross-app navigation, retries, and verification. Reach for `auten.phone(serial).tap(x, y)` only when you need precision the LLM can't reach (e.g. exact pixel placement in a canvas).
699
+
700
+ ### Picking a speed
701
+
702
+ - Use `fast` (default) for the first run of any new prompt or after a UI redesign.
703
+ - Use `lightning` for prompts that already succeeded once verified — replays will skip per-action `look()` and shave seconds.
704
+ - Use `instant` rarely; it's a middle point.
705
+
706
+ ### Error-handling pattern
707
+
708
+ ```ts
709
+ const task = await auten.tasks.run({ device, prompt }, { timeoutMs: 600_000 });
710
+
711
+ if (task.status === "completed" && task.verified === true) {
712
+ return task.result?.summary ?? "ok";
713
+ }
714
+
715
+ // completed-but-verifier-flagged: the agent thought it was done but the verifier disagreed
716
+ if (task.status === "completed" && task.verified === false) {
717
+ return { failed: true, reason: task.result?.verify_reason };
718
+ }
719
+
720
+ // runtime failures (device offline, max turns hit, etc.)
721
+ return { failed: true, reason: task.error?.message ?? "unknown" };
722
+ ```
723
+
724
+ ### When to use `phone.proxy()`
725
+
726
+ Almost never. The named methods (`tap`, `type`, `look`, etc.) cover the entire stable APK surface. `proxy()` exists for forward compatibility — if the APK ships a new endpoint that the SDK hasn't wrapped yet, you can call it without waiting for an SDK release. Don't reach for it on the first try.
727
+
728
+ ### Owner scoping summary
729
+
730
+ - Your key has an `ownerId`. Every resource you create gets that `ownerId`.
731
+ - You can't see other owners' resources. `auten.me()` shows your scope.
732
+ - The `is_root` flag (env-set admin key) sees everything — use it for ops, not for app code.
733
+
734
+ ### Deprecated surfaces — do NOT use
735
+
736
+ - `MobileAgent` class → replaced by `Auten`
737
+ - `/devices`, `/sessions`, `/recordings` paths → replaced by `/v1/...`
738
+ - `@autenai/cli` package → bundled into `@autenai/sdk` now
739
+ - `auten run` SSH-based command → replaced by `auten task` (HTTP)
740
+
741
+ If you find docs or example code mentioning these, treat as outdated.
742
+
743
+ ---
744
+
745
+ ## Versioning
746
+
747
+ `@autenai/sdk` follows semver:
748
+
749
+ - `0.x.y` — APIs may break between minor versions while we shake out the surface
750
+ - `1.0.0` — first frozen public surface (TBD)
751
+
752
+ Breaking changes always land in CHANGELOG with migration notes. Pre-1.0 deprecations are marked in JSDoc comments and shipped as runtime warnings for at least one minor version before removal.
753
+
754
+ Latest version: see https://www.npmjs.com/package/@autenai/sdk.
755
+
756
+ ---
757
+
758
+ ## License
759
+
760
+ MIT.