@onkernel/cua-cli 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md ADDED
@@ -0,0 +1,170 @@
1
+ # `@onkernel/cua-cli`
2
+
3
+ The CLI / TUI binary for the [`cua`](../../README.md) monorepo. Wires
4
+ [`@onkernel/cua-agent`](../agent)'s `CuaAgentHarness` to
5
+ [`pi-tui`](https://www.npmjs.com/package/@earendil-works/pi-tui) for an
6
+ interactive front-end and to
7
+ [`pi-coding-agent`](https://www.npmjs.com/package/@earendil-works/pi-coding-agent)'s
8
+ coding tools for workspace access.
9
+
10
+ ## Install (from the monorepo)
11
+
12
+ ```bash
13
+ # from the repo root:
14
+ npm install
15
+ # run directly from source via tsx (no global install required):
16
+ npx tsx packages/cli/src/cli.ts --help
17
+
18
+ # optional: pin a shell function in your rc so `cua` works from any cwd
19
+ # while preserving the caller's directory (so `--out`, transcript
20
+ # bucketing, and `.agents/skills` discovery use the directory you
21
+ # invoked from):
22
+ # CUA_REPO=/absolute/path/to/cua
23
+ # cua() { "$CUA_REPO/node_modules/.bin/tsx" "$CUA_REPO/packages/cli/src/cli.ts" "$@"; }
24
+ ```
25
+
26
+ ## Usage
27
+
28
+ ```bash
29
+ # Interactive TUI:
30
+ cua
31
+
32
+ # Single-shot prompt:
33
+ cua --print "open https://example.com and tell me the heading"
34
+
35
+ # Constrained one-shot subcommands (deterministic exit codes):
36
+ cua open https://example.com
37
+ cua click "Sign in button"
38
+ cua type "email field" "alice@example.com"
39
+ cua press ctrl l # Ctrl+L (focus address bar)
40
+ cua url
41
+ cua observe "what page is loaded?"
42
+ cua screenshot --out shot.png
43
+ cua do "buy a pair of socks on amazon" --max-steps 20
44
+
45
+ # List and pick supported models:
46
+ cua models
47
+ cua models -p openai
48
+ cua --print --model openai:gpt-5.5 "..."
49
+ cua --print --model anthropic:claude-opus-4-7 "..."
50
+ cua --print --model google:gemini-3-flash-preview "..."
51
+ cua --print --model yutori:n1.5-latest "..."
52
+
53
+ # Named sessions (browser stays alive across calls):
54
+ cua session start login # provisions Kernel browser
55
+ cua -s login open https://github.com/login
56
+ cua -s login type "email field" "$EMAIL"
57
+ cua -s login click "Sign in"
58
+ cua session stop login
59
+
60
+ cua session list # NAME / KERNEL_ID / AGE / LIVE_URL
61
+ cua session show login # full JSON metadata
62
+
63
+ # Resume a prior session transcript into a fresh browser:
64
+ cua --continue
65
+ cua --resume # picker
66
+ cua --session abc12345 # by id prefix
67
+ ```
68
+
69
+ ## Models
70
+
71
+ Run `cua models` to list every supported `-m` / `--model` value and the
72
+ provider it routes to. Filter by provider with `cua models -p openai`,
73
+ `cua models -p anthropic`, `cua models -p google` (alias: `gemini`), or
74
+ `cua models -p yutori`.
75
+
76
+ `-m` / `--model` accepts a provider-qualified `provider:model` ref (e.g.
77
+ `openai:gpt-5.5`) or a bare model id when it matches exactly one catalog
78
+ entry. The default is `openai:gpt-5.5`.
79
+
80
+ ## Configuration
81
+
82
+ Configuration is by environment variable. There is no config file.
83
+
84
+ | Env | Used for |
85
+ | -------------------- | ---------------------------------------------- |
86
+ | `KERNEL_API_KEY` | Kernel API key (required) |
87
+ | `OPENAI_API_KEY` | OpenAI API key (required when `-m openai:…`) |
88
+ | `ANTHROPIC_API_KEY` | Anthropic API key (required when `-m anthropic:…`) |
89
+ | `GOOGLE_API_KEY` | Google API key (required when `-m google:…`) |
90
+ | `GEMINI_API_KEY` | alias of `GOOGLE_API_KEY` |
91
+ | `TZAFON_API_KEY` | Tzafon API key (required when `-m tzafon:…`) |
92
+ | `YUTORI_API_KEY` | Yutori API key (required when `-m yutori:…`) |
93
+ | `KERNEL_BASE_URL` | override Kernel base URL |
94
+ | `OPENAI_BASE_URL` | override OpenAI base URL |
95
+ | `ANTHROPIC_BASE_URL` | override Anthropic base URL |
96
+ | `GOOGLE_BASE_URL` | override Google base URL |
97
+ | `TZAFON_BASE_URL` | override Tzafon base URL |
98
+ | `YUTORI_BASE_URL` | override Yutori base URL |
99
+ | `XDG_DATA_HOME` | sessions dir base (defaults to `~/.local/share`) |
100
+ | `CUA_IMAGE_PROTOCOL` | force inline image protocol (`kitty`/`iterm2`/`none`/`auto`) |
101
+
102
+ Use `--thinking <level>` (`off | minimal | low | medium | high | xhigh`,
103
+ default `low`) for providers that support reasoning effort.
104
+
105
+ ## Output formats
106
+
107
+ `--print` defaults to streaming text. Pass `-o jsonl` for one
108
+ structured event per line (good for scripting):
109
+
110
+ ```bash
111
+ cua --print -o jsonl "open https://example.com" \
112
+ | jq -c 'select(.type=="tool_call" or .type=="assistant_text_done")'
113
+ ```
114
+
115
+ Add `--jsonl-include-deltas` for assistant-token deltas and
116
+ `--jsonl-include-images` for base64 screenshots in `tool_result` events.
117
+
118
+ The first event of every `--print -o jsonl` run is
119
+ `session_created` with a `schema_version` field. The current schema
120
+ version is `1`. The `model` field carries a provider-qualified ref
121
+ (e.g. `openai:gpt-5.5`); use `parseCuaModelRef` from `@onkernel/cua-ai`
122
+ if you only need the bare model id.
123
+
124
+ ## Sessions and transcripts
125
+
126
+ `--print`, the interactive TUI, and any `-s <name>` invocation persist
127
+ a JSONL transcript to
128
+ `$XDG_DATA_HOME/cua/sessions/<cwd-hash>/<id>.jsonl` by default
129
+ (typically `~/.local/share/cua/sessions/...`). Pass `--no-session` to
130
+ keep a run in-memory only, or `--session-dir <path>` to override the
131
+ location.
132
+
133
+ For named sessions, the exact transcript path is in
134
+ `cua session show <name>` under `transcript_path`. See the
135
+ [Session transcripts section in the top-level README](../../README.md#session-transcripts)
136
+ for the JSONL schema and `jq` analysis examples.
137
+
138
+ ## Skills
139
+
140
+ `cua` follows the cross-agent
141
+ [`~/.agents/skills/`](https://agentskills.io) standard. Discovery
142
+ defaults:
143
+
144
+ - `~/.agents/skills/` (user-global)
145
+ - `<cwd>/.agents/skills/` (project-local)
146
+
147
+ Plus any explicit `--skill <path>` flags. Disable with `--no-skills`
148
+ (`-ns`).
149
+
150
+ Each skill's `name`, `description`, and file `location` are added to
151
+ the system prompt; the model uses the `read` tool to load a skill's
152
+ full body when its description matches the task. Use `/skill:<name>`
153
+ in a prompt to force-load a skill body inline.
154
+
155
+ ## Image protocol
156
+
157
+ Force the inline-screenshot protocol with `--image-protocol` or
158
+ `CUA_IMAGE_PROTOCOL`:
159
+
160
+ - `kitty` — Kitty graphics protocol (also covers Ghostty / WezTerm).
161
+ - `iterm2` — iTerm2 inline images.
162
+ - `none` — disable inline images; show a compact text card instead.
163
+ - `auto` — auto-detect based on `TERM_PROGRAM` / `TMUX` / etc. (default).
164
+
165
+ The TUI prints the resolved capability as the second header line so
166
+ you can see at a glance whether inline images will render.
167
+
168
+ ## License
169
+
170
+ MIT.