aipeek 0.2.6 → 0.2.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,10 +1,12 @@
1
1
  # aipeek
2
2
 
3
- Gives AI a peek into — and a hand on — your running browser app. Reads the UI tree (React fiber), semantic DOM, console, network, errors, and store state; drives the page (click/fill/press/wait/screenshot). All over plain-text HTTP on your Vite dev server — zero resident context cost, unlike a browser MCP whose tool schemas sit in the model's context whether used or not.
3
+ Gives AI a peek into — and a hand on — your running browser app. Reads the UI tree (React fiber), semantic DOM, console, network, errors, and store state; drives the page (click, fill, press, wait, drag/drop, clipboard, screenshot) and profiles it. All over plain-text HTTP on your Vite dev server — zero resident context cost, unlike a browser MCP whose tool schemas sit in the model's context whether used or not.
4
4
 
5
5
  **10× faster end-to-end.** What you feel is wall-clock from prompt to done — model thinking, round-trips, all of it. Screenshot agents (Playwright + vision) pay 2–5s of pixel-parsing *every step*; aipeek reads semantic text (instant) and batches a whole interaction into one round-trip with `/chain` — the model thinks once, not N times.
6
6
 
7
- It lives **inside** the open page (injected client + HMR channel), so it reads React/store internals a DOM-only driver can't, and acts on the current tab with no separate browser process. It does **not** open browsers, navigate, run headless, or fire real pointer events — it's the dev inner loop, not E2E. For that, use Playwright.
7
+ It lives **inside** the open page (injected client + HMR channel), so it reads React/store internals a DOM-only driver can't, and acts on the current tab with no separate browser process. It does **not** open browsers, navigate, or run headless — it's the dev inner loop, not E2E. For that, use Playwright. (It *can* fire trusted OS-level pointer events via `realclick` when a synthetic click won't do — see Actions.)
8
+
9
+ **Self-healing.** The transport rides the Vite HMR socket; the injected client re-announces itself on connect and polls a process-level `BOOT_ID` over HTTP, so a full dev-server restart re-handshakes automatically instead of stranding the page until a human hits ⌘R.
8
10
 
9
11
  ## Install
10
12
 
@@ -29,19 +31,30 @@ export default defineConfig({
29
31
 
30
32
  ## API Endpoints
31
33
 
32
- All endpoints are available on your Vite dev server:
34
+ All endpoints are available on your Vite dev server. Reads are listed cheapest-first.
33
35
 
34
36
  | Endpoint | Description |
35
37
  |----------|-------------|
36
- | `GET /__aipeek/screen` | **State-machine projection** — `{view, modal, focus, knobs}`. Start here. |
37
- | `GET /__aipeek` | Summary of all sections (UI, console, network, errors, state) |
38
- | `GET /__aipeek/{section}` | Detail for a section (`ui`, `console`, `network`, `errors`, `state`) |
38
+ | `GET /__aipeek/screen` | **State-machine projection** — `{view, modal, focus, knobs, domain}`. Each read prints a `token: tN`. Start here. |
39
+ | `GET /__aipeek/screen?since=tN` | **Delta** only what moved since token `tN` (view/modal/focus + new errors/failed requests), `(no state change)` if nothing did. The cheap "what changed after I acted" read. |
40
+ | `GET /__aipeek` | Summary of all sections (UI, console, network, errors, state); healthy sections fold to one line, issues expand |
41
+ | `GET /__aipeek?full` | Full dump — UI tree + console + network + errors + state |
42
+ | `GET /__aipeek/{section}` | Detail for a section (`ui`, `console`, `network`, `errors`, `state`, `profile`) |
39
43
  | `GET /__aipeek/{section}/{index}` | Detail for a specific item in a section |
40
44
  | `GET /__aipeek/{section}?full` | Full detail (no truncation) |
41
45
  | `GET /__aipeek/dom[?scope=Name\|?sel=css]` | Semantic DOM — UI as text (see below) |
46
+ | `GET /__aipeek/query?sel=css` | Read-side twin of `sel=`: a selector's live `count` + each match's `text`/`visible`/`attrs` (role, `data-state`, `aria-*`/`data-*`, value, disabled). Per-element assertions without `/eval`. |
47
+ | `GET /__aipeek/check` | Pass/fail health check (no console errors / no uncaught errors / no failed requests / UI rendered). `417` when any assertion fails. Use after a code change. |
48
+ | `GET /__aipeek/console` · `/network` · `/errors` · `/state` | Section shortcuts (same as `/{section}`) |
49
+ | `GET /__aipeek/profile` | Performance profiler — which component/function is burning frames. `/profile/reset` clears the window; `/profile/diff` gives a baseline→after IMPROVED/REGRESSED verdict |
42
50
  | `GET /__aipeek/{action}?...` | Drive the page (see Actions) |
51
+ | `GET /__aipeek/tabs` | List live tabs (id, visible/background, title) for `?tab=` addressing |
52
+ | `GET /__aipeek/timeline` | Interleaved action stream across **all** tabs in time order — who clicked what, and the resulting UI change |
43
53
  | `POST /__aipeek/chain` | Run a JSON array of actions in one round-trip (see Actions) |
44
- | `GET\|POST /__aipeek/eval` | Run arbitrary JS in the page (`?code=` or POST body); returns the result. Escape hatch for what typed endpoints can't do. |
54
+ | `GET\|POST /__aipeek/eval` | Run arbitrary JS in the page (`?code=` or POST body); returns the result. Escape hatch for what typed endpoints can't do — for count/text/state/attr checks reach for `/query` first. |
55
+
56
+ **Secret redaction.** Password inputs and API-key/token fields render as `‹redacted N chars›`
57
+ across `/dom`, `/query`, and `/screen` — the length stays visible, the value doesn't.
45
58
 
46
59
  ### Perception layers — UI as text, not pixels
47
60
 
@@ -51,8 +64,10 @@ information is already textual in the DOM. aipeek exposes four layers, cheapest
51
64
 
52
65
  - **`/screen`** — state-machine projection. The whole UI collapsed to what a human reads
53
66
  off a washing-machine panel: `view` (which area), `modal` (is something covering it),
54
- `focus`, and `knobs` (the few *reachable* controls now — repeated rows fold to `source ×N`,
55
- and when a modal is open only its subtree counts). A handful of lines. *Start here.*
67
+ `focus`, `knobs` (the few *reachable* controls now — repeated rows fold to `source ×N`,
68
+ and when a modal is open only its subtree counts), and `domain` (the app's own state
69
+ variables, if it sets `window.__AIPEEK_SCREEN__`). A handful of lines. *Start here.*
70
+ Append `?full` for untruncated output; pass `?since=tN` for just the delta.
56
71
  - **`/ui`** — React component tree. Full structure. Deep-dive when `/screen` isn't enough.
57
72
  - **`/dom`** — semantic DOM: `tag·role·semantic-class·data-*·state` per element, with
58
73
  Tailwind/atomic noise stripped and each line tagged with its **source location**
@@ -78,43 +93,103 @@ as the component boundary. Each line's `@File.tsx:line` then tells you exactly w
78
93
  | Endpoint | Params | Effect |
79
94
  |----------|--------|--------|
80
95
  | `/click` | `sel=` (CSS) or `text=` (visible text) | dispatch a real click |
81
- | `/fill` | `sel=`/`text=` + `value=` | set value on input/textarea/select; **contenteditable** via `execCommand` |
96
+ | `/fill` | `sel=`/`text=` + `value=` | set value on input/textarea/select via React's native value setter (fires onChange on **controlled** inputs); **contenteditable** via `execCommand` |
82
97
  | `/press` | `key=` (e.g. `Enter`, `Control+a`) | keydown/keyup on the focused element |
83
- | `/wait` | `text=`/`sel=`, `timeout=` (ms, default 5000) | poll until it appears; 504 on timeout |
98
+ | `/wait` | `text=`/`sel=`, `timeout=` (ms, default 5000), `gone=1` | poll until it appears (or, with `gone=1`, disappears); 504 on timeout |
99
+ | `/realclick` | `sel=`/`text=`, `button=left\|right` | **trusted** OS-level click via the extension's CDP channel — for popups/context-menus a synthetic click can't open. Electron fires it in-process |
100
+ | `/scrollIntoView` | `sel=`/`text=` | scroll a target into view (off-screen virtualized rows) |
101
+ | `/drag` | `sel=` + `to=` | synthetic pointer drag, source → destination (steps past dnd-kit's activation distance) |
102
+ | `/drop` | `sel=` + `files=a.png,b.pdf` | fire a file-drop (`DataTransfer`) on a target (synthetic Files trigger handlers, no byte content) |
103
+ | `/clipboard` | `mode=read\|write`, `value=` | seed or read the clipboard (needs the tab focused) |
84
104
  | `/screenshot` | `sel=`, `out=` | DOM→PNG into `.aipeek/`; skips cross-origin/broken images |
85
- | `POST /chain` | JSON array of `{type, sel?, text?, value?, key?, timeout?}` | run in sequence, settle between steps, stop on first failure |
105
+ | `POST /chain` | JSON array of steps | run in sequence, settle between steps, stop on first failure |
106
+
107
+ `click`/`fill`/`press` **settle the DOM and append `--- changed ---`** — only the state-machine
108
+ transition this action caused (`view: a → b`, `modal: opened X`, `focus: …`) plus any new
109
+ errors/failed requests, not a fresh snapshot. `(no state change)` means nothing moved. You read
110
+ the delta and drill into `/ui` or `/dom` for detail only if you need it. On a target miss,
111
+ `/click` and `/fill` return the reachable clickable elements (clipped to the open modal's
112
+ subtree) so you can re-target.
86
113
 
87
- `click`/`fill`/`press` **settle the DOM and append the UI tree after** (`--- ui after ---`)
88
- to the response no follow-up read needed. On a target miss, `/click` and `/fill` return the
89
- reachable clickable elements (clipped to the open modal's subtree) so you can re-target.
114
+ They also append a **`--- recent actions ---` timeline** the semantic page actions in order
115
+ (`T`=trusted human / `S`=synthetic aipeek), each with its resulting UI change, your own action
116
+ bracketed by `你当前的行为` dividers. If the user manipulates the page concurrently (closes a
117
+ dialog you opened), their action shows up in your next response — conflict surfaces automatically.
90
118
 
91
119
  A CSS `sel=` with non-ASCII or quotes/brackets must be URL-encoded, or the query parser
92
120
  mangles it: `curl -G .../click --data-urlencode 'sel=button[title="知识库"]'`.
93
121
 
94
- **Chain** packs a whole interaction into one round-trip:
122
+ **Multiple tabs.** Every read/drive command takes `?tab=<id>` to address one tab — including
123
+ a **background** one (drive the Chat tab while the user reads a different tab; synthetic events
124
+ and Electron `sendInputEvent` don't need foreground). `GET /tabs` lists the live ids. One tab
125
+ open → omit `?tab=`, it just works. Several open + no `?tab=` → the command returns `409` + the
126
+ tab list instead of randomly hitting one; pick an id and retry with `?tab=`. `GET /timeline`
127
+ interleaves every tab's actions in time order, so an A/B comparison (drive A, watch B react) is
128
+ one read.
129
+
130
+ **Federation across dev servers.** When several dev servers run at once (a micro-frontend, a
131
+ separate front/back, a teammate's machine), any command takes `?host=<host:port>` to reach a
132
+ *sibling* aipeek. The plugin you curl reverse-proxies the request server-side (no browser, no
133
+ CORS): `…/screen?host=localhost:5174` reads the app on :5174; combine with `?tab=` to point at
134
+ one tab over there. No registry — you name the peer, so list its tabs with `/tabs?host=…` first.
135
+
136
+ **Trusted input.** A control tagged `{needs-trusted?}` in `/screen` or `/dom` opens a popup
137
+ (`aria-haspopup`) a synthetic click may not trigger — use `/realclick` on it. Right-click menus
138
+ carry no DOM marker; reach for `/realclick` with `button=right` there. (Requires the browser
139
+ extension for a plain tab; Electron fires trusted events in-process.)
140
+
141
+ **Chain** packs a whole interaction into one round-trip. An `assert` step is a mid-chain judge —
142
+ `{type:"assert", screen, equals}` checks an app domain variable (from `window.__AIPEEK_SCREEN__`),
143
+ or `{type:"assert", sel, equals}` an element's text — and stops the chain with `asserted X=="Y",
144
+ actual "Z"` on mismatch:
95
145
 
96
146
  ```bash
97
147
  curl -X POST localhost:5195/__aipeek/chain -d '[
98
148
  {"type":"click","sel":"button[title=\"知识库\"]"},
99
149
  {"type":"wait","text":"Done"},
100
150
  {"type":"fill","sel":"textarea","value":"hi"},
151
+ {"type":"assert","screen":"streaming","equals":"false"},
101
152
  {"type":"press","key":"Enter"}
102
153
  ]'
103
154
  ```
104
155
 
156
+ **Connection diagnostics.** When a read can't get an answer, aipeek never returns a bare "no
157
+ tab" — it distinguishes, and tells you the fix: *tab connected but the handler hung* (check
158
+ `/console`), *tab closed / mid self-heal* (retry), *pages connected via HMR but the client
159
+ didn't load* (wrong vite server / `vite preview` / production build — check the port), or *no
160
+ browser pointed here at all* (open the app). `/profile` on a backgrounded tab says so too —
161
+ the browser throttles rAF for hidden tabs, so bring it foreground ~2s; it's the only read that
162
+ needs foreground.
163
+
105
164
  ## CLI
106
165
 
166
+ A thin wrapper over the HTTP endpoints, with colored `check` output:
167
+
107
168
  ```bash
108
- npx aipeek # fetch from localhost:5195
109
- npx aipeek --port=3000 # custom port
169
+ npx aipeek # full summary (localhost:5173)
170
+ npx aipeek check # health check (pass/fail)
171
+ npx aipeek console # console logs
172
+ npx aipeek network/0 # detail for one network request
173
+ npx aipeek errors/1 --full # untruncated error detail
174
+ npx aipeek --port=5195 # custom port
110
175
  ```
111
176
 
112
177
  ## Store Registration (optional)
113
178
 
114
- Register MobX/other stores for state inspection:
179
+ Register MobX/other stores so their snapshots appear in the `state` section:
115
180
 
116
181
  ```ts
117
182
  window.__AIPEEK_STORES__ = { myStore, anotherStore }
118
183
  ```
119
184
 
120
185
  State is snapshotted on demand (depth-limited, bounded) and included in the `<state>` section.
186
+
187
+ ## Domain projection (optional)
188
+
189
+ Let aipeek read your app's own state machine — the variables show up in `/screen`'s `domain`
190
+ block, in every `--- changed ---` diff, and are assertable in a chain (`{type:"assert", screen,
191
+ equals}`). Opt in by exposing a snapshot function:
192
+
193
+ ```ts
194
+ window.__AIPEEK_SCREEN__ = () => ({ view, streaming, modal: openDialog })
195
+ ```