@kata-sh/cli 0.1.0 → 0.1.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +21 -0
- package/README.md +156 -0
- package/dist/app-paths.d.ts +4 -0
- package/dist/app-paths.js +6 -0
- package/dist/cli.d.ts +1 -0
- package/dist/cli.js +56 -0
- package/dist/loader.d.ts +2 -0
- package/dist/loader.js +95 -0
- package/dist/resource-loader.d.ts +18 -0
- package/dist/resource-loader.js +50 -0
- package/dist/wizard.d.ts +15 -0
- package/dist/wizard.js +159 -0
- package/package.json +50 -21
- package/pkg/dist/modes/interactive/theme/dark.json +85 -0
- package/pkg/dist/modes/interactive/theme/light.json +84 -0
- package/pkg/dist/modes/interactive/theme/theme-schema.json +335 -0
- package/pkg/dist/modes/interactive/theme/theme.d.ts +78 -0
- package/pkg/dist/modes/interactive/theme/theme.d.ts.map +1 -0
- package/pkg/dist/modes/interactive/theme/theme.js +949 -0
- package/pkg/dist/modes/interactive/theme/theme.js.map +1 -0
- package/pkg/package.json +8 -0
- package/scripts/postinstall.js +45 -0
- package/src/resources/AGENTS.md +108 -0
- package/src/resources/KATA-WORKFLOW.md +661 -0
- package/src/resources/agents/researcher.md +29 -0
- package/src/resources/agents/scout.md +56 -0
- package/src/resources/agents/worker.md +31 -0
- package/src/resources/extensions/ask-user-questions.ts +200 -0
- package/src/resources/extensions/bg-shell/index.ts +2758 -0
- package/src/resources/extensions/browser-tools/BROWSER-TOOLS-V2-PROPOSAL.md +1277 -0
- package/src/resources/extensions/browser-tools/core.js +1057 -0
- package/src/resources/extensions/browser-tools/index.ts +4916 -0
- package/src/resources/extensions/browser-tools/package.json +20 -0
- package/src/resources/extensions/context7/index.ts +428 -0
- package/src/resources/extensions/context7/package.json +11 -0
- package/src/resources/extensions/get-secrets-from-user.ts +352 -0
- package/src/resources/extensions/github/formatters.ts +207 -0
- package/src/resources/extensions/github/gh-api.ts +537 -0
- package/src/resources/extensions/github/index.ts +778 -0
- package/src/resources/extensions/kata/activity-log.ts +88 -0
- package/src/resources/extensions/kata/auto.ts +2786 -0
- package/src/resources/extensions/kata/commands.ts +355 -0
- package/src/resources/extensions/kata/crash-recovery.ts +85 -0
- package/src/resources/extensions/kata/dashboard-overlay.ts +516 -0
- package/src/resources/extensions/kata/docs/preferences-reference.md +103 -0
- package/src/resources/extensions/kata/doctor.ts +683 -0
- package/src/resources/extensions/kata/files.ts +730 -0
- package/src/resources/extensions/kata/gitignore.ts +165 -0
- package/src/resources/extensions/kata/guided-flow.ts +976 -0
- package/src/resources/extensions/kata/index.ts +556 -0
- package/src/resources/extensions/kata/metrics.ts +397 -0
- package/src/resources/extensions/kata/observability-validator.ts +408 -0
- package/src/resources/extensions/kata/package.json +11 -0
- package/src/resources/extensions/kata/paths.ts +346 -0
- package/src/resources/extensions/kata/preferences.ts +695 -0
- package/src/resources/extensions/kata/prompt-loader.ts +50 -0
- package/src/resources/extensions/kata/prompts/complete-milestone.md +25 -0
- package/src/resources/extensions/kata/prompts/complete-slice.md +27 -0
- package/src/resources/extensions/kata/prompts/discuss.md +151 -0
- package/src/resources/extensions/kata/prompts/doctor-heal.md +29 -0
- package/src/resources/extensions/kata/prompts/execute-task.md +64 -0
- package/src/resources/extensions/kata/prompts/guided-complete-slice.md +1 -0
- package/src/resources/extensions/kata/prompts/guided-discuss-milestone.md +3 -0
- package/src/resources/extensions/kata/prompts/guided-discuss-slice.md +59 -0
- package/src/resources/extensions/kata/prompts/guided-execute-task.md +1 -0
- package/src/resources/extensions/kata/prompts/guided-plan-milestone.md +23 -0
- package/src/resources/extensions/kata/prompts/guided-plan-slice.md +1 -0
- package/src/resources/extensions/kata/prompts/guided-research-slice.md +11 -0
- package/src/resources/extensions/kata/prompts/guided-resume-task.md +1 -0
- package/src/resources/extensions/kata/prompts/plan-milestone.md +47 -0
- package/src/resources/extensions/kata/prompts/plan-slice.md +63 -0
- package/src/resources/extensions/kata/prompts/queue.md +85 -0
- package/src/resources/extensions/kata/prompts/reassess-roadmap.md +48 -0
- package/src/resources/extensions/kata/prompts/replan-slice.md +39 -0
- package/src/resources/extensions/kata/prompts/research-milestone.md +37 -0
- package/src/resources/extensions/kata/prompts/research-slice.md +28 -0
- package/src/resources/extensions/kata/prompts/run-uat.md +109 -0
- package/src/resources/extensions/kata/prompts/system.md +341 -0
- package/src/resources/extensions/kata/session-forensics.ts +550 -0
- package/src/resources/extensions/kata/skill-discovery.ts +137 -0
- package/src/resources/extensions/kata/state.ts +509 -0
- package/src/resources/extensions/kata/templates/context.md +76 -0
- package/src/resources/extensions/kata/templates/decisions.md +8 -0
- package/src/resources/extensions/kata/templates/milestone-summary.md +73 -0
- package/src/resources/extensions/kata/templates/plan.md +133 -0
- package/src/resources/extensions/kata/templates/preferences.md +15 -0
- package/src/resources/extensions/kata/templates/project.md +31 -0
- package/src/resources/extensions/kata/templates/reassessment.md +28 -0
- package/src/resources/extensions/kata/templates/requirements.md +81 -0
- package/src/resources/extensions/kata/templates/research.md +46 -0
- package/src/resources/extensions/kata/templates/roadmap.md +118 -0
- package/src/resources/extensions/kata/templates/slice-context.md +58 -0
- package/src/resources/extensions/kata/templates/slice-summary.md +99 -0
- package/src/resources/extensions/kata/templates/state.md +19 -0
- package/src/resources/extensions/kata/templates/task-plan.md +52 -0
- package/src/resources/extensions/kata/templates/task-summary.md +57 -0
- package/src/resources/extensions/kata/templates/uat.md +54 -0
- package/src/resources/extensions/kata/tests/activity-log-prune.test.ts +327 -0
- package/src/resources/extensions/kata/tests/auto-preflight.test.ts +97 -0
- package/src/resources/extensions/kata/tests/auto-supervisor.test.mjs +53 -0
- package/src/resources/extensions/kata/tests/complete-milestone.test.ts +317 -0
- package/src/resources/extensions/kata/tests/cost-projection.test.ts +160 -0
- package/src/resources/extensions/kata/tests/derive-state-deps.test.ts +477 -0
- package/src/resources/extensions/kata/tests/derive-state.test.ts +1013 -0
- package/src/resources/extensions/kata/tests/doctor.test.ts +718 -0
- package/src/resources/extensions/kata/tests/idle-recovery.test.ts +490 -0
- package/src/resources/extensions/kata/tests/metrics-io.test.ts +254 -0
- package/src/resources/extensions/kata/tests/metrics.test.ts +217 -0
- package/src/resources/extensions/kata/tests/must-have-parser.test.ts +309 -0
- package/src/resources/extensions/kata/tests/parsers.test.ts +1257 -0
- package/src/resources/extensions/kata/tests/plan-milestone.test.ts +185 -0
- package/src/resources/extensions/kata/tests/plan-quality-validator.test.ts +386 -0
- package/src/resources/extensions/kata/tests/reassess-prompt.test.ts +208 -0
- package/src/resources/extensions/kata/tests/replan-slice.test.ts +686 -0
- package/src/resources/extensions/kata/tests/requirements.test.ts +151 -0
- package/src/resources/extensions/kata/tests/resolve-ts-hooks.mjs +17 -0
- package/src/resources/extensions/kata/tests/resolve-ts.mjs +11 -0
- package/src/resources/extensions/kata/tests/run-uat.test.ts +383 -0
- package/src/resources/extensions/kata/tests/unit-runtime.test.ts +388 -0
- package/src/resources/extensions/kata/tests/workspace-index.test.ts +118 -0
- package/src/resources/extensions/kata/tests/worktree.test.ts +222 -0
- package/src/resources/extensions/kata/types.ts +159 -0
- package/src/resources/extensions/kata/unit-runtime.ts +163 -0
- package/src/resources/extensions/kata/workspace-index.ts +203 -0
- package/src/resources/extensions/kata/worktree.ts +182 -0
- package/src/resources/extensions/mac-tools/index.ts +852 -0
- package/src/resources/extensions/mac-tools/swift-cli/Package.swift +22 -0
- package/src/resources/extensions/mac-tools/swift-cli/Sources/main.swift +1318 -0
- package/src/resources/extensions/search-the-web/cache.ts +78 -0
- package/src/resources/extensions/search-the-web/format.ts +258 -0
- package/src/resources/extensions/search-the-web/http.ts +238 -0
- package/src/resources/extensions/search-the-web/index.ts +68 -0
- package/src/resources/extensions/search-the-web/tool-fetch-page.ts +519 -0
- package/src/resources/extensions/search-the-web/tool-llm-context.ts +404 -0
- package/src/resources/extensions/search-the-web/tool-search.ts +503 -0
- package/src/resources/extensions/search-the-web/url-utils.ts +91 -0
- package/src/resources/extensions/shared/confirm-ui.ts +126 -0
- package/src/resources/extensions/shared/interview-ui.ts +822 -0
- package/src/resources/extensions/shared/next-action-ui.ts +235 -0
- package/src/resources/extensions/shared/progress-widget.ts +282 -0
- package/src/resources/extensions/shared/thinking-widget.ts +107 -0
- package/src/resources/extensions/shared/ui.ts +400 -0
- package/src/resources/extensions/shared/wizard-ui.ts +551 -0
- package/src/resources/extensions/slash-commands/audit.ts +92 -0
- package/src/resources/extensions/slash-commands/create-extension.ts +375 -0
- package/src/resources/extensions/slash-commands/create-slash-command.ts +280 -0
- package/src/resources/extensions/slash-commands/index.ts +12 -0
- package/src/resources/extensions/slash-commands/kata-run.ts +34 -0
- package/src/resources/extensions/subagent/agents.ts +126 -0
- package/src/resources/extensions/subagent/index.ts +1293 -0
- package/src/resources/skills/debug-like-expert/SKILL.md +231 -0
- package/src/resources/skills/debug-like-expert/references/debugging-mindset.md +253 -0
- package/src/resources/skills/debug-like-expert/references/hypothesis-testing.md +373 -0
- package/src/resources/skills/debug-like-expert/references/investigation-techniques.md +337 -0
- package/src/resources/skills/debug-like-expert/references/verification-patterns.md +425 -0
- package/src/resources/skills/debug-like-expert/references/when-to-research.md +361 -0
- package/src/resources/skills/frontend-design/SKILL.md +45 -0
- package/src/resources/skills/swiftui/SKILL.md +208 -0
- package/src/resources/skills/swiftui/references/animations.md +921 -0
- package/src/resources/skills/swiftui/references/architecture.md +1561 -0
- package/src/resources/skills/swiftui/references/layout-system.md +1186 -0
- package/src/resources/skills/swiftui/references/navigation.md +1492 -0
- package/src/resources/skills/swiftui/references/networking-async.md +214 -0
- package/src/resources/skills/swiftui/references/performance.md +1706 -0
- package/src/resources/skills/swiftui/references/platform-integration.md +204 -0
- package/src/resources/skills/swiftui/references/state-management.md +1443 -0
- package/src/resources/skills/swiftui/references/swiftdata.md +297 -0
- package/src/resources/skills/swiftui/references/testing-debugging.md +247 -0
- package/src/resources/skills/swiftui/references/uikit-appkit-interop.md +218 -0
- package/src/resources/skills/swiftui/workflows/add-feature.md +191 -0
- package/src/resources/skills/swiftui/workflows/build-new-app.md +311 -0
- package/src/resources/skills/swiftui/workflows/debug-swiftui.md +192 -0
- package/src/resources/skills/swiftui/workflows/optimize-performance.md +197 -0
- package/src/resources/skills/swiftui/workflows/ship-app.md +203 -0
- package/src/resources/skills/swiftui/workflows/write-tests.md +235 -0
- package/dist/commands/task.d.ts +0 -9
- package/dist/commands/task.d.ts.map +0 -1
- package/dist/commands/task.js +0 -129
- package/dist/commands/task.js.map +0 -1
- package/dist/commands/task.test.d.ts +0 -2
- package/dist/commands/task.test.d.ts.map +0 -1
- package/dist/commands/task.test.js +0 -169
- package/dist/commands/task.test.js.map +0 -1
- package/dist/e2e/task-e2e.test.d.ts +0 -2
- package/dist/e2e/task-e2e.test.d.ts.map +0 -1
- package/dist/e2e/task-e2e.test.js +0 -173
- package/dist/e2e/task-e2e.test.js.map +0 -1
- package/dist/index.d.ts +0 -3
- package/dist/index.d.ts.map +0 -1
- package/dist/index.js +0 -93
- package/dist/index.js.map +0 -1
- package/dist/slug.d.ts +0 -2
- package/dist/slug.d.ts.map +0 -1
- package/dist/slug.js +0 -12
- package/dist/slug.js.map +0 -1
- package/dist/slug.test.d.ts +0 -2
- package/dist/slug.test.d.ts.map +0 -1
- package/dist/slug.test.js +0 -32
- package/dist/slug.test.js.map +0 -1
|
@@ -0,0 +1,1277 @@
|
|
|
1
|
+
# Browser Tools V2 Proposal
|
|
2
|
+
|
|
3
|
+
## Purpose
|
|
4
|
+
|
|
5
|
+
This document proposes a comprehensive evolution of `agent/extensions/browser-tools/` from a strong set of browser-control primitives into a world-class AI-native browser device for:
|
|
6
|
+
|
|
7
|
+
- autonomous verification
|
|
8
|
+
- end-to-end testing
|
|
9
|
+
- Kata slice validation
|
|
10
|
+
- debugging and observability
|
|
11
|
+
- general internet task execution
|
|
12
|
+
- low-token, high-reliability browser interaction
|
|
13
|
+
|
|
14
|
+
The goal is not just to let the agent click around in a browser. The goal is to give the agent **hands, eyes, memory, verification, and local judgment** in a way that is:
|
|
15
|
+
|
|
16
|
+
- context-efficient
|
|
17
|
+
- fast
|
|
18
|
+
- deterministic where possible
|
|
19
|
+
- observable when things fail
|
|
20
|
+
- composable for larger workflows
|
|
21
|
+
- optimized for LLM use, not human scripting ergonomics
|
|
22
|
+
|
|
23
|
+
---
|
|
24
|
+
|
|
25
|
+
## Executive Summary
|
|
26
|
+
|
|
27
|
+
The current browser tools already make several unusually good architectural choices:
|
|
28
|
+
|
|
29
|
+
- accessibility-first inspection instead of screenshot-first browsing
|
|
30
|
+
- deterministic versioned element refs
|
|
31
|
+
- compact post-action summaries instead of full DOM spam
|
|
32
|
+
- buffered observability surfaces for console, network, and dialogs
|
|
33
|
+
- lightweight success verification after actions
|
|
34
|
+
- adaptive settling instead of blindly waiting for `networkidle`
|
|
35
|
+
|
|
36
|
+
Those choices align well with March 2026 best practices in AI browser automation.
|
|
37
|
+
|
|
38
|
+
However, the current system still operates mostly as a **toolbox of action primitives**. To become a truly elite AI-native browser device, it should evolve in six major directions:
|
|
39
|
+
|
|
40
|
+
1. **Assertions over prose** — explicit PASS/FAIL verification tools
|
|
41
|
+
2. **Composite actions over chatty primitive loops** — batch, form fill, goal-oriented flows
|
|
42
|
+
3. **Diffs over full resnapshots** — tell the agent what changed, not just what exists now
|
|
43
|
+
4. **Stateful browser modeling** — tabs, frames, forms, dialogs, refs, action history
|
|
44
|
+
5. **Failure artifacts and observability** — traces, bundles, structured debug evidence
|
|
45
|
+
6. **Intent-aware semantic helpers** — find the best next element/action for a goal
|
|
46
|
+
|
|
47
|
+
If implemented well, these changes would make browser-tools materially better for both:
|
|
48
|
+
|
|
49
|
+
- **Kata automatic verification and UAT generation**
|
|
50
|
+
- **general-purpose agentic browser use on arbitrary websites and apps**
|
|
51
|
+
|
|
52
|
+
---
|
|
53
|
+
|
|
54
|
+
## Current State: What Browser Tools Already Does Well
|
|
55
|
+
|
|
56
|
+
The existing extension in `agent/extensions/browser-tools/index.ts` already gets several important things right.
|
|
57
|
+
|
|
58
|
+
### 1. Accessibility-first state representation
|
|
59
|
+
|
|
60
|
+
The system already prefers:
|
|
61
|
+
|
|
62
|
+
- `browser_get_accessibility_tree`
|
|
63
|
+
- `browser_find`
|
|
64
|
+
- `browser_snapshot_refs`
|
|
65
|
+
|
|
66
|
+
This is the correct strategic direction. Accessibility snapshots are usually far more token-efficient and reliable than:
|
|
67
|
+
|
|
68
|
+
- full HTML dumps
|
|
69
|
+
- screenshot-only operation
|
|
70
|
+
- coordinate-based automation
|
|
71
|
+
|
|
72
|
+
### 2. Deterministic element references
|
|
73
|
+
|
|
74
|
+
The versioned ref system (`@vN:e1`) is one of the strongest parts of the current design.
|
|
75
|
+
|
|
76
|
+
It provides:
|
|
77
|
+
|
|
78
|
+
- compact handles for later actions
|
|
79
|
+
- stale-ref detection
|
|
80
|
+
- lower repeated selector verbosity
|
|
81
|
+
- less guesswork for the model
|
|
82
|
+
|
|
83
|
+
This aligns closely with current agent-browser and Playwright MCP design patterns.
|
|
84
|
+
|
|
85
|
+
### 3. Compact post-action summaries
|
|
86
|
+
|
|
87
|
+
The `postActionSummary()` helper is a strong design decision.
|
|
88
|
+
|
|
89
|
+
It gives the agent:
|
|
90
|
+
|
|
91
|
+
- title
|
|
92
|
+
- URL
|
|
93
|
+
- high-level element counts
|
|
94
|
+
- important headings
|
|
95
|
+
- focus state
|
|
96
|
+
- dialog hints
|
|
97
|
+
|
|
98
|
+
without flooding context.
|
|
99
|
+
|
|
100
|
+
### 4. Pull-based observability
|
|
101
|
+
|
|
102
|
+
Buffered logs for:
|
|
103
|
+
|
|
104
|
+
- console
|
|
105
|
+
- network
|
|
106
|
+
- dialogs
|
|
107
|
+
|
|
108
|
+
are exactly the right pattern.
|
|
109
|
+
|
|
110
|
+
This prevents every tool call from becoming noisy while still preserving rich debugging when needed.
|
|
111
|
+
|
|
112
|
+
### 5. Built-in self-verification on interactions
|
|
113
|
+
|
|
114
|
+
The current tools already attempt to verify success through signals like:
|
|
115
|
+
|
|
116
|
+
- URL changes
|
|
117
|
+
- hash changes
|
|
118
|
+
- target ARIA state changes
|
|
119
|
+
- value changes
|
|
120
|
+
- focus changes
|
|
121
|
+
- dialog count changes
|
|
122
|
+
|
|
123
|
+
This is much better than blind action execution.
|
|
124
|
+
|
|
125
|
+
### 6. Adaptive settling
|
|
126
|
+
|
|
127
|
+
The mutation counter plus pending-critical-request model is clever and practical.
|
|
128
|
+
|
|
129
|
+
It is better than:
|
|
130
|
+
|
|
131
|
+
- fixed sleeps everywhere
|
|
132
|
+
- hard dependence on `networkidle`
|
|
133
|
+
- no settle logic at all
|
|
134
|
+
|
|
135
|
+
### 7. Sensible visual fallback strategy
|
|
136
|
+
|
|
137
|
+
The extension already uses screenshots as:
|
|
138
|
+
|
|
139
|
+
- navigation-time context
|
|
140
|
+
- explicit inspection output
|
|
141
|
+
- failure debugging evidence
|
|
142
|
+
|
|
143
|
+
That is good. Screenshots should support semantics, not replace them.
|
|
144
|
+
|
|
145
|
+
---
|
|
146
|
+
|
|
147
|
+
## Core Diagnosis
|
|
148
|
+
|
|
149
|
+
### What the current system is
|
|
150
|
+
|
|
151
|
+
Right now, browser-tools is primarily a **semantic browser control toolkit**.
|
|
152
|
+
|
|
153
|
+
That is already useful and better than many browser agent stacks.
|
|
154
|
+
|
|
155
|
+
### What it should become
|
|
156
|
+
|
|
157
|
+
It should become an **AI-native browser operating layer** that gives the model:
|
|
158
|
+
|
|
159
|
+
- reliable control
|
|
160
|
+
- compact semantic state
|
|
161
|
+
- explicit verification
|
|
162
|
+
- efficient action composition
|
|
163
|
+
- better local reasoning support
|
|
164
|
+
- durable debugging artifacts
|
|
165
|
+
|
|
166
|
+
### The central gap
|
|
167
|
+
|
|
168
|
+
The biggest gap is that the extension currently optimizes for **individual actions** more than **successful browser tasks**.
|
|
169
|
+
|
|
170
|
+
That difference matters.
|
|
171
|
+
|
|
172
|
+
An elite browser device for AI should optimize for:
|
|
173
|
+
|
|
174
|
+
- “did the task succeed?”
|
|
175
|
+
- “what changed?”
|
|
176
|
+
- “what should I do next?”
|
|
177
|
+
- “can I verify this automatically?”
|
|
178
|
+
- “if it failed, what evidence do I have?”
|
|
179
|
+
|
|
180
|
+
not just:
|
|
181
|
+
|
|
182
|
+
- “did the click happen?”
|
|
183
|
+
- “here is the current page summary”
|
|
184
|
+
|
|
185
|
+
---
|
|
186
|
+
|
|
187
|
+
## Design Principles for V2
|
|
188
|
+
|
|
189
|
+
The proposed system should follow these principles.
|
|
190
|
+
|
|
191
|
+
### 1. Semantics first, vision second
|
|
192
|
+
|
|
193
|
+
Preferred order of understanding:
|
|
194
|
+
|
|
195
|
+
1. structured semantic state
|
|
196
|
+
2. scoped accessibility/tree snapshots
|
|
197
|
+
3. ranked semantic refs
|
|
198
|
+
4. DOM or JS inspection when needed
|
|
199
|
+
5. screenshots only when semantics are insufficient or visual truth matters
|
|
200
|
+
|
|
201
|
+
### 2. Assertions are first-class
|
|
202
|
+
|
|
203
|
+
Every serious verification system needs explicit assertions.
|
|
204
|
+
|
|
205
|
+
Tool outputs should prefer structured verification objects over prose.
|
|
206
|
+
|
|
207
|
+
### 3. Minimize round trips
|
|
208
|
+
|
|
209
|
+
The fastest tool call is the one the model does not need to make.
|
|
210
|
+
|
|
211
|
+
Obvious action sequences should be batchable.
|
|
212
|
+
|
|
213
|
+
### 4. Model the browser as state, not just a stream of actions
|
|
214
|
+
|
|
215
|
+
The extension should internally track:
|
|
216
|
+
|
|
217
|
+
- pages/tabs
|
|
218
|
+
- frames
|
|
219
|
+
- dialogs
|
|
220
|
+
- form structures
|
|
221
|
+
- refs
|
|
222
|
+
- last known page summaries
|
|
223
|
+
- diffs across actions
|
|
224
|
+
- recent action outcomes
|
|
225
|
+
|
|
226
|
+
### 5. Tell the agent what changed
|
|
227
|
+
|
|
228
|
+
State deltas are often more useful than fresh full state.
|
|
229
|
+
|
|
230
|
+
### 6. Heavy artifacts belong on disk, not in context
|
|
231
|
+
|
|
232
|
+
Trace files, HAR data, visual diffs, and debug bundles should generally be persisted and summarized, not inlined.
|
|
233
|
+
|
|
234
|
+
### 7. Optimize for Kata verification
|
|
235
|
+
|
|
236
|
+
The browser device should be excellent at producing:
|
|
237
|
+
|
|
238
|
+
- deterministic pass/fail checks
|
|
239
|
+
- concise verification summaries
|
|
240
|
+
- debug artifacts on failure
|
|
241
|
+
- machine-usable evidence for slice/task summaries and UAT
|
|
242
|
+
|
|
243
|
+
---
|
|
244
|
+
|
|
245
|
+
## Proposed Changes
|
|
246
|
+
|
|
247
|
+
# 1. Add a First-Class Assertion System
|
|
248
|
+
|
|
249
|
+
## Proposal
|
|
250
|
+
|
|
251
|
+
Add a `browser_assert` tool and a small assertion language built around common browser verification needs.
|
|
252
|
+
|
|
253
|
+
## Why it matters
|
|
254
|
+
|
|
255
|
+
This is the single most important missing capability for Kata and autonomous QA.
|
|
256
|
+
|
|
257
|
+
Today the agent must infer correctness from prose and heuristics. That is weaker than explicit pass/fail evaluation.
|
|
258
|
+
|
|
259
|
+
## What it enables
|
|
260
|
+
|
|
261
|
+
- deterministic verification
|
|
262
|
+
- clean Kata artifact generation
|
|
263
|
+
- structured failure reporting
|
|
264
|
+
- simpler agent reasoning
|
|
265
|
+
- less repeated browser inspection
|
|
266
|
+
|
|
267
|
+
## Suggested assertion kinds
|
|
268
|
+
|
|
269
|
+
### Page state assertions
|
|
270
|
+
- `url_contains`
|
|
271
|
+
- `url_equals`
|
|
272
|
+
- `title_contains`
|
|
273
|
+
- `page_ready`
|
|
274
|
+
- `page_has_dialog`
|
|
275
|
+
- `page_has_alert`
|
|
276
|
+
|
|
277
|
+
### Element assertions
|
|
278
|
+
- `selector_visible`
|
|
279
|
+
- `selector_hidden`
|
|
280
|
+
- `ref_visible`
|
|
281
|
+
- `ref_enabled`
|
|
282
|
+
- `text_visible`
|
|
283
|
+
- `text_not_visible`
|
|
284
|
+
- `focused_matches`
|
|
285
|
+
- `value_equals`
|
|
286
|
+
- `value_contains`
|
|
287
|
+
- `checked_equals`
|
|
288
|
+
- `count_equals`
|
|
289
|
+
- `count_at_least`
|
|
290
|
+
|
|
291
|
+
### Accessibility assertions
|
|
292
|
+
- `aria_snapshot_contains`
|
|
293
|
+
- `aria_snapshot_matches`
|
|
294
|
+
- `role_name_exists`
|
|
295
|
+
- `dialog_open`
|
|
296
|
+
- `alert_visible`
|
|
297
|
+
|
|
298
|
+
### Observability assertions
|
|
299
|
+
- `no_console_errors`
|
|
300
|
+
- `network_request_seen`
|
|
301
|
+
- `response_status_seen`
|
|
302
|
+
- `no_failed_requests`
|
|
303
|
+
- `dialog_seen`
|
|
304
|
+
|
|
305
|
+
### Visual assertions
|
|
306
|
+
- `screenshot_changed`
|
|
307
|
+
- `element_visually_changed`
|
|
308
|
+
- `layout_breakpoint_ok`
|
|
309
|
+
|
|
310
|
+
## Suggested output shape
|
|
311
|
+
|
|
312
|
+
```json
|
|
313
|
+
{
|
|
314
|
+
"verified": true,
|
|
315
|
+
"checks": [
|
|
316
|
+
{
|
|
317
|
+
"name": "url_contains",
|
|
318
|
+
"passed": true,
|
|
319
|
+
"actual": "http://localhost:3000/dashboard",
|
|
320
|
+
"expected": "/dashboard"
|
|
321
|
+
},
|
|
322
|
+
{
|
|
323
|
+
"name": "no_console_errors",
|
|
324
|
+
"passed": true,
|
|
325
|
+
"actual": 0
|
|
326
|
+
}
|
|
327
|
+
],
|
|
328
|
+
"summary": "PASS (2/2 checks)",
|
|
329
|
+
"agent_hint": "Dashboard loaded without browser-side errors"
|
|
330
|
+
}
|
|
331
|
+
```
|
|
332
|
+
|
|
333
|
+
## Additional recommendation
|
|
334
|
+
|
|
335
|
+
Support both:
|
|
336
|
+
|
|
337
|
+
- single assertions
|
|
338
|
+
- multi-check assertions in one call
|
|
339
|
+
|
|
340
|
+
This keeps verification compact and expressive.
|
|
341
|
+
|
|
342
|
+
---
|
|
343
|
+
|
|
344
|
+
# 2. Add `browser_batch` for Composite Action Execution
|
|
345
|
+
|
|
346
|
+
## Proposal
|
|
347
|
+
|
|
348
|
+
Add a batch or transaction-style tool that executes multiple browser steps in a single tool call.
|
|
349
|
+
|
|
350
|
+
## Why it matters
|
|
351
|
+
|
|
352
|
+
This is one of the highest-ROI speed and token-efficiency improvements.
|
|
353
|
+
|
|
354
|
+
Many browser tasks currently require a chatty loop:
|
|
355
|
+
|
|
356
|
+
- find
|
|
357
|
+
- click
|
|
358
|
+
- type
|
|
359
|
+
- wait
|
|
360
|
+
- inspect
|
|
361
|
+
- verify
|
|
362
|
+
|
|
363
|
+
A batch tool collapses obvious sequential actions into one round trip.
|
|
364
|
+
|
|
365
|
+
## What it enables
|
|
366
|
+
|
|
367
|
+
- fewer tool invocations
|
|
368
|
+
- lower latency
|
|
369
|
+
- lower schema overhead
|
|
370
|
+
- less repetitive page-summary generation
|
|
371
|
+
- more deterministic execution of known action sequences
|
|
372
|
+
|
|
373
|
+
## Example
|
|
374
|
+
|
|
375
|
+
```json
|
|
376
|
+
{
|
|
377
|
+
"steps": [
|
|
378
|
+
{ "action": "click_ref", "ref": "@v3:e2" },
|
|
379
|
+
{ "action": "fill_ref", "ref": "@v3:e5", "text": "lex@example.com" },
|
|
380
|
+
{ "action": "fill_ref", "ref": "@v3:e6", "text": "password123" },
|
|
381
|
+
{ "action": "click_ref", "ref": "@v3:e7" },
|
|
382
|
+
{ "action": "wait_for", "condition": "url_contains", "value": "/dashboard" },
|
|
383
|
+
{ "action": "assert", "kind": "text_visible", "text": "Dashboard" }
|
|
384
|
+
],
|
|
385
|
+
"stopOnFailure": true,
|
|
386
|
+
"finalSummaryOnly": true
|
|
387
|
+
}
|
|
388
|
+
```
|
|
389
|
+
|
|
390
|
+
## Recommended options
|
|
391
|
+
|
|
392
|
+
- `stopOnFailure`
|
|
393
|
+
- `captureIntermediateState`
|
|
394
|
+
- `includeIntermediateDiagnostics`
|
|
395
|
+
- `finalSummaryOnly`
|
|
396
|
+
- `returnStepResults`
|
|
397
|
+
|
|
398
|
+
## Design note
|
|
399
|
+
|
|
400
|
+
This should not replace primitive tools. It should sit above them.
|
|
401
|
+
|
|
402
|
+
---
|
|
403
|
+
|
|
404
|
+
# 3. Add `browser_diff` to Report What Changed
|
|
405
|
+
|
|
406
|
+
## Proposal
|
|
407
|
+
|
|
408
|
+
Add a diff tool that compares two browser states or the pre/post state around an action.
|
|
409
|
+
|
|
410
|
+
## Why it matters
|
|
411
|
+
|
|
412
|
+
The model frequently needs to answer:
|
|
413
|
+
|
|
414
|
+
- did the click do anything?
|
|
415
|
+
- what changed after submit?
|
|
416
|
+
- what new UI appeared?
|
|
417
|
+
- what should I inspect next?
|
|
418
|
+
|
|
419
|
+
A change summary is usually more useful than a fresh full snapshot.
|
|
420
|
+
|
|
421
|
+
## What it enables
|
|
422
|
+
|
|
423
|
+
- faster reasoning after actions
|
|
424
|
+
- better success detection
|
|
425
|
+
- lower token usage
|
|
426
|
+
- easier failure diagnosis
|
|
427
|
+
- improved “next action” selection
|
|
428
|
+
|
|
429
|
+
## Suggested diff dimensions
|
|
430
|
+
|
|
431
|
+
- URL change
|
|
432
|
+
- title change
|
|
433
|
+
- focus change
|
|
434
|
+
- dialog open/close
|
|
435
|
+
- heading additions/removals
|
|
436
|
+
- new alerts/errors/toasts
|
|
437
|
+
- interactive element count changes
|
|
438
|
+
- text changes in scoped region
|
|
439
|
+
- ARIA subtree changes
|
|
440
|
+
- validation error changes
|
|
441
|
+
- scroll position changes
|
|
442
|
+
- form state changes
|
|
443
|
+
|
|
444
|
+
## Example output
|
|
445
|
+
|
|
446
|
+
```json
|
|
447
|
+
{
|
|
448
|
+
"changed": true,
|
|
449
|
+
"changes": [
|
|
450
|
+
{ "type": "url", "before": "/login", "after": "/dashboard" },
|
|
451
|
+
{ "type": "dialog_closed", "value": "Sign in" },
|
|
452
|
+
{ "type": "new_heading", "value": "Dashboard" }
|
|
453
|
+
],
|
|
454
|
+
"summary": "Navigation completed and login modal closed",
|
|
455
|
+
"agent_hint": "Authentication likely succeeded"
|
|
456
|
+
}
|
|
457
|
+
```
|
|
458
|
+
|
|
459
|
+
## Implementation note
|
|
460
|
+
|
|
461
|
+
A lightweight internal state snapshot should be stored after major actions so diffs are cheap.
|
|
462
|
+
|
|
463
|
+
---
|
|
464
|
+
|
|
465
|
+
# 4. Add Form Intelligence
|
|
466
|
+
|
|
467
|
+
## Proposal
|
|
468
|
+
|
|
469
|
+
Add form-specific analysis and fill tools.
|
|
470
|
+
|
|
471
|
+
### New tools
|
|
472
|
+
- `browser_analyze_form`
|
|
473
|
+
- `browser_fill_form`
|
|
474
|
+
|
|
475
|
+
## Why it matters
|
|
476
|
+
|
|
477
|
+
A large percentage of browser tasks are fundamentally form tasks:
|
|
478
|
+
|
|
479
|
+
- sign in
|
|
480
|
+
- sign up
|
|
481
|
+
- checkout
|
|
482
|
+
- onboarding
|
|
483
|
+
- search
|
|
484
|
+
- settings
|
|
485
|
+
- admin actions
|
|
486
|
+
- content publishing
|
|
487
|
+
|
|
488
|
+
Forms are one of the highest-leverage abstractions in browser automation.
|
|
489
|
+
|
|
490
|
+
## What it enables
|
|
491
|
+
|
|
492
|
+
- fewer calls for common flows
|
|
493
|
+
- stronger semantic mapping between labels and inputs
|
|
494
|
+
- automatic handling of required fields and validation messages
|
|
495
|
+
- better submit targeting
|
|
496
|
+
- more robust Kata verification of user flows
|
|
497
|
+
|
|
498
|
+
## `browser_analyze_form` should return
|
|
499
|
+
|
|
500
|
+
- form purpose inference
|
|
501
|
+
- fields and labels
|
|
502
|
+
- field types
|
|
503
|
+
- required status
|
|
504
|
+
- current values
|
|
505
|
+
- current validation errors
|
|
506
|
+
- submit controls
|
|
507
|
+
- grouped sections
|
|
508
|
+
- likely primary action
|
|
509
|
+
|
|
510
|
+
## `browser_fill_form` should support
|
|
511
|
+
|
|
512
|
+
```json
|
|
513
|
+
{
|
|
514
|
+
"selector": "form",
|
|
515
|
+
"values": {
|
|
516
|
+
"email": "lex@example.com",
|
|
517
|
+
"password": "hunter2"
|
|
518
|
+
},
|
|
519
|
+
"submit": true,
|
|
520
|
+
"strict": false
|
|
521
|
+
}
|
|
522
|
+
```
|
|
523
|
+
|
|
524
|
+
## Important design behavior
|
|
525
|
+
|
|
526
|
+
It should map values by:
|
|
527
|
+
|
|
528
|
+
- label text
|
|
529
|
+
- accessible name
|
|
530
|
+
- field name
|
|
531
|
+
- placeholder when needed
|
|
532
|
+
- form-local semantic inference
|
|
533
|
+
|
|
534
|
+
## Recommended output
|
|
535
|
+
|
|
536
|
+
- matched fields
|
|
537
|
+
- unmatched requested values
|
|
538
|
+
- fields skipped
|
|
539
|
+
- validation state after fill
|
|
540
|
+
- submit result summary
|
|
541
|
+
|
|
542
|
+
---
|
|
543
|
+
|
|
544
|
+
# 5. Add Intent-Ranked Element Retrieval
|
|
545
|
+
|
|
546
|
+
## Proposal
|
|
547
|
+
|
|
548
|
+
Add a smarter semantic finder, such as `browser_find_best`.
|
|
549
|
+
|
|
550
|
+
## Why it matters
|
|
551
|
+
|
|
552
|
+
The current `browser_find` is useful but still fairly literal. Agents often need a ranked answer to questions like:
|
|
553
|
+
|
|
554
|
+
- what is the primary CTA?
|
|
555
|
+
- which button submits this form?
|
|
556
|
+
- which textbox is the email field?
|
|
557
|
+
- what element most likely advances login?
|
|
558
|
+
- which visible error is most relevant right now?
|
|
559
|
+
|
|
560
|
+
## What it enables
|
|
561
|
+
|
|
562
|
+
- better action selection
|
|
563
|
+
- fewer failed clicks
|
|
564
|
+
- less token spent interpreting noisy candidate lists
|
|
565
|
+
- more autonomous local decisions
|
|
566
|
+
|
|
567
|
+
## Example
|
|
568
|
+
|
|
569
|
+
```json
|
|
570
|
+
{
|
|
571
|
+
"intent": "submit login form",
|
|
572
|
+
"candidates": [
|
|
573
|
+
{
|
|
574
|
+
"ref": "@v5:e7",
|
|
575
|
+
"score": 0.93,
|
|
576
|
+
"reason": "button in same form as email and password fields named Sign in"
|
|
577
|
+
},
|
|
578
|
+
{
|
|
579
|
+
"ref": "@v5:e9",
|
|
580
|
+
"score": 0.41,
|
|
581
|
+
"reason": "secondary link outside form"
|
|
582
|
+
}
|
|
583
|
+
]
|
|
584
|
+
}
|
|
585
|
+
```
|
|
586
|
+
|
|
587
|
+
## Suggested intents
|
|
588
|
+
|
|
589
|
+
- submit form
|
|
590
|
+
- primary CTA
|
|
591
|
+
- close dialog
|
|
592
|
+
- search field
|
|
593
|
+
- next step
|
|
594
|
+
- destructive action
|
|
595
|
+
- auth action
|
|
596
|
+
- error surface
|
|
597
|
+
- back navigation
|
|
598
|
+
- menu trigger
|
|
599
|
+
|
|
600
|
+
## Design recommendation
|
|
601
|
+
|
|
602
|
+
This should be deterministic heuristic ranking first, not a hidden LLM.
|
|
603
|
+
|
|
604
|
+
---
|
|
605
|
+
|
|
606
|
+
# 6. Upgrade the Ref System
|
|
607
|
+
|
|
608
|
+
## Proposal
|
|
609
|
+
|
|
610
|
+
Keep versioned refs, but evolve them into a richer semantic reference system.
|
|
611
|
+
|
|
612
|
+
## Why it matters
|
|
613
|
+
|
|
614
|
+
Refs are the backbone of efficient browser interaction. The current system is good; the next step is to make refs more resilient, more semantic, and more useful across changing DOMs.
|
|
615
|
+
|
|
616
|
+
## What it enables
|
|
617
|
+
|
|
618
|
+
- lower selector dependence
|
|
619
|
+
- better recovery from DOM churn
|
|
620
|
+
- more compact instructions
|
|
621
|
+
- clearer reasoning for the agent
|
|
622
|
+
|
|
623
|
+
## Proposed upgrades
|
|
624
|
+
|
|
625
|
+
### A. Snapshot modes
|
|
626
|
+
Allow specialized snapshot modes:
|
|
627
|
+
|
|
628
|
+
- `interactive`
|
|
629
|
+
- `form`
|
|
630
|
+
- `dialog`
|
|
631
|
+
- `navigation`
|
|
632
|
+
- `errors`
|
|
633
|
+
- `headings`
|
|
634
|
+
- `visible_only`
|
|
635
|
+
|
|
636
|
+
This reduces token waste and improves relevance.
|
|
637
|
+
|
|
638
|
+
### B. Better internal fingerprints
|
|
639
|
+
Track more stable descriptors:
|
|
640
|
+
|
|
641
|
+
- role
|
|
642
|
+
- accessible name
|
|
643
|
+
- type
|
|
644
|
+
- href
|
|
645
|
+
- form ownership
|
|
646
|
+
- ancestry signature
|
|
647
|
+
- relative region
|
|
648
|
+
- label association
|
|
649
|
+
- nearby headings
|
|
650
|
+
|
|
651
|
+
This helps ref remapping across light DOM changes.
|
|
652
|
+
|
|
653
|
+
### C. Semantic aliases
|
|
654
|
+
Potentially expose alias-like labels such as:
|
|
655
|
+
|
|
656
|
+
- primary submit
|
|
657
|
+
- close dialog
|
|
658
|
+
- current tab
|
|
659
|
+
- email field
|
|
660
|
+
- password field
|
|
661
|
+
|
|
662
|
+
Even if these remain derived rather than canonical, they can improve action clarity.
|
|
663
|
+
|
|
664
|
+
### D. Scoped ref groups
|
|
665
|
+
Allow refs generated per region:
|
|
666
|
+
|
|
667
|
+
- within dialog
|
|
668
|
+
- within main
|
|
669
|
+
- within form
|
|
670
|
+
- within sidebar
|
|
671
|
+
|
|
672
|
+
This helps reduce ambiguity.
|
|
673
|
+
|
|
674
|
+
---
|
|
675
|
+
|
|
676
|
+
# 7. Add Browser Session Modeling: Tabs, Pages, Frames
|
|
677
|
+
|
|
678
|
+
## Proposal
|
|
679
|
+
|
|
680
|
+
Promote the internal browser model from “single active page” to a real page registry.
|
|
681
|
+
|
|
682
|
+
### New tools
|
|
683
|
+
- `browser_list_pages`
|
|
684
|
+
- `browser_switch_page`
|
|
685
|
+
- `browser_close_page`
|
|
686
|
+
- `browser_list_frames`
|
|
687
|
+
- `browser_select_frame`
|
|
688
|
+
|
|
689
|
+
## Why it matters
|
|
690
|
+
|
|
691
|
+
Real browser flows often involve:
|
|
692
|
+
|
|
693
|
+
- popups
|
|
694
|
+
- auth redirects
|
|
695
|
+
- payment tabs
|
|
696
|
+
- docs tabs
|
|
697
|
+
- embedded auth iframes
|
|
698
|
+
- admin consoles with frames
|
|
699
|
+
|
|
700
|
+
A single global `page` pointer does not scale well.
|
|
701
|
+
|
|
702
|
+
## What it enables
|
|
703
|
+
|
|
704
|
+
- more reliable multi-tab flows
|
|
705
|
+
- less hidden state confusion
|
|
706
|
+
- better popup handling
|
|
707
|
+
- frame-aware automation
|
|
708
|
+
- clearer debugging when navigation opens a new surface
|
|
709
|
+
|
|
710
|
+
## Recommended session model
|
|
711
|
+
|
|
712
|
+
Track:
|
|
713
|
+
|
|
714
|
+
- page id
|
|
715
|
+
- opener relationship
|
|
716
|
+
- title
|
|
717
|
+
- URL
|
|
718
|
+
- last active time
|
|
719
|
+
- frame inventory
|
|
720
|
+
- whether page was auto-opened or explicitly targeted
|
|
721
|
+
|
|
722
|
+
## Design recommendation
|
|
723
|
+
|
|
724
|
+
Auto-switching to a newly opened page is still useful, but should be visible and inspectable.
|
|
725
|
+
|
|
726
|
+
---
|
|
727
|
+
|
|
728
|
+
# 8. Add Tracing and Failure Artifacts
|
|
729
|
+
|
|
730
|
+
## Proposal
|
|
731
|
+
|
|
732
|
+
Add explicit debug artifact tools.
|
|
733
|
+
|
|
734
|
+
### New tools
|
|
735
|
+
- `browser_trace_start`
|
|
736
|
+
- `browser_trace_stop`
|
|
737
|
+
- `browser_export_har`
|
|
738
|
+
- `browser_debug_bundle`
|
|
739
|
+
- `browser_timeline`
|
|
740
|
+
- `browser_session_summary`
|
|
741
|
+
|
|
742
|
+
## Why it matters
|
|
743
|
+
|
|
744
|
+
For Kata and for hard UI debugging, you need failure evidence that survives the current context window.
|
|
745
|
+
|
|
746
|
+
## What it enables
|
|
747
|
+
|
|
748
|
+
- durable debugging artifacts
|
|
749
|
+
- post-failure inspection without replaying everything
|
|
750
|
+
- easier handoff across sessions or agents
|
|
751
|
+
- structured evidence for summaries and UAT docs
|
|
752
|
+
|
|
753
|
+
## `browser_debug_bundle` should ideally include
|
|
754
|
+
|
|
755
|
+
- current URL/title
|
|
756
|
+
- viewport
|
|
757
|
+
- recent actions
|
|
758
|
+
- compact recent warnings
|
|
759
|
+
- recent console errors
|
|
760
|
+
- recent failed/important requests
|
|
761
|
+
- active dialogs
|
|
762
|
+
- screenshot path or inline thumbnail
|
|
763
|
+
- scoped AX snapshot near likely failure area
|
|
764
|
+
- trace path if enabled
|
|
765
|
+
- concise failure hypothesis
|
|
766
|
+
|
|
767
|
+
## Artifact policy
|
|
768
|
+
|
|
769
|
+
Heavy artifacts should be written to disk and summarized in tool output.
|
|
770
|
+
|
|
771
|
+
Example return:
|
|
772
|
+
|
|
773
|
+
```json
|
|
774
|
+
{
|
|
775
|
+
"bundlePath": ".artifacts/browser/failure-2026-03-09T15-22-10Z/",
|
|
776
|
+
"files": ["trace.zip", "screenshot.jpg", "summary.json", "ax.md"],
|
|
777
|
+
"summary": "Submit button click did not change URL or form state; network returned 422"
|
|
778
|
+
}
|
|
779
|
+
```
|
|
780
|
+
|
|
781
|
+
---
|
|
782
|
+
|
|
783
|
+
# 9. Add Goal-Oriented Composite Tools
|
|
784
|
+
|
|
785
|
+
## Proposal
|
|
786
|
+
|
|
787
|
+
Add tools that operate one level above raw browser actions.
|
|
788
|
+
|
|
789
|
+
### Candidate tools
|
|
790
|
+
- `browser_act`
|
|
791
|
+
- `browser_run_task`
|
|
792
|
+
- `browser_recommend_next`
|
|
793
|
+
- `browser_verify_flow`
|
|
794
|
+
|
|
795
|
+
## Why it matters
|
|
796
|
+
|
|
797
|
+
The model should not have to fully re-solve every local browser decision through multiple turns if the browser device can cheaply reason about obvious next steps.
|
|
798
|
+
|
|
799
|
+
## What it enables
|
|
800
|
+
|
|
801
|
+
- reduced local decision overhead
|
|
802
|
+
- more agent autonomy
|
|
803
|
+
- bounded browser-side loops for repetitive UI micro-tasks
|
|
804
|
+
- cleaner higher-level orchestration
|
|
805
|
+
|
|
806
|
+
## Suggested roles
|
|
807
|
+
|
|
808
|
+
### `browser_recommend_next`
|
|
809
|
+
Given a goal and current page state, return the best next 3 actions with confidence and reasons.
|
|
810
|
+
|
|
811
|
+
### `browser_act`
|
|
812
|
+
Perform one higher-level semantic action like:
|
|
813
|
+
|
|
814
|
+
- open login dialog
|
|
815
|
+
- submit current form
|
|
816
|
+
- close active modal
|
|
817
|
+
- click primary CTA
|
|
818
|
+
- expand navigation menu
|
|
819
|
+
|
|
820
|
+
### `browser_verify_flow`
|
|
821
|
+
Run a bounded set of assertions for a named flow such as:
|
|
822
|
+
|
|
823
|
+
- logged in
|
|
824
|
+
- signed out
|
|
825
|
+
- item created
|
|
826
|
+
- toast appeared
|
|
827
|
+
- navigation completed
|
|
828
|
+
|
|
829
|
+
### `browser_run_task`
|
|
830
|
+
Frontier tool: perform a bounded internal action loop toward a clear goal.
|
|
831
|
+
|
|
832
|
+
## Safety recommendations
|
|
833
|
+
|
|
834
|
+
These tools must be bounded by:
|
|
835
|
+
|
|
836
|
+
- max step count
|
|
837
|
+
- allowed action categories
|
|
838
|
+
- destructive action restrictions
|
|
839
|
+
- explicit halt conditions
|
|
840
|
+
|
|
841
|
+
---
|
|
842
|
+
|
|
843
|
+
# 10. Add Better Waits and Reactive Predicates
|
|
844
|
+
|
|
845
|
+
## Proposal
|
|
846
|
+
|
|
847
|
+
Replace or augment `browser_wait_for` with a richer `browser_wait_until`.
|
|
848
|
+
|
|
849
|
+
## Why it matters
|
|
850
|
+
|
|
851
|
+
Generic waiting is weaker than intent-aware waiting. The best wait is waiting for the expected outcome.
|
|
852
|
+
|
|
853
|
+
## What it enables
|
|
854
|
+
|
|
855
|
+
- higher reliability
|
|
856
|
+
- fewer arbitrary delays
|
|
857
|
+
- better async app support
|
|
858
|
+
- less flakiness in SPA and real-time UIs
|
|
859
|
+
|
|
860
|
+
## Suggested predicates
|
|
861
|
+
|
|
862
|
+
- text appears/disappears
|
|
863
|
+
- ref state changes
|
|
864
|
+
- element count changes
|
|
865
|
+
- request matching pattern completes
|
|
866
|
+
- response with status seen
|
|
867
|
+
- toast appears
|
|
868
|
+
- dialog opens/closes
|
|
869
|
+
- loading spinner disappears
|
|
870
|
+
- route transition completes
|
|
871
|
+
- region stops changing
|
|
872
|
+
- focus reaches expected element
|
|
873
|
+
|
|
874
|
+
## Design note
|
|
875
|
+
|
|
876
|
+
This should integrate with the same state/diff infrastructure proposed above.
|
|
877
|
+
|
|
878
|
+
---
|
|
879
|
+
|
|
880
|
+
# 11. Make Screenshots More Selective and More Useful
|
|
881
|
+
|
|
882
|
+
## Proposal
|
|
883
|
+
|
|
884
|
+
Keep screenshots, but use them more surgically.
|
|
885
|
+
|
|
886
|
+
### New tools or behaviors
|
|
887
|
+
- `browser_screenshot_diff`
|
|
888
|
+
- `browser_capture_region`
|
|
889
|
+
- `browser_inspect_visual`
|
|
890
|
+
|
|
891
|
+
## Why it matters
|
|
892
|
+
|
|
893
|
+
Screenshots are valuable when:
|
|
894
|
+
|
|
895
|
+
- the UI is canvas-based
|
|
896
|
+
- layout quality matters
|
|
897
|
+
- icon-only controls are ambiguous
|
|
898
|
+
- a visual regression is suspected
|
|
899
|
+
- CSS behavior matters
|
|
900
|
+
- semantic state is insufficient
|
|
901
|
+
|
|
902
|
+
But screenshots are often too expensive and too noisy to be the default state transport.
|
|
903
|
+
|
|
904
|
+
## What it enables
|
|
905
|
+
|
|
906
|
+
- better visual debugging when actually needed
|
|
907
|
+
- less token waste than full-page screenshots
|
|
908
|
+
- pairing visual evidence with semantic evidence
|
|
909
|
+
|
|
910
|
+
## Recommended direction
|
|
911
|
+
|
|
912
|
+
- make screenshots scoped and purposeful
|
|
913
|
+
- prefer element/region crops over full-page captures
|
|
914
|
+
- pair screenshot outputs with semantic context and diffs
|
|
915
|
+
- support perceptual diff summaries instead of raw image-only comparisons
|
|
916
|
+
|
|
917
|
+
---
|
|
918
|
+
|
|
919
|
+
# 12. Add Structured Network and Console Assertions
|
|
920
|
+
|
|
921
|
+
## Proposal
|
|
922
|
+
|
|
923
|
+
Evolve buffered observability from passive retrieval into active verification and querying.
|
|
924
|
+
|
|
925
|
+
## Why it matters
|
|
926
|
+
|
|
927
|
+
Modern web apps often fail in ways only visible through:
|
|
928
|
+
|
|
929
|
+
- fetch/XHR failures
|
|
930
|
+
- console errors
|
|
931
|
+
- CSP/CORS issues
|
|
932
|
+
- React hydration errors
|
|
933
|
+
- auth-related 401/403s
|
|
934
|
+
|
|
935
|
+
These should be easy for the agent to test explicitly.
|
|
936
|
+
|
|
937
|
+
## What it enables
|
|
938
|
+
|
|
939
|
+
- stronger root-cause detection
|
|
940
|
+
- better end-to-end verification
|
|
941
|
+
- fewer false positives where UI looked okay but requests failed
|
|
942
|
+
|
|
943
|
+
## Suggested additions
|
|
944
|
+
|
|
945
|
+
- filter by request URL pattern
|
|
946
|
+
- filter by method/resource type/status range
|
|
947
|
+
- query logs since action id or timestamp
|
|
948
|
+
- assert request happened
|
|
949
|
+
- assert response status seen
|
|
950
|
+
- assert no console errors of severity >= error
|
|
951
|
+
- assert no failed XHR/fetch during flow
|
|
952
|
+
|
|
953
|
+
---
|
|
954
|
+
|
|
955
|
+
# 13. Add an Action Timeline and Action IDs
|
|
956
|
+
|
|
957
|
+
## Proposal
|
|
958
|
+
|
|
959
|
+
Assign every browser action an internal action id and keep a lightweight action timeline.
|
|
960
|
+
|
|
961
|
+
## Why it matters
|
|
962
|
+
|
|
963
|
+
This makes the system far more debuggable and composable.
|
|
964
|
+
|
|
965
|
+
## What it enables
|
|
966
|
+
|
|
967
|
+
- diff since action N
|
|
968
|
+
- logs since action N
|
|
969
|
+
- request correlation
|
|
970
|
+
- failure bundle generation
|
|
971
|
+
- concise flow summaries
|
|
972
|
+
- better Kata verification records
|
|
973
|
+
|
|
974
|
+
## Suggested stored fields per action
|
|
975
|
+
|
|
976
|
+
- action id
|
|
977
|
+
- tool name
|
|
978
|
+
- params summary
|
|
979
|
+
- page id
|
|
980
|
+
- timestamp start/end
|
|
981
|
+
- verification outcome
|
|
982
|
+
- detected changes
|
|
983
|
+
- relevant warnings
|
|
984
|
+
|
|
985
|
+
---
|
|
986
|
+
|
|
987
|
+
# 14. Tighten Tool Descriptions and Prompt Guidance
|
|
988
|
+
|
|
989
|
+
## Proposal
|
|
990
|
+
|
|
991
|
+
Refine tool descriptions so the model understands exactly what each tool returns and when to use it.
|
|
992
|
+
|
|
993
|
+
## Why it matters
|
|
994
|
+
|
|
995
|
+
A surprising amount of agent inefficiency comes from slightly misleading tool expectations.
|
|
996
|
+
|
|
997
|
+
## Current issue
|
|
998
|
+
|
|
999
|
+
Some tools describe outputs in terms like “returns accessibility snapshot” when they more accurately return a compact page summary.
|
|
1000
|
+
|
|
1001
|
+
## What it enables
|
|
1002
|
+
|
|
1003
|
+
- better tool selection
|
|
1004
|
+
- fewer redundant follow-up calls
|
|
1005
|
+
- less confusion about when to use full AX vs compact find vs summaries
|
|
1006
|
+
|
|
1007
|
+
## Recommended prompt guidance hierarchy
|
|
1008
|
+
|
|
1009
|
+
For state inspection, teach the model to prefer:
|
|
1010
|
+
|
|
1011
|
+
1. `browser_find`
|
|
1012
|
+
2. `browser_snapshot_refs`
|
|
1013
|
+
3. `browser_assert`
|
|
1014
|
+
4. `browser_diff`
|
|
1015
|
+
5. `browser_get_accessibility_tree`
|
|
1016
|
+
6. `browser_get_page_source`
|
|
1017
|
+
7. `browser_evaluate`
|
|
1018
|
+
|
|
1019
|
+
This keeps common browsing token-efficient.
|
|
1020
|
+
|
|
1021
|
+
---
|
|
1022
|
+
|
|
1023
|
+
# 15. Add Browser-Side State Compression and Delta Reporting
|
|
1024
|
+
|
|
1025
|
+
## Proposal
|
|
1026
|
+
|
|
1027
|
+
Internally maintain a compact page model and expose only deltas unless the agent asks for full detail.
|
|
1028
|
+
|
|
1029
|
+
## Why it matters
|
|
1030
|
+
|
|
1031
|
+
This is one of the biggest long-term wins for context efficiency.
|
|
1032
|
+
|
|
1033
|
+
## What it enables
|
|
1034
|
+
|
|
1035
|
+
- state reuse across tool calls
|
|
1036
|
+
- lower repeated summaries
|
|
1037
|
+
- cheaper comparison after actions
|
|
1038
|
+
- better change detection
|
|
1039
|
+
- smarter internal recommendations
|
|
1040
|
+
|
|
1041
|
+
## Internal state could include
|
|
1042
|
+
|
|
1043
|
+
- last summary
|
|
1044
|
+
- heading set
|
|
1045
|
+
- visible alerts
|
|
1046
|
+
- dialog inventory
|
|
1047
|
+
- interactive ref list
|
|
1048
|
+
- form inventory
|
|
1049
|
+
- last screenshot hash
|
|
1050
|
+
- last AX signatures for key scopes
|
|
1051
|
+
|
|
1052
|
+
## Output policy
|
|
1053
|
+
|
|
1054
|
+
The default response should prefer:
|
|
1055
|
+
|
|
1056
|
+
- what changed
|
|
1057
|
+
- what likely matters
|
|
1058
|
+
- what the agent might want next
|
|
1059
|
+
|
|
1060
|
+
rather than always restating the whole page summary.
|
|
1061
|
+
|
|
1062
|
+
---
|
|
1063
|
+
|
|
1064
|
+
# 16. Add Kata-Native Verification Outputs
|
|
1065
|
+
|
|
1066
|
+
## Proposal
|
|
1067
|
+
|
|
1068
|
+
Make browser-tools able to emit outputs that directly support Kata slice/task completion.
|
|
1069
|
+
|
|
1070
|
+
## Why it matters
|
|
1071
|
+
|
|
1072
|
+
You explicitly want browser tools to power automatic verification and testing during `@agent/extensions/kata/` use.
|
|
1073
|
+
|
|
1074
|
+
## What it enables
|
|
1075
|
+
|
|
1076
|
+
- easier automatic generation of `Sxx-UAT.md` content
|
|
1077
|
+
- deterministic slice verification evidence
|
|
1078
|
+
- less ad hoc summarization by the agent
|
|
1079
|
+
- clearer “done/not done” boundaries
|
|
1080
|
+
|
|
1081
|
+
## Suggested additions
|
|
1082
|
+
|
|
1083
|
+
### `browser_verify_flow`
|
|
1084
|
+
Return:
|
|
1085
|
+
|
|
1086
|
+
- named flow
|
|
1087
|
+
- steps attempted
|
|
1088
|
+
- checks passed/failed
|
|
1089
|
+
- evidence links/paths
|
|
1090
|
+
- final verdict
|
|
1091
|
+
|
|
1092
|
+
### `browser_export_verification_report`
|
|
1093
|
+
Write a markdown or JSON artifact summarizing:
|
|
1094
|
+
|
|
1095
|
+
- environment
|
|
1096
|
+
- URL(s)
|
|
1097
|
+
- viewport(s)
|
|
1098
|
+
- actions
|
|
1099
|
+
- assertions
|
|
1100
|
+
- outcome
|
|
1101
|
+
- diagnostics
|
|
1102
|
+
|
|
1103
|
+
This is especially useful for Kata artifacts.
|
|
1104
|
+
|
|
1105
|
+
---
|
|
1106
|
+
|
|
1107
|
+
## Proposed Roadmap
|
|
1108
|
+
|
|
1109
|
+
## Phase 1 — Highest-ROI Near-Term Upgrades
|
|
1110
|
+
|
|
1111
|
+
These are the best immediate improvements.
|
|
1112
|
+
|
|
1113
|
+
### 1. `browser_assert`
|
|
1114
|
+
Highest priority.
|
|
1115
|
+
|
|
1116
|
+
### 2. `browser_batch`
|
|
1117
|
+
Highest priority.
|
|
1118
|
+
|
|
1119
|
+
### 3. `browser_diff`
|
|
1120
|
+
Highest priority.
|
|
1121
|
+
|
|
1122
|
+
### 4. `browser_analyze_form`
|
|
1123
|
+
Very high priority.
|
|
1124
|
+
|
|
1125
|
+
### 5. `browser_fill_form`
|
|
1126
|
+
Very high priority.
|
|
1127
|
+
|
|
1128
|
+
### 6. Tighten tool descriptions and prompt guidance
|
|
1129
|
+
Low risk, immediate value.
|
|
1130
|
+
|
|
1131
|
+
### 7. Action timeline / action ids
|
|
1132
|
+
Important enabling infrastructure.
|
|
1133
|
+
|
|
1134
|
+
---
|
|
1135
|
+
|
|
1136
|
+
## Phase 2 — Strong Maturity Upgrades
|
|
1137
|
+
|
|
1138
|
+
### 8. Multi-page/tab/frame model
|
|
1139
|
+
### 9. Richer wait predicates
|
|
1140
|
+
### 10. Structured network/console assertions
|
|
1141
|
+
### 11. Ref snapshot modes and better ref fingerprints
|
|
1142
|
+
### 12. Debug bundle and trace export
|
|
1143
|
+
|
|
1144
|
+
---
|
|
1145
|
+
|
|
1146
|
+
## Phase 3 — Frontier AI-Native Capabilities
|
|
1147
|
+
|
|
1148
|
+
### 13. `browser_find_best`
|
|
1149
|
+
### 14. `browser_recommend_next`
|
|
1150
|
+
### 15. `browser_act`
|
|
1151
|
+
### 16. `browser_verify_flow`
|
|
1152
|
+
### 17. `browser_run_task`
|
|
1153
|
+
### 18. hybrid semantic + visual fallback targeting
|
|
1154
|
+
|
|
1155
|
+
These are the ideas that move the extension from excellent tooling into a genuinely mind-blowing browser device for agents.
|
|
1156
|
+
|
|
1157
|
+
---
|
|
1158
|
+
|
|
1159
|
+
## Detailed Impact Summary
|
|
1160
|
+
|
|
1161
|
+
## Biggest wins for context efficiency
|
|
1162
|
+
|
|
1163
|
+
1. `browser_batch`
|
|
1164
|
+
2. `browser_diff`
|
|
1165
|
+
3. snapshot modes for refs
|
|
1166
|
+
4. assertion outputs instead of prose
|
|
1167
|
+
5. browser-side state compression/deltas
|
|
1168
|
+
6. form-level tools replacing many small actions
|
|
1169
|
+
|
|
1170
|
+
## Biggest wins for reliability
|
|
1171
|
+
|
|
1172
|
+
1. `browser_assert`
|
|
1173
|
+
2. richer waits
|
|
1174
|
+
3. multi-page/frame awareness
|
|
1175
|
+
4. structured network/console assertions
|
|
1176
|
+
5. failure bundles and trace export
|
|
1177
|
+
6. smarter ref remapping
|
|
1178
|
+
|
|
1179
|
+
## Biggest wins for agent autonomy
|
|
1180
|
+
|
|
1181
|
+
1. `browser_assert`
|
|
1182
|
+
2. `browser_recommend_next`
|
|
1183
|
+
3. `browser_find_best`
|
|
1184
|
+
4. `browser_fill_form`
|
|
1185
|
+
5. `browser_verify_flow`
|
|
1186
|
+
6. `browser_run_task`
|
|
1187
|
+
|
|
1188
|
+
## Biggest wins for Kata
|
|
1189
|
+
|
|
1190
|
+
1. explicit verification outputs
|
|
1191
|
+
2. debug bundles on failure
|
|
1192
|
+
3. flow verification reports
|
|
1193
|
+
4. assertion-based PASS/FAIL summaries
|
|
1194
|
+
5. durable artifact export
|
|
1195
|
+
|
|
1196
|
+
---
|
|
1197
|
+
|
|
1198
|
+
## What Should Remain True in V2
|
|
1199
|
+
|
|
1200
|
+
As the extension evolves, it should preserve its best current qualities.
|
|
1201
|
+
|
|
1202
|
+
### Keep these principles
|
|
1203
|
+
- accessibility-first browsing
|
|
1204
|
+
- deterministic refs
|
|
1205
|
+
- compact summaries
|
|
1206
|
+
- pull-based diagnostics
|
|
1207
|
+
- verification after action
|
|
1208
|
+
- screenshots as support, not default state transport
|
|
1209
|
+
- adaptive settling
|
|
1210
|
+
|
|
1211
|
+
### Avoid these regressions
|
|
1212
|
+
- screenshot-first browsing as the normal path
|
|
1213
|
+
- giant raw DOM dumps as default output
|
|
1214
|
+
- excessive prose instead of structured results
|
|
1215
|
+
- hidden nondeterminism in action selection
|
|
1216
|
+
- too many tool calls for common flows
|
|
1217
|
+
- flaky fixed waits replacing intent-aware checks
|
|
1218
|
+
|
|
1219
|
+
---
|
|
1220
|
+
|
|
1221
|
+
## Recommended Implementation Order
|
|
1222
|
+
|
|
1223
|
+
If the goal is maximum practical value with strong architectural compounding, implement in this order:
|
|
1224
|
+
|
|
1225
|
+
1. `browser_assert`
|
|
1226
|
+
2. action timeline / action ids
|
|
1227
|
+
3. `browser_batch`
|
|
1228
|
+
4. `browser_diff`
|
|
1229
|
+
5. `browser_analyze_form`
|
|
1230
|
+
6. `browser_fill_form`
|
|
1231
|
+
7. structured network/console assertions
|
|
1232
|
+
8. multi-page and frame model
|
|
1233
|
+
9. trace/debug bundle tools
|
|
1234
|
+
10. ref snapshot modes and richer fingerprints
|
|
1235
|
+
11. `browser_find_best`
|
|
1236
|
+
12. `browser_recommend_next`
|
|
1237
|
+
13. `browser_verify_flow`
|
|
1238
|
+
14. `browser_run_task`
|
|
1239
|
+
|
|
1240
|
+
This order gives immediate value while laying down the right primitives for more ambitious features.
|
|
1241
|
+
|
|
1242
|
+
---
|
|
1243
|
+
|
|
1244
|
+
## Final Recommendation
|
|
1245
|
+
|
|
1246
|
+
The current browser-tools extension is already on the right side of the 2026 design curve. It has made several choices that are smarter than many contemporary AI browser stacks.
|
|
1247
|
+
|
|
1248
|
+
The next leap is to shift from:
|
|
1249
|
+
|
|
1250
|
+
- a browser control toolkit
|
|
1251
|
+
|
|
1252
|
+
into:
|
|
1253
|
+
|
|
1254
|
+
- a browser execution and verification device purpose-built for agents
|
|
1255
|
+
|
|
1256
|
+
The most important changes are:
|
|
1257
|
+
|
|
1258
|
+
- first-class assertions
|
|
1259
|
+
- batch execution
|
|
1260
|
+
- state diffs
|
|
1261
|
+
- form intelligence
|
|
1262
|
+
- session/page/frame modeling
|
|
1263
|
+
- durable debug artifacts
|
|
1264
|
+
- intent-aware semantic helpers
|
|
1265
|
+
|
|
1266
|
+
If these are implemented well, browser-tools can become not just a useful extension, but a foundational AI-native capability for both:
|
|
1267
|
+
|
|
1268
|
+
- **agentic browser use across the web**
|
|
1269
|
+
- **automatic verification inside Kata workflows**
|
|
1270
|
+
|
|
1271
|
+
---
|
|
1272
|
+
|
|
1273
|
+
## File Added
|
|
1274
|
+
|
|
1275
|
+
This proposal is stored at:
|
|
1276
|
+
|
|
1277
|
+
`agent/extensions/browser-tools/BROWSER-TOOLS-V2-PROPOSAL.md`
|