@kata-sh/cli 0.1.0 → 0.1.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (199) hide show
  1. package/LICENSE +21 -0
  2. package/README.md +156 -0
  3. package/dist/app-paths.d.ts +4 -0
  4. package/dist/app-paths.js +6 -0
  5. package/dist/cli.d.ts +1 -0
  6. package/dist/cli.js +56 -0
  7. package/dist/loader.d.ts +2 -0
  8. package/dist/loader.js +95 -0
  9. package/dist/resource-loader.d.ts +18 -0
  10. package/dist/resource-loader.js +50 -0
  11. package/dist/wizard.d.ts +15 -0
  12. package/dist/wizard.js +159 -0
  13. package/package.json +50 -21
  14. package/pkg/dist/modes/interactive/theme/dark.json +85 -0
  15. package/pkg/dist/modes/interactive/theme/light.json +84 -0
  16. package/pkg/dist/modes/interactive/theme/theme-schema.json +335 -0
  17. package/pkg/dist/modes/interactive/theme/theme.d.ts +78 -0
  18. package/pkg/dist/modes/interactive/theme/theme.d.ts.map +1 -0
  19. package/pkg/dist/modes/interactive/theme/theme.js +949 -0
  20. package/pkg/dist/modes/interactive/theme/theme.js.map +1 -0
  21. package/pkg/package.json +8 -0
  22. package/scripts/postinstall.js +45 -0
  23. package/src/resources/AGENTS.md +108 -0
  24. package/src/resources/KATA-WORKFLOW.md +661 -0
  25. package/src/resources/agents/researcher.md +29 -0
  26. package/src/resources/agents/scout.md +56 -0
  27. package/src/resources/agents/worker.md +31 -0
  28. package/src/resources/extensions/ask-user-questions.ts +200 -0
  29. package/src/resources/extensions/bg-shell/index.ts +2758 -0
  30. package/src/resources/extensions/browser-tools/BROWSER-TOOLS-V2-PROPOSAL.md +1277 -0
  31. package/src/resources/extensions/browser-tools/core.js +1057 -0
  32. package/src/resources/extensions/browser-tools/index.ts +4916 -0
  33. package/src/resources/extensions/browser-tools/package.json +20 -0
  34. package/src/resources/extensions/context7/index.ts +428 -0
  35. package/src/resources/extensions/context7/package.json +11 -0
  36. package/src/resources/extensions/get-secrets-from-user.ts +352 -0
  37. package/src/resources/extensions/github/formatters.ts +207 -0
  38. package/src/resources/extensions/github/gh-api.ts +537 -0
  39. package/src/resources/extensions/github/index.ts +778 -0
  40. package/src/resources/extensions/kata/activity-log.ts +88 -0
  41. package/src/resources/extensions/kata/auto.ts +2786 -0
  42. package/src/resources/extensions/kata/commands.ts +355 -0
  43. package/src/resources/extensions/kata/crash-recovery.ts +85 -0
  44. package/src/resources/extensions/kata/dashboard-overlay.ts +516 -0
  45. package/src/resources/extensions/kata/docs/preferences-reference.md +103 -0
  46. package/src/resources/extensions/kata/doctor.ts +683 -0
  47. package/src/resources/extensions/kata/files.ts +730 -0
  48. package/src/resources/extensions/kata/gitignore.ts +165 -0
  49. package/src/resources/extensions/kata/guided-flow.ts +976 -0
  50. package/src/resources/extensions/kata/index.ts +556 -0
  51. package/src/resources/extensions/kata/metrics.ts +397 -0
  52. package/src/resources/extensions/kata/observability-validator.ts +408 -0
  53. package/src/resources/extensions/kata/package.json +11 -0
  54. package/src/resources/extensions/kata/paths.ts +346 -0
  55. package/src/resources/extensions/kata/preferences.ts +695 -0
  56. package/src/resources/extensions/kata/prompt-loader.ts +50 -0
  57. package/src/resources/extensions/kata/prompts/complete-milestone.md +25 -0
  58. package/src/resources/extensions/kata/prompts/complete-slice.md +27 -0
  59. package/src/resources/extensions/kata/prompts/discuss.md +151 -0
  60. package/src/resources/extensions/kata/prompts/doctor-heal.md +29 -0
  61. package/src/resources/extensions/kata/prompts/execute-task.md +64 -0
  62. package/src/resources/extensions/kata/prompts/guided-complete-slice.md +1 -0
  63. package/src/resources/extensions/kata/prompts/guided-discuss-milestone.md +3 -0
  64. package/src/resources/extensions/kata/prompts/guided-discuss-slice.md +59 -0
  65. package/src/resources/extensions/kata/prompts/guided-execute-task.md +1 -0
  66. package/src/resources/extensions/kata/prompts/guided-plan-milestone.md +23 -0
  67. package/src/resources/extensions/kata/prompts/guided-plan-slice.md +1 -0
  68. package/src/resources/extensions/kata/prompts/guided-research-slice.md +11 -0
  69. package/src/resources/extensions/kata/prompts/guided-resume-task.md +1 -0
  70. package/src/resources/extensions/kata/prompts/plan-milestone.md +47 -0
  71. package/src/resources/extensions/kata/prompts/plan-slice.md +63 -0
  72. package/src/resources/extensions/kata/prompts/queue.md +85 -0
  73. package/src/resources/extensions/kata/prompts/reassess-roadmap.md +48 -0
  74. package/src/resources/extensions/kata/prompts/replan-slice.md +39 -0
  75. package/src/resources/extensions/kata/prompts/research-milestone.md +37 -0
  76. package/src/resources/extensions/kata/prompts/research-slice.md +28 -0
  77. package/src/resources/extensions/kata/prompts/run-uat.md +109 -0
  78. package/src/resources/extensions/kata/prompts/system.md +341 -0
  79. package/src/resources/extensions/kata/session-forensics.ts +550 -0
  80. package/src/resources/extensions/kata/skill-discovery.ts +137 -0
  81. package/src/resources/extensions/kata/state.ts +509 -0
  82. package/src/resources/extensions/kata/templates/context.md +76 -0
  83. package/src/resources/extensions/kata/templates/decisions.md +8 -0
  84. package/src/resources/extensions/kata/templates/milestone-summary.md +73 -0
  85. package/src/resources/extensions/kata/templates/plan.md +133 -0
  86. package/src/resources/extensions/kata/templates/preferences.md +15 -0
  87. package/src/resources/extensions/kata/templates/project.md +31 -0
  88. package/src/resources/extensions/kata/templates/reassessment.md +28 -0
  89. package/src/resources/extensions/kata/templates/requirements.md +81 -0
  90. package/src/resources/extensions/kata/templates/research.md +46 -0
  91. package/src/resources/extensions/kata/templates/roadmap.md +118 -0
  92. package/src/resources/extensions/kata/templates/slice-context.md +58 -0
  93. package/src/resources/extensions/kata/templates/slice-summary.md +99 -0
  94. package/src/resources/extensions/kata/templates/state.md +19 -0
  95. package/src/resources/extensions/kata/templates/task-plan.md +52 -0
  96. package/src/resources/extensions/kata/templates/task-summary.md +57 -0
  97. package/src/resources/extensions/kata/templates/uat.md +54 -0
  98. package/src/resources/extensions/kata/tests/activity-log-prune.test.ts +327 -0
  99. package/src/resources/extensions/kata/tests/auto-preflight.test.ts +97 -0
  100. package/src/resources/extensions/kata/tests/auto-supervisor.test.mjs +53 -0
  101. package/src/resources/extensions/kata/tests/complete-milestone.test.ts +317 -0
  102. package/src/resources/extensions/kata/tests/cost-projection.test.ts +160 -0
  103. package/src/resources/extensions/kata/tests/derive-state-deps.test.ts +477 -0
  104. package/src/resources/extensions/kata/tests/derive-state.test.ts +1013 -0
  105. package/src/resources/extensions/kata/tests/doctor.test.ts +718 -0
  106. package/src/resources/extensions/kata/tests/idle-recovery.test.ts +490 -0
  107. package/src/resources/extensions/kata/tests/metrics-io.test.ts +254 -0
  108. package/src/resources/extensions/kata/tests/metrics.test.ts +217 -0
  109. package/src/resources/extensions/kata/tests/must-have-parser.test.ts +309 -0
  110. package/src/resources/extensions/kata/tests/parsers.test.ts +1257 -0
  111. package/src/resources/extensions/kata/tests/plan-milestone.test.ts +185 -0
  112. package/src/resources/extensions/kata/tests/plan-quality-validator.test.ts +386 -0
  113. package/src/resources/extensions/kata/tests/reassess-prompt.test.ts +208 -0
  114. package/src/resources/extensions/kata/tests/replan-slice.test.ts +686 -0
  115. package/src/resources/extensions/kata/tests/requirements.test.ts +151 -0
  116. package/src/resources/extensions/kata/tests/resolve-ts-hooks.mjs +17 -0
  117. package/src/resources/extensions/kata/tests/resolve-ts.mjs +11 -0
  118. package/src/resources/extensions/kata/tests/run-uat.test.ts +383 -0
  119. package/src/resources/extensions/kata/tests/unit-runtime.test.ts +388 -0
  120. package/src/resources/extensions/kata/tests/workspace-index.test.ts +118 -0
  121. package/src/resources/extensions/kata/tests/worktree.test.ts +222 -0
  122. package/src/resources/extensions/kata/types.ts +159 -0
  123. package/src/resources/extensions/kata/unit-runtime.ts +163 -0
  124. package/src/resources/extensions/kata/workspace-index.ts +203 -0
  125. package/src/resources/extensions/kata/worktree.ts +182 -0
  126. package/src/resources/extensions/mac-tools/index.ts +852 -0
  127. package/src/resources/extensions/mac-tools/swift-cli/Package.swift +22 -0
  128. package/src/resources/extensions/mac-tools/swift-cli/Sources/main.swift +1318 -0
  129. package/src/resources/extensions/search-the-web/cache.ts +78 -0
  130. package/src/resources/extensions/search-the-web/format.ts +258 -0
  131. package/src/resources/extensions/search-the-web/http.ts +238 -0
  132. package/src/resources/extensions/search-the-web/index.ts +68 -0
  133. package/src/resources/extensions/search-the-web/tool-fetch-page.ts +519 -0
  134. package/src/resources/extensions/search-the-web/tool-llm-context.ts +404 -0
  135. package/src/resources/extensions/search-the-web/tool-search.ts +503 -0
  136. package/src/resources/extensions/search-the-web/url-utils.ts +91 -0
  137. package/src/resources/extensions/shared/confirm-ui.ts +126 -0
  138. package/src/resources/extensions/shared/interview-ui.ts +822 -0
  139. package/src/resources/extensions/shared/next-action-ui.ts +235 -0
  140. package/src/resources/extensions/shared/progress-widget.ts +282 -0
  141. package/src/resources/extensions/shared/thinking-widget.ts +107 -0
  142. package/src/resources/extensions/shared/ui.ts +400 -0
  143. package/src/resources/extensions/shared/wizard-ui.ts +551 -0
  144. package/src/resources/extensions/slash-commands/audit.ts +92 -0
  145. package/src/resources/extensions/slash-commands/create-extension.ts +375 -0
  146. package/src/resources/extensions/slash-commands/create-slash-command.ts +280 -0
  147. package/src/resources/extensions/slash-commands/index.ts +12 -0
  148. package/src/resources/extensions/slash-commands/kata-run.ts +34 -0
  149. package/src/resources/extensions/subagent/agents.ts +126 -0
  150. package/src/resources/extensions/subagent/index.ts +1293 -0
  151. package/src/resources/skills/debug-like-expert/SKILL.md +231 -0
  152. package/src/resources/skills/debug-like-expert/references/debugging-mindset.md +253 -0
  153. package/src/resources/skills/debug-like-expert/references/hypothesis-testing.md +373 -0
  154. package/src/resources/skills/debug-like-expert/references/investigation-techniques.md +337 -0
  155. package/src/resources/skills/debug-like-expert/references/verification-patterns.md +425 -0
  156. package/src/resources/skills/debug-like-expert/references/when-to-research.md +361 -0
  157. package/src/resources/skills/frontend-design/SKILL.md +45 -0
  158. package/src/resources/skills/swiftui/SKILL.md +208 -0
  159. package/src/resources/skills/swiftui/references/animations.md +921 -0
  160. package/src/resources/skills/swiftui/references/architecture.md +1561 -0
  161. package/src/resources/skills/swiftui/references/layout-system.md +1186 -0
  162. package/src/resources/skills/swiftui/references/navigation.md +1492 -0
  163. package/src/resources/skills/swiftui/references/networking-async.md +214 -0
  164. package/src/resources/skills/swiftui/references/performance.md +1706 -0
  165. package/src/resources/skills/swiftui/references/platform-integration.md +204 -0
  166. package/src/resources/skills/swiftui/references/state-management.md +1443 -0
  167. package/src/resources/skills/swiftui/references/swiftdata.md +297 -0
  168. package/src/resources/skills/swiftui/references/testing-debugging.md +247 -0
  169. package/src/resources/skills/swiftui/references/uikit-appkit-interop.md +218 -0
  170. package/src/resources/skills/swiftui/workflows/add-feature.md +191 -0
  171. package/src/resources/skills/swiftui/workflows/build-new-app.md +311 -0
  172. package/src/resources/skills/swiftui/workflows/debug-swiftui.md +192 -0
  173. package/src/resources/skills/swiftui/workflows/optimize-performance.md +197 -0
  174. package/src/resources/skills/swiftui/workflows/ship-app.md +203 -0
  175. package/src/resources/skills/swiftui/workflows/write-tests.md +235 -0
  176. package/dist/commands/task.d.ts +0 -9
  177. package/dist/commands/task.d.ts.map +0 -1
  178. package/dist/commands/task.js +0 -129
  179. package/dist/commands/task.js.map +0 -1
  180. package/dist/commands/task.test.d.ts +0 -2
  181. package/dist/commands/task.test.d.ts.map +0 -1
  182. package/dist/commands/task.test.js +0 -169
  183. package/dist/commands/task.test.js.map +0 -1
  184. package/dist/e2e/task-e2e.test.d.ts +0 -2
  185. package/dist/e2e/task-e2e.test.d.ts.map +0 -1
  186. package/dist/e2e/task-e2e.test.js +0 -173
  187. package/dist/e2e/task-e2e.test.js.map +0 -1
  188. package/dist/index.d.ts +0 -3
  189. package/dist/index.d.ts.map +0 -1
  190. package/dist/index.js +0 -93
  191. package/dist/index.js.map +0 -1
  192. package/dist/slug.d.ts +0 -2
  193. package/dist/slug.d.ts.map +0 -1
  194. package/dist/slug.js +0 -12
  195. package/dist/slug.js.map +0 -1
  196. package/dist/slug.test.d.ts +0 -2
  197. package/dist/slug.test.d.ts.map +0 -1
  198. package/dist/slug.test.js +0 -32
  199. package/dist/slug.test.js.map +0 -1
@@ -0,0 +1,1277 @@
1
+ # Browser Tools V2 Proposal
2
+
3
+ ## Purpose
4
+
5
+ This document proposes a comprehensive evolution of `agent/extensions/browser-tools/` from a strong set of browser-control primitives into a world-class AI-native browser device for:
6
+
7
+ - autonomous verification
8
+ - end-to-end testing
9
+ - Kata slice validation
10
+ - debugging and observability
11
+ - general internet task execution
12
+ - low-token, high-reliability browser interaction
13
+
14
+ The goal is not just to let the agent click around in a browser. The goal is to give the agent **hands, eyes, memory, verification, and local judgment** in a way that is:
15
+
16
+ - context-efficient
17
+ - fast
18
+ - deterministic where possible
19
+ - observable when things fail
20
+ - composable for larger workflows
21
+ - optimized for LLM use, not human scripting ergonomics
22
+
23
+ ---
24
+
25
+ ## Executive Summary
26
+
27
+ The current browser tools already make several unusually good architectural choices:
28
+
29
+ - accessibility-first inspection instead of screenshot-first browsing
30
+ - deterministic versioned element refs
31
+ - compact post-action summaries instead of full DOM spam
32
+ - buffered observability surfaces for console, network, and dialogs
33
+ - lightweight success verification after actions
34
+ - adaptive settling instead of blindly waiting for `networkidle`
35
+
36
+ Those choices align well with March 2026 best practices in AI browser automation.
37
+
38
+ However, the current system still operates mostly as a **toolbox of action primitives**. To become a truly elite AI-native browser device, it should evolve in six major directions:
39
+
40
+ 1. **Assertions over prose** — explicit PASS/FAIL verification tools
41
+ 2. **Composite actions over chatty primitive loops** — batch, form fill, goal-oriented flows
42
+ 3. **Diffs over full resnapshots** — tell the agent what changed, not just what exists now
43
+ 4. **Stateful browser modeling** — tabs, frames, forms, dialogs, refs, action history
44
+ 5. **Failure artifacts and observability** — traces, bundles, structured debug evidence
45
+ 6. **Intent-aware semantic helpers** — find the best next element/action for a goal
46
+
47
+ If implemented well, these changes would make browser-tools materially better for both:
48
+
49
+ - **Kata automatic verification and UAT generation**
50
+ - **general-purpose agentic browser use on arbitrary websites and apps**
51
+
52
+ ---
53
+
54
+ ## Current State: What Browser Tools Already Does Well
55
+
56
+ The existing extension in `agent/extensions/browser-tools/index.ts` already gets several important things right.
57
+
58
+ ### 1. Accessibility-first state representation
59
+
60
+ The system already prefers:
61
+
62
+ - `browser_get_accessibility_tree`
63
+ - `browser_find`
64
+ - `browser_snapshot_refs`
65
+
66
+ This is the correct strategic direction. Accessibility snapshots are usually far more token-efficient and reliable than:
67
+
68
+ - full HTML dumps
69
+ - screenshot-only operation
70
+ - coordinate-based automation
71
+
72
+ ### 2. Deterministic element references
73
+
74
+ The versioned ref system (`@vN:e1`) is one of the strongest parts of the current design.
75
+
76
+ It provides:
77
+
78
+ - compact handles for later actions
79
+ - stale-ref detection
80
+ - lower repeated selector verbosity
81
+ - less guesswork for the model
82
+
83
+ This aligns closely with current agent-browser and Playwright MCP design patterns.
84
+
85
+ ### 3. Compact post-action summaries
86
+
87
+ The `postActionSummary()` helper is a strong design decision.
88
+
89
+ It gives the agent:
90
+
91
+ - title
92
+ - URL
93
+ - high-level element counts
94
+ - important headings
95
+ - focus state
96
+ - dialog hints
97
+
98
+ without flooding context.
99
+
100
+ ### 4. Pull-based observability
101
+
102
+ Buffered logs for:
103
+
104
+ - console
105
+ - network
106
+ - dialogs
107
+
108
+ are exactly the right pattern.
109
+
110
+ This prevents every tool call from becoming noisy while still preserving rich debugging when needed.
111
+
112
+ ### 5. Built-in self-verification on interactions
113
+
114
+ The current tools already attempt to verify success through signals like:
115
+
116
+ - URL changes
117
+ - hash changes
118
+ - target ARIA state changes
119
+ - value changes
120
+ - focus changes
121
+ - dialog count changes
122
+
123
+ This is much better than blind action execution.
124
+
125
+ ### 6. Adaptive settling
126
+
127
+ The mutation counter plus pending-critical-request model is clever and practical.
128
+
129
+ It is better than:
130
+
131
+ - fixed sleeps everywhere
132
+ - hard dependence on `networkidle`
133
+ - no settle logic at all
134
+
135
+ ### 7. Sensible visual fallback strategy
136
+
137
+ The extension already uses screenshots as:
138
+
139
+ - navigation-time context
140
+ - explicit inspection output
141
+ - failure debugging evidence
142
+
143
+ That is good. Screenshots should support semantics, not replace them.
144
+
145
+ ---
146
+
147
+ ## Core Diagnosis
148
+
149
+ ### What the current system is
150
+
151
+ Right now, browser-tools is primarily a **semantic browser control toolkit**.
152
+
153
+ That is already useful and better than many browser agent stacks.
154
+
155
+ ### What it should become
156
+
157
+ It should become an **AI-native browser operating layer** that gives the model:
158
+
159
+ - reliable control
160
+ - compact semantic state
161
+ - explicit verification
162
+ - efficient action composition
163
+ - better local reasoning support
164
+ - durable debugging artifacts
165
+
166
+ ### The central gap
167
+
168
+ The biggest gap is that the extension currently optimizes for **individual actions** more than **successful browser tasks**.
169
+
170
+ That difference matters.
171
+
172
+ An elite browser device for AI should optimize for:
173
+
174
+ - “did the task succeed?”
175
+ - “what changed?”
176
+ - “what should I do next?”
177
+ - “can I verify this automatically?”
178
+ - “if it failed, what evidence do I have?”
179
+
180
+ not just:
181
+
182
+ - “did the click happen?”
183
+ - “here is the current page summary”
184
+
185
+ ---
186
+
187
+ ## Design Principles for V2
188
+
189
+ The proposed system should follow these principles.
190
+
191
+ ### 1. Semantics first, vision second
192
+
193
+ Preferred order of understanding:
194
+
195
+ 1. structured semantic state
196
+ 2. scoped accessibility/tree snapshots
197
+ 3. ranked semantic refs
198
+ 4. DOM or JS inspection when needed
199
+ 5. screenshots only when semantics are insufficient or visual truth matters
200
+
201
+ ### 2. Assertions are first-class
202
+
203
+ Every serious verification system needs explicit assertions.
204
+
205
+ Tool outputs should prefer structured verification objects over prose.
206
+
207
+ ### 3. Minimize round trips
208
+
209
+ The fastest tool call is the one the model does not need to make.
210
+
211
+ Obvious action sequences should be batchable.
212
+
213
+ ### 4. Model the browser as state, not just a stream of actions
214
+
215
+ The extension should internally track:
216
+
217
+ - pages/tabs
218
+ - frames
219
+ - dialogs
220
+ - form structures
221
+ - refs
222
+ - last known page summaries
223
+ - diffs across actions
224
+ - recent action outcomes
225
+
226
+ ### 5. Tell the agent what changed
227
+
228
+ State deltas are often more useful than fresh full state.
229
+
230
+ ### 6. Heavy artifacts belong on disk, not in context
231
+
232
+ Trace files, HAR data, visual diffs, and debug bundles should generally be persisted and summarized, not inlined.
233
+
234
+ ### 7. Optimize for Kata verification
235
+
236
+ The browser device should be excellent at producing:
237
+
238
+ - deterministic pass/fail checks
239
+ - concise verification summaries
240
+ - debug artifacts on failure
241
+ - machine-usable evidence for slice/task summaries and UAT
242
+
243
+ ---
244
+
245
+ ## Proposed Changes
246
+
247
+ # 1. Add a First-Class Assertion System
248
+
249
+ ## Proposal
250
+
251
+ Add a `browser_assert` tool and a small assertion language built around common browser verification needs.
252
+
253
+ ## Why it matters
254
+
255
+ This is the single most important missing capability for Kata and autonomous QA.
256
+
257
+ Today the agent must infer correctness from prose and heuristics. That is weaker than explicit pass/fail evaluation.
258
+
259
+ ## What it enables
260
+
261
+ - deterministic verification
262
+ - clean Kata artifact generation
263
+ - structured failure reporting
264
+ - simpler agent reasoning
265
+ - less repeated browser inspection
266
+
267
+ ## Suggested assertion kinds
268
+
269
+ ### Page state assertions
270
+ - `url_contains`
271
+ - `url_equals`
272
+ - `title_contains`
273
+ - `page_ready`
274
+ - `page_has_dialog`
275
+ - `page_has_alert`
276
+
277
+ ### Element assertions
278
+ - `selector_visible`
279
+ - `selector_hidden`
280
+ - `ref_visible`
281
+ - `ref_enabled`
282
+ - `text_visible`
283
+ - `text_not_visible`
284
+ - `focused_matches`
285
+ - `value_equals`
286
+ - `value_contains`
287
+ - `checked_equals`
288
+ - `count_equals`
289
+ - `count_at_least`
290
+
291
+ ### Accessibility assertions
292
+ - `aria_snapshot_contains`
293
+ - `aria_snapshot_matches`
294
+ - `role_name_exists`
295
+ - `dialog_open`
296
+ - `alert_visible`
297
+
298
+ ### Observability assertions
299
+ - `no_console_errors`
300
+ - `network_request_seen`
301
+ - `response_status_seen`
302
+ - `no_failed_requests`
303
+ - `dialog_seen`
304
+
305
+ ### Visual assertions
306
+ - `screenshot_changed`
307
+ - `element_visually_changed`
308
+ - `layout_breakpoint_ok`
309
+
310
+ ## Suggested output shape
311
+
312
+ ```json
313
+ {
314
+ "verified": true,
315
+ "checks": [
316
+ {
317
+ "name": "url_contains",
318
+ "passed": true,
319
+ "actual": "http://localhost:3000/dashboard",
320
+ "expected": "/dashboard"
321
+ },
322
+ {
323
+ "name": "no_console_errors",
324
+ "passed": true,
325
+ "actual": 0
326
+ }
327
+ ],
328
+ "summary": "PASS (2/2 checks)",
329
+ "agent_hint": "Dashboard loaded without browser-side errors"
330
+ }
331
+ ```
332
+
333
+ ## Additional recommendation
334
+
335
+ Support both:
336
+
337
+ - single assertions
338
+ - multi-check assertions in one call
339
+
340
+ This keeps verification compact and expressive.
341
+
342
+ ---
343
+
344
+ # 2. Add `browser_batch` for Composite Action Execution
345
+
346
+ ## Proposal
347
+
348
+ Add a batch or transaction-style tool that executes multiple browser steps in a single tool call.
349
+
350
+ ## Why it matters
351
+
352
+ This is one of the highest-ROI speed and token-efficiency improvements.
353
+
354
+ Many browser tasks currently require a chatty loop:
355
+
356
+ - find
357
+ - click
358
+ - type
359
+ - wait
360
+ - inspect
361
+ - verify
362
+
363
+ A batch tool collapses obvious sequential actions into one round trip.
364
+
365
+ ## What it enables
366
+
367
+ - fewer tool invocations
368
+ - lower latency
369
+ - lower schema overhead
370
+ - less repetitive page-summary generation
371
+ - more deterministic execution of known action sequences
372
+
373
+ ## Example
374
+
375
+ ```json
376
+ {
377
+ "steps": [
378
+ { "action": "click_ref", "ref": "@v3:e2" },
379
+ { "action": "fill_ref", "ref": "@v3:e5", "text": "lex@example.com" },
380
+ { "action": "fill_ref", "ref": "@v3:e6", "text": "password123" },
381
+ { "action": "click_ref", "ref": "@v3:e7" },
382
+ { "action": "wait_for", "condition": "url_contains", "value": "/dashboard" },
383
+ { "action": "assert", "kind": "text_visible", "text": "Dashboard" }
384
+ ],
385
+ "stopOnFailure": true,
386
+ "finalSummaryOnly": true
387
+ }
388
+ ```
389
+
390
+ ## Recommended options
391
+
392
+ - `stopOnFailure`
393
+ - `captureIntermediateState`
394
+ - `includeIntermediateDiagnostics`
395
+ - `finalSummaryOnly`
396
+ - `returnStepResults`
397
+
398
+ ## Design note
399
+
400
+ This should not replace primitive tools. It should sit above them.
401
+
402
+ ---
403
+
404
+ # 3. Add `browser_diff` to Report What Changed
405
+
406
+ ## Proposal
407
+
408
+ Add a diff tool that compares two browser states or the pre/post state around an action.
409
+
410
+ ## Why it matters
411
+
412
+ The model frequently needs to answer:
413
+
414
+ - did the click do anything?
415
+ - what changed after submit?
416
+ - what new UI appeared?
417
+ - what should I inspect next?
418
+
419
+ A change summary is usually more useful than a fresh full snapshot.
420
+
421
+ ## What it enables
422
+
423
+ - faster reasoning after actions
424
+ - better success detection
425
+ - lower token usage
426
+ - easier failure diagnosis
427
+ - improved “next action” selection
428
+
429
+ ## Suggested diff dimensions
430
+
431
+ - URL change
432
+ - title change
433
+ - focus change
434
+ - dialog open/close
435
+ - heading additions/removals
436
+ - new alerts/errors/toasts
437
+ - interactive element count changes
438
+ - text changes in scoped region
439
+ - ARIA subtree changes
440
+ - validation error changes
441
+ - scroll position changes
442
+ - form state changes
443
+
444
+ ## Example output
445
+
446
+ ```json
447
+ {
448
+ "changed": true,
449
+ "changes": [
450
+ { "type": "url", "before": "/login", "after": "/dashboard" },
451
+ { "type": "dialog_closed", "value": "Sign in" },
452
+ { "type": "new_heading", "value": "Dashboard" }
453
+ ],
454
+ "summary": "Navigation completed and login modal closed",
455
+ "agent_hint": "Authentication likely succeeded"
456
+ }
457
+ ```
458
+
459
+ ## Implementation note
460
+
461
+ A lightweight internal state snapshot should be stored after major actions so diffs are cheap.
462
+
463
+ ---
464
+
465
+ # 4. Add Form Intelligence
466
+
467
+ ## Proposal
468
+
469
+ Add form-specific analysis and fill tools.
470
+
471
+ ### New tools
472
+ - `browser_analyze_form`
473
+ - `browser_fill_form`
474
+
475
+ ## Why it matters
476
+
477
+ A large percentage of browser tasks are fundamentally form tasks:
478
+
479
+ - sign in
480
+ - sign up
481
+ - checkout
482
+ - onboarding
483
+ - search
484
+ - settings
485
+ - admin actions
486
+ - content publishing
487
+
488
+ Forms are one of the highest-leverage abstractions in browser automation.
489
+
490
+ ## What it enables
491
+
492
+ - fewer calls for common flows
493
+ - stronger semantic mapping between labels and inputs
494
+ - automatic handling of required fields and validation messages
495
+ - better submit targeting
496
+ - more robust Kata verification of user flows
497
+
498
+ ## `browser_analyze_form` should return
499
+
500
+ - form purpose inference
501
+ - fields and labels
502
+ - field types
503
+ - required status
504
+ - current values
505
+ - current validation errors
506
+ - submit controls
507
+ - grouped sections
508
+ - likely primary action
509
+
510
+ ## `browser_fill_form` should support
511
+
512
+ ```json
513
+ {
514
+ "selector": "form",
515
+ "values": {
516
+ "email": "lex@example.com",
517
+ "password": "hunter2"
518
+ },
519
+ "submit": true,
520
+ "strict": false
521
+ }
522
+ ```
523
+
524
+ ## Important design behavior
525
+
526
+ It should map values by:
527
+
528
+ - label text
529
+ - accessible name
530
+ - field name
531
+ - placeholder when needed
532
+ - form-local semantic inference
533
+
534
+ ## Recommended output
535
+
536
+ - matched fields
537
+ - unmatched requested values
538
+ - fields skipped
539
+ - validation state after fill
540
+ - submit result summary
541
+
542
+ ---
543
+
544
+ # 5. Add Intent-Ranked Element Retrieval
545
+
546
+ ## Proposal
547
+
548
+ Add a smarter semantic finder, such as `browser_find_best`.
549
+
550
+ ## Why it matters
551
+
552
+ The current `browser_find` is useful but still fairly literal. Agents often need a ranked answer to questions like:
553
+
554
+ - what is the primary CTA?
555
+ - which button submits this form?
556
+ - which textbox is the email field?
557
+ - what element most likely advances login?
558
+ - which visible error is most relevant right now?
559
+
560
+ ## What it enables
561
+
562
+ - better action selection
563
+ - fewer failed clicks
564
+ - less token spent interpreting noisy candidate lists
565
+ - more autonomous local decisions
566
+
567
+ ## Example
568
+
569
+ ```json
570
+ {
571
+ "intent": "submit login form",
572
+ "candidates": [
573
+ {
574
+ "ref": "@v5:e7",
575
+ "score": 0.93,
576
+ "reason": "button in same form as email and password fields named Sign in"
577
+ },
578
+ {
579
+ "ref": "@v5:e9",
580
+ "score": 0.41,
581
+ "reason": "secondary link outside form"
582
+ }
583
+ ]
584
+ }
585
+ ```
586
+
587
+ ## Suggested intents
588
+
589
+ - submit form
590
+ - primary CTA
591
+ - close dialog
592
+ - search field
593
+ - next step
594
+ - destructive action
595
+ - auth action
596
+ - error surface
597
+ - back navigation
598
+ - menu trigger
599
+
600
+ ## Design recommendation
601
+
602
+ This should be deterministic heuristic ranking first, not a hidden LLM.
603
+
604
+ ---
605
+
606
+ # 6. Upgrade the Ref System
607
+
608
+ ## Proposal
609
+
610
+ Keep versioned refs, but evolve them into a richer semantic reference system.
611
+
612
+ ## Why it matters
613
+
614
+ Refs are the backbone of efficient browser interaction. The current system is good; the next step is to make refs more resilient, more semantic, and more useful across changing DOMs.
615
+
616
+ ## What it enables
617
+
618
+ - lower selector dependence
619
+ - better recovery from DOM churn
620
+ - more compact instructions
621
+ - clearer reasoning for the agent
622
+
623
+ ## Proposed upgrades
624
+
625
+ ### A. Snapshot modes
626
+ Allow specialized snapshot modes:
627
+
628
+ - `interactive`
629
+ - `form`
630
+ - `dialog`
631
+ - `navigation`
632
+ - `errors`
633
+ - `headings`
634
+ - `visible_only`
635
+
636
+ This reduces token waste and improves relevance.
637
+
638
+ ### B. Better internal fingerprints
639
+ Track more stable descriptors:
640
+
641
+ - role
642
+ - accessible name
643
+ - type
644
+ - href
645
+ - form ownership
646
+ - ancestry signature
647
+ - relative region
648
+ - label association
649
+ - nearby headings
650
+
651
+ This helps ref remapping across light DOM changes.
652
+
653
+ ### C. Semantic aliases
654
+ Potentially expose alias-like labels such as:
655
+
656
+ - primary submit
657
+ - close dialog
658
+ - current tab
659
+ - email field
660
+ - password field
661
+
662
+ Even if these remain derived rather than canonical, they can improve action clarity.
663
+
664
+ ### D. Scoped ref groups
665
+ Allow refs generated per region:
666
+
667
+ - within dialog
668
+ - within main
669
+ - within form
670
+ - within sidebar
671
+
672
+ This helps reduce ambiguity.
673
+
674
+ ---
675
+
676
+ # 7. Add Browser Session Modeling: Tabs, Pages, Frames
677
+
678
+ ## Proposal
679
+
680
+ Promote the internal browser model from “single active page” to a real page registry.
681
+
682
+ ### New tools
683
+ - `browser_list_pages`
684
+ - `browser_switch_page`
685
+ - `browser_close_page`
686
+ - `browser_list_frames`
687
+ - `browser_select_frame`
688
+
689
+ ## Why it matters
690
+
691
+ Real browser flows often involve:
692
+
693
+ - popups
694
+ - auth redirects
695
+ - payment tabs
696
+ - docs tabs
697
+ - embedded auth iframes
698
+ - admin consoles with frames
699
+
700
+ A single global `page` pointer does not scale well.
701
+
702
+ ## What it enables
703
+
704
+ - more reliable multi-tab flows
705
+ - less hidden state confusion
706
+ - better popup handling
707
+ - frame-aware automation
708
+ - clearer debugging when navigation opens a new surface
709
+
710
+ ## Recommended session model
711
+
712
+ Track:
713
+
714
+ - page id
715
+ - opener relationship
716
+ - title
717
+ - URL
718
+ - last active time
719
+ - frame inventory
720
+ - whether page was auto-opened or explicitly targeted
721
+
722
+ ## Design recommendation
723
+
724
+ Auto-switching to a newly opened page is still useful, but should be visible and inspectable.
725
+
726
+ ---
727
+
728
+ # 8. Add Tracing and Failure Artifacts
729
+
730
+ ## Proposal
731
+
732
+ Add explicit debug artifact tools.
733
+
734
+ ### New tools
735
+ - `browser_trace_start`
736
+ - `browser_trace_stop`
737
+ - `browser_export_har`
738
+ - `browser_debug_bundle`
739
+ - `browser_timeline`
740
+ - `browser_session_summary`
741
+
742
+ ## Why it matters
743
+
744
+ For Kata and for hard UI debugging, you need failure evidence that survives the current context window.
745
+
746
+ ## What it enables
747
+
748
+ - durable debugging artifacts
749
+ - post-failure inspection without replaying everything
750
+ - easier handoff across sessions or agents
751
+ - structured evidence for summaries and UAT docs
752
+
753
+ ## `browser_debug_bundle` should ideally include
754
+
755
+ - current URL/title
756
+ - viewport
757
+ - recent actions
758
+ - compact recent warnings
759
+ - recent console errors
760
+ - recent failed/important requests
761
+ - active dialogs
762
+ - screenshot path or inline thumbnail
763
+ - scoped AX snapshot near likely failure area
764
+ - trace path if enabled
765
+ - concise failure hypothesis
766
+
767
+ ## Artifact policy
768
+
769
+ Heavy artifacts should be written to disk and summarized in tool output.
770
+
771
+ Example return:
772
+
773
+ ```json
774
+ {
775
+ "bundlePath": ".artifacts/browser/failure-2026-03-09T15-22-10Z/",
776
+ "files": ["trace.zip", "screenshot.jpg", "summary.json", "ax.md"],
777
+ "summary": "Submit button click did not change URL or form state; network returned 422"
778
+ }
779
+ ```
780
+
781
+ ---
782
+
783
+ # 9. Add Goal-Oriented Composite Tools
784
+
785
+ ## Proposal
786
+
787
+ Add tools that operate one level above raw browser actions.
788
+
789
+ ### Candidate tools
790
+ - `browser_act`
791
+ - `browser_run_task`
792
+ - `browser_recommend_next`
793
+ - `browser_verify_flow`
794
+
795
+ ## Why it matters
796
+
797
+ The model should not have to fully re-solve every local browser decision through multiple turns if the browser device can cheaply reason about obvious next steps.
798
+
799
+ ## What it enables
800
+
801
+ - reduced local decision overhead
802
+ - more agent autonomy
803
+ - bounded browser-side loops for repetitive UI micro-tasks
804
+ - cleaner higher-level orchestration
805
+
806
+ ## Suggested roles
807
+
808
+ ### `browser_recommend_next`
809
+ Given a goal and current page state, return the best next 3 actions with confidence and reasons.
810
+
811
+ ### `browser_act`
812
+ Perform one higher-level semantic action like:
813
+
814
+ - open login dialog
815
+ - submit current form
816
+ - close active modal
817
+ - click primary CTA
818
+ - expand navigation menu
819
+
820
+ ### `browser_verify_flow`
821
+ Run a bounded set of assertions for a named flow such as:
822
+
823
+ - logged in
824
+ - signed out
825
+ - item created
826
+ - toast appeared
827
+ - navigation completed
828
+
829
+ ### `browser_run_task`
830
+ Frontier tool: perform a bounded internal action loop toward a clear goal.
831
+
832
+ ## Safety recommendations
833
+
834
+ These tools must be bounded by:
835
+
836
+ - max step count
837
+ - allowed action categories
838
+ - destructive action restrictions
839
+ - explicit halt conditions
840
+
841
+ ---
842
+
843
+ # 10. Add Better Waits and Reactive Predicates
844
+
845
+ ## Proposal
846
+
847
+ Replace or augment `browser_wait_for` with a richer `browser_wait_until`.
848
+
849
+ ## Why it matters
850
+
851
+ Generic waiting is weaker than intent-aware waiting. The best wait is waiting for the expected outcome.
852
+
853
+ ## What it enables
854
+
855
+ - higher reliability
856
+ - fewer arbitrary delays
857
+ - better async app support
858
+ - less flakiness in SPA and real-time UIs
859
+
860
+ ## Suggested predicates
861
+
862
+ - text appears/disappears
863
+ - ref state changes
864
+ - element count changes
865
+ - request matching pattern completes
866
+ - response with status seen
867
+ - toast appears
868
+ - dialog opens/closes
869
+ - loading spinner disappears
870
+ - route transition completes
871
+ - region stops changing
872
+ - focus reaches expected element
873
+
874
+ ## Design note
875
+
876
+ This should integrate with the same state/diff infrastructure proposed above.
877
+
878
+ ---
879
+
880
+ # 11. Make Screenshots More Selective and More Useful
881
+
882
+ ## Proposal
883
+
884
+ Keep screenshots, but use them more surgically.
885
+
886
+ ### New tools or behaviors
887
+ - `browser_screenshot_diff`
888
+ - `browser_capture_region`
889
+ - `browser_inspect_visual`
890
+
891
+ ## Why it matters
892
+
893
+ Screenshots are valuable when:
894
+
895
+ - the UI is canvas-based
896
+ - layout quality matters
897
+ - icon-only controls are ambiguous
898
+ - a visual regression is suspected
899
+ - CSS behavior matters
900
+ - semantic state is insufficient
901
+
902
+ But screenshots are often too expensive and too noisy to be the default state transport.
903
+
904
+ ## What it enables
905
+
906
+ - better visual debugging when actually needed
907
+ - less token waste than full-page screenshots
908
+ - pairing visual evidence with semantic evidence
909
+
910
+ ## Recommended direction
911
+
912
+ - make screenshots scoped and purposeful
913
+ - prefer element/region crops over full-page captures
914
+ - pair screenshot outputs with semantic context and diffs
915
+ - support perceptual diff summaries instead of raw image-only comparisons
916
+
917
+ ---
918
+
919
+ # 12. Add Structured Network and Console Assertions
920
+
921
+ ## Proposal
922
+
923
+ Evolve buffered observability from passive retrieval into active verification and querying.
924
+
925
+ ## Why it matters
926
+
927
+ Modern web apps often fail in ways only visible through:
928
+
929
+ - fetch/XHR failures
930
+ - console errors
931
+ - CSP/CORS issues
932
+ - React hydration errors
933
+ - auth-related 401/403s
934
+
935
+ These should be easy for the agent to test explicitly.
936
+
937
+ ## What it enables
938
+
939
+ - stronger root-cause detection
940
+ - better end-to-end verification
941
+ - fewer false positives where UI looked okay but requests failed
942
+
943
+ ## Suggested additions
944
+
945
+ - filter by request URL pattern
946
+ - filter by method/resource type/status range
947
+ - query logs since action id or timestamp
948
+ - assert request happened
949
+ - assert response status seen
950
+ - assert no console errors of severity >= error
951
+ - assert no failed XHR/fetch during flow
952
+
953
+ ---
954
+
955
+ # 13. Add an Action Timeline and Action IDs
956
+
957
+ ## Proposal
958
+
959
+ Assign every browser action an internal action id and keep a lightweight action timeline.
960
+
961
+ ## Why it matters
962
+
963
+ This makes the system far more debuggable and composable.
964
+
965
+ ## What it enables
966
+
967
+ - diff since action N
968
+ - logs since action N
969
+ - request correlation
970
+ - failure bundle generation
971
+ - concise flow summaries
972
+ - better Kata verification records
973
+
974
+ ## Suggested stored fields per action
975
+
976
+ - action id
977
+ - tool name
978
+ - params summary
979
+ - page id
980
+ - timestamp start/end
981
+ - verification outcome
982
+ - detected changes
983
+ - relevant warnings
984
+
985
+ ---
986
+
987
+ # 14. Tighten Tool Descriptions and Prompt Guidance
988
+
989
+ ## Proposal
990
+
991
+ Refine tool descriptions so the model understands exactly what each tool returns and when to use it.
992
+
993
+ ## Why it matters
994
+
995
+ A surprising amount of agent inefficiency comes from slightly misleading tool expectations.
996
+
997
+ ## Current issue
998
+
999
+ Some tools describe outputs in terms like “returns accessibility snapshot” when they more accurately return a compact page summary.
1000
+
1001
+ ## What it enables
1002
+
1003
+ - better tool selection
1004
+ - fewer redundant follow-up calls
1005
+ - less confusion about when to use full AX vs compact find vs summaries
1006
+
1007
+ ## Recommended prompt guidance hierarchy
1008
+
1009
+ For state inspection, teach the model to prefer:
1010
+
1011
+ 1. `browser_find`
1012
+ 2. `browser_snapshot_refs`
1013
+ 3. `browser_assert`
1014
+ 4. `browser_diff`
1015
+ 5. `browser_get_accessibility_tree`
1016
+ 6. `browser_get_page_source`
1017
+ 7. `browser_evaluate`
1018
+
1019
+ This keeps common browsing token-efficient.
1020
+
1021
+ ---
1022
+
1023
+ # 15. Add Browser-Side State Compression and Delta Reporting
1024
+
1025
+ ## Proposal
1026
+
1027
+ Internally maintain a compact page model and expose only deltas unless the agent asks for full detail.
1028
+
1029
+ ## Why it matters
1030
+
1031
+ This is one of the biggest long-term wins for context efficiency.
1032
+
1033
+ ## What it enables
1034
+
1035
+ - state reuse across tool calls
1036
+ - lower repeated summaries
1037
+ - cheaper comparison after actions
1038
+ - better change detection
1039
+ - smarter internal recommendations
1040
+
1041
+ ## Internal state could include
1042
+
1043
+ - last summary
1044
+ - heading set
1045
+ - visible alerts
1046
+ - dialog inventory
1047
+ - interactive ref list
1048
+ - form inventory
1049
+ - last screenshot hash
1050
+ - last AX signatures for key scopes
1051
+
1052
+ ## Output policy
1053
+
1054
+ The default response should prefer:
1055
+
1056
+ - what changed
1057
+ - what likely matters
1058
+ - what the agent might want next
1059
+
1060
+ rather than always restating the whole page summary.
1061
+
1062
+ ---
1063
+
1064
+ # 16. Add Kata-Native Verification Outputs
1065
+
1066
+ ## Proposal
1067
+
1068
+ Make browser-tools able to emit outputs that directly support Kata slice/task completion.
1069
+
1070
+ ## Why it matters
1071
+
1072
+ You explicitly want browser tools to power automatic verification and testing during `@agent/extensions/kata/` use.
1073
+
1074
+ ## What it enables
1075
+
1076
+ - easier automatic generation of `Sxx-UAT.md` content
1077
+ - deterministic slice verification evidence
1078
+ - less ad hoc summarization by the agent
1079
+ - clearer “done/not done” boundaries
1080
+
1081
+ ## Suggested additions
1082
+
1083
+ ### `browser_verify_flow`
1084
+ Return:
1085
+
1086
+ - named flow
1087
+ - steps attempted
1088
+ - checks passed/failed
1089
+ - evidence links/paths
1090
+ - final verdict
1091
+
1092
+ ### `browser_export_verification_report`
1093
+ Write a markdown or JSON artifact summarizing:
1094
+
1095
+ - environment
1096
+ - URL(s)
1097
+ - viewport(s)
1098
+ - actions
1099
+ - assertions
1100
+ - outcome
1101
+ - diagnostics
1102
+
1103
+ This is especially useful for Kata artifacts.
1104
+
1105
+ ---
1106
+
1107
+ ## Proposed Roadmap
1108
+
1109
+ ## Phase 1 — Highest-ROI Near-Term Upgrades
1110
+
1111
+ These are the best immediate improvements.
1112
+
1113
+ ### 1. `browser_assert`
1114
+ Highest priority.
1115
+
1116
+ ### 2. `browser_batch`
1117
+ Highest priority.
1118
+
1119
+ ### 3. `browser_diff`
1120
+ Highest priority.
1121
+
1122
+ ### 4. `browser_analyze_form`
1123
+ Very high priority.
1124
+
1125
+ ### 5. `browser_fill_form`
1126
+ Very high priority.
1127
+
1128
+ ### 6. Tighten tool descriptions and prompt guidance
1129
+ Low risk, immediate value.
1130
+
1131
+ ### 7. Action timeline / action ids
1132
+ Important enabling infrastructure.
1133
+
1134
+ ---
1135
+
1136
+ ## Phase 2 — Strong Maturity Upgrades
1137
+
1138
+ ### 8. Multi-page/tab/frame model
1139
+ ### 9. Richer wait predicates
1140
+ ### 10. Structured network/console assertions
1141
+ ### 11. Ref snapshot modes and better ref fingerprints
1142
+ ### 12. Debug bundle and trace export
1143
+
1144
+ ---
1145
+
1146
+ ## Phase 3 — Frontier AI-Native Capabilities
1147
+
1148
+ ### 13. `browser_find_best`
1149
+ ### 14. `browser_recommend_next`
1150
+ ### 15. `browser_act`
1151
+ ### 16. `browser_verify_flow`
1152
+ ### 17. `browser_run_task`
1153
+ ### 18. hybrid semantic + visual fallback targeting
1154
+
1155
+ These are the ideas that move the extension from excellent tooling into a genuinely mind-blowing browser device for agents.
1156
+
1157
+ ---
1158
+
1159
+ ## Detailed Impact Summary
1160
+
1161
+ ## Biggest wins for context efficiency
1162
+
1163
+ 1. `browser_batch`
1164
+ 2. `browser_diff`
1165
+ 3. snapshot modes for refs
1166
+ 4. assertion outputs instead of prose
1167
+ 5. browser-side state compression/deltas
1168
+ 6. form-level tools replacing many small actions
1169
+
1170
+ ## Biggest wins for reliability
1171
+
1172
+ 1. `browser_assert`
1173
+ 2. richer waits
1174
+ 3. multi-page/frame awareness
1175
+ 4. structured network/console assertions
1176
+ 5. failure bundles and trace export
1177
+ 6. smarter ref remapping
1178
+
1179
+ ## Biggest wins for agent autonomy
1180
+
1181
+ 1. `browser_assert`
1182
+ 2. `browser_recommend_next`
1183
+ 3. `browser_find_best`
1184
+ 4. `browser_fill_form`
1185
+ 5. `browser_verify_flow`
1186
+ 6. `browser_run_task`
1187
+
1188
+ ## Biggest wins for Kata
1189
+
1190
+ 1. explicit verification outputs
1191
+ 2. debug bundles on failure
1192
+ 3. flow verification reports
1193
+ 4. assertion-based PASS/FAIL summaries
1194
+ 5. durable artifact export
1195
+
1196
+ ---
1197
+
1198
+ ## What Should Remain True in V2
1199
+
1200
+ As the extension evolves, it should preserve its best current qualities.
1201
+
1202
+ ### Keep these principles
1203
+ - accessibility-first browsing
1204
+ - deterministic refs
1205
+ - compact summaries
1206
+ - pull-based diagnostics
1207
+ - verification after action
1208
+ - screenshots as support, not default state transport
1209
+ - adaptive settling
1210
+
1211
+ ### Avoid these regressions
1212
+ - screenshot-first browsing as the normal path
1213
+ - giant raw DOM dumps as default output
1214
+ - excessive prose instead of structured results
1215
+ - hidden nondeterminism in action selection
1216
+ - too many tool calls for common flows
1217
+ - flaky fixed waits replacing intent-aware checks
1218
+
1219
+ ---
1220
+
1221
+ ## Recommended Implementation Order
1222
+
1223
+ If the goal is maximum practical value with strong architectural compounding, implement in this order:
1224
+
1225
+ 1. `browser_assert`
1226
+ 2. action timeline / action ids
1227
+ 3. `browser_batch`
1228
+ 4. `browser_diff`
1229
+ 5. `browser_analyze_form`
1230
+ 6. `browser_fill_form`
1231
+ 7. structured network/console assertions
1232
+ 8. multi-page and frame model
1233
+ 9. trace/debug bundle tools
1234
+ 10. ref snapshot modes and richer fingerprints
1235
+ 11. `browser_find_best`
1236
+ 12. `browser_recommend_next`
1237
+ 13. `browser_verify_flow`
1238
+ 14. `browser_run_task`
1239
+
1240
+ This order gives immediate value while laying down the right primitives for more ambitious features.
1241
+
1242
+ ---
1243
+
1244
+ ## Final Recommendation
1245
+
1246
+ The current browser-tools extension is already on the right side of the 2026 design curve. It has made several choices that are smarter than many contemporary AI browser stacks.
1247
+
1248
+ The next leap is to shift from:
1249
+
1250
+ - a browser control toolkit
1251
+
1252
+ into:
1253
+
1254
+ - a browser execution and verification device purpose-built for agents
1255
+
1256
+ The most important changes are:
1257
+
1258
+ - first-class assertions
1259
+ - batch execution
1260
+ - state diffs
1261
+ - form intelligence
1262
+ - session/page/frame modeling
1263
+ - durable debug artifacts
1264
+ - intent-aware semantic helpers
1265
+
1266
+ If these are implemented well, browser-tools can become not just a useful extension, but a foundational AI-native capability for both:
1267
+
1268
+ - **agentic browser use across the web**
1269
+ - **automatic verification inside Kata workflows**
1270
+
1271
+ ---
1272
+
1273
+ ## File Added
1274
+
1275
+ This proposal is stored at:
1276
+
1277
+ `agent/extensions/browser-tools/BROWSER-TOOLS-V2-PROPOSAL.md`