browser-use 0.1.0 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (258) hide show
  1. package/README.md +301 -636
  2. package/dist/actor/element.d.ts +19 -0
  3. package/dist/actor/element.js +46 -0
  4. package/dist/actor/index.d.ts +4 -0
  5. package/dist/actor/index.js +4 -0
  6. package/dist/actor/mouse.d.ts +19 -0
  7. package/dist/actor/mouse.js +39 -0
  8. package/dist/actor/page.d.ts +29 -0
  9. package/dist/actor/page.js +88 -0
  10. package/dist/actor/utils.d.ts +4 -0
  11. package/dist/actor/utils.js +35 -0
  12. package/dist/agent/cloud-events.d.ts +18 -0
  13. package/dist/agent/cloud-events.js +65 -2
  14. package/dist/agent/gif.d.ts +1 -0
  15. package/dist/agent/gif.js +24 -2
  16. package/dist/agent/judge.d.ts +17 -0
  17. package/dist/agent/judge.js +197 -0
  18. package/dist/agent/message-manager/service.d.ts +12 -4
  19. package/dist/agent/message-manager/service.js +205 -39
  20. package/dist/agent/message-manager/utils.js +0 -1
  21. package/dist/agent/message-manager/views.d.ts +4 -0
  22. package/dist/agent/message-manager/views.js +11 -7
  23. package/dist/agent/prompts.d.ts +24 -3
  24. package/dist/agent/prompts.js +274 -59
  25. package/dist/agent/service.d.ts +99 -40
  26. package/dist/agent/service.js +2282 -474
  27. package/dist/agent/variable-detector.d.ts +12 -0
  28. package/dist/agent/variable-detector.js +211 -0
  29. package/dist/agent/views.d.ts +237 -17
  30. package/dist/agent/views.js +446 -32
  31. package/dist/browser/cloud/cloud.d.ts +20 -0
  32. package/dist/browser/cloud/cloud.js +129 -0
  33. package/dist/browser/cloud/index.d.ts +2 -0
  34. package/dist/browser/cloud/index.js +2 -0
  35. package/dist/browser/cloud/views.d.ts +41 -0
  36. package/dist/browser/cloud/views.js +35 -0
  37. package/dist/browser/events.d.ts +345 -0
  38. package/dist/browser/events.js +566 -0
  39. package/dist/browser/extensions.js +17 -17
  40. package/dist/browser/index.d.ts +4 -0
  41. package/dist/browser/index.js +4 -0
  42. package/dist/browser/profile.d.ts +8 -2
  43. package/dist/browser/profile.js +79 -12
  44. package/dist/browser/session-manager.d.ts +85 -0
  45. package/dist/browser/session-manager.js +208 -0
  46. package/dist/browser/session.d.ts +100 -8
  47. package/dist/browser/session.js +1102 -63
  48. package/dist/browser/types.d.ts +0 -2
  49. package/dist/browser/views.d.ts +39 -0
  50. package/dist/browser/views.js +32 -0
  51. package/dist/browser/watchdogs/aboutblank-watchdog.d.ts +12 -0
  52. package/dist/browser/watchdogs/aboutblank-watchdog.js +131 -0
  53. package/dist/browser/watchdogs/base.d.ts +21 -0
  54. package/dist/browser/watchdogs/base.js +81 -0
  55. package/dist/browser/watchdogs/cdp-session-watchdog.d.ts +14 -0
  56. package/dist/browser/watchdogs/cdp-session-watchdog.js +177 -0
  57. package/dist/browser/watchdogs/crash-watchdog.d.ts +38 -0
  58. package/dist/browser/watchdogs/crash-watchdog.js +296 -0
  59. package/dist/browser/watchdogs/default-action-watchdog.d.ts +49 -0
  60. package/dist/browser/watchdogs/default-action-watchdog.js +212 -0
  61. package/dist/browser/watchdogs/dom-watchdog.d.ts +8 -0
  62. package/dist/browser/watchdogs/dom-watchdog.js +31 -0
  63. package/dist/browser/watchdogs/downloads-watchdog.d.ts +77 -0
  64. package/dist/browser/watchdogs/downloads-watchdog.js +409 -0
  65. package/dist/browser/watchdogs/har-recording-watchdog.d.ts +19 -0
  66. package/dist/browser/watchdogs/har-recording-watchdog.js +317 -0
  67. package/dist/browser/watchdogs/index.d.ts +15 -0
  68. package/dist/browser/watchdogs/index.js +15 -0
  69. package/dist/browser/watchdogs/local-browser-watchdog.d.ts +10 -0
  70. package/dist/browser/watchdogs/local-browser-watchdog.js +32 -0
  71. package/dist/browser/watchdogs/permissions-watchdog.d.ts +8 -0
  72. package/dist/browser/watchdogs/permissions-watchdog.js +73 -0
  73. package/dist/browser/watchdogs/popups-watchdog.d.ts +13 -0
  74. package/dist/browser/watchdogs/popups-watchdog.js +77 -0
  75. package/dist/browser/watchdogs/recording-watchdog.d.ts +27 -0
  76. package/dist/browser/watchdogs/recording-watchdog.js +249 -0
  77. package/dist/browser/watchdogs/screenshot-watchdog.d.ts +6 -0
  78. package/dist/browser/watchdogs/screenshot-watchdog.js +13 -0
  79. package/dist/browser/watchdogs/security-watchdog.d.ts +10 -0
  80. package/dist/browser/watchdogs/security-watchdog.js +84 -0
  81. package/dist/browser/watchdogs/storage-state-watchdog.d.ts +24 -0
  82. package/dist/browser/watchdogs/storage-state-watchdog.js +288 -0
  83. package/dist/cli.d.ts +41 -0
  84. package/dist/cli.js +820 -10
  85. package/dist/code-use/formatting.d.ts +3 -0
  86. package/dist/code-use/formatting.js +18 -0
  87. package/dist/code-use/index.d.ts +6 -0
  88. package/dist/code-use/index.js +6 -0
  89. package/dist/code-use/namespace.d.ts +5 -0
  90. package/dist/code-use/namespace.js +81 -0
  91. package/dist/code-use/notebook-export.d.ts +3 -0
  92. package/dist/code-use/notebook-export.js +56 -0
  93. package/dist/code-use/service.d.ts +24 -0
  94. package/dist/code-use/service.js +104 -0
  95. package/dist/code-use/utils.d.ts +4 -0
  96. package/dist/code-use/utils.js +98 -0
  97. package/dist/code-use/views.d.ts +108 -0
  98. package/dist/code-use/views.js +165 -0
  99. package/dist/config.d.ts +13 -0
  100. package/dist/config.js +69 -3
  101. package/dist/controller/registry/service.d.ts +10 -1
  102. package/dist/controller/registry/service.js +266 -10
  103. package/dist/controller/registry/views.d.ts +4 -1
  104. package/dist/controller/registry/views.js +25 -2
  105. package/dist/controller/service.d.ts +10 -1
  106. package/dist/controller/service.js +1849 -288
  107. package/dist/controller/views.d.ts +78 -155
  108. package/dist/controller/views.js +61 -12
  109. package/dist/dom/history-tree-processor/service.d.ts +5 -0
  110. package/dist/dom/history-tree-processor/service.js +169 -14
  111. package/dist/dom/history-tree-processor/view.d.ts +7 -1
  112. package/dist/dom/history-tree-processor/view.js +10 -1
  113. package/dist/dom/markdown-extractor.d.ts +37 -0
  114. package/dist/dom/markdown-extractor.js +345 -0
  115. package/dist/dom/service.d.ts +3 -1
  116. package/dist/dom/service.js +76 -0
  117. package/dist/dom/views.d.ts +1 -0
  118. package/dist/dom/views.js +45 -0
  119. package/dist/event-bus.d.ts +107 -7
  120. package/dist/event-bus.js +313 -10
  121. package/dist/filesystem/file-system.d.ts +18 -0
  122. package/dist/filesystem/file-system.js +530 -42
  123. package/dist/index.d.ts +7 -0
  124. package/dist/index.js +6 -0
  125. package/dist/integrations/gmail/actions.d.ts +3 -3
  126. package/dist/integrations/gmail/actions.js +5 -5
  127. package/dist/llm/anthropic/chat.d.ts +18 -1
  128. package/dist/llm/anthropic/chat.js +123 -55
  129. package/dist/llm/anthropic/serializer.d.ts +2 -0
  130. package/dist/llm/anthropic/serializer.js +81 -9
  131. package/dist/llm/aws/chat-anthropic.d.ts +17 -0
  132. package/dist/llm/aws/chat-anthropic.js +129 -40
  133. package/dist/llm/aws/chat-bedrock.d.ts +28 -1
  134. package/dist/llm/aws/chat-bedrock.js +161 -34
  135. package/dist/llm/aws/serializer.d.ts +13 -1
  136. package/dist/llm/aws/serializer.js +56 -17
  137. package/dist/llm/azure/chat.d.ts +53 -2
  138. package/dist/llm/azure/chat.js +366 -53
  139. package/dist/llm/base.d.ts +2 -0
  140. package/dist/llm/browser-use/chat.d.ts +40 -0
  141. package/dist/llm/browser-use/chat.js +305 -0
  142. package/dist/llm/browser-use/index.d.ts +1 -0
  143. package/dist/llm/browser-use/index.js +1 -0
  144. package/dist/llm/cerebras/chat.d.ts +39 -0
  145. package/dist/llm/cerebras/chat.js +178 -0
  146. package/dist/llm/cerebras/index.d.ts +2 -0
  147. package/dist/llm/cerebras/index.js +2 -0
  148. package/dist/llm/cerebras/serializer.d.ts +7 -0
  149. package/dist/llm/cerebras/serializer.js +82 -0
  150. package/dist/llm/deepseek/chat.d.ts +19 -2
  151. package/dist/llm/deepseek/chat.js +138 -25
  152. package/dist/llm/google/chat.d.ts +46 -2
  153. package/dist/llm/google/chat.js +268 -63
  154. package/dist/llm/google/serializer.d.ts +9 -1
  155. package/dist/llm/google/serializer.js +141 -34
  156. package/dist/llm/groq/chat.d.ts +21 -2
  157. package/dist/llm/groq/chat.js +125 -26
  158. package/dist/llm/groq/parser.js +3 -1
  159. package/dist/llm/messages.d.ts +4 -4
  160. package/dist/llm/mistral/chat.d.ts +43 -0
  161. package/dist/llm/mistral/chat.js +154 -0
  162. package/dist/llm/mistral/index.d.ts +2 -0
  163. package/dist/llm/mistral/index.js +2 -0
  164. package/dist/llm/mistral/schema.d.ts +8 -0
  165. package/dist/llm/mistral/schema.js +27 -0
  166. package/dist/llm/models.d.ts +2 -0
  167. package/dist/llm/models.js +317 -0
  168. package/dist/llm/ollama/chat.d.ts +13 -1
  169. package/dist/llm/ollama/chat.js +110 -19
  170. package/dist/llm/ollama/serializer.d.ts +1 -0
  171. package/dist/llm/ollama/serializer.js +34 -12
  172. package/dist/llm/openai/chat.d.ts +16 -0
  173. package/dist/llm/openai/chat.js +94 -44
  174. package/dist/llm/openai/like.d.ts +5 -3
  175. package/dist/llm/openai/like.js +7 -3
  176. package/dist/llm/openai/responses-serializer.d.ts +18 -0
  177. package/dist/llm/openai/responses-serializer.js +72 -0
  178. package/dist/llm/openrouter/chat.d.ts +28 -2
  179. package/dist/llm/openrouter/chat.js +115 -29
  180. package/dist/llm/schema.d.ts +11 -1
  181. package/dist/llm/schema.js +81 -1
  182. package/dist/llm/vercel/chat.d.ts +50 -0
  183. package/dist/llm/vercel/chat.js +276 -0
  184. package/dist/llm/vercel/index.d.ts +1 -0
  185. package/dist/llm/vercel/index.js +1 -0
  186. package/dist/llm/vercel/serializer.d.ts +5 -0
  187. package/dist/llm/vercel/serializer.js +7 -0
  188. package/dist/llm/views.d.ts +2 -1
  189. package/dist/llm/views.js +3 -1
  190. package/dist/logging-config.d.ts +2 -0
  191. package/dist/logging-config.js +82 -29
  192. package/dist/mcp/client.d.ts +10 -5
  193. package/dist/mcp/client.js +21 -15
  194. package/dist/mcp/controller.d.ts +42 -3
  195. package/dist/mcp/controller.js +56 -31
  196. package/dist/mcp/server.d.ts +14 -0
  197. package/dist/mcp/server.js +257 -51
  198. package/dist/observability.js +10 -4
  199. package/dist/sandbox/index.d.ts +2 -0
  200. package/dist/sandbox/index.js +2 -0
  201. package/dist/sandbox/sandbox.d.ts +19 -0
  202. package/dist/sandbox/sandbox.js +140 -0
  203. package/dist/sandbox/views.d.ts +67 -0
  204. package/dist/sandbox/views.js +121 -0
  205. package/dist/skill-cli/index.d.ts +3 -0
  206. package/dist/skill-cli/index.js +3 -0
  207. package/dist/skill-cli/protocol.d.ts +30 -0
  208. package/dist/skill-cli/protocol.js +48 -0
  209. package/dist/skill-cli/server.d.ts +11 -0
  210. package/dist/skill-cli/server.js +85 -0
  211. package/dist/skill-cli/sessions.d.ts +24 -0
  212. package/dist/skill-cli/sessions.js +47 -0
  213. package/dist/skills/index.d.ts +3 -0
  214. package/dist/skills/index.js +3 -0
  215. package/dist/skills/service.d.ts +27 -0
  216. package/dist/skills/service.js +266 -0
  217. package/dist/skills/utils.d.ts +6 -0
  218. package/dist/skills/utils.js +53 -0
  219. package/dist/skills/views.d.ts +40 -0
  220. package/dist/skills/views.js +10 -0
  221. package/dist/sync/auth.js +8 -3
  222. package/dist/sync/service.d.ts +6 -6
  223. package/dist/sync/service.js +54 -89
  224. package/dist/telemetry/views.d.ts +20 -6
  225. package/dist/telemetry/views.js +23 -5
  226. package/dist/tokens/custom-pricing.d.ts +2 -0
  227. package/dist/tokens/custom-pricing.js +22 -0
  228. package/dist/tokens/index.d.ts +2 -0
  229. package/dist/tokens/index.js +2 -0
  230. package/dist/tokens/mappings.d.ts +1 -0
  231. package/dist/tokens/mappings.js +3 -0
  232. package/dist/tokens/service.js +30 -12
  233. package/dist/tools/extraction/index.d.ts +2 -0
  234. package/dist/tools/extraction/index.js +2 -0
  235. package/dist/tools/extraction/schema-utils.d.ts +6 -0
  236. package/dist/tools/extraction/schema-utils.js +237 -0
  237. package/dist/tools/extraction/views.d.ts +7 -0
  238. package/dist/tools/index.d.ts +5 -0
  239. package/dist/tools/index.js +5 -0
  240. package/dist/tools/registry/index.d.ts +2 -0
  241. package/dist/tools/registry/index.js +2 -0
  242. package/dist/tools/registry/service.d.ts +1 -0
  243. package/dist/tools/registry/service.js +1 -0
  244. package/dist/tools/registry/views.d.ts +1 -0
  245. package/dist/tools/registry/views.js +1 -0
  246. package/dist/tools/service.d.ts +2 -0
  247. package/dist/tools/service.js +1 -0
  248. package/dist/tools/utils.d.ts +2 -0
  249. package/dist/tools/utils.js +57 -0
  250. package/dist/tools/views.d.ts +1 -0
  251. package/dist/tools/views.js +1 -0
  252. package/dist/utils.d.ts +10 -1
  253. package/dist/utils.js +70 -3
  254. package/package.json +265 -28
  255. package/dist/dom/playground/process-dom.js +0 -5
  256. package/dist/dom/playground/test-accessibility.d.ts +0 -44
  257. package/dist/dom/playground/test-accessibility.js +0 -111
  258. /package/dist/{dom/playground/process-dom.d.ts → tools/extraction/views.js} +0 -0
package/README.md CHANGED
@@ -1,761 +1,426 @@
1
- # browser-use
2
-
3
- ![Node CI](https://github.com/webllm/browser-use/workflows/Node%20CI/badge.svg)
4
- [![npm](https://img.shields.io/npm/v/browser-use.svg)](https://www.npmjs.com/package/browser-use)
5
- ![license](https://img.shields.io/npm/l/browser-use)
6
-
7
- > 🙏 **A TypeScript port of the amazing [browser-use](https://github.com/browser-use/browser-use) Python library**
8
- >
9
- > This project is a faithful TypeScript/JavaScript implementation of the original [browser-use](https://github.com/browser-use/browser-use) Python library, bringing the power of AI-driven browser automation to the Node.js ecosystem. All credit for the innovative design and architecture goes to the original Python project and its creators.
10
-
11
- A TypeScript-first library for programmatic browser control, designed for building AI-powered web agents with vision capabilities and extensive LLM integrations.
12
-
13
- ## Why TypeScript?
14
-
15
- While the original [browser-use Python library](https://github.com/browser-use/browser-use) is excellent and feature-complete, this TypeScript port aims to:
16
-
17
- - 🌍 Bring browser-use capabilities to the JavaScript/TypeScript ecosystem
18
- - 🔧 Enable seamless integration with Node.js, Deno, and Bun projects
19
- - 📦 Provide native TypeScript type definitions for better DX
20
- - 🤝 Make browser automation accessible to frontend and full-stack developers
21
-
22
- ### Python vs TypeScript: Which Should You Use?
23
-
24
- | Feature | Python Version | TypeScript Version |
25
- | ------------------- | --------------------------------------------------------------------- | ----------------------------------------------------------- |
26
- | **Recommended for** | Python developers, Data scientists, AI/ML engineers | JavaScript/TypeScript developers, Full-stack engineers |
27
- | **Ecosystem** | PyPI, pip | npm, yarn, pnpm |
28
- | **Type Safety** | Optional (with type hints) | Built-in (TypeScript) |
29
- | **Runtime** | Python 3.x | Node.js, Deno, Bun |
30
- | **LLM Providers** | 10+ providers | 10+ providers (same) |
31
- | **Browser Support** | Playwright | Playwright (same) |
32
- | **Documentation** | Original & Complete | Port with TS-specific examples |
33
- | **Community** | Larger & More Established | Growing |
34
- | **GitHub** | [browser-use/browser-use](https://github.com/browser-use/browser-use) | [webllm/browser-use](https://github.com/webllm/browser-use) |
35
-
36
- **👉 If you're working in Python, we highly recommend using the [original browser-use library](https://github.com/browser-use/browser-use).** This TypeScript port is specifically for those who need to work within the JavaScript/TypeScript ecosystem.
37
-
38
- ### Commitment to the Original
39
-
40
- We are committed to:
41
-
42
- - ✅ Maintaining feature parity with the Python version whenever possible
43
- - 🔄 Keeping up with upstream updates and improvements
44
- - 🐛 Reporting bugs found in this port back to the original project when applicable
45
- - 📚 Directing users to the original project's documentation for core concepts
46
- - 🤝 Collaborating with the original authors and respecting their vision
47
-
48
- This is **not** a fork or competing project—it's a respectful port to serve a different programming language community.
49
-
50
- ## Features
51
-
52
- - 🤖 **AI-Powered**: Built specifically for LLM-driven web automation with structured output support
53
- - 🎯 **Type-Safe**: Full TypeScript support with comprehensive type definitions
54
- - 🌐 **Multi-Browser**: Support for Chromium, Firefox, and WebKit via Playwright
55
- - 🔌 **10+ LLM Providers**: OpenAI, Anthropic, Google, AWS, Azure, DeepSeek, Groq, Ollama, OpenRouter, and more
56
- - 👁️ **Vision Support**: Multimodal capabilities with screenshot analysis
57
- - 🛡️ **Robust**: Built-in error handling, recovery, graceful shutdown, and retry mechanisms
58
- - 📊 **Observable**: Comprehensive logging, execution history, and telemetry
59
- - 🔧 **Extensible**: Custom actions, MCP protocol, and plugin system
60
- - 📁 **FileSystem**: Built-in file operations with PDF parsing
61
- - 🔗 **Integrations**: Gmail API, Google Sheets, and MCP servers
62
-
63
- ## Quick Start
1
+ <p align="center">
2
+ <h1 align="center">🌐 Browser-Use</h1>
3
+ <p align="center">
4
+ <strong>Make websites accessible for AI agents — in TypeScript</strong>
5
+ </p>
6
+ <p align="center">
7
+ A TypeScript-first library for building AI-powered web agents that can autonomously browse, interact with, and extract data from the web using LLMs and Playwright.
8
+ </p>
9
+ </p>
10
+
11
+ <p align="center">
12
+ <a href="https://github.com/webllm/browser-use/workflows/Node%20CI"><img src="https://github.com/webllm/browser-use/workflows/Node%20CI/badge.svg" alt="Node CI"></a>
13
+ <a href="https://www.npmjs.com/package/browser-use"><img src="https://img.shields.io/npm/v/browser-use.svg" alt="npm"></a>
14
+ <a href="https://www.npmjs.com/package/browser-use"><img src="https://img.shields.io/npm/dm/browser-use.svg" alt="npm downloads"></a>
15
+ <img src="https://img.shields.io/npm/l/browser-use" alt="license">
16
+ <img src="https://img.shields.io/badge/TypeScript-first-blue" alt="TypeScript">
17
+ </p>
18
+
19
+ ---
20
+
21
+ > **TypeScript port** of the popular Python [browser-use](https://github.com/browser-use/browser-use) library — with a native Node.js experience, full type safety, and first-class support for all major LLM providers.
22
+
23
+ ## ✨ Features
24
+
25
+ - 🤖 **Autonomous Browser Control** AI-driven navigation, clicking, typing, form filling, scrolling, and tab management
26
+ - 🧠 **10+ LLM Providers** OpenAI, Anthropic, Google Gemini, Azure, AWS Bedrock, Groq, Ollama, DeepSeek, OpenRouter, Mistral, Cerebras, and custom providers
27
+ - 👁️ **Vision Support** Screenshot-based understanding for visual web interactions
28
+ - 🔧 **45+ Built-in Actions** Navigation, element interaction, scrolling, forms, tabs, content extraction, file I/O, and more
29
+ - 🧩 **Custom Actions** Extensible registry with Zod schema validation, domain restrictions, and page filters
30
+ - 🔌 **MCP Server** Model Context Protocol support for Claude Desktop and MCP-compatible clients
31
+ - ⌨️ **CLI Tool** Interactive and one-shot modes for quick browser tasks
32
+ - 🔒 **Security First** Sensitive data masking, domain restrictions, and Chromium sandboxing
33
+ - 📊 **Observability** Event system, telemetry, performance tracing, and session recording (GIF)
34
+ - 🐳 **Docker Ready** Configurable for containerized and CI/CD environments
35
+
36
+ ## 🚀 Quick Start
64
37
 
65
38
  ### Installation
66
39
 
67
40
  ```bash
68
41
  npm install browser-use
69
- # or
70
- yarn add browser-use
71
- # or
72
- pnpm add browser-use
42
+ # Playwright browsers are installed automatically via postinstall
73
43
  ```
74
44
 
75
- Playwright browsers will be installed automatically via postinstall hook.
45
+ ### Set Up Your API Key
76
46
 
77
- ### Basic Usage with Agent
78
-
79
- ```typescript
80
- import { Agent } from 'browser-use';
81
- import { ChatOpenAI } from 'browser-use/llm/openai';
82
-
83
- async function main() {
84
- const llm = new ChatOpenAI({
85
- model: 'gpt-4',
86
- apiKey: process.env.OPENAI_API_KEY,
87
- });
88
-
89
- const agent = new Agent({
90
- task: 'Go to google.com and search for "TypeScript browser automation"',
91
- llm,
92
- });
93
-
94
- const history = await agent.run();
95
-
96
- console.log(`Task completed in ${history.history.length} steps`);
97
-
98
- // Access the browser session
99
- const browserSession = agent.browser_session;
100
- const currentPage = await browserSession.get_current_page();
101
- console.log('Final URL:', currentPage?.url());
102
- }
103
-
104
- main();
47
+ ```bash
48
+ export OPENAI_API_KEY=sk-your-api-key
49
+ # or ANTHROPIC_API_KEY, GOOGLE_API_KEY, etc.
105
50
  ```
106
51
 
107
- ### Using Controller for Custom Actions
108
-
109
- Use `Controller` to register domain-specific actions, then pass it into `Agent`:
52
+ ### Run Your First Agent
110
53
 
111
54
  ```typescript
112
- import { Agent, Controller, ActionResult } from 'browser-use';
55
+ import { Agent } from 'browser-use';
113
56
  import { ChatOpenAI } from 'browser-use/llm/openai';
114
- import { z } from 'zod';
115
-
116
- const controller = new Controller();
117
-
118
- controller.registry.action('Extract product info from the current page', {
119
- param_model: z.object({
120
- include_price: z.boolean().default(true),
121
- include_reviews: z.boolean().default(false),
122
- }),
123
- })(async function extract_product_info(params, { page }) {
124
- const productData = await page.evaluate(() => ({
125
- title: document.querySelector('h1')?.textContent ?? null,
126
- price: document.querySelector('.price')?.textContent ?? null,
127
- }));
128
-
129
- return new ActionResult({
130
- extracted_content: JSON.stringify({ ...productData, ...params }),
131
- include_in_memory: true,
132
- });
133
- });
134
57
 
135
58
  const agent = new Agent({
136
- task: 'Open product page and extract product info',
59
+ task: 'Go to google.com and search for "TypeScript tutorials"',
137
60
  llm: new ChatOpenAI({
138
61
  model: 'gpt-4o',
139
62
  apiKey: process.env.OPENAI_API_KEY,
140
63
  }),
141
- controller,
142
64
  });
143
65
 
144
- const history = await agent.run(10);
145
- console.log(history.final_result());
66
+ const history = await agent.run();
67
+ console.log('Result:', history.final_result());
68
+ console.log('Success:', history.is_successful());
146
69
  ```
147
70
 
148
- ## Advanced Usage
71
+ ```bash
72
+ npx tsx example.ts
73
+ ```
149
74
 
150
- ### Vision/Multimodal Support
75
+ ### Use the CLI
151
76
 
152
- Enable vision capabilities to let the AI analyze screenshots:
77
+ ```bash
78
+ # Interactive mode
79
+ npx browser-use
153
80
 
154
- ```typescript
155
- import { Agent } from 'browser-use';
156
- import { ChatGoogle } from 'browser-use/llm/google';
81
+ # One-shot task
82
+ npx browser-use "Go to example.com and extract the page title"
157
83
 
158
- const llm = new ChatGoogle('gemini-2.5-flash');
84
+ # With specific model
85
+ npx browser-use --model claude-sonnet-4-20250514 -p "Search for AI news"
159
86
 
160
- const agent = new Agent({
161
- task: 'Describe what you see on this page and identify main visual elements',
162
- llm,
163
- use_vision: true,
164
- vision_detail_level: 'high', // 'auto' | 'low' | 'high'
165
- });
87
+ # Headless mode
88
+ npx browser-use --headless -p "Check the weather"
166
89
 
167
- const history = await agent.run(5);
90
+ # MCP server mode
91
+ npx browser-use --mcp
168
92
  ```
169
93
 
170
- ### Custom Actions with Controller Registry
171
-
172
- Extend the agent's capabilities with custom actions:
173
-
174
- ```typescript
175
- import { Controller, ActionResult } from 'browser-use';
176
- import { z } from 'zod';
177
-
178
- const controller = new Controller();
179
-
180
- controller.registry.action('Extract product information', {
181
- param_model: z.object({
182
- include_price: z.boolean().default(true),
183
- include_reviews: z.boolean().default(false),
184
- }),
185
- })(async function extract_product_info(params, { page }) {
186
- const productData = await page.evaluate(() => ({
187
- title: document.querySelector('h1')?.textContent ?? null,
188
- price: document.querySelector('.price')?.textContent ?? null,
189
- }));
94
+ ## 🏗️ Architecture
190
95
 
191
- return new ActionResult({
192
- extracted_content: JSON.stringify({ ...productData, ...params }),
193
- include_in_memory: true,
194
- });
195
- });
96
+ ```
97
+ ┌─────────────────────────────────────────────────────┐
98
+ │ Browser-Use │
99
+ ├─────────────────────────────────────────────────────┤
100
+ │ Agent ← MessageManager ← LLM Providers │
101
+ │ ↓ │
102
+ │ Controller → Action Registry → BrowserSession │
103
+ │ ↓ │
104
+ │ DomService │
105
+ └─────────────────────────────────────────────────────┘
196
106
  ```
197
107
 
198
- ### FileSystem Operations
199
-
200
- Built-in file system support with PDF parsing:
108
+ | Component | Description |
109
+ | ------------------ | ---------------------------------------------------------------------- |
110
+ | **Agent** | Central orchestrator runs the observe → think → act loop |
111
+ | **Controller** | Manages action registration and execution via Registry |
112
+ | **BrowserSession** | Playwright wrapper — browser lifecycle, tab management, screenshots |
113
+ | **DomService** | Extracts interactive elements with indexed mapping for LLM consumption |
114
+ | **MessageManager** | Manages LLM conversation history with token optimization |
115
+ | **LLM Providers** | Unified `BaseChatModel` interface across 10+ providers |
116
+
117
+ ### How It Works
118
+
119
+ 1. **Agent** receives a natural language task
120
+ 2. **DomService** extracts the current page state (interactive elements + optional screenshot)
121
+ 3. **LLM** analyzes the state and returns actions to take
122
+ 4. **Controller** validates and executes actions through the **Registry**
123
+ 5. Results feed back to the LLM for the next step
124
+ 6. Loop continues until `done` action or `max_steps`
125
+
126
+ ## 🔌 LLM Providers
127
+
128
+ | Provider | Import | Vision | Notes |
129
+ | ----------------- | ---------------------------- | ------ | --------------------------------------------- |
130
+ | **OpenAI** | `browser-use/llm/openai` | ✅ | Default provider, reasoning models (o1/o3/o4) |
131
+ | **Anthropic** | `browser-use/llm/anthropic` | ✅ | Prompt caching support |
132
+ | **Google Gemini** | `browser-use/llm/google` | ✅ | Extended thinking support |
133
+ | **Azure OpenAI** | `browser-use/llm/azure` | ✅ | Enterprise deployment |
134
+ | **AWS Bedrock** | `browser-use/llm/aws` | ✅ | Claude via AWS |
135
+ | **Groq** | `browser-use/llm/groq` | ❌ | Fastest inference |
136
+ | **Ollama** | `browser-use/llm/ollama` | ❌ | Local/self-hosted models |
137
+ | **DeepSeek** | `browser-use/llm/deepseek` | ❌ | Cost-effective |
138
+ | **OpenRouter** | `browser-use/llm/openrouter` | Varies | Multi-model routing |
139
+ | **Mistral** | `browser-use/llm/mistral` | Varies | Mistral models |
140
+ | **Cerebras** | `browser-use/llm/cerebras` | ❌ | Fast inference |
141
+
142
+ <details>
143
+ <summary>Provider examples</summary>
201
144
 
202
145
  ```typescript
203
- import { Agent } from 'browser-use';
146
+ // OpenAI
204
147
  import { ChatOpenAI } from 'browser-use/llm/openai';
205
-
206
- const agent = new Agent({
207
- task: 'Download the PDF and extract text from page 1',
208
- llm: new ChatOpenAI(),
209
- file_system_path: './agent-workspace',
148
+ const llm = new ChatOpenAI({
149
+ model: 'gpt-4o',
150
+ apiKey: process.env.OPENAI_API_KEY,
210
151
  });
211
152
 
212
- // FileSystem actions are available:
213
- // - read_file: Read file contents (supports PDF)
214
- // - write_file: Write content to file
215
- // - replace_file_str: Replace text in file
216
- ```
217
-
218
- ### Browser Profile Configuration
219
-
220
- Customize browser behavior with profiles:
221
-
222
- ```typescript
223
- import { BrowserProfile, BrowserSession } from 'browser-use';
224
-
225
- const profile = new BrowserProfile({
226
- window_size: { width: 1920, height: 1080 },
227
- disable_security: false,
228
- headless: true,
229
- chromium_sandbox: true, // Keep enabled by default in production
230
- args: ['--disable-blink-features=AutomationControlled'],
231
- wait_for_network_idle_page_load_time: 3, // seconds
232
- allowed_domains: ['example.com', '*.google.com'],
233
- cookies_file: './cookies.json',
234
- downloads_path: './downloads',
235
- highlight_elements: false, // Visual debugging
236
- viewport_expansion: 0, // Expand viewport for element detection
153
+ // Anthropic
154
+ import { ChatAnthropic } from 'browser-use/llm/anthropic';
155
+ const llm = new ChatAnthropic({
156
+ model: 'claude-sonnet-4-20250514',
157
+ apiKey: process.env.ANTHROPIC_API_KEY,
237
158
  });
238
159
 
239
- const browserSession = new BrowserSession({
240
- browser_profile: profile,
241
- });
160
+ // Google Gemini
161
+ import { ChatGoogle } from 'browser-use/llm/google';
162
+ const llm = new ChatGoogle('gemini-2.5-flash');
163
+
164
+ // Ollama (local)
165
+ import { ChatOllama } from 'browser-use/llm/ollama';
166
+ const llm = new ChatOllama('llama3', 'http://localhost:11434');
242
167
 
243
- await browserSession.start();
168
+ // OpenAI Reasoning Models
169
+ const llm = new ChatOpenAI({ model: 'o3-mini', reasoningEffort: 'medium' });
244
170
  ```
245
171
 
246
- If Chromium launch fails with `No usable sandbox` (common in restricted Linux CI),
247
- `BrowserSession` automatically retries once with `chromium_sandbox: false` and logs
248
- a warning. For deterministic CI behavior, set `chromium_sandbox: false` explicitly.
172
+ </details>
249
173
 
250
- ### MCP (Model Context Protocol) Integration
174
+ ## 🎯 Code Examples
251
175
 
252
- Connect to MCP servers for extended capabilities:
176
+ ### Data Extraction
253
177
 
254
178
  ```typescript
255
- import { MCPController } from 'browser-use';
256
-
257
- const mcpController = new MCPController();
258
-
259
- // Add MCP server
260
- await mcpController.addServer('my-server', 'npx', [
261
- '-y',
262
- '@modelcontextprotocol/server-filesystem',
263
- '/path/to/data',
264
- ]);
179
+ const agent = new Agent({
180
+ task: `Go to amazon.com, search for "wireless keyboard",
181
+ extract the name, price, and rating of the first 5 products as JSON`,
182
+ llm,
183
+ use_vision: true,
184
+ });
265
185
 
266
- // MCP tools are automatically available to the agent
267
- const tools = await mcpController.listAllTools();
268
- console.log('Available MCP tools:', tools);
186
+ const history = await agent.run(30);
187
+ console.log(history.final_result());
269
188
  ```
270
189
 
271
- ### Gmail Integration
272
-
273
- Built-in Gmail API support:
190
+ ### Form Filling with Sensitive Data
274
191
 
275
192
  ```typescript
276
- import { GmailService } from 'browser-use';
277
-
278
- // Gmail actions are automatically available:
279
- // - get_recent_emails: Fetch recent emails
280
- // - send_email: Send email via Gmail API
281
-
282
193
  const agent = new Agent({
283
- task: 'Check my last 5 emails and summarize them',
284
- llm: new ChatOpenAI(),
285
- // Gmail credentials loaded from config files (or explicit GmailService options)
194
+ task: 'Login to the dashboard',
195
+ llm,
196
+ sensitive_data: {
197
+ '*.example.com': {
198
+ username: process.env.SITE_USERNAME!,
199
+ password: process.env.SITE_PASSWORD!,
200
+ },
201
+ },
202
+ browser_session: new BrowserSession({
203
+ browser_profile: new BrowserProfile({
204
+ allowed_domains: ['*.example.com'],
205
+ }),
206
+ }),
286
207
  });
287
208
  ```
288
209
 
289
- ## Configuration
290
-
291
- ### Environment Variables
292
-
293
- ```bash
294
- # LLM Configuration (provider-specific)
295
- OPENAI_API_KEY=your-openai-key
296
- ANTHROPIC_API_KEY=your-anthropic-key
297
- GOOGLE_API_KEY=your-google-key
298
- AWS_ACCESS_KEY_ID=your-aws-key
299
- AWS_SECRET_ACCESS_KEY=your-aws-secret
300
- AZURE_OPENAI_API_KEY=your-azure-key
301
- AZURE_OPENAI_ENDPOINT=your-azure-endpoint
302
- GROQ_API_KEY=your-groq-key
303
- DEEPSEEK_API_KEY=your-deepseek-key
304
-
305
- # Browser Configuration
306
- BROWSER_USE_HEADLESS=true
307
- BROWSER_USE_ALLOWED_DOMAINS=example.com,*.trusted.org
308
- IN_DOCKER=true
309
-
310
- # Logging Configuration
311
- BROWSER_USE_LOGGING_LEVEL=info # debug, info, warning, error
312
-
313
- # Telemetry (optional)
314
- ANONYMIZED_TELEMETRY=false
315
-
316
- # Observability (optional)
317
- LMNR_API_KEY=your-lmnr-key
318
- ```
319
-
320
- ### Agent Configuration
210
+ ### Custom Actions
321
211
 
322
212
  ```typescript
323
- interface AgentOptions {
324
- // Vision/multimodal
325
- use_vision?: boolean;
326
- vision_detail_level?: 'low' | 'high' | 'auto';
327
-
328
- // Error handling
329
- max_failures?: number; // default: 3
330
- retry_delay?: number; // seconds, default: 10
331
- max_actions_per_step?: number; // default: 10
332
-
333
- // Persistence / output
334
- save_conversation_path?: string | null;
335
- file_system_path?: string | null;
336
- validate_output?: boolean;
337
- include_attributes?: string[];
338
-
339
- // Runtime limits (seconds)
340
- llm_timeout?: number; // default: 60
341
- step_timeout?: number; // default: 180
342
- }
343
-
344
- // Max step count is configured per run call:
345
- await agent.run(100);
346
- ```
347
-
348
- ## Supported LLM Providers
349
-
350
- ### OpenAI
213
+ import { Controller, ActionResult } from 'browser-use';
214
+ import { z } from 'zod';
351
215
 
352
- ```typescript
353
- import { ChatOpenAI } from 'browser-use/llm/openai';
216
+ const controller = new Controller();
354
217
 
355
- const llm = new ChatOpenAI({
356
- model: 'gpt-4o', // or 'gpt-4', 'gpt-3.5-turbo'
357
- apiKey: process.env.OPENAI_API_KEY,
358
- temperature: 0.1,
359
- maxTokens: 4096,
218
+ controller.registry.action('Save screenshot to file', {
219
+ param_model: z.object({
220
+ filename: z.string().describe('Output filename'),
221
+ }),
222
+ })(async function save_screenshot(params, ctx) {
223
+ const screenshot = await ctx.page.screenshot();
224
+ fs.writeFileSync(`./screenshots/${params.filename}`, screenshot);
225
+ return new ActionResult({
226
+ extracted_content: `Screenshot saved as ${params.filename}`,
227
+ });
360
228
  });
229
+
230
+ const agent = new Agent({ task: '...', llm, controller });
361
231
  ```
362
232
 
363
- ### Anthropic Claude
233
+ ### Vision Mode & Session Recording
364
234
 
365
235
  ```typescript
366
- import { ChatAnthropic } from 'browser-use/llm/anthropic';
367
-
368
- const llm = new ChatAnthropic({
369
- model: 'claude-3-5-sonnet-20241022', // or other Claude models
370
- apiKey: process.env.ANTHROPIC_API_KEY,
371
- temperature: 0.1,
236
+ const agent = new Agent({
237
+ task: 'Navigate to hacker news and summarize the top stories',
238
+ llm,
239
+ use_vision: true,
240
+ vision_detail_level: 'high', // 'auto' | 'low' | 'high'
241
+ generate_gif: './session.gif',
372
242
  });
373
243
  ```
374
244
 
375
- ### Google Gemini
245
+ ### Multi-Tab Workflows
376
246
 
377
247
  ```typescript
378
- import { ChatGoogle } from 'browser-use/llm/google';
379
-
380
- const llm = new ChatGoogle('gemini-2.5-flash');
381
- // Configure GOOGLE_API_KEY in env. Optional:
382
- // GOOGLE_API_BASE_URL / GOOGLE_API_VERSION
248
+ const agent = new Agent({
249
+ task: `Compare "Sony WH-1000XM5" prices:
250
+ 1. Open amazon.com and search for the product
251
+ 2. Open bestbuy.com in a new tab and search
252
+ 3. Provide a comparison summary`,
253
+ llm,
254
+ use_vision: true,
255
+ });
383
256
  ```
384
257
 
385
- ### AWS Bedrock
258
+ ### Event System
386
259
 
387
260
  ```typescript
388
- import { ChatAnthropicBedrock } from 'browser-use/llm/aws';
261
+ const agent = new Agent({ task: '...', llm });
389
262
 
390
- const llm = new ChatAnthropicBedrock({
391
- model: 'anthropic.claude-3-5-sonnet-20241022-v2:0',
392
- region: 'us-east-1',
393
- max_tokens: 4096,
263
+ agent.eventbus.on('CreateAgentStepEvent', (event) => {
264
+ console.log('Step completed:', event.step_id);
394
265
  });
395
- ```
396
-
397
- ### Azure OpenAI
398
266
 
399
- ```typescript
400
- import { ChatAzure } from 'browser-use/llm/azure';
401
-
402
- const llm = new ChatAzure('gpt-4o');
403
- // Configure AZURE_OPENAI_API_KEY, AZURE_OPENAI_ENDPOINT, AZURE_OPENAI_API_VERSION in env.
267
+ await agent.run();
404
268
  ```
405
269
 
406
- ### DeepSeek
407
-
408
- ```typescript
409
- import { ChatDeepSeek } from 'browser-use/llm/deepseek';
410
-
411
- const llm = new ChatDeepSeek('deepseek-chat');
412
- ```
270
+ ## ⚙️ Configuration
413
271
 
414
- ### Groq
272
+ ### Agent Options
415
273
 
416
274
  ```typescript
417
- import { ChatGroq } from 'browser-use/llm/groq';
275
+ const agent = new Agent({
276
+ task: 'Your task',
277
+ llm,
278
+ use_vision: true, // Enable screenshot analysis
279
+ max_actions_per_step: 5, // Actions per LLM call
280
+ max_failures: 3, // Max retries on failure
281
+ generate_gif: './recording.gif', // Session recording
282
+ validate_output: true, // Strict output validation
283
+ use_thinking: true, // Extended thinking prompts
284
+ llm_timeout: 60, // LLM call timeout (seconds)
285
+ step_timeout: 180, // Step timeout (seconds)
286
+ extend_system_message: 'Be concise', // Custom prompt additions
287
+ });
418
288
 
419
- const llm = new ChatGroq('mixtral-8x7b-32768');
289
+ const history = await agent.run(50); // Max 50 steps
420
290
  ```
421
291
 
422
- ### Ollama (Local)
292
+ ### Browser Profile
423
293
 
424
294
  ```typescript
425
- import { ChatOllama } from 'browser-use/llm/ollama';
426
-
427
- const llm = new ChatOllama('llama3.1', 'http://localhost:11434');
428
- ```
429
-
430
- ### OpenRouter
295
+ import { BrowserProfile, BrowserSession } from 'browser-use';
431
296
 
432
- ```typescript
433
- import { ChatOpenRouter } from 'browser-use/llm/openrouter';
297
+ const profile = new BrowserProfile({
298
+ headless: true,
299
+ viewport: { width: 1920, height: 1080 },
300
+ user_data_dir: './my-profile', // Persistent sessions
301
+ allowed_domains: ['*.example.com'], // Domain restrictions
302
+ highlight_elements: true, // Visual debugging
303
+ proxy: { server: 'http://proxy:8080' },
304
+ });
434
305
 
435
- const llm = new ChatOpenRouter('anthropic/claude-3-opus');
306
+ const session = new BrowserSession({ browser_profile: profile });
307
+ const agent = new Agent({ task: '...', llm, browser_session: session });
436
308
  ```
437
309
 
438
- ## Available Actions
439
-
440
- The AI agent can perform these actions:
441
-
442
- ### Navigation
443
-
444
- - **search_google** - Search query in Google (web results only)
445
- - **go_to_url** - Navigate to a specific URL (with optional new tab)
446
-
447
- ### Element Interaction
448
-
449
- - **click_element** - Click buttons, links, or clickable elements by index
450
- - **input_text** - Type text into input fields and textareas by index
451
-
452
- ### Dropdown/Select
453
-
454
- - **dropdown_options** - Get available options from a dropdown
455
- - **select_dropdown** - Select option from dropdown by index
456
-
457
- ### Scrolling
458
-
459
- - **scroll** - Scroll page up/down by pixels or direction
460
- - **scroll_to_text** - Scroll to text content on page
461
-
462
- ### Tabs
463
-
464
- - **switch_tab** - Switch to different browser tab by index
465
- - **close_tab** - Close current or specific tab
466
-
467
- ### Keyboard
468
-
469
- - **send_keys** - Send keyboard input (Enter, Tab, Escape, etc.)
470
-
471
- ### Content Extraction
472
-
473
- - **extract_structured_data** - Extract specific data using LLM from page markdown
474
-
475
- ### FileSystem
476
-
477
- - **read_file** - Read file contents (supports PDF parsing)
478
- - **write_file** - Write content to file
479
- - **replace_file_str** - Replace string in file
480
-
481
- ### Google Sheets
482
-
483
- - **sheets_range** - Get cell range from Google Sheet
484
- - **sheets_update** - Update Google Sheet cells
485
- - **sheets_input** - Input data into Google Sheet
486
-
487
- ### Gmail
488
-
489
- - **get_recent_emails** - Fetch recent emails from Gmail
490
- - **send_email** - Send email via Gmail API
491
-
492
- ### Completion
493
-
494
- - **done** - Mark task as completed with optional structured output
310
+ ### Environment Variables
495
311
 
496
- ## Examples
312
+ | Variable | Description |
313
+ | ----------------------------- | ---------------------------------------------- |
314
+ | `OPENAI_API_KEY` | OpenAI API key |
315
+ | `ANTHROPIC_API_KEY` | Anthropic API key |
316
+ | `GOOGLE_API_KEY` | Google API key |
317
+ | `BROWSER_USE_HEADLESS` | Run browser headlessly (`true`/`false`) |
318
+ | `BROWSER_USE_LOGGING_LEVEL` | Log level: `debug`, `info`, `warning`, `error` |
319
+ | `BROWSER_USE_ALLOWED_DOMAINS` | Comma-separated domain allowlist |
320
+ | `ANONYMIZED_TELEMETRY` | Enable/disable anonymous telemetry |
497
321
 
498
- See the `/examples` directory for detailed examples:
322
+ > See [Configuration Guide](./docs/CONFIGURATION.md) for the full list.
499
323
 
500
- - `examples/simple-search.ts` - Basic web search automation
501
- - `examples/search-wikipedia.ts` - Wikipedia navigation with vision
502
- - `examples/test-vision.ts` - Vision/multimodal capabilities demo
503
- - `examples/test-filesystem.ts` - File operations and PDF parsing
504
- - `examples/openapi.ts` - Complex API documentation extraction
324
+ ## 🔌 MCP Server (Claude Desktop)
505
325
 
506
- ### Running Examples
326
+ Browser-Use can run as an [MCP](https://modelcontextprotocol.io/) server, exposing browser automation as tools for Claude Desktop:
507
327
 
508
328
  ```bash
509
- # Set your API key
510
- export OPENAI_API_KEY=your-key
511
- # or for Google
512
- export GOOGLE_API_KEY=your-key
513
-
514
- # Run an example
515
- npx tsx examples/simple-search.ts
329
+ npx browser-use --mcp
516
330
  ```
517
331
 
518
- ## Error Handling
519
-
520
- The library includes comprehensive error handling:
521
-
522
- ```typescript
523
- import { Agent, AgentError } from 'browser-use';
524
-
525
- try {
526
- const agent = new Agent({ task: 'Your task', llm });
527
- const history = await agent.run(10); // max 10 steps
528
-
529
- // Check completion status
530
- const lastStep = history.history[history.history.length - 1];
531
- if (lastStep?.result.is_done) {
532
- console.log('Task completed:', lastStep.result.extracted_content);
533
- } else {
534
- console.log('Task incomplete after max steps');
535
- }
536
- } catch (error) {
537
- if (error instanceof AgentError) {
538
- console.error('Agent error:', error.message);
539
- console.error('Failed at step:', error.step);
540
- } else {
541
- console.error('Unexpected error:', error);
332
+ Add to your Claude Desktop config (`~/Library/Application Support/Claude/claude_desktop_config.json`):
333
+
334
+ ```json
335
+ {
336
+ "mcpServers": {
337
+ "browser-use": {
338
+ "command": "npx",
339
+ "args": ["browser-use", "--mcp"],
340
+ "env": {
341
+ "OPENAI_API_KEY": "your-api-key"
342
+ }
343
+ }
542
344
  }
543
345
  }
544
346
  ```
545
347
 
546
- ## Development
547
-
548
- ### Building from Source
549
-
550
- ```bash
551
- git clone https://github.com/webllm/browser-use.git
552
- cd browser-use
553
- yarn install # Automatically installs Playwright browsers
554
- yarn build
555
- ```
556
-
557
- ### Running Tests
558
-
559
- ```bash
560
- # Run all tests
561
- yarn test
562
-
563
- # Run specific test
564
- yarn test test/integration-advanced.test.ts
565
-
566
- # Watch mode
567
- yarn test:watch
568
- ```
569
-
570
- ### Code Quality
571
-
572
- ```bash
573
- # Lint
574
- yarn lint
575
-
576
- # Format
577
- yarn prettier
578
-
579
- # Type check
580
- yarn build
581
- ```
582
-
583
- ## Architecture
584
-
585
- The library follows a modular, layered architecture:
586
-
587
- ```
588
- ┌─────────────────────────────────────────┐
589
- │ Agent (Orchestrator) │
590
- │ - Task execution & planning │
591
- │ - LLM message management │
592
- │ - Step execution loop │
593
- └─────────┬───────────────────────────────┘
594
-
595
- ┌─────────▼───────────────────────────────┐
596
- │ Controller (Actions) │
597
- │ - Action registry & execution │
598
- │ - Built-in actions (30+) │
599
- │ - Custom action support │
600
- └─────────┬───────────────────────────────┘
601
-
602
- ┌─────────▼───────────────────────────────┐
603
- │ BrowserSession (Browser) │
604
- │ - Playwright integration │
605
- │ - Tab & page management │
606
- │ - Navigation & interaction │
607
- └─────────┬───────────────────────────────┘
608
-
609
- ┌─────────▼───────────────────────────────┐
610
- │ DOMService (DOM Analysis) │
611
- │ - Element extraction │
612
- │ - Clickable element detection │
613
- │ - History tree processing │
614
- └──────────────────────────────────────────┘
615
-
616
- Supporting Services:
617
- ┌──────────────────────────────────────────┐
618
- │ - LLM Clients (10+ providers) │
619
- │ - FileSystem (with PDF support) │
620
- │ - Screenshot Service │
621
- │ - Token Tracking & Cost Calculation │
622
- │ - Telemetry (PostHog) │
623
- │ - Observability (LMNR) │
624
- │ - MCP Protocol Support │
625
- │ - Gmail/Sheets Integration │
626
- └──────────────────────────────────────────┘
627
- ```
628
-
629
- ### Key Components
348
+ Available MCP tools: `browser_run_task`, `browser_navigate`, `browser_click`, `browser_type`, `browser_scroll`, `browser_get_state`, `browser_extract`, `browser_screenshot`, `browser_close`.
630
349
 
631
- - **Agent**: High-level orchestrator managing task execution, LLM communication, and step-by-step planning
632
- - **Controller**: Action registry and executor with 30+ built-in actions and custom action support
633
- - **BrowserSession**: Browser lifecycle manager built on Playwright with tab management and state tracking
634
- - **DOMService**: Intelligent DOM analyzer extracting relevant elements for AI consumption
635
- - **MessageManager**: Manages conversation history with token optimization and context window management
636
- - **FileSystem**: File operations with PDF parsing and workspace management
637
- - **ScreenshotService**: Captures and manages screenshots for vision capabilities
638
- - **Registry**: Type-safe action registration system with Zod schema validation
350
+ > See [MCP Server Guide](./docs/MCP_SERVER.md) for more details.
639
351
 
640
- ## Token Usage & Cost Tracking
352
+ ## 🔒 Security
641
353
 
642
- The library automatically tracks token usage and calculates costs:
354
+ - **Sensitive Data Masking** — Credentials are automatically masked in logs and LLM context
355
+ - **Domain Restrictions** — Lock browser navigation to trusted domains
356
+ - **Domain-scoped Secrets** — Credentials are only injected on matching domains
357
+ - **Hard Safety Gate** — `sensitive_data` requires `allowed_domains` by default
358
+ - **Chromium Sandbox** — Enabled by default for production security
643
359
 
644
360
  ```typescript
645
- import { TokenCost } from 'browser-use';
646
-
647
- const agent = new Agent({ task: 'Your task', llm });
648
- const history = await agent.run();
649
-
650
- // Get token statistics
651
- const stats = history.stats();
652
- console.log(
653
- 'Total tokens:',
654
- stats.total_input_tokens + stats.total_output_tokens
655
- );
656
- console.log('Steps:', stats.n_steps);
657
-
658
- // Calculate cost (if pricing data available)
659
- const cost = TokenCost.calculate(history);
660
- console.log('Estimated cost: $', cost.toFixed(4));
661
- ```
662
-
663
- ## Screenshot & History Export
664
-
665
- Generate GIF animations from agent execution history:
666
-
667
- ```typescript
668
- import { create_history_gif } from 'browser-use';
669
-
670
- const history = await agent.run();
671
-
672
- await create_history_gif('My automation task', history, {
673
- output_path: 'agent-history.gif',
674
- duration: 3000, // ms per frame
675
- show_goals: true,
676
- show_task: true,
677
- show_logo: false,
361
+ const agent = new Agent({
362
+ task: 'Login and fetch invoices',
363
+ llm,
364
+ sensitive_data: {
365
+ '*.example.com': {
366
+ username: process.env.USERNAME!,
367
+ password: process.env.PASSWORD!,
368
+ },
369
+ },
370
+ browser_session: new BrowserSession({
371
+ browser_profile: new BrowserProfile({
372
+ allowed_domains: ['*.example.com'],
373
+ }),
374
+ }),
678
375
  });
679
-
680
- console.log('Created agent-history.gif');
681
376
  ```
682
377
 
683
- ## Observability
684
-
685
- Built-in observability with LMNR (Laminar) and custom debugging:
378
+ > See [Security Guide](./docs/SECURITY.md) for production deployment best practices.
686
379
 
687
- ```typescript
688
- import { observe, observe_debug } from 'browser-use';
689
-
690
- // Automatic tracing (if LMNR_API_KEY set)
691
- // All agent operations are automatically traced
380
+ ## 📚 Documentation
692
381
 
693
- // Custom debug observations
694
- @observe_debug({ name: 'my_custom_operation' })
695
- async function myFunction() {
696
- // Function execution is logged and timed
697
- }
698
- ```
382
+ | Document | Description |
383
+ | ---------------------------------------- | ------------------------------------ |
384
+ | [Quick Start](./docs/QUICKSTART.md) | Get started in 5 minutes |
385
+ | [Architecture](./docs/ARCHITECTURE.md) | System design and component overview |
386
+ | [API Reference](./docs/API_REFERENCE.md) | Complete API documentation |
387
+ | [Configuration](./docs/CONFIGURATION.md) | All configuration options |
388
+ | [LLM Providers](./docs/LLM_PROVIDERS.md) | Provider setup and comparison |
389
+ | [Actions](./docs/ACTIONS.md) | Built-in and custom actions |
390
+ | [MCP Server](./docs/MCP_SERVER.md) | MCP integration guide |
391
+ | [Security](./docs/SECURITY.md) | Security best practices |
392
+ | [Examples](./docs/EXAMPLES.md) | More code examples |
393
+ | [Contributing](./docs/CONTRIBUTING.md) | Contribution guidelines |
699
394
 
700
- ## Contributing
395
+ ## 🛠️ Development
701
396
 
702
- Contributions are welcome! Please feel free to submit a Pull Request.
703
-
704
- 1. Fork the repository
705
- 2. Create your feature branch (`git checkout -b feature/amazing-feature`)
706
- 3. Commit your changes (`git commit -m 'feat: add amazing feature'`)
707
- 4. Push to the branch (`git push origin feature/amazing-feature`)
708
- 5. Open a Pull Request
709
-
710
- ## Support
711
-
712
- - 📚 [Documentation](https://github.com/webllm/browser-use)
713
- - 🐛 [Issue Tracker](https://github.com/webllm/browser-use/issues)
714
- - 💬 [Discussions](https://github.com/webllm/browser-use/discussions)
715
-
716
- ## Acknowledgments
717
-
718
- ### Original Project
719
-
720
- This TypeScript implementation would not exist without the groundbreaking work of the original **[browser-use](https://github.com/browser-use/browser-use)** Python library:
721
-
722
- - 🎯 **Original Project**: [browser-use/browser-use](https://github.com/browser-use/browser-use) (Python)
723
- - 👏 **Created by**: The browser-use team and contributors
724
- - 💡 **Inspiration**: All architectural decisions, agent design patterns, and innovative approaches come from the original Python implementation
725
-
726
- We are deeply grateful to the original authors for creating such an elegant and powerful solution for AI-driven browser automation. This TypeScript port aims to faithfully replicate their excellent work for the JavaScript/TypeScript community.
727
-
728
- ### Key Differences from Python Version
729
-
730
- While we strive to maintain feature parity with the Python version, there are some differences due to platform constraints:
731
-
732
- - **Runtime**: Node.js/Deno/Bun instead of Python
733
- - **Type System**: TypeScript's structural typing vs Python's duck typing
734
- - **Async Model**: JavaScript Promises vs Python async/await (similar but different)
735
- - **Ecosystem**: npm packages vs PyPI packages
397
+ ```bash
398
+ # Install dependencies
399
+ npm install
736
400
 
737
- ### Technology Stack
401
+ # Build
402
+ npm run build
738
403
 
739
- This project is built with:
404
+ # Run tests
405
+ npm test
740
406
 
741
- - [Playwright](https://playwright.dev/) - Browser automation framework
742
- - [Zod](https://zod.dev/) - TypeScript-first schema validation
743
- - [OpenAI](https://openai.com/), [Anthropic](https://anthropic.com/), [Google](https://ai.google.dev/) - LLM providers
744
- - And many other excellent open-source libraries
407
+ # Lint & format
408
+ npm run lint
409
+ npm run prettier
745
410
 
746
- ### Community
411
+ # Type checking
412
+ npm run typecheck
747
413
 
748
- - 🌟 **Star the original Python project**: [browser-use/browser-use](https://github.com/browser-use/browser-use)
749
- - 🌟 **Star this TypeScript port**: [webllm/browser-use](https://github.com/webllm/browser-use)
750
- - 💬 **Join the community**: Share your use cases and contribute to both projects!
414
+ # Run an example
415
+ npx tsx examples/simple-search.ts
416
+ ```
751
417
 
752
- ## Related Projects
418
+ ## Requirements
753
419
 
754
- - 🐍 [browser-use (Python)](https://github.com/browser-use/browser-use) - The original and official implementation
755
- - 🎭 [Playwright](https://playwright.dev/) - The browser automation foundation
756
- - 🤖 [LangChain](https://www.langchain.com/) - LLM application framework
757
- - 🦜 [Laminar](https://laminar.run/) - LLM observability platform
420
+ - **Node.js** >= 18.0.0
421
+ - **LLM API Key** At least one supported provider
422
+ - **Playwright** Installed automatically as a dependency
758
423
 
759
- ## License
424
+ ## 📄 License
760
425
 
761
- MIT License - see [LICENSE](LICENSE) for details.
426
+ [MIT](./LICENSE) © Web LLM