browser-use 0.2.0 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (259) hide show
  1. package/README.md +295 -686
  2. package/dist/actor/element.d.ts +19 -0
  3. package/dist/actor/element.js +46 -0
  4. package/dist/actor/index.d.ts +4 -0
  5. package/dist/actor/index.js +4 -0
  6. package/dist/actor/mouse.d.ts +19 -0
  7. package/dist/actor/mouse.js +39 -0
  8. package/dist/actor/page.d.ts +29 -0
  9. package/dist/actor/page.js +88 -0
  10. package/dist/actor/utils.d.ts +4 -0
  11. package/dist/actor/utils.js +35 -0
  12. package/dist/agent/cloud-events.d.ts +18 -0
  13. package/dist/agent/cloud-events.js +65 -2
  14. package/dist/agent/gif.d.ts +1 -0
  15. package/dist/agent/gif.js +24 -2
  16. package/dist/agent/judge.d.ts +17 -0
  17. package/dist/agent/judge.js +197 -0
  18. package/dist/agent/message-manager/service.d.ts +12 -4
  19. package/dist/agent/message-manager/service.js +205 -39
  20. package/dist/agent/message-manager/utils.js +0 -1
  21. package/dist/agent/message-manager/views.d.ts +4 -0
  22. package/dist/agent/message-manager/views.js +11 -7
  23. package/dist/agent/prompts.d.ts +24 -3
  24. package/dist/agent/prompts.js +274 -59
  25. package/dist/agent/service.d.ts +103 -41
  26. package/dist/agent/service.js +2336 -472
  27. package/dist/agent/variable-detector.d.ts +12 -0
  28. package/dist/agent/variable-detector.js +211 -0
  29. package/dist/agent/views.d.ts +237 -18
  30. package/dist/agent/views.js +446 -33
  31. package/dist/browser/cloud/cloud.d.ts +20 -0
  32. package/dist/browser/cloud/cloud.js +129 -0
  33. package/dist/browser/cloud/index.d.ts +2 -0
  34. package/dist/browser/cloud/index.js +2 -0
  35. package/dist/browser/cloud/views.d.ts +41 -0
  36. package/dist/browser/cloud/views.js +35 -0
  37. package/dist/browser/events.d.ts +345 -0
  38. package/dist/browser/events.js +566 -0
  39. package/dist/browser/extensions.js +17 -17
  40. package/dist/browser/index.d.ts +4 -0
  41. package/dist/browser/index.js +4 -0
  42. package/dist/browser/profile.d.ts +10 -4
  43. package/dist/browser/profile.js +79 -12
  44. package/dist/browser/session-manager.d.ts +85 -0
  45. package/dist/browser/session-manager.js +208 -0
  46. package/dist/browser/session.d.ts +105 -9
  47. package/dist/browser/session.js +1166 -95
  48. package/dist/browser/types.d.ts +153 -156
  49. package/dist/browser/views.d.ts +39 -0
  50. package/dist/browser/views.js +32 -0
  51. package/dist/browser/watchdogs/aboutblank-watchdog.d.ts +12 -0
  52. package/dist/browser/watchdogs/aboutblank-watchdog.js +131 -0
  53. package/dist/browser/watchdogs/base.d.ts +21 -0
  54. package/dist/browser/watchdogs/base.js +81 -0
  55. package/dist/browser/watchdogs/cdp-session-watchdog.d.ts +14 -0
  56. package/dist/browser/watchdogs/cdp-session-watchdog.js +177 -0
  57. package/dist/browser/watchdogs/crash-watchdog.d.ts +38 -0
  58. package/dist/browser/watchdogs/crash-watchdog.js +296 -0
  59. package/dist/browser/watchdogs/default-action-watchdog.d.ts +49 -0
  60. package/dist/browser/watchdogs/default-action-watchdog.js +212 -0
  61. package/dist/browser/watchdogs/dom-watchdog.d.ts +8 -0
  62. package/dist/browser/watchdogs/dom-watchdog.js +31 -0
  63. package/dist/browser/watchdogs/downloads-watchdog.d.ts +77 -0
  64. package/dist/browser/watchdogs/downloads-watchdog.js +409 -0
  65. package/dist/browser/watchdogs/har-recording-watchdog.d.ts +19 -0
  66. package/dist/browser/watchdogs/har-recording-watchdog.js +317 -0
  67. package/dist/browser/watchdogs/index.d.ts +15 -0
  68. package/dist/browser/watchdogs/index.js +15 -0
  69. package/dist/browser/watchdogs/local-browser-watchdog.d.ts +10 -0
  70. package/dist/browser/watchdogs/local-browser-watchdog.js +32 -0
  71. package/dist/browser/watchdogs/permissions-watchdog.d.ts +8 -0
  72. package/dist/browser/watchdogs/permissions-watchdog.js +73 -0
  73. package/dist/browser/watchdogs/popups-watchdog.d.ts +13 -0
  74. package/dist/browser/watchdogs/popups-watchdog.js +77 -0
  75. package/dist/browser/watchdogs/recording-watchdog.d.ts +27 -0
  76. package/dist/browser/watchdogs/recording-watchdog.js +249 -0
  77. package/dist/browser/watchdogs/screenshot-watchdog.d.ts +6 -0
  78. package/dist/browser/watchdogs/screenshot-watchdog.js +13 -0
  79. package/dist/browser/watchdogs/security-watchdog.d.ts +10 -0
  80. package/dist/browser/watchdogs/security-watchdog.js +84 -0
  81. package/dist/browser/watchdogs/storage-state-watchdog.d.ts +24 -0
  82. package/dist/browser/watchdogs/storage-state-watchdog.js +288 -0
  83. package/dist/cli.d.ts +7 -2
  84. package/dist/cli.js +182 -25
  85. package/dist/code-use/formatting.d.ts +3 -0
  86. package/dist/code-use/formatting.js +18 -0
  87. package/dist/code-use/index.d.ts +6 -0
  88. package/dist/code-use/index.js +6 -0
  89. package/dist/code-use/namespace.d.ts +5 -0
  90. package/dist/code-use/namespace.js +81 -0
  91. package/dist/code-use/notebook-export.d.ts +3 -0
  92. package/dist/code-use/notebook-export.js +56 -0
  93. package/dist/code-use/service.d.ts +24 -0
  94. package/dist/code-use/service.js +104 -0
  95. package/dist/code-use/utils.d.ts +4 -0
  96. package/dist/code-use/utils.js +98 -0
  97. package/dist/code-use/views.d.ts +108 -0
  98. package/dist/code-use/views.js +165 -0
  99. package/dist/config.d.ts +15 -0
  100. package/dist/config.js +109 -7
  101. package/dist/controller/registry/service.d.ts +10 -1
  102. package/dist/controller/registry/service.js +266 -10
  103. package/dist/controller/registry/views.d.ts +4 -1
  104. package/dist/controller/registry/views.js +25 -2
  105. package/dist/controller/service.d.ts +10 -1
  106. package/dist/controller/service.js +1814 -268
  107. package/dist/controller/views.d.ts +78 -155
  108. package/dist/controller/views.js +61 -12
  109. package/dist/dom/history-tree-processor/service.d.ts +5 -0
  110. package/dist/dom/history-tree-processor/service.js +169 -14
  111. package/dist/dom/history-tree-processor/view.d.ts +7 -1
  112. package/dist/dom/history-tree-processor/view.js +10 -1
  113. package/dist/dom/markdown-extractor.d.ts +37 -0
  114. package/dist/dom/markdown-extractor.js +345 -0
  115. package/dist/dom/service.d.ts +3 -1
  116. package/dist/dom/service.js +76 -0
  117. package/dist/dom/views.d.ts +1 -0
  118. package/dist/dom/views.js +45 -0
  119. package/dist/event-bus.d.ts +107 -7
  120. package/dist/event-bus.js +313 -10
  121. package/dist/exceptions.d.ts +0 -3
  122. package/dist/exceptions.js +0 -7
  123. package/dist/filesystem/file-system.d.ts +18 -0
  124. package/dist/filesystem/file-system.js +503 -42
  125. package/dist/index.d.ts +7 -0
  126. package/dist/index.js +6 -0
  127. package/dist/integrations/gmail/actions.d.ts +3 -3
  128. package/dist/integrations/gmail/actions.js +4 -4
  129. package/dist/llm/anthropic/chat.d.ts +18 -1
  130. package/dist/llm/anthropic/chat.js +123 -55
  131. package/dist/llm/anthropic/serializer.d.ts +2 -0
  132. package/dist/llm/anthropic/serializer.js +81 -9
  133. package/dist/llm/aws/chat-anthropic.d.ts +17 -0
  134. package/dist/llm/aws/chat-anthropic.js +126 -26
  135. package/dist/llm/aws/chat-bedrock.d.ts +28 -1
  136. package/dist/llm/aws/chat-bedrock.js +161 -34
  137. package/dist/llm/aws/serializer.d.ts +13 -1
  138. package/dist/llm/aws/serializer.js +56 -17
  139. package/dist/llm/azure/chat.d.ts +53 -2
  140. package/dist/llm/azure/chat.js +366 -54
  141. package/dist/llm/base.d.ts +2 -0
  142. package/dist/llm/browser-use/chat.d.ts +40 -0
  143. package/dist/llm/browser-use/chat.js +305 -0
  144. package/dist/llm/browser-use/index.d.ts +1 -0
  145. package/dist/llm/browser-use/index.js +1 -0
  146. package/dist/llm/cerebras/chat.d.ts +39 -0
  147. package/dist/llm/cerebras/chat.js +178 -0
  148. package/dist/llm/cerebras/index.d.ts +2 -0
  149. package/dist/llm/cerebras/index.js +2 -0
  150. package/dist/llm/cerebras/serializer.d.ts +7 -0
  151. package/dist/llm/cerebras/serializer.js +82 -0
  152. package/dist/llm/deepseek/chat.d.ts +19 -2
  153. package/dist/llm/deepseek/chat.js +138 -25
  154. package/dist/llm/google/chat.d.ts +46 -2
  155. package/dist/llm/google/chat.js +267 -64
  156. package/dist/llm/google/serializer.d.ts +9 -1
  157. package/dist/llm/google/serializer.js +141 -34
  158. package/dist/llm/groq/chat.d.ts +21 -2
  159. package/dist/llm/groq/chat.js +125 -26
  160. package/dist/llm/groq/parser.js +3 -1
  161. package/dist/llm/mistral/chat.d.ts +43 -0
  162. package/dist/llm/mistral/chat.js +154 -0
  163. package/dist/llm/mistral/index.d.ts +2 -0
  164. package/dist/llm/mistral/index.js +2 -0
  165. package/dist/llm/mistral/schema.d.ts +8 -0
  166. package/dist/llm/mistral/schema.js +27 -0
  167. package/dist/llm/models.d.ts +2 -0
  168. package/dist/llm/models.js +317 -0
  169. package/dist/llm/ollama/chat.d.ts +13 -1
  170. package/dist/llm/ollama/chat.js +110 -19
  171. package/dist/llm/ollama/serializer.d.ts +1 -0
  172. package/dist/llm/ollama/serializer.js +34 -12
  173. package/dist/llm/openai/chat.d.ts +16 -0
  174. package/dist/llm/openai/chat.js +94 -44
  175. package/dist/llm/openai/like.d.ts +5 -3
  176. package/dist/llm/openai/like.js +7 -3
  177. package/dist/llm/openai/responses-serializer.d.ts +18 -0
  178. package/dist/llm/openai/responses-serializer.js +72 -0
  179. package/dist/llm/openrouter/chat.d.ts +28 -2
  180. package/dist/llm/openrouter/chat.js +115 -29
  181. package/dist/llm/schema.d.ts +11 -1
  182. package/dist/llm/schema.js +109 -4
  183. package/dist/llm/vercel/chat.d.ts +50 -0
  184. package/dist/llm/vercel/chat.js +276 -0
  185. package/dist/llm/vercel/index.d.ts +1 -0
  186. package/dist/llm/vercel/index.js +1 -0
  187. package/dist/llm/vercel/serializer.d.ts +5 -0
  188. package/dist/llm/vercel/serializer.js +7 -0
  189. package/dist/llm/views.d.ts +2 -1
  190. package/dist/llm/views.js +3 -1
  191. package/dist/logging-config.d.ts +2 -0
  192. package/dist/logging-config.js +82 -29
  193. package/dist/mcp/client.d.ts +10 -5
  194. package/dist/mcp/client.js +14 -9
  195. package/dist/mcp/controller.d.ts +42 -3
  196. package/dist/mcp/controller.js +56 -31
  197. package/dist/mcp/server.d.ts +15 -0
  198. package/dist/mcp/server.js +261 -52
  199. package/dist/observability.js +10 -4
  200. package/dist/sandbox/index.d.ts +2 -0
  201. package/dist/sandbox/index.js +2 -0
  202. package/dist/sandbox/sandbox.d.ts +19 -0
  203. package/dist/sandbox/sandbox.js +140 -0
  204. package/dist/sandbox/views.d.ts +67 -0
  205. package/dist/sandbox/views.js +121 -0
  206. package/dist/skill-cli/index.d.ts +3 -0
  207. package/dist/skill-cli/index.js +3 -0
  208. package/dist/skill-cli/protocol.d.ts +30 -0
  209. package/dist/skill-cli/protocol.js +48 -0
  210. package/dist/skill-cli/server.d.ts +11 -0
  211. package/dist/skill-cli/server.js +85 -0
  212. package/dist/skill-cli/sessions.d.ts +24 -0
  213. package/dist/skill-cli/sessions.js +47 -0
  214. package/dist/skills/index.d.ts +3 -0
  215. package/dist/skills/index.js +3 -0
  216. package/dist/skills/service.d.ts +27 -0
  217. package/dist/skills/service.js +266 -0
  218. package/dist/skills/utils.d.ts +6 -0
  219. package/dist/skills/utils.js +53 -0
  220. package/dist/skills/views.d.ts +40 -0
  221. package/dist/skills/views.js +10 -0
  222. package/dist/sync/auth.js +8 -3
  223. package/dist/sync/service.d.ts +6 -6
  224. package/dist/sync/service.js +54 -89
  225. package/dist/telemetry/views.d.ts +20 -6
  226. package/dist/telemetry/views.js +23 -5
  227. package/dist/tokens/custom-pricing.d.ts +2 -0
  228. package/dist/tokens/custom-pricing.js +22 -0
  229. package/dist/tokens/index.d.ts +2 -0
  230. package/dist/tokens/index.js +2 -0
  231. package/dist/tokens/mappings.d.ts +1 -0
  232. package/dist/tokens/mappings.js +3 -0
  233. package/dist/tokens/service.js +27 -8
  234. package/dist/tools/extraction/index.d.ts +2 -0
  235. package/dist/tools/extraction/index.js +2 -0
  236. package/dist/tools/extraction/schema-utils.d.ts +6 -0
  237. package/dist/tools/extraction/schema-utils.js +237 -0
  238. package/dist/tools/extraction/views.d.ts +7 -0
  239. package/dist/tools/index.d.ts +5 -0
  240. package/dist/tools/index.js +5 -0
  241. package/dist/tools/registry/index.d.ts +2 -0
  242. package/dist/tools/registry/index.js +2 -0
  243. package/dist/tools/registry/service.d.ts +1 -0
  244. package/dist/tools/registry/service.js +1 -0
  245. package/dist/tools/registry/views.d.ts +1 -0
  246. package/dist/tools/registry/views.js +1 -0
  247. package/dist/tools/service.d.ts +2 -0
  248. package/dist/tools/service.js +1 -0
  249. package/dist/tools/utils.d.ts +2 -0
  250. package/dist/tools/utils.js +57 -0
  251. package/dist/tools/views.d.ts +1 -0
  252. package/dist/tools/views.js +1 -0
  253. package/dist/utils.d.ts +10 -1
  254. package/dist/utils.js +70 -3
  255. package/package.json +116 -49
  256. package/dist/dom/playground/process-dom.js +0 -5
  257. package/dist/dom/playground/test-accessibility.d.ts +0 -44
  258. package/dist/dom/playground/test-accessibility.js +0 -111
  259. /package/dist/{dom/playground/process-dom.d.ts → tools/extraction/views.js} +0 -0
package/README.md CHANGED
@@ -1,228 +1,213 @@
1
- # browser-use
2
-
3
- ![Node CI](https://github.com/webllm/browser-use/workflows/Node%20CI/badge.svg)
4
- [![npm](https://img.shields.io/npm/v/browser-use.svg)](https://www.npmjs.com/package/browser-use)
5
- ![license](https://img.shields.io/npm/l/browser-use)
6
-
7
- > 🙏 **A TypeScript port of the amazing [browser-use](https://github.com/browser-use/browser-use) Python library**
8
- >
9
- > This project is a faithful TypeScript/JavaScript implementation of the original [browser-use](https://github.com/browser-use/browser-use) Python library, bringing the power of AI-driven browser automation to the Node.js ecosystem. All credit for the innovative design and architecture goes to the original Python project and its creators.
10
-
11
- A TypeScript-first library for programmatic browser control, designed for building AI-powered web agents with vision capabilities and extensive LLM integrations.
12
-
13
- ## Why TypeScript?
14
-
15
- While the original [browser-use Python library](https://github.com/browser-use/browser-use) is excellent and feature-complete, this TypeScript port aims to:
16
-
17
- - 🌍 Bring browser-use capabilities to the JavaScript/TypeScript ecosystem
18
- - 🔧 Enable seamless integration with Node.js, Deno, and Bun projects
19
- - 📦 Provide native TypeScript type definitions for better DX
20
- - 🤝 Make browser automation accessible to frontend and full-stack developers
21
-
22
- ### Python vs TypeScript: Which Should You Use?
23
-
24
- | Feature | Python Version | TypeScript Version |
25
- | ------------------- | --------------------------------------------------------------------- | ----------------------------------------------------------- |
26
- | **Recommended for** | Python developers, Data scientists, AI/ML engineers | JavaScript/TypeScript developers, Full-stack engineers |
27
- | **Ecosystem** | PyPI, pip | npm, yarn, pnpm |
28
- | **Type Safety** | Optional (with type hints) | Built-in (TypeScript) |
29
- | **Runtime** | Python 3.x | Node.js, Deno, Bun |
30
- | **LLM Providers** | 10+ providers | 10+ providers (same) |
31
- | **Browser Support** | Playwright | Playwright (same) |
32
- | **Documentation** | Original & Complete | Port with TS-specific examples |
33
- | **Community** | Larger & More Established | Growing |
34
- | **GitHub** | [browser-use/browser-use](https://github.com/browser-use/browser-use) | [webllm/browser-use](https://github.com/webllm/browser-use) |
35
-
36
- **👉 If you're working in Python, we highly recommend using the [original browser-use library](https://github.com/browser-use/browser-use).** This TypeScript port is specifically for those who need to work within the JavaScript/TypeScript ecosystem.
37
-
38
- ### Commitment to the Original
39
-
40
- We are committed to:
41
-
42
- - ✅ Maintaining feature parity with the Python version whenever possible
43
- - 🔄 Keeping up with upstream updates and improvements
44
- - 🐛 Reporting bugs found in this port back to the original project when applicable
45
- - 📚 Directing users to the original project's documentation for core concepts
46
- - 🤝 Collaborating with the original authors and respecting their vision
47
-
48
- This is **not** a fork or competing project—it's a respectful port to serve a different programming language community.
49
-
50
- ### Upstream Parity Status
51
-
52
- This Node.js/TypeScript implementation is currently **strictly aligned** with the Python `browser-use` release
53
- [`v0.5.11`](https://github.com/browser-use/browser-use/releases/tag/0.5.11), published on **August 10, 2025**.
54
-
55
- - 📦 Core features and behavior are aligned against that upstream tag baseline.
56
- - ✅ Our test strategy is maintained to be as equivalent as practical to the Python coverage and behavior checks.
57
- - 🔄 We expect to move this parity baseline forward to the Python **January 2026** release line very soon.
58
-
59
- ## Features
60
-
61
- - 🤖 **AI-Powered**: Built specifically for LLM-driven web automation with structured output support
62
- - 🎯 **Type-Safe**: Full TypeScript support with comprehensive type definitions
63
- - 🌐 **Multi-Browser**: Support for Chromium, Firefox, and WebKit via Playwright
64
- - 🔌 **10+ LLM Providers**: OpenAI, Anthropic, Google, AWS, Azure, DeepSeek, Groq, Ollama, OpenRouter, and more
65
- - 👁️ **Vision Support**: Multimodal capabilities with screenshot analysis
66
- - 🛡️ **Robust**: Built-in error handling, recovery, graceful shutdown, and retry mechanisms
67
- - 📊 **Observable**: Comprehensive logging, execution history, and telemetry
68
- - 🔧 **Extensible**: Custom actions, MCP protocol, and plugin system
69
- - 📁 **FileSystem**: Built-in file operations with PDF parsing
70
- - 🔗 **Integrations**: Gmail API, Google Sheets, and MCP servers
71
-
72
- ## Quick Start
1
+ <p align="center">
2
+ <h1 align="center">🌐 Browser-Use</h1>
3
+ <p align="center">
4
+ <strong>Make websites accessible for AI agents — in TypeScript</strong>
5
+ </p>
6
+ <p align="center">
7
+ A TypeScript-first library for building AI-powered web agents that can autonomously browse, interact with, and extract data from the web using LLMs and Playwright.
8
+ </p>
9
+ </p>
10
+
11
+ <p align="center">
12
+ <a href="https://github.com/webllm/browser-use/workflows/Node%20CI"><img src="https://github.com/webllm/browser-use/workflows/Node%20CI/badge.svg" alt="Node CI"></a>
13
+ <a href="https://www.npmjs.com/package/browser-use"><img src="https://img.shields.io/npm/v/browser-use.svg" alt="npm"></a>
14
+ <a href="https://www.npmjs.com/package/browser-use"><img src="https://img.shields.io/npm/dm/browser-use.svg" alt="npm downloads"></a>
15
+ <img src="https://img.shields.io/npm/l/browser-use" alt="license">
16
+ <img src="https://img.shields.io/badge/TypeScript-first-blue" alt="TypeScript">
17
+ </p>
18
+
19
+ ---
20
+
21
+ > **TypeScript port** of the popular Python [browser-use](https://github.com/browser-use/browser-use) library — with a native Node.js experience, full type safety, and first-class support for all major LLM providers.
22
+
23
+ ## ✨ Features
24
+
25
+ - 🤖 **Autonomous Browser Control** AI-driven navigation, clicking, typing, form filling, scrolling, and tab management
26
+ - 🧠 **10+ LLM Providers** OpenAI, Anthropic, Google Gemini, Azure, AWS Bedrock, Groq, Ollama, DeepSeek, OpenRouter, Mistral, Cerebras, and custom providers
27
+ - 👁️ **Vision Support** Screenshot-based understanding for visual web interactions
28
+ - 🔧 **45+ Built-in Actions** Navigation, element interaction, scrolling, forms, tabs, content extraction, file I/O, and more
29
+ - 🧩 **Custom Actions** Extensible registry with Zod schema validation, domain restrictions, and page filters
30
+ - 🔌 **MCP Server** Model Context Protocol support for Claude Desktop and MCP-compatible clients
31
+ - ⌨️ **CLI Tool** Interactive and one-shot modes for quick browser tasks
32
+ - 🔒 **Security First** Sensitive data masking, domain restrictions, and Chromium sandboxing
33
+ - 📊 **Observability** Event system, telemetry, performance tracing, and session recording (GIF)
34
+ - 🐳 **Docker Ready** Configurable for containerized and CI/CD environments
35
+
36
+ ## 🚀 Quick Start
73
37
 
74
38
  ### Installation
75
39
 
76
40
  ```bash
77
41
  npm install browser-use
78
- # or
79
- yarn add browser-use
80
- # or
81
- pnpm add browser-use
42
+ # Playwright browsers are installed automatically via postinstall
82
43
  ```
83
44
 
84
- Playwright browsers will be installed automatically via postinstall hook.
85
-
86
- Use only documented public entrypoints such as `browser-use` and
87
- `browser-use/llm/openai`. Avoid deep imports like `browser-use/dist/...`.
88
-
89
- ### Basic Usage with Agent
90
-
91
- ```typescript
92
- import { Agent } from 'browser-use';
93
- import { ChatOpenAI } from 'browser-use/llm/openai';
94
-
95
- async function main() {
96
- const llm = new ChatOpenAI({
97
- model: 'gpt-4',
98
- apiKey: process.env.OPENAI_API_KEY,
99
- });
100
-
101
- const agent = new Agent({
102
- task: 'Go to google.com and search for "TypeScript browser automation"',
103
- llm,
104
- });
105
-
106
- const history = await agent.run();
107
-
108
- console.log(`Task completed in ${history.history.length} steps`);
109
-
110
- // Access the browser session
111
- const browserSession = agent.browser_session;
112
- const currentPage = await browserSession.get_current_page();
113
- console.log('Final URL:', currentPage?.url());
114
- }
45
+ ### Set Up Your API Key
115
46
 
116
- main();
47
+ ```bash
48
+ export OPENAI_API_KEY=sk-your-api-key
49
+ # or ANTHROPIC_API_KEY, GOOGLE_API_KEY, etc.
117
50
  ```
118
51
 
119
- ### Using Controller for Custom Actions
120
-
121
- Use `Controller` to register domain-specific actions, then pass it into `Agent`:
52
+ ### Run Your First Agent
122
53
 
123
54
  ```typescript
124
- import { Agent, Controller, ActionResult } from 'browser-use';
55
+ import { Agent } from 'browser-use';
125
56
  import { ChatOpenAI } from 'browser-use/llm/openai';
126
- import { z } from 'zod';
127
-
128
- const controller = new Controller();
129
-
130
- controller.registry.action('Extract product info from the current page', {
131
- param_model: z.object({
132
- include_price: z.boolean().default(true),
133
- include_reviews: z.boolean().default(false),
134
- }),
135
- })(async function extract_product_info(params, { page }) {
136
- const productData = await page.evaluate(() => ({
137
- title: document.querySelector('h1')?.textContent ?? null,
138
- price: document.querySelector('.price')?.textContent ?? null,
139
- }));
140
-
141
- return new ActionResult({
142
- extracted_content: JSON.stringify({ ...productData, ...params }),
143
- include_in_memory: true,
144
- });
145
- });
146
57
 
147
58
  const agent = new Agent({
148
- task: 'Open product page and extract product info',
59
+ task: 'Go to google.com and search for "TypeScript tutorials"',
149
60
  llm: new ChatOpenAI({
150
61
  model: 'gpt-4o',
151
62
  apiKey: process.env.OPENAI_API_KEY,
152
63
  }),
153
- controller,
154
64
  });
155
65
 
156
- const history = await agent.run(10);
157
- console.log(history.final_result());
66
+ const history = await agent.run();
67
+ console.log('Result:', history.final_result());
68
+ console.log('Success:', history.is_successful());
158
69
  ```
159
70
 
160
- ### CLI Usage
71
+ ```bash
72
+ npx tsx example.ts
73
+ ```
74
+
75
+ ### Use the CLI
161
76
 
162
77
  ```bash
163
- # Interactive mode (when running in a TTY)
78
+ # Interactive mode
164
79
  npx browser-use
165
80
 
166
81
  # One-shot task
167
- npx browser-use -p "Go to example.com and extract the page title"
168
-
169
- # Positional task mode
170
- npx browser-use "Search for TypeScript browser automation"
171
-
172
- # Pick model/provider by model name
173
- npx browser-use --model claude-sonnet-4-20250514 -p "Summarize latest AI news"
174
-
175
- # Pick provider explicitly (uses provider default model)
176
- npx browser-use --provider anthropic -p "Summarize latest AI news"
82
+ npx browser-use "Go to example.com and extract the page title"
177
83
 
178
- # Headless + custom browser profile settings
179
- npx browser-use --headless --window-width 1440 --window-height 900 -p "Check dashboard status"
84
+ # With specific model
85
+ npx browser-use --model claude-sonnet-4-20250514 -p "Search for AI news"
180
86
 
181
- # Restrict navigation to trusted domains (recommended with secrets)
182
- npx browser-use --allowed-domains "example.com,*.example.org" -p "Log in and fetch account info"
183
-
184
- # Connect to existing Chromium via CDP
185
- npx browser-use --cdp-url http://localhost:9222 -p "Inspect the active tab"
87
+ # Headless mode
88
+ npx browser-use --headless -p "Check the weather"
186
89
 
187
90
  # MCP server mode
188
91
  npx browser-use --mcp
189
92
  ```
190
93
 
191
- Interactive mode commands:
94
+ ## 🏗️ Architecture
95
+
96
+ ```
97
+ ┌─────────────────────────────────────────────────────┐
98
+ │ Browser-Use │
99
+ ├─────────────────────────────────────────────────────┤
100
+ │ Agent ← MessageManager ← LLM Providers │
101
+ │ ↓ │
102
+ │ Controller → Action Registry → BrowserSession │
103
+ │ ↓ │
104
+ │ DomService │
105
+ └─────────────────────────────────────────────────────┘
106
+ ```
107
+
108
+ | Component | Description |
109
+ | ------------------ | ---------------------------------------------------------------------- |
110
+ | **Agent** | Central orchestrator — runs the observe → think → act loop |
111
+ | **Controller** | Manages action registration and execution via Registry |
112
+ | **BrowserSession** | Playwright wrapper — browser lifecycle, tab management, screenshots |
113
+ | **DomService** | Extracts interactive elements with indexed mapping for LLM consumption |
114
+ | **MessageManager** | Manages LLM conversation history with token optimization |
115
+ | **LLM Providers** | Unified `BaseChatModel` interface across 10+ providers |
116
+
117
+ ### How It Works
118
+
119
+ 1. **Agent** receives a natural language task
120
+ 2. **DomService** extracts the current page state (interactive elements + optional screenshot)
121
+ 3. **LLM** analyzes the state and returns actions to take
122
+ 4. **Controller** validates and executes actions through the **Registry**
123
+ 5. Results feed back to the LLM for the next step
124
+ 6. Loop continues until `done` action or `max_steps`
125
+
126
+ ## 🔌 LLM Providers
127
+
128
+ | Provider | Import | Vision | Notes |
129
+ | ----------------- | ---------------------------- | ------ | --------------------------------------------- |
130
+ | **OpenAI** | `browser-use/llm/openai` | ✅ | Default provider, reasoning models (o1/o3/o4) |
131
+ | **Anthropic** | `browser-use/llm/anthropic` | ✅ | Prompt caching support |
132
+ | **Google Gemini** | `browser-use/llm/google` | ✅ | Extended thinking support |
133
+ | **Azure OpenAI** | `browser-use/llm/azure` | ✅ | Enterprise deployment |
134
+ | **AWS Bedrock** | `browser-use/llm/aws` | ✅ | Claude via AWS |
135
+ | **Groq** | `browser-use/llm/groq` | ❌ | Fastest inference |
136
+ | **Ollama** | `browser-use/llm/ollama` | ❌ | Local/self-hosted models |
137
+ | **DeepSeek** | `browser-use/llm/deepseek` | ❌ | Cost-effective |
138
+ | **OpenRouter** | `browser-use/llm/openrouter` | Varies | Multi-model routing |
139
+ | **Mistral** | `browser-use/llm/mistral` | Varies | Mistral models |
140
+ | **Cerebras** | `browser-use/llm/cerebras` | ❌ | Fast inference |
141
+
142
+ <details>
143
+ <summary>Provider examples</summary>
192
144
 
193
- - `help`: show interactive usage
194
- - `exit`: quit interactive mode
145
+ ```typescript
146
+ // OpenAI
147
+ import { ChatOpenAI } from 'browser-use/llm/openai';
148
+ const llm = new ChatOpenAI({
149
+ model: 'gpt-4o',
150
+ apiKey: process.env.OPENAI_API_KEY,
151
+ });
195
152
 
196
- Security notes:
153
+ // Anthropic
154
+ import { ChatAnthropic } from 'browser-use/llm/anthropic';
155
+ const llm = new ChatAnthropic({
156
+ model: 'claude-sonnet-4-20250514',
157
+ apiKey: process.env.ANTHROPIC_API_KEY,
158
+ });
197
159
 
198
- - Prefer `--allowed-domains` whenever tasks involve credentials or sensitive data.
199
- - `--allow-insecure` disables domain-lockdown enforcement for sensitive data and is unsafe for production.
160
+ // Google Gemini
161
+ import { ChatGoogle } from 'browser-use/llm/google';
162
+ const llm = new ChatGoogle('gemini-2.5-flash');
200
163
 
201
- ## Advanced Usage
164
+ // Ollama (local)
165
+ import { ChatOllama } from 'browser-use/llm/ollama';
166
+ const llm = new ChatOllama('llama3', 'http://localhost:11434');
202
167
 
203
- ### Vision/Multimodal Support
168
+ // OpenAI Reasoning Models
169
+ const llm = new ChatOpenAI({ model: 'o3-mini', reasoningEffort: 'medium' });
170
+ ```
204
171
 
205
- Enable vision capabilities to let the AI analyze screenshots:
172
+ </details>
206
173
 
207
- ```typescript
208
- import { Agent } from 'browser-use';
209
- import { ChatGoogle } from 'browser-use/llm/google';
174
+ ## 🎯 Code Examples
210
175
 
211
- const llm = new ChatGoogle('gemini-2.5-flash');
176
+ ### Data Extraction
212
177
 
178
+ ```typescript
213
179
  const agent = new Agent({
214
- task: 'Describe what you see on this page and identify main visual elements',
180
+ task: `Go to amazon.com, search for "wireless keyboard",
181
+ extract the name, price, and rating of the first 5 products as JSON`,
215
182
  llm,
216
183
  use_vision: true,
217
- vision_detail_level: 'high', // 'auto' | 'low' | 'high'
218
184
  });
219
185
 
220
- const history = await agent.run(5);
186
+ const history = await agent.run(30);
187
+ console.log(history.final_result());
221
188
  ```
222
189
 
223
- ### Custom Actions with Controller Registry
190
+ ### Form Filling with Sensitive Data
191
+
192
+ ```typescript
193
+ const agent = new Agent({
194
+ task: 'Login to the dashboard',
195
+ llm,
196
+ sensitive_data: {
197
+ '*.example.com': {
198
+ username: process.env.SITE_USERNAME!,
199
+ password: process.env.SITE_PASSWORD!,
200
+ },
201
+ },
202
+ browser_session: new BrowserSession({
203
+ browser_profile: new BrowserProfile({
204
+ allowed_domains: ['*.example.com'],
205
+ }),
206
+ }),
207
+ });
208
+ ```
224
209
 
225
- Extend the agent's capabilities with custom actions:
210
+ ### Custom Actions
226
211
 
227
212
  ```typescript
228
213
  import { Controller, ActionResult } from 'browser-use';
@@ -230,588 +215,212 @@ import { z } from 'zod';
230
215
 
231
216
  const controller = new Controller();
232
217
 
233
- controller.registry.action('Extract product information', {
218
+ controller.registry.action('Save screenshot to file', {
234
219
  param_model: z.object({
235
- include_price: z.boolean().default(true),
236
- include_reviews: z.boolean().default(false),
220
+ filename: z.string().describe('Output filename'),
237
221
  }),
238
- })(async function extract_product_info(params, { page }) {
239
- const productData = await page.evaluate(() => ({
240
- title: document.querySelector('h1')?.textContent ?? null,
241
- price: document.querySelector('.price')?.textContent ?? null,
242
- }));
243
-
222
+ })(async function save_screenshot(params, ctx) {
223
+ const screenshot = await ctx.page.screenshot();
224
+ fs.writeFileSync(`./screenshots/${params.filename}`, screenshot);
244
225
  return new ActionResult({
245
- extracted_content: JSON.stringify({ ...productData, ...params }),
246
- include_in_memory: true,
226
+ extracted_content: `Screenshot saved as ${params.filename}`,
247
227
  });
248
228
  });
249
- ```
250
229
 
251
- ### FileSystem Operations
252
-
253
- Built-in file system support with PDF parsing:
254
-
255
- ```typescript
256
- import { Agent } from 'browser-use';
257
- import { ChatOpenAI } from 'browser-use/llm/openai';
258
-
259
- const agent = new Agent({
260
- task: 'Download the PDF and extract text from page 1',
261
- llm: new ChatOpenAI(),
262
- file_system_path: './agent-workspace',
263
- });
264
-
265
- // FileSystem actions are available:
266
- // - read_file: Read file contents (supports PDF)
267
- // - write_file: Write content to file
268
- // - replace_file_str: Replace text in file
230
+ const agent = new Agent({ task: '...', llm, controller });
269
231
  ```
270
232
 
271
- ### Browser Profile Configuration
272
-
273
- Customize browser behavior with profiles:
233
+ ### Vision Mode & Session Recording
274
234
 
275
235
  ```typescript
276
- import { BrowserProfile, BrowserSession } from 'browser-use';
277
-
278
- const profile = new BrowserProfile({
279
- window_size: { width: 1920, height: 1080 },
280
- disable_security: false,
281
- headless: true,
282
- chromium_sandbox: true, // Keep enabled by default in production
283
- args: ['--disable-blink-features=AutomationControlled'],
284
- wait_for_network_idle_page_load_time: 3, // seconds
285
- allowed_domains: ['example.com', '*.google.com'],
286
- cookies_file: './cookies.json',
287
- downloads_path: './downloads',
288
- highlight_elements: false, // Visual debugging
289
- viewport_expansion: 0, // Expand viewport for element detection
290
- });
291
-
292
- const browserSession = new BrowserSession({
293
- browser_profile: profile,
236
+ const agent = new Agent({
237
+ task: 'Navigate to hacker news and summarize the top stories',
238
+ llm,
239
+ use_vision: true,
240
+ vision_detail_level: 'high', // 'auto' | 'low' | 'high'
241
+ generate_gif: './session.gif',
294
242
  });
295
-
296
- await browserSession.start();
297
243
  ```
298
244
 
299
- If Chromium launch fails with `No usable sandbox` (common in restricted Linux CI),
300
- `BrowserSession` automatically retries once with `chromium_sandbox: false` and logs
301
- a warning. For deterministic CI behavior, set `chromium_sandbox: false` explicitly.
302
-
303
- ### MCP (Model Context Protocol) Integration
304
-
305
- Connect to MCP servers for extended capabilities:
245
+ ### Multi-Tab Workflows
306
246
 
307
247
  ```typescript
308
- import { MCPController } from 'browser-use';
309
-
310
- const mcpController = new MCPController();
311
-
312
- // Add MCP server
313
- await mcpController.addServer('my-server', 'npx', [
314
- '-y',
315
- '@modelcontextprotocol/server-filesystem',
316
- '/path/to/data',
317
- ]);
318
-
319
- // MCP tools are automatically available to the agent
320
- const tools = await mcpController.listAllTools();
321
- console.log('Available MCP tools:', tools);
322
- ```
323
-
324
- ### Gmail Integration
325
-
326
- Built-in Gmail API support:
327
-
328
- ```typescript
329
- import { GmailService } from 'browser-use';
330
-
331
- // Gmail actions are automatically available:
332
- // - get_recent_emails: Fetch recent emails
333
- // - send_email: Send email via Gmail API
334
-
335
248
  const agent = new Agent({
336
- task: 'Check my last 5 emails and summarize them',
337
- llm: new ChatOpenAI(),
338
- // Gmail credentials loaded from config files (or explicit GmailService options)
249
+ task: `Compare "Sony WH-1000XM5" prices:
250
+ 1. Open amazon.com and search for the product
251
+ 2. Open bestbuy.com in a new tab and search
252
+ 3. Provide a comparison summary`,
253
+ llm,
254
+ use_vision: true,
339
255
  });
340
256
  ```
341
257
 
342
- ## Configuration
343
-
344
- ### Environment Variables
345
-
346
- ```bash
347
- # LLM Configuration (provider-specific)
348
- OPENAI_API_KEY=your-openai-key
349
- ANTHROPIC_API_KEY=your-anthropic-key
350
- GOOGLE_API_KEY=your-google-key
351
- AWS_ACCESS_KEY_ID=your-aws-key
352
- AWS_SECRET_ACCESS_KEY=your-aws-secret
353
- AZURE_OPENAI_API_KEY=your-azure-key
354
- AZURE_OPENAI_ENDPOINT=your-azure-endpoint
355
- GROQ_API_KEY=your-groq-key
356
- DEEPSEEK_API_KEY=your-deepseek-key
357
-
358
- # Browser Configuration
359
- BROWSER_USE_HEADLESS=true
360
- BROWSER_USE_ALLOWED_DOMAINS=example.com,*.trusted.org
361
- IN_DOCKER=true
362
-
363
- # Logging Configuration
364
- BROWSER_USE_LOGGING_LEVEL=info # debug, info, warning, error
365
-
366
- # Telemetry (optional)
367
- ANONYMIZED_TELEMETRY=false
368
-
369
- # Observability (optional)
370
- LMNR_API_KEY=your-lmnr-key
371
- ```
372
-
373
- ### Agent Configuration
374
-
375
- ```typescript
376
- interface AgentOptions {
377
- // Vision/multimodal
378
- use_vision?: boolean;
379
- vision_detail_level?: 'low' | 'high' | 'auto';
380
-
381
- // Error handling
382
- max_failures?: number; // default: 3
383
- retry_delay?: number; // seconds, default: 10
384
- max_actions_per_step?: number; // default: 10
385
-
386
- // Persistence / output
387
- save_conversation_path?: string | null;
388
- file_system_path?: string | null;
389
- validate_output?: boolean;
390
- include_attributes?: string[];
391
-
392
- // Runtime limits (seconds)
393
- llm_timeout?: number; // default: 60
394
- step_timeout?: number; // default: 180
395
- }
396
-
397
- // Max step count is configured per run call:
398
- await agent.run(100);
399
- ```
400
-
401
- ## Supported LLM Providers
402
-
403
- ### OpenAI
258
+ ### Event System
404
259
 
405
260
  ```typescript
406
- import { ChatOpenAI } from 'browser-use/llm/openai';
261
+ const agent = new Agent({ task: '...', llm });
407
262
 
408
- const llm = new ChatOpenAI({
409
- model: 'gpt-4o', // or 'gpt-4', 'gpt-3.5-turbo'
410
- apiKey: process.env.OPENAI_API_KEY,
411
- temperature: 0.1,
412
- maxTokens: 4096,
263
+ agent.eventbus.on('CreateAgentStepEvent', (event) => {
264
+ console.log('Step completed:', event.step_id);
413
265
  });
414
- ```
415
266
 
416
- ### Anthropic Claude
417
-
418
- ```typescript
419
- import { ChatAnthropic } from 'browser-use/llm/anthropic';
420
-
421
- const llm = new ChatAnthropic({
422
- model: 'claude-3-5-sonnet-20241022', // or other Claude models
423
- apiKey: process.env.ANTHROPIC_API_KEY,
424
- temperature: 0.1,
425
- });
267
+ await agent.run();
426
268
  ```
427
269
 
428
- ### Google Gemini
429
-
430
- ```typescript
431
- import { ChatGoogle } from 'browser-use/llm/google';
432
-
433
- const llm = new ChatGoogle('gemini-2.5-flash');
434
- // Configure GOOGLE_API_KEY in env. Optional:
435
- // GOOGLE_API_BASE_URL / GOOGLE_API_VERSION
436
- ```
270
+ ## ⚙️ Configuration
437
271
 
438
- ### AWS Bedrock
272
+ ### Agent Options
439
273
 
440
274
  ```typescript
441
- import { ChatAnthropicBedrock } from 'browser-use/llm/aws';
442
-
443
- const llm = new ChatAnthropicBedrock({
444
- model: 'anthropic.claude-3-5-sonnet-20241022-v2:0',
445
- region: 'us-east-1',
446
- max_tokens: 4096,
275
+ const agent = new Agent({
276
+ task: 'Your task',
277
+ llm,
278
+ use_vision: true, // Enable screenshot analysis
279
+ max_actions_per_step: 5, // Actions per LLM call
280
+ max_failures: 3, // Max retries on failure
281
+ generate_gif: './recording.gif', // Session recording
282
+ validate_output: true, // Strict output validation
283
+ use_thinking: true, // Extended thinking prompts
284
+ llm_timeout: 60, // LLM call timeout (seconds)
285
+ step_timeout: 180, // Step timeout (seconds)
286
+ extend_system_message: 'Be concise', // Custom prompt additions
447
287
  });
448
- ```
449
288
 
450
- ### Azure OpenAI
451
-
452
- ```typescript
453
- import { ChatAzure } from 'browser-use/llm/azure';
454
-
455
- const llm = new ChatAzure('gpt-4o');
456
- // Configure AZURE_OPENAI_API_KEY, AZURE_OPENAI_ENDPOINT, AZURE_OPENAI_API_VERSION in env.
457
- ```
458
-
459
- ### DeepSeek
460
-
461
- ```typescript
462
- import { ChatDeepSeek } from 'browser-use/llm/deepseek';
463
-
464
- const llm = new ChatDeepSeek('deepseek-chat');
289
+ const history = await agent.run(50); // Max 50 steps
465
290
  ```
466
291
 
467
- ### Groq
292
+ ### Browser Profile
468
293
 
469
294
  ```typescript
470
- import { ChatGroq } from 'browser-use/llm/groq';
471
-
472
- const llm = new ChatGroq('mixtral-8x7b-32768');
473
- ```
474
-
475
- ### Ollama (Local)
476
-
477
- ```typescript
478
- import { ChatOllama } from 'browser-use/llm/ollama';
479
-
480
- const llm = new ChatOllama('llama3.1', 'http://localhost:11434');
481
- ```
482
-
483
- ### OpenRouter
295
+ import { BrowserProfile, BrowserSession } from 'browser-use';
484
296
 
485
- ```typescript
486
- import { ChatOpenRouter } from 'browser-use/llm/openrouter';
297
+ const profile = new BrowserProfile({
298
+ headless: true,
299
+ viewport: { width: 1920, height: 1080 },
300
+ user_data_dir: './my-profile', // Persistent sessions
301
+ allowed_domains: ['*.example.com'], // Domain restrictions
302
+ highlight_elements: true, // Visual debugging
303
+ proxy: { server: 'http://proxy:8080' },
304
+ });
487
305
 
488
- const llm = new ChatOpenRouter('anthropic/claude-3-opus');
306
+ const session = new BrowserSession({ browser_profile: profile });
307
+ const agent = new Agent({ task: '...', llm, browser_session: session });
489
308
  ```
490
309
 
491
- ## Available Actions
492
-
493
- The AI agent can perform these actions:
494
-
495
- ### Navigation
496
-
497
- - **search_google** - Search query in Google (web results only)
498
- - **go_to_url** - Navigate to a specific URL (with optional new tab)
499
-
500
- ### Element Interaction
501
-
502
- - **click_element** - Click buttons, links, or clickable elements by index
503
- - **input_text** - Type text into input fields and textareas by index
504
-
505
- ### Dropdown/Select
506
-
507
- - **dropdown_options** - Get available options from a dropdown
508
- - **select_dropdown** - Select option from dropdown by index
509
-
510
- ### Scrolling
511
-
512
- - **scroll** - Scroll page up/down by pixels or direction
513
- - **scroll_to_text** - Scroll to text content on page
514
-
515
- ### Tabs
516
-
517
- - **switch_tab** - Switch to different browser tab by index
518
- - **close_tab** - Close current or specific tab
519
-
520
- ### Keyboard
521
-
522
- - **send_keys** - Send keyboard input (Enter, Tab, Escape, etc.)
523
-
524
- ### Content Extraction
525
-
526
- - **extract_structured_data** - Extract specific data using LLM from page markdown
527
-
528
- ### FileSystem
529
-
530
- - **read_file** - Read file contents (supports PDF parsing)
531
- - **write_file** - Write content to file
532
- - **replace_file_str** - Replace string in file
533
-
534
- ### Google Sheets
535
-
536
- - **sheets_range** - Get cell range from Google Sheet
537
- - **sheets_update** - Update Google Sheet cells
538
- - **sheets_input** - Input data into Google Sheet
539
-
540
- ### Gmail
541
-
542
- - **get_recent_emails** - Fetch recent emails from Gmail
543
- - **send_email** - Send email via Gmail API
544
-
545
- ### Completion
546
-
547
- - **done** - Mark task as completed with optional structured output
310
+ ### Environment Variables
548
311
 
549
- ## Examples
312
+ | Variable | Description |
313
+ | ----------------------------- | ---------------------------------------------- |
314
+ | `OPENAI_API_KEY` | OpenAI API key |
315
+ | `ANTHROPIC_API_KEY` | Anthropic API key |
316
+ | `GOOGLE_API_KEY` | Google API key |
317
+ | `BROWSER_USE_HEADLESS` | Run browser headlessly (`true`/`false`) |
318
+ | `BROWSER_USE_LOGGING_LEVEL` | Log level: `debug`, `info`, `warning`, `error` |
319
+ | `BROWSER_USE_ALLOWED_DOMAINS` | Comma-separated domain allowlist |
320
+ | `ANONYMIZED_TELEMETRY` | Enable/disable anonymous telemetry |
550
321
 
551
- See the `/examples` directory for detailed examples:
322
+ > See [Configuration Guide](./docs/CONFIGURATION.md) for the full list.
552
323
 
553
- - `examples/simple-search.ts` - Basic web search automation
554
- - `examples/search-wikipedia.ts` - Wikipedia navigation with vision
555
- - `examples/test-vision.ts` - Vision/multimodal capabilities demo
556
- - `examples/test-filesystem.ts` - File operations and PDF parsing
557
- - `examples/openapi.ts` - Complex API documentation extraction
324
+ ## 🔌 MCP Server (Claude Desktop)
558
325
 
559
- ### Running Examples
326
+ Browser-Use can run as an [MCP](https://modelcontextprotocol.io/) server, exposing browser automation as tools for Claude Desktop:
560
327
 
561
328
  ```bash
562
- # Set your API key
563
- export OPENAI_API_KEY=your-key
564
- # or for Google
565
- export GOOGLE_API_KEY=your-key
566
-
567
- # Run an example
568
- npx tsx examples/simple-search.ts
329
+ npx browser-use --mcp
569
330
  ```
570
331
 
571
- ## Error Handling
572
-
573
- The library includes comprehensive error handling:
574
-
575
- ```typescript
576
- import { Agent, AgentError } from 'browser-use';
577
-
578
- try {
579
- const agent = new Agent({ task: 'Your task', llm });
580
- const history = await agent.run(10); // max 10 steps
581
-
582
- // Check completion status
583
- const lastStep = history.history[history.history.length - 1];
584
- if (lastStep?.result.is_done) {
585
- console.log('Task completed:', lastStep.result.extracted_content);
586
- } else {
587
- console.log('Task incomplete after max steps');
588
- }
589
- } catch (error) {
590
- if (error instanceof AgentError) {
591
- console.error('Agent error:', error.message);
592
- console.error('Failed at step:', error.step);
593
- } else {
594
- console.error('Unexpected error:', error);
332
+ Add to your Claude Desktop config (`~/Library/Application Support/Claude/claude_desktop_config.json`):
333
+
334
+ ```json
335
+ {
336
+ "mcpServers": {
337
+ "browser-use": {
338
+ "command": "npx",
339
+ "args": ["browser-use", "--mcp"],
340
+ "env": {
341
+ "OPENAI_API_KEY": "your-api-key"
342
+ }
343
+ }
595
344
  }
596
345
  }
597
346
  ```
598
347
 
599
- ## Development
600
-
601
- ### Building from Source
602
-
603
- ```bash
604
- git clone https://github.com/webllm/browser-use.git
605
- cd browser-use
606
- yarn install # Automatically installs Playwright browsers
607
- yarn build
608
- ```
609
-
610
- ### Running Tests
611
-
612
- ```bash
613
- # Run all tests
614
- yarn test
615
-
616
- # Run specific test
617
- yarn test test/integration-advanced.test.ts
618
-
619
- # Watch mode
620
- yarn test:watch
621
-
622
- # Validate published package exports
623
- yarn test:pack
624
- ```
625
-
626
- ### Code Quality
348
+ Available MCP tools: `browser_run_task`, `browser_navigate`, `browser_click`, `browser_type`, `browser_scroll`, `browser_get_state`, `browser_extract`, `browser_screenshot`, `browser_close`.
627
349
 
628
- ```bash
629
- # Lint
630
- yarn lint
631
-
632
- # Format
633
- yarn prettier
634
-
635
- # Type check
636
- yarn typecheck
637
- ```
350
+ > See [MCP Server Guide](./docs/MCP_SERVER.md) for more details.
638
351
 
639
- ## Architecture
352
+ ## 🔒 Security
640
353
 
641
- The library follows a modular, layered architecture:
642
-
643
- ```
644
- ┌──────────────────────────────────────────┐
645
- │ Agent (Orchestrator) │
646
- │ - Task execution & planning │
647
- │ - LLM message management │
648
- │ - Step execution loop │
649
- └─────────┬────────────────────────────────┘
650
-
651
- ┌─────────▼────────────────────────────────┐
652
- │ Controller (Actions) │
653
- │ - Action registry & execution │
654
- │ - Built-in actions (30+) │
655
- │ - Custom action support │
656
- └─────────┬────────────────────────────────┘
657
-
658
- ┌─────────▼────────────────────────────────┐
659
- │ BrowserSession (Browser) │
660
- │ - Playwright integration │
661
- │ - Tab & page management │
662
- │ - Navigation & interaction │
663
- └─────────┬────────────────────────────────┘
664
-
665
- ┌─────────▼────────────────────────────────┐
666
- │ DOMService (DOM Analysis) │
667
- │ - Element extraction │
668
- │ - Clickable element detection │
669
- │ - History tree processing │
670
- └──────────────────────────────────────────┘
671
-
672
- Supporting Services:
673
- ┌──────────────────────────────────────────┐
674
- │ - LLM Clients (10+ providers) │
675
- │ - FileSystem (with PDF support) │
676
- │ - Screenshot Service │
677
- │ - Token Tracking & Cost Calculation │
678
- │ - Telemetry (PostHog) │
679
- │ - Observability (LMNR) │
680
- │ - MCP Protocol Support │
681
- │ - Gmail/Sheets Integration │
682
- └──────────────────────────────────────────┘
683
- ```
684
-
685
- ### Key Components
686
-
687
- - **Agent**: High-level orchestrator managing task execution, LLM communication, and step-by-step planning
688
- - **Controller**: Action registry and executor with 30+ built-in actions and custom action support
689
- - **BrowserSession**: Browser lifecycle manager built on Playwright with tab management and state tracking
690
- - **DOMService**: Intelligent DOM analyzer extracting relevant elements for AI consumption
691
- - **MessageManager**: Manages conversation history with token optimization and context window management
692
- - **FileSystem**: File operations with PDF parsing and workspace management
693
- - **ScreenshotService**: Captures and manages screenshots for vision capabilities
694
- - **Registry**: Type-safe action registration system with Zod schema validation
695
-
696
- ## Token Usage & Cost Tracking
697
-
698
- The library automatically tracks token usage and calculates costs:
354
+ - **Sensitive Data Masking** Credentials are automatically masked in logs and LLM context
355
+ - **Domain Restrictions** — Lock browser navigation to trusted domains
356
+ - **Domain-scoped Secrets** — Credentials are only injected on matching domains
357
+ - **Hard Safety Gate** — `sensitive_data` requires `allowed_domains` by default
358
+ - **Chromium Sandbox** — Enabled by default for production security
699
359
 
700
360
  ```typescript
701
- import { TokenCost } from 'browser-use';
702
-
703
- const agent = new Agent({ task: 'Your task', llm });
704
- const history = await agent.run();
705
-
706
- // Get token statistics
707
- const stats = history.stats();
708
- console.log(
709
- 'Total tokens:',
710
- stats.total_input_tokens + stats.total_output_tokens
711
- );
712
- console.log('Steps:', stats.n_steps);
713
-
714
- // Calculate cost (if pricing data available)
715
- const cost = TokenCost.calculate(history);
716
- console.log('Estimated cost: $', cost.toFixed(4));
717
- ```
718
-
719
- ## Screenshot & History Export
720
-
721
- Generate GIF animations from agent execution history:
722
-
723
- ```typescript
724
- import { create_history_gif } from 'browser-use';
725
-
726
- const history = await agent.run();
727
-
728
- await create_history_gif('My automation task', history, {
729
- output_path: 'agent-history.gif',
730
- duration: 3000, // ms per frame
731
- show_goals: true,
732
- show_task: true,
733
- show_logo: false,
361
+ const agent = new Agent({
362
+ task: 'Login and fetch invoices',
363
+ llm,
364
+ sensitive_data: {
365
+ '*.example.com': {
366
+ username: process.env.USERNAME!,
367
+ password: process.env.PASSWORD!,
368
+ },
369
+ },
370
+ browser_session: new BrowserSession({
371
+ browser_profile: new BrowserProfile({
372
+ allowed_domains: ['*.example.com'],
373
+ }),
374
+ }),
734
375
  });
735
-
736
- console.log('Created agent-history.gif');
737
376
  ```
738
377
 
739
- ## Observability
378
+ > See [Security Guide](./docs/SECURITY.md) for production deployment best practices.
740
379
 
741
- Built-in observability with LMNR (Laminar) and custom debugging:
380
+ ## 📚 Documentation
742
381
 
743
- ```typescript
744
- import { observe, observe_debug } from 'browser-use';
382
+ | Document | Description |
383
+ | ---------------------------------------- | ------------------------------------ |
384
+ | [Quick Start](./docs/QUICKSTART.md) | Get started in 5 minutes |
385
+ | [Architecture](./docs/ARCHITECTURE.md) | System design and component overview |
386
+ | [API Reference](./docs/API_REFERENCE.md) | Complete API documentation |
387
+ | [Configuration](./docs/CONFIGURATION.md) | All configuration options |
388
+ | [LLM Providers](./docs/LLM_PROVIDERS.md) | Provider setup and comparison |
389
+ | [Actions](./docs/ACTIONS.md) | Built-in and custom actions |
390
+ | [MCP Server](./docs/MCP_SERVER.md) | MCP integration guide |
391
+ | [Security](./docs/SECURITY.md) | Security best practices |
392
+ | [Examples](./docs/EXAMPLES.md) | More code examples |
393
+ | [Contributing](./docs/CONTRIBUTING.md) | Contribution guidelines |
745
394
 
746
- // Automatic tracing (if LMNR_API_KEY set)
747
- // All agent operations are automatically traced
395
+ ## 🛠️ Development
748
396
 
749
- // Custom debug observations
750
- @observe_debug({ name: 'my_custom_operation' })
751
- async function myFunction() {
752
- // Function execution is logged and timed
753
- }
754
- ```
755
-
756
- ## Contributing
757
-
758
- Contributions are welcome! Please feel free to submit a Pull Request.
759
-
760
- 1. Fork the repository
761
- 2. Create your feature branch (`git checkout -b feature/amazing-feature`)
762
- 3. Commit your changes (`git commit -m 'feat: add amazing feature'`)
763
- 4. Push to the branch (`git push origin feature/amazing-feature`)
764
- 5. Open a Pull Request
765
-
766
- ## Support
767
-
768
- - 📚 [Documentation](https://github.com/webllm/browser-use)
769
- - 🐛 [Issue Tracker](https://github.com/webllm/browser-use/issues)
770
- - 💬 [Discussions](https://github.com/webllm/browser-use/discussions)
771
-
772
- ## Acknowledgments
773
-
774
- ### Original Project
775
-
776
- This TypeScript implementation would not exist without the groundbreaking work of the original **[browser-use](https://github.com/browser-use/browser-use)** Python library:
777
-
778
- - 🎯 **Original Project**: [browser-use/browser-use](https://github.com/browser-use/browser-use) (Python)
779
- - 👏 **Created by**: The browser-use team and contributors
780
- - 💡 **Inspiration**: All architectural decisions, agent design patterns, and innovative approaches come from the original Python implementation
781
-
782
- We are deeply grateful to the original authors for creating such an elegant and powerful solution for AI-driven browser automation. This TypeScript port aims to faithfully replicate their excellent work for the JavaScript/TypeScript community.
783
-
784
- ### Key Differences from Python Version
785
-
786
- While we strive to maintain feature parity with the Python version, there are some differences due to platform constraints:
787
-
788
- - **Runtime**: Node.js/Deno/Bun instead of Python
789
- - **Type System**: TypeScript's structural typing vs Python's duck typing
790
- - **Async Model**: JavaScript Promises vs Python async/await (similar but different)
791
- - **Ecosystem**: npm packages vs PyPI packages
397
+ ```bash
398
+ # Install dependencies
399
+ pnpm install
792
400
 
793
- ### Technology Stack
401
+ # Build
402
+ pnpm build
794
403
 
795
- This project is built with:
404
+ # Run tests
405
+ pnpm test
796
406
 
797
- - [Playwright](https://playwright.dev/) - Browser automation framework
798
- - [Zod](https://zod.dev/) - TypeScript-first schema validation
799
- - [OpenAI](https://openai.com/), [Anthropic](https://anthropic.com/), [Google](https://ai.google.dev/) - LLM providers
800
- - And many other excellent open-source libraries
407
+ # Lint & format
408
+ pnpm lint
409
+ pnpm prettier
801
410
 
802
- ### Community
411
+ # Type checking
412
+ pnpm typecheck
803
413
 
804
- - 🌟 **Star the original Python project**: [browser-use/browser-use](https://github.com/browser-use/browser-use)
805
- - 🌟 **Star this TypeScript port**: [webllm/browser-use](https://github.com/webllm/browser-use)
806
- - 💬 **Join the community**: Share your use cases and contribute to both projects!
414
+ # Run an example
415
+ pnpm exec tsx examples/simple-search.ts
416
+ ```
807
417
 
808
- ## Related Projects
418
+ ## Requirements
809
419
 
810
- - 🐍 [browser-use (Python)](https://github.com/browser-use/browser-use) - The original and official implementation
811
- - 🎭 [Playwright](https://playwright.dev/) - The browser automation foundation
812
- - 🤖 [LangChain](https://www.langchain.com/) - LLM application framework
813
- - 🦜 [Laminar](https://laminar.run/) - LLM observability platform
420
+ - **Node.js** >= 18.0.0
421
+ - **LLM API Key** At least one supported provider
422
+ - **Playwright** Installed automatically as a dependency
814
423
 
815
- ## License
424
+ ## 📄 License
816
425
 
817
- MIT License - see [LICENSE](LICENSE) for details.
426
+ [MIT](./LICENSE) © Web LLM