browser-use 0.2.0 → 0.4.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +295 -686
- package/dist/actor/element.d.ts +19 -0
- package/dist/actor/element.js +46 -0
- package/dist/actor/index.d.ts +4 -0
- package/dist/actor/index.js +4 -0
- package/dist/actor/mouse.d.ts +19 -0
- package/dist/actor/mouse.js +39 -0
- package/dist/actor/page.d.ts +29 -0
- package/dist/actor/page.js +88 -0
- package/dist/actor/utils.d.ts +4 -0
- package/dist/actor/utils.js +35 -0
- package/dist/agent/cloud-events.d.ts +18 -0
- package/dist/agent/cloud-events.js +65 -2
- package/dist/agent/gif.d.ts +1 -0
- package/dist/agent/gif.js +24 -2
- package/dist/agent/judge.d.ts +17 -0
- package/dist/agent/judge.js +197 -0
- package/dist/agent/message-manager/service.d.ts +12 -4
- package/dist/agent/message-manager/service.js +205 -39
- package/dist/agent/message-manager/utils.js +0 -1
- package/dist/agent/message-manager/views.d.ts +4 -0
- package/dist/agent/message-manager/views.js +11 -7
- package/dist/agent/prompts.d.ts +24 -3
- package/dist/agent/prompts.js +274 -59
- package/dist/agent/service.d.ts +103 -41
- package/dist/agent/service.js +2336 -472
- package/dist/agent/variable-detector.d.ts +12 -0
- package/dist/agent/variable-detector.js +211 -0
- package/dist/agent/views.d.ts +237 -18
- package/dist/agent/views.js +446 -33
- package/dist/browser/cloud/cloud.d.ts +20 -0
- package/dist/browser/cloud/cloud.js +129 -0
- package/dist/browser/cloud/index.d.ts +2 -0
- package/dist/browser/cloud/index.js +2 -0
- package/dist/browser/cloud/views.d.ts +41 -0
- package/dist/browser/cloud/views.js +35 -0
- package/dist/browser/events.d.ts +345 -0
- package/dist/browser/events.js +566 -0
- package/dist/browser/extensions.js +17 -17
- package/dist/browser/index.d.ts +4 -0
- package/dist/browser/index.js +4 -0
- package/dist/browser/profile.d.ts +10 -4
- package/dist/browser/profile.js +79 -12
- package/dist/browser/session-manager.d.ts +85 -0
- package/dist/browser/session-manager.js +208 -0
- package/dist/browser/session.d.ts +105 -9
- package/dist/browser/session.js +1166 -95
- package/dist/browser/types.d.ts +153 -156
- package/dist/browser/views.d.ts +39 -0
- package/dist/browser/views.js +32 -0
- package/dist/browser/watchdogs/aboutblank-watchdog.d.ts +12 -0
- package/dist/browser/watchdogs/aboutblank-watchdog.js +131 -0
- package/dist/browser/watchdogs/base.d.ts +21 -0
- package/dist/browser/watchdogs/base.js +81 -0
- package/dist/browser/watchdogs/cdp-session-watchdog.d.ts +14 -0
- package/dist/browser/watchdogs/cdp-session-watchdog.js +177 -0
- package/dist/browser/watchdogs/crash-watchdog.d.ts +38 -0
- package/dist/browser/watchdogs/crash-watchdog.js +296 -0
- package/dist/browser/watchdogs/default-action-watchdog.d.ts +49 -0
- package/dist/browser/watchdogs/default-action-watchdog.js +212 -0
- package/dist/browser/watchdogs/dom-watchdog.d.ts +8 -0
- package/dist/browser/watchdogs/dom-watchdog.js +31 -0
- package/dist/browser/watchdogs/downloads-watchdog.d.ts +77 -0
- package/dist/browser/watchdogs/downloads-watchdog.js +409 -0
- package/dist/browser/watchdogs/har-recording-watchdog.d.ts +19 -0
- package/dist/browser/watchdogs/har-recording-watchdog.js +317 -0
- package/dist/browser/watchdogs/index.d.ts +15 -0
- package/dist/browser/watchdogs/index.js +15 -0
- package/dist/browser/watchdogs/local-browser-watchdog.d.ts +10 -0
- package/dist/browser/watchdogs/local-browser-watchdog.js +32 -0
- package/dist/browser/watchdogs/permissions-watchdog.d.ts +8 -0
- package/dist/browser/watchdogs/permissions-watchdog.js +73 -0
- package/dist/browser/watchdogs/popups-watchdog.d.ts +13 -0
- package/dist/browser/watchdogs/popups-watchdog.js +77 -0
- package/dist/browser/watchdogs/recording-watchdog.d.ts +27 -0
- package/dist/browser/watchdogs/recording-watchdog.js +249 -0
- package/dist/browser/watchdogs/screenshot-watchdog.d.ts +6 -0
- package/dist/browser/watchdogs/screenshot-watchdog.js +13 -0
- package/dist/browser/watchdogs/security-watchdog.d.ts +10 -0
- package/dist/browser/watchdogs/security-watchdog.js +84 -0
- package/dist/browser/watchdogs/storage-state-watchdog.d.ts +24 -0
- package/dist/browser/watchdogs/storage-state-watchdog.js +288 -0
- package/dist/cli.d.ts +7 -2
- package/dist/cli.js +182 -25
- package/dist/code-use/formatting.d.ts +3 -0
- package/dist/code-use/formatting.js +18 -0
- package/dist/code-use/index.d.ts +6 -0
- package/dist/code-use/index.js +6 -0
- package/dist/code-use/namespace.d.ts +5 -0
- package/dist/code-use/namespace.js +81 -0
- package/dist/code-use/notebook-export.d.ts +3 -0
- package/dist/code-use/notebook-export.js +56 -0
- package/dist/code-use/service.d.ts +24 -0
- package/dist/code-use/service.js +104 -0
- package/dist/code-use/utils.d.ts +4 -0
- package/dist/code-use/utils.js +98 -0
- package/dist/code-use/views.d.ts +108 -0
- package/dist/code-use/views.js +165 -0
- package/dist/config.d.ts +15 -0
- package/dist/config.js +109 -7
- package/dist/controller/registry/service.d.ts +10 -1
- package/dist/controller/registry/service.js +266 -10
- package/dist/controller/registry/views.d.ts +4 -1
- package/dist/controller/registry/views.js +25 -2
- package/dist/controller/service.d.ts +10 -1
- package/dist/controller/service.js +1814 -268
- package/dist/controller/views.d.ts +78 -155
- package/dist/controller/views.js +61 -12
- package/dist/dom/history-tree-processor/service.d.ts +5 -0
- package/dist/dom/history-tree-processor/service.js +169 -14
- package/dist/dom/history-tree-processor/view.d.ts +7 -1
- package/dist/dom/history-tree-processor/view.js +10 -1
- package/dist/dom/markdown-extractor.d.ts +37 -0
- package/dist/dom/markdown-extractor.js +345 -0
- package/dist/dom/service.d.ts +3 -1
- package/dist/dom/service.js +76 -0
- package/dist/dom/views.d.ts +1 -0
- package/dist/dom/views.js +45 -0
- package/dist/event-bus.d.ts +107 -7
- package/dist/event-bus.js +313 -10
- package/dist/exceptions.d.ts +0 -3
- package/dist/exceptions.js +0 -7
- package/dist/filesystem/file-system.d.ts +18 -0
- package/dist/filesystem/file-system.js +503 -42
- package/dist/index.d.ts +7 -0
- package/dist/index.js +6 -0
- package/dist/integrations/gmail/actions.d.ts +3 -3
- package/dist/integrations/gmail/actions.js +4 -4
- package/dist/llm/anthropic/chat.d.ts +18 -1
- package/dist/llm/anthropic/chat.js +123 -55
- package/dist/llm/anthropic/serializer.d.ts +2 -0
- package/dist/llm/anthropic/serializer.js +81 -9
- package/dist/llm/aws/chat-anthropic.d.ts +17 -0
- package/dist/llm/aws/chat-anthropic.js +126 -26
- package/dist/llm/aws/chat-bedrock.d.ts +28 -1
- package/dist/llm/aws/chat-bedrock.js +161 -34
- package/dist/llm/aws/serializer.d.ts +13 -1
- package/dist/llm/aws/serializer.js +56 -17
- package/dist/llm/azure/chat.d.ts +53 -2
- package/dist/llm/azure/chat.js +366 -54
- package/dist/llm/base.d.ts +2 -0
- package/dist/llm/browser-use/chat.d.ts +40 -0
- package/dist/llm/browser-use/chat.js +305 -0
- package/dist/llm/browser-use/index.d.ts +1 -0
- package/dist/llm/browser-use/index.js +1 -0
- package/dist/llm/cerebras/chat.d.ts +39 -0
- package/dist/llm/cerebras/chat.js +178 -0
- package/dist/llm/cerebras/index.d.ts +2 -0
- package/dist/llm/cerebras/index.js +2 -0
- package/dist/llm/cerebras/serializer.d.ts +7 -0
- package/dist/llm/cerebras/serializer.js +82 -0
- package/dist/llm/deepseek/chat.d.ts +19 -2
- package/dist/llm/deepseek/chat.js +138 -25
- package/dist/llm/google/chat.d.ts +46 -2
- package/dist/llm/google/chat.js +267 -64
- package/dist/llm/google/serializer.d.ts +9 -1
- package/dist/llm/google/serializer.js +141 -34
- package/dist/llm/groq/chat.d.ts +21 -2
- package/dist/llm/groq/chat.js +125 -26
- package/dist/llm/groq/parser.js +3 -1
- package/dist/llm/mistral/chat.d.ts +43 -0
- package/dist/llm/mistral/chat.js +154 -0
- package/dist/llm/mistral/index.d.ts +2 -0
- package/dist/llm/mistral/index.js +2 -0
- package/dist/llm/mistral/schema.d.ts +8 -0
- package/dist/llm/mistral/schema.js +27 -0
- package/dist/llm/models.d.ts +2 -0
- package/dist/llm/models.js +317 -0
- package/dist/llm/ollama/chat.d.ts +13 -1
- package/dist/llm/ollama/chat.js +110 -19
- package/dist/llm/ollama/serializer.d.ts +1 -0
- package/dist/llm/ollama/serializer.js +34 -12
- package/dist/llm/openai/chat.d.ts +16 -0
- package/dist/llm/openai/chat.js +94 -44
- package/dist/llm/openai/like.d.ts +5 -3
- package/dist/llm/openai/like.js +7 -3
- package/dist/llm/openai/responses-serializer.d.ts +18 -0
- package/dist/llm/openai/responses-serializer.js +72 -0
- package/dist/llm/openrouter/chat.d.ts +28 -2
- package/dist/llm/openrouter/chat.js +115 -29
- package/dist/llm/schema.d.ts +11 -1
- package/dist/llm/schema.js +109 -4
- package/dist/llm/vercel/chat.d.ts +50 -0
- package/dist/llm/vercel/chat.js +276 -0
- package/dist/llm/vercel/index.d.ts +1 -0
- package/dist/llm/vercel/index.js +1 -0
- package/dist/llm/vercel/serializer.d.ts +5 -0
- package/dist/llm/vercel/serializer.js +7 -0
- package/dist/llm/views.d.ts +2 -1
- package/dist/llm/views.js +3 -1
- package/dist/logging-config.d.ts +2 -0
- package/dist/logging-config.js +82 -29
- package/dist/mcp/client.d.ts +10 -5
- package/dist/mcp/client.js +14 -9
- package/dist/mcp/controller.d.ts +42 -3
- package/dist/mcp/controller.js +56 -31
- package/dist/mcp/server.d.ts +15 -0
- package/dist/mcp/server.js +261 -52
- package/dist/observability.js +10 -4
- package/dist/sandbox/index.d.ts +2 -0
- package/dist/sandbox/index.js +2 -0
- package/dist/sandbox/sandbox.d.ts +19 -0
- package/dist/sandbox/sandbox.js +140 -0
- package/dist/sandbox/views.d.ts +67 -0
- package/dist/sandbox/views.js +121 -0
- package/dist/skill-cli/index.d.ts +3 -0
- package/dist/skill-cli/index.js +3 -0
- package/dist/skill-cli/protocol.d.ts +30 -0
- package/dist/skill-cli/protocol.js +48 -0
- package/dist/skill-cli/server.d.ts +11 -0
- package/dist/skill-cli/server.js +85 -0
- package/dist/skill-cli/sessions.d.ts +24 -0
- package/dist/skill-cli/sessions.js +47 -0
- package/dist/skills/index.d.ts +3 -0
- package/dist/skills/index.js +3 -0
- package/dist/skills/service.d.ts +27 -0
- package/dist/skills/service.js +266 -0
- package/dist/skills/utils.d.ts +6 -0
- package/dist/skills/utils.js +53 -0
- package/dist/skills/views.d.ts +40 -0
- package/dist/skills/views.js +10 -0
- package/dist/sync/auth.js +8 -3
- package/dist/sync/service.d.ts +6 -6
- package/dist/sync/service.js +54 -89
- package/dist/telemetry/views.d.ts +20 -6
- package/dist/telemetry/views.js +23 -5
- package/dist/tokens/custom-pricing.d.ts +2 -0
- package/dist/tokens/custom-pricing.js +22 -0
- package/dist/tokens/index.d.ts +2 -0
- package/dist/tokens/index.js +2 -0
- package/dist/tokens/mappings.d.ts +1 -0
- package/dist/tokens/mappings.js +3 -0
- package/dist/tokens/service.js +27 -8
- package/dist/tools/extraction/index.d.ts +2 -0
- package/dist/tools/extraction/index.js +2 -0
- package/dist/tools/extraction/schema-utils.d.ts +6 -0
- package/dist/tools/extraction/schema-utils.js +237 -0
- package/dist/tools/extraction/views.d.ts +7 -0
- package/dist/tools/index.d.ts +5 -0
- package/dist/tools/index.js +5 -0
- package/dist/tools/registry/index.d.ts +2 -0
- package/dist/tools/registry/index.js +2 -0
- package/dist/tools/registry/service.d.ts +1 -0
- package/dist/tools/registry/service.js +1 -0
- package/dist/tools/registry/views.d.ts +1 -0
- package/dist/tools/registry/views.js +1 -0
- package/dist/tools/service.d.ts +2 -0
- package/dist/tools/service.js +1 -0
- package/dist/tools/utils.d.ts +2 -0
- package/dist/tools/utils.js +57 -0
- package/dist/tools/views.d.ts +1 -0
- package/dist/tools/views.js +1 -0
- package/dist/utils.d.ts +10 -1
- package/dist/utils.js +70 -3
- package/package.json +116 -49
- package/dist/dom/playground/process-dom.js +0 -5
- package/dist/dom/playground/test-accessibility.d.ts +0 -44
- package/dist/dom/playground/test-accessibility.js +0 -111
- /package/dist/{dom/playground/process-dom.d.ts → tools/extraction/views.js} +0 -0
package/README.md
CHANGED
|
@@ -1,228 +1,213 @@
|
|
|
1
|
-
|
|
2
|
-
|
|
3
|
-
|
|
4
|
-
|
|
5
|
-
|
|
6
|
-
|
|
7
|
-
|
|
8
|
-
>
|
|
9
|
-
>
|
|
10
|
-
|
|
11
|
-
|
|
12
|
-
|
|
13
|
-
|
|
14
|
-
|
|
15
|
-
|
|
16
|
-
|
|
17
|
-
|
|
18
|
-
|
|
19
|
-
|
|
20
|
-
|
|
21
|
-
|
|
22
|
-
|
|
23
|
-
|
|
24
|
-
|
|
25
|
-
|
|
26
|
-
|
|
27
|
-
|
|
28
|
-
|
|
29
|
-
|
|
30
|
-
|
|
31
|
-
|
|
32
|
-
|
|
33
|
-
|
|
34
|
-
|
|
35
|
-
|
|
36
|
-
|
|
37
|
-
|
|
38
|
-
### Commitment to the Original
|
|
39
|
-
|
|
40
|
-
We are committed to:
|
|
41
|
-
|
|
42
|
-
- ✅ Maintaining feature parity with the Python version whenever possible
|
|
43
|
-
- 🔄 Keeping up with upstream updates and improvements
|
|
44
|
-
- 🐛 Reporting bugs found in this port back to the original project when applicable
|
|
45
|
-
- 📚 Directing users to the original project's documentation for core concepts
|
|
46
|
-
- 🤝 Collaborating with the original authors and respecting their vision
|
|
47
|
-
|
|
48
|
-
This is **not** a fork or competing project—it's a respectful port to serve a different programming language community.
|
|
49
|
-
|
|
50
|
-
### Upstream Parity Status
|
|
51
|
-
|
|
52
|
-
This Node.js/TypeScript implementation is currently **strictly aligned** with the Python `browser-use` release
|
|
53
|
-
[`v0.5.11`](https://github.com/browser-use/browser-use/releases/tag/0.5.11), published on **August 10, 2025**.
|
|
54
|
-
|
|
55
|
-
- 📦 Core features and behavior are aligned against that upstream tag baseline.
|
|
56
|
-
- ✅ Our test strategy is maintained to be as equivalent as practical to the Python coverage and behavior checks.
|
|
57
|
-
- 🔄 We expect to move this parity baseline forward to the Python **January 2026** release line very soon.
|
|
58
|
-
|
|
59
|
-
## Features
|
|
60
|
-
|
|
61
|
-
- 🤖 **AI-Powered**: Built specifically for LLM-driven web automation with structured output support
|
|
62
|
-
- 🎯 **Type-Safe**: Full TypeScript support with comprehensive type definitions
|
|
63
|
-
- 🌐 **Multi-Browser**: Support for Chromium, Firefox, and WebKit via Playwright
|
|
64
|
-
- 🔌 **10+ LLM Providers**: OpenAI, Anthropic, Google, AWS, Azure, DeepSeek, Groq, Ollama, OpenRouter, and more
|
|
65
|
-
- 👁️ **Vision Support**: Multimodal capabilities with screenshot analysis
|
|
66
|
-
- 🛡️ **Robust**: Built-in error handling, recovery, graceful shutdown, and retry mechanisms
|
|
67
|
-
- 📊 **Observable**: Comprehensive logging, execution history, and telemetry
|
|
68
|
-
- 🔧 **Extensible**: Custom actions, MCP protocol, and plugin system
|
|
69
|
-
- 📁 **FileSystem**: Built-in file operations with PDF parsing
|
|
70
|
-
- 🔗 **Integrations**: Gmail API, Google Sheets, and MCP servers
|
|
71
|
-
|
|
72
|
-
## Quick Start
|
|
1
|
+
<p align="center">
|
|
2
|
+
<h1 align="center">🌐 Browser-Use</h1>
|
|
3
|
+
<p align="center">
|
|
4
|
+
<strong>Make websites accessible for AI agents — in TypeScript</strong>
|
|
5
|
+
</p>
|
|
6
|
+
<p align="center">
|
|
7
|
+
A TypeScript-first library for building AI-powered web agents that can autonomously browse, interact with, and extract data from the web using LLMs and Playwright.
|
|
8
|
+
</p>
|
|
9
|
+
</p>
|
|
10
|
+
|
|
11
|
+
<p align="center">
|
|
12
|
+
<a href="https://github.com/webllm/browser-use/workflows/Node%20CI"><img src="https://github.com/webllm/browser-use/workflows/Node%20CI/badge.svg" alt="Node CI"></a>
|
|
13
|
+
<a href="https://www.npmjs.com/package/browser-use"><img src="https://img.shields.io/npm/v/browser-use.svg" alt="npm"></a>
|
|
14
|
+
<a href="https://www.npmjs.com/package/browser-use"><img src="https://img.shields.io/npm/dm/browser-use.svg" alt="npm downloads"></a>
|
|
15
|
+
<img src="https://img.shields.io/npm/l/browser-use" alt="license">
|
|
16
|
+
<img src="https://img.shields.io/badge/TypeScript-first-blue" alt="TypeScript">
|
|
17
|
+
</p>
|
|
18
|
+
|
|
19
|
+
---
|
|
20
|
+
|
|
21
|
+
> **TypeScript port** of the popular Python [browser-use](https://github.com/browser-use/browser-use) library — with a native Node.js experience, full type safety, and first-class support for all major LLM providers.
|
|
22
|
+
|
|
23
|
+
## ✨ Features
|
|
24
|
+
|
|
25
|
+
- 🤖 **Autonomous Browser Control** — AI-driven navigation, clicking, typing, form filling, scrolling, and tab management
|
|
26
|
+
- 🧠 **10+ LLM Providers** — OpenAI, Anthropic, Google Gemini, Azure, AWS Bedrock, Groq, Ollama, DeepSeek, OpenRouter, Mistral, Cerebras, and custom providers
|
|
27
|
+
- 👁️ **Vision Support** — Screenshot-based understanding for visual web interactions
|
|
28
|
+
- 🔧 **45+ Built-in Actions** — Navigation, element interaction, scrolling, forms, tabs, content extraction, file I/O, and more
|
|
29
|
+
- 🧩 **Custom Actions** — Extensible registry with Zod schema validation, domain restrictions, and page filters
|
|
30
|
+
- 🔌 **MCP Server** — Model Context Protocol support for Claude Desktop and MCP-compatible clients
|
|
31
|
+
- ⌨️ **CLI Tool** — Interactive and one-shot modes for quick browser tasks
|
|
32
|
+
- 🔒 **Security First** — Sensitive data masking, domain restrictions, and Chromium sandboxing
|
|
33
|
+
- 📊 **Observability** — Event system, telemetry, performance tracing, and session recording (GIF)
|
|
34
|
+
- 🐳 **Docker Ready** — Configurable for containerized and CI/CD environments
|
|
35
|
+
|
|
36
|
+
## 🚀 Quick Start
|
|
73
37
|
|
|
74
38
|
### Installation
|
|
75
39
|
|
|
76
40
|
```bash
|
|
77
41
|
npm install browser-use
|
|
78
|
-
#
|
|
79
|
-
yarn add browser-use
|
|
80
|
-
# or
|
|
81
|
-
pnpm add browser-use
|
|
42
|
+
# Playwright browsers are installed automatically via postinstall
|
|
82
43
|
```
|
|
83
44
|
|
|
84
|
-
|
|
85
|
-
|
|
86
|
-
Use only documented public entrypoints such as `browser-use` and
|
|
87
|
-
`browser-use/llm/openai`. Avoid deep imports like `browser-use/dist/...`.
|
|
88
|
-
|
|
89
|
-
### Basic Usage with Agent
|
|
90
|
-
|
|
91
|
-
```typescript
|
|
92
|
-
import { Agent } from 'browser-use';
|
|
93
|
-
import { ChatOpenAI } from 'browser-use/llm/openai';
|
|
94
|
-
|
|
95
|
-
async function main() {
|
|
96
|
-
const llm = new ChatOpenAI({
|
|
97
|
-
model: 'gpt-4',
|
|
98
|
-
apiKey: process.env.OPENAI_API_KEY,
|
|
99
|
-
});
|
|
100
|
-
|
|
101
|
-
const agent = new Agent({
|
|
102
|
-
task: 'Go to google.com and search for "TypeScript browser automation"',
|
|
103
|
-
llm,
|
|
104
|
-
});
|
|
105
|
-
|
|
106
|
-
const history = await agent.run();
|
|
107
|
-
|
|
108
|
-
console.log(`Task completed in ${history.history.length} steps`);
|
|
109
|
-
|
|
110
|
-
// Access the browser session
|
|
111
|
-
const browserSession = agent.browser_session;
|
|
112
|
-
const currentPage = await browserSession.get_current_page();
|
|
113
|
-
console.log('Final URL:', currentPage?.url());
|
|
114
|
-
}
|
|
45
|
+
### Set Up Your API Key
|
|
115
46
|
|
|
116
|
-
|
|
47
|
+
```bash
|
|
48
|
+
export OPENAI_API_KEY=sk-your-api-key
|
|
49
|
+
# or ANTHROPIC_API_KEY, GOOGLE_API_KEY, etc.
|
|
117
50
|
```
|
|
118
51
|
|
|
119
|
-
###
|
|
120
|
-
|
|
121
|
-
Use `Controller` to register domain-specific actions, then pass it into `Agent`:
|
|
52
|
+
### Run Your First Agent
|
|
122
53
|
|
|
123
54
|
```typescript
|
|
124
|
-
import { Agent
|
|
55
|
+
import { Agent } from 'browser-use';
|
|
125
56
|
import { ChatOpenAI } from 'browser-use/llm/openai';
|
|
126
|
-
import { z } from 'zod';
|
|
127
|
-
|
|
128
|
-
const controller = new Controller();
|
|
129
|
-
|
|
130
|
-
controller.registry.action('Extract product info from the current page', {
|
|
131
|
-
param_model: z.object({
|
|
132
|
-
include_price: z.boolean().default(true),
|
|
133
|
-
include_reviews: z.boolean().default(false),
|
|
134
|
-
}),
|
|
135
|
-
})(async function extract_product_info(params, { page }) {
|
|
136
|
-
const productData = await page.evaluate(() => ({
|
|
137
|
-
title: document.querySelector('h1')?.textContent ?? null,
|
|
138
|
-
price: document.querySelector('.price')?.textContent ?? null,
|
|
139
|
-
}));
|
|
140
|
-
|
|
141
|
-
return new ActionResult({
|
|
142
|
-
extracted_content: JSON.stringify({ ...productData, ...params }),
|
|
143
|
-
include_in_memory: true,
|
|
144
|
-
});
|
|
145
|
-
});
|
|
146
57
|
|
|
147
58
|
const agent = new Agent({
|
|
148
|
-
task: '
|
|
59
|
+
task: 'Go to google.com and search for "TypeScript tutorials"',
|
|
149
60
|
llm: new ChatOpenAI({
|
|
150
61
|
model: 'gpt-4o',
|
|
151
62
|
apiKey: process.env.OPENAI_API_KEY,
|
|
152
63
|
}),
|
|
153
|
-
controller,
|
|
154
64
|
});
|
|
155
65
|
|
|
156
|
-
const history = await agent.run(
|
|
157
|
-
console.log(history.final_result());
|
|
66
|
+
const history = await agent.run();
|
|
67
|
+
console.log('Result:', history.final_result());
|
|
68
|
+
console.log('Success:', history.is_successful());
|
|
158
69
|
```
|
|
159
70
|
|
|
160
|
-
|
|
71
|
+
```bash
|
|
72
|
+
npx tsx example.ts
|
|
73
|
+
```
|
|
74
|
+
|
|
75
|
+
### Use the CLI
|
|
161
76
|
|
|
162
77
|
```bash
|
|
163
|
-
# Interactive mode
|
|
78
|
+
# Interactive mode
|
|
164
79
|
npx browser-use
|
|
165
80
|
|
|
166
81
|
# One-shot task
|
|
167
|
-
npx browser-use
|
|
168
|
-
|
|
169
|
-
# Positional task mode
|
|
170
|
-
npx browser-use "Search for TypeScript browser automation"
|
|
171
|
-
|
|
172
|
-
# Pick model/provider by model name
|
|
173
|
-
npx browser-use --model claude-sonnet-4-20250514 -p "Summarize latest AI news"
|
|
174
|
-
|
|
175
|
-
# Pick provider explicitly (uses provider default model)
|
|
176
|
-
npx browser-use --provider anthropic -p "Summarize latest AI news"
|
|
82
|
+
npx browser-use "Go to example.com and extract the page title"
|
|
177
83
|
|
|
178
|
-
#
|
|
179
|
-
npx browser-use --
|
|
84
|
+
# With specific model
|
|
85
|
+
npx browser-use --model claude-sonnet-4-20250514 -p "Search for AI news"
|
|
180
86
|
|
|
181
|
-
#
|
|
182
|
-
npx browser-use --
|
|
183
|
-
|
|
184
|
-
# Connect to existing Chromium via CDP
|
|
185
|
-
npx browser-use --cdp-url http://localhost:9222 -p "Inspect the active tab"
|
|
87
|
+
# Headless mode
|
|
88
|
+
npx browser-use --headless -p "Check the weather"
|
|
186
89
|
|
|
187
90
|
# MCP server mode
|
|
188
91
|
npx browser-use --mcp
|
|
189
92
|
```
|
|
190
93
|
|
|
191
|
-
|
|
94
|
+
## 🏗️ Architecture
|
|
95
|
+
|
|
96
|
+
```
|
|
97
|
+
┌─────────────────────────────────────────────────────┐
|
|
98
|
+
│ Browser-Use │
|
|
99
|
+
├─────────────────────────────────────────────────────┤
|
|
100
|
+
│ Agent ← MessageManager ← LLM Providers │
|
|
101
|
+
│ ↓ │
|
|
102
|
+
│ Controller → Action Registry → BrowserSession │
|
|
103
|
+
│ ↓ │
|
|
104
|
+
│ DomService │
|
|
105
|
+
└─────────────────────────────────────────────────────┘
|
|
106
|
+
```
|
|
107
|
+
|
|
108
|
+
| Component | Description |
|
|
109
|
+
| ------------------ | ---------------------------------------------------------------------- |
|
|
110
|
+
| **Agent** | Central orchestrator — runs the observe → think → act loop |
|
|
111
|
+
| **Controller** | Manages action registration and execution via Registry |
|
|
112
|
+
| **BrowserSession** | Playwright wrapper — browser lifecycle, tab management, screenshots |
|
|
113
|
+
| **DomService** | Extracts interactive elements with indexed mapping for LLM consumption |
|
|
114
|
+
| **MessageManager** | Manages LLM conversation history with token optimization |
|
|
115
|
+
| **LLM Providers** | Unified `BaseChatModel` interface across 10+ providers |
|
|
116
|
+
|
|
117
|
+
### How It Works
|
|
118
|
+
|
|
119
|
+
1. **Agent** receives a natural language task
|
|
120
|
+
2. **DomService** extracts the current page state (interactive elements + optional screenshot)
|
|
121
|
+
3. **LLM** analyzes the state and returns actions to take
|
|
122
|
+
4. **Controller** validates and executes actions through the **Registry**
|
|
123
|
+
5. Results feed back to the LLM for the next step
|
|
124
|
+
6. Loop continues until `done` action or `max_steps`
|
|
125
|
+
|
|
126
|
+
## 🔌 LLM Providers
|
|
127
|
+
|
|
128
|
+
| Provider | Import | Vision | Notes |
|
|
129
|
+
| ----------------- | ---------------------------- | ------ | --------------------------------------------- |
|
|
130
|
+
| **OpenAI** | `browser-use/llm/openai` | ✅ | Default provider, reasoning models (o1/o3/o4) |
|
|
131
|
+
| **Anthropic** | `browser-use/llm/anthropic` | ✅ | Prompt caching support |
|
|
132
|
+
| **Google Gemini** | `browser-use/llm/google` | ✅ | Extended thinking support |
|
|
133
|
+
| **Azure OpenAI** | `browser-use/llm/azure` | ✅ | Enterprise deployment |
|
|
134
|
+
| **AWS Bedrock** | `browser-use/llm/aws` | ✅ | Claude via AWS |
|
|
135
|
+
| **Groq** | `browser-use/llm/groq` | ❌ | Fastest inference |
|
|
136
|
+
| **Ollama** | `browser-use/llm/ollama` | ❌ | Local/self-hosted models |
|
|
137
|
+
| **DeepSeek** | `browser-use/llm/deepseek` | ❌ | Cost-effective |
|
|
138
|
+
| **OpenRouter** | `browser-use/llm/openrouter` | Varies | Multi-model routing |
|
|
139
|
+
| **Mistral** | `browser-use/llm/mistral` | Varies | Mistral models |
|
|
140
|
+
| **Cerebras** | `browser-use/llm/cerebras` | ❌ | Fast inference |
|
|
141
|
+
|
|
142
|
+
<details>
|
|
143
|
+
<summary>Provider examples</summary>
|
|
192
144
|
|
|
193
|
-
|
|
194
|
-
|
|
145
|
+
```typescript
|
|
146
|
+
// OpenAI
|
|
147
|
+
import { ChatOpenAI } from 'browser-use/llm/openai';
|
|
148
|
+
const llm = new ChatOpenAI({
|
|
149
|
+
model: 'gpt-4o',
|
|
150
|
+
apiKey: process.env.OPENAI_API_KEY,
|
|
151
|
+
});
|
|
195
152
|
|
|
196
|
-
|
|
153
|
+
// Anthropic
|
|
154
|
+
import { ChatAnthropic } from 'browser-use/llm/anthropic';
|
|
155
|
+
const llm = new ChatAnthropic({
|
|
156
|
+
model: 'claude-sonnet-4-20250514',
|
|
157
|
+
apiKey: process.env.ANTHROPIC_API_KEY,
|
|
158
|
+
});
|
|
197
159
|
|
|
198
|
-
|
|
199
|
-
|
|
160
|
+
// Google Gemini
|
|
161
|
+
import { ChatGoogle } from 'browser-use/llm/google';
|
|
162
|
+
const llm = new ChatGoogle('gemini-2.5-flash');
|
|
200
163
|
|
|
201
|
-
|
|
164
|
+
// Ollama (local)
|
|
165
|
+
import { ChatOllama } from 'browser-use/llm/ollama';
|
|
166
|
+
const llm = new ChatOllama('llama3', 'http://localhost:11434');
|
|
202
167
|
|
|
203
|
-
|
|
168
|
+
// OpenAI Reasoning Models
|
|
169
|
+
const llm = new ChatOpenAI({ model: 'o3-mini', reasoningEffort: 'medium' });
|
|
170
|
+
```
|
|
204
171
|
|
|
205
|
-
|
|
172
|
+
</details>
|
|
206
173
|
|
|
207
|
-
|
|
208
|
-
import { Agent } from 'browser-use';
|
|
209
|
-
import { ChatGoogle } from 'browser-use/llm/google';
|
|
174
|
+
## 🎯 Code Examples
|
|
210
175
|
|
|
211
|
-
|
|
176
|
+
### Data Extraction
|
|
212
177
|
|
|
178
|
+
```typescript
|
|
213
179
|
const agent = new Agent({
|
|
214
|
-
task:
|
|
180
|
+
task: `Go to amazon.com, search for "wireless keyboard",
|
|
181
|
+
extract the name, price, and rating of the first 5 products as JSON`,
|
|
215
182
|
llm,
|
|
216
183
|
use_vision: true,
|
|
217
|
-
vision_detail_level: 'high', // 'auto' | 'low' | 'high'
|
|
218
184
|
});
|
|
219
185
|
|
|
220
|
-
const history = await agent.run(
|
|
186
|
+
const history = await agent.run(30);
|
|
187
|
+
console.log(history.final_result());
|
|
221
188
|
```
|
|
222
189
|
|
|
223
|
-
###
|
|
190
|
+
### Form Filling with Sensitive Data
|
|
191
|
+
|
|
192
|
+
```typescript
|
|
193
|
+
const agent = new Agent({
|
|
194
|
+
task: 'Login to the dashboard',
|
|
195
|
+
llm,
|
|
196
|
+
sensitive_data: {
|
|
197
|
+
'*.example.com': {
|
|
198
|
+
username: process.env.SITE_USERNAME!,
|
|
199
|
+
password: process.env.SITE_PASSWORD!,
|
|
200
|
+
},
|
|
201
|
+
},
|
|
202
|
+
browser_session: new BrowserSession({
|
|
203
|
+
browser_profile: new BrowserProfile({
|
|
204
|
+
allowed_domains: ['*.example.com'],
|
|
205
|
+
}),
|
|
206
|
+
}),
|
|
207
|
+
});
|
|
208
|
+
```
|
|
224
209
|
|
|
225
|
-
|
|
210
|
+
### Custom Actions
|
|
226
211
|
|
|
227
212
|
```typescript
|
|
228
213
|
import { Controller, ActionResult } from 'browser-use';
|
|
@@ -230,588 +215,212 @@ import { z } from 'zod';
|
|
|
230
215
|
|
|
231
216
|
const controller = new Controller();
|
|
232
217
|
|
|
233
|
-
controller.registry.action('
|
|
218
|
+
controller.registry.action('Save screenshot to file', {
|
|
234
219
|
param_model: z.object({
|
|
235
|
-
|
|
236
|
-
include_reviews: z.boolean().default(false),
|
|
220
|
+
filename: z.string().describe('Output filename'),
|
|
237
221
|
}),
|
|
238
|
-
})(async function
|
|
239
|
-
const
|
|
240
|
-
|
|
241
|
-
price: document.querySelector('.price')?.textContent ?? null,
|
|
242
|
-
}));
|
|
243
|
-
|
|
222
|
+
})(async function save_screenshot(params, ctx) {
|
|
223
|
+
const screenshot = await ctx.page.screenshot();
|
|
224
|
+
fs.writeFileSync(`./screenshots/${params.filename}`, screenshot);
|
|
244
225
|
return new ActionResult({
|
|
245
|
-
extracted_content:
|
|
246
|
-
include_in_memory: true,
|
|
226
|
+
extracted_content: `Screenshot saved as ${params.filename}`,
|
|
247
227
|
});
|
|
248
228
|
});
|
|
249
|
-
```
|
|
250
229
|
|
|
251
|
-
|
|
252
|
-
|
|
253
|
-
Built-in file system support with PDF parsing:
|
|
254
|
-
|
|
255
|
-
```typescript
|
|
256
|
-
import { Agent } from 'browser-use';
|
|
257
|
-
import { ChatOpenAI } from 'browser-use/llm/openai';
|
|
258
|
-
|
|
259
|
-
const agent = new Agent({
|
|
260
|
-
task: 'Download the PDF and extract text from page 1',
|
|
261
|
-
llm: new ChatOpenAI(),
|
|
262
|
-
file_system_path: './agent-workspace',
|
|
263
|
-
});
|
|
264
|
-
|
|
265
|
-
// FileSystem actions are available:
|
|
266
|
-
// - read_file: Read file contents (supports PDF)
|
|
267
|
-
// - write_file: Write content to file
|
|
268
|
-
// - replace_file_str: Replace text in file
|
|
230
|
+
const agent = new Agent({ task: '...', llm, controller });
|
|
269
231
|
```
|
|
270
232
|
|
|
271
|
-
###
|
|
272
|
-
|
|
273
|
-
Customize browser behavior with profiles:
|
|
233
|
+
### Vision Mode & Session Recording
|
|
274
234
|
|
|
275
235
|
```typescript
|
|
276
|
-
|
|
277
|
-
|
|
278
|
-
|
|
279
|
-
|
|
280
|
-
|
|
281
|
-
|
|
282
|
-
chromium_sandbox: true, // Keep enabled by default in production
|
|
283
|
-
args: ['--disable-blink-features=AutomationControlled'],
|
|
284
|
-
wait_for_network_idle_page_load_time: 3, // seconds
|
|
285
|
-
allowed_domains: ['example.com', '*.google.com'],
|
|
286
|
-
cookies_file: './cookies.json',
|
|
287
|
-
downloads_path: './downloads',
|
|
288
|
-
highlight_elements: false, // Visual debugging
|
|
289
|
-
viewport_expansion: 0, // Expand viewport for element detection
|
|
290
|
-
});
|
|
291
|
-
|
|
292
|
-
const browserSession = new BrowserSession({
|
|
293
|
-
browser_profile: profile,
|
|
236
|
+
const agent = new Agent({
|
|
237
|
+
task: 'Navigate to hacker news and summarize the top stories',
|
|
238
|
+
llm,
|
|
239
|
+
use_vision: true,
|
|
240
|
+
vision_detail_level: 'high', // 'auto' | 'low' | 'high'
|
|
241
|
+
generate_gif: './session.gif',
|
|
294
242
|
});
|
|
295
|
-
|
|
296
|
-
await browserSession.start();
|
|
297
243
|
```
|
|
298
244
|
|
|
299
|
-
|
|
300
|
-
`BrowserSession` automatically retries once with `chromium_sandbox: false` and logs
|
|
301
|
-
a warning. For deterministic CI behavior, set `chromium_sandbox: false` explicitly.
|
|
302
|
-
|
|
303
|
-
### MCP (Model Context Protocol) Integration
|
|
304
|
-
|
|
305
|
-
Connect to MCP servers for extended capabilities:
|
|
245
|
+
### Multi-Tab Workflows
|
|
306
246
|
|
|
307
247
|
```typescript
|
|
308
|
-
import { MCPController } from 'browser-use';
|
|
309
|
-
|
|
310
|
-
const mcpController = new MCPController();
|
|
311
|
-
|
|
312
|
-
// Add MCP server
|
|
313
|
-
await mcpController.addServer('my-server', 'npx', [
|
|
314
|
-
'-y',
|
|
315
|
-
'@modelcontextprotocol/server-filesystem',
|
|
316
|
-
'/path/to/data',
|
|
317
|
-
]);
|
|
318
|
-
|
|
319
|
-
// MCP tools are automatically available to the agent
|
|
320
|
-
const tools = await mcpController.listAllTools();
|
|
321
|
-
console.log('Available MCP tools:', tools);
|
|
322
|
-
```
|
|
323
|
-
|
|
324
|
-
### Gmail Integration
|
|
325
|
-
|
|
326
|
-
Built-in Gmail API support:
|
|
327
|
-
|
|
328
|
-
```typescript
|
|
329
|
-
import { GmailService } from 'browser-use';
|
|
330
|
-
|
|
331
|
-
// Gmail actions are automatically available:
|
|
332
|
-
// - get_recent_emails: Fetch recent emails
|
|
333
|
-
// - send_email: Send email via Gmail API
|
|
334
|
-
|
|
335
248
|
const agent = new Agent({
|
|
336
|
-
task:
|
|
337
|
-
|
|
338
|
-
|
|
249
|
+
task: `Compare "Sony WH-1000XM5" prices:
|
|
250
|
+
1. Open amazon.com and search for the product
|
|
251
|
+
2. Open bestbuy.com in a new tab and search
|
|
252
|
+
3. Provide a comparison summary`,
|
|
253
|
+
llm,
|
|
254
|
+
use_vision: true,
|
|
339
255
|
});
|
|
340
256
|
```
|
|
341
257
|
|
|
342
|
-
|
|
343
|
-
|
|
344
|
-
### Environment Variables
|
|
345
|
-
|
|
346
|
-
```bash
|
|
347
|
-
# LLM Configuration (provider-specific)
|
|
348
|
-
OPENAI_API_KEY=your-openai-key
|
|
349
|
-
ANTHROPIC_API_KEY=your-anthropic-key
|
|
350
|
-
GOOGLE_API_KEY=your-google-key
|
|
351
|
-
AWS_ACCESS_KEY_ID=your-aws-key
|
|
352
|
-
AWS_SECRET_ACCESS_KEY=your-aws-secret
|
|
353
|
-
AZURE_OPENAI_API_KEY=your-azure-key
|
|
354
|
-
AZURE_OPENAI_ENDPOINT=your-azure-endpoint
|
|
355
|
-
GROQ_API_KEY=your-groq-key
|
|
356
|
-
DEEPSEEK_API_KEY=your-deepseek-key
|
|
357
|
-
|
|
358
|
-
# Browser Configuration
|
|
359
|
-
BROWSER_USE_HEADLESS=true
|
|
360
|
-
BROWSER_USE_ALLOWED_DOMAINS=example.com,*.trusted.org
|
|
361
|
-
IN_DOCKER=true
|
|
362
|
-
|
|
363
|
-
# Logging Configuration
|
|
364
|
-
BROWSER_USE_LOGGING_LEVEL=info # debug, info, warning, error
|
|
365
|
-
|
|
366
|
-
# Telemetry (optional)
|
|
367
|
-
ANONYMIZED_TELEMETRY=false
|
|
368
|
-
|
|
369
|
-
# Observability (optional)
|
|
370
|
-
LMNR_API_KEY=your-lmnr-key
|
|
371
|
-
```
|
|
372
|
-
|
|
373
|
-
### Agent Configuration
|
|
374
|
-
|
|
375
|
-
```typescript
|
|
376
|
-
interface AgentOptions {
|
|
377
|
-
// Vision/multimodal
|
|
378
|
-
use_vision?: boolean;
|
|
379
|
-
vision_detail_level?: 'low' | 'high' | 'auto';
|
|
380
|
-
|
|
381
|
-
// Error handling
|
|
382
|
-
max_failures?: number; // default: 3
|
|
383
|
-
retry_delay?: number; // seconds, default: 10
|
|
384
|
-
max_actions_per_step?: number; // default: 10
|
|
385
|
-
|
|
386
|
-
// Persistence / output
|
|
387
|
-
save_conversation_path?: string | null;
|
|
388
|
-
file_system_path?: string | null;
|
|
389
|
-
validate_output?: boolean;
|
|
390
|
-
include_attributes?: string[];
|
|
391
|
-
|
|
392
|
-
// Runtime limits (seconds)
|
|
393
|
-
llm_timeout?: number; // default: 60
|
|
394
|
-
step_timeout?: number; // default: 180
|
|
395
|
-
}
|
|
396
|
-
|
|
397
|
-
// Max step count is configured per run call:
|
|
398
|
-
await agent.run(100);
|
|
399
|
-
```
|
|
400
|
-
|
|
401
|
-
## Supported LLM Providers
|
|
402
|
-
|
|
403
|
-
### OpenAI
|
|
258
|
+
### Event System
|
|
404
259
|
|
|
405
260
|
```typescript
|
|
406
|
-
|
|
261
|
+
const agent = new Agent({ task: '...', llm });
|
|
407
262
|
|
|
408
|
-
|
|
409
|
-
|
|
410
|
-
apiKey: process.env.OPENAI_API_KEY,
|
|
411
|
-
temperature: 0.1,
|
|
412
|
-
maxTokens: 4096,
|
|
263
|
+
agent.eventbus.on('CreateAgentStepEvent', (event) => {
|
|
264
|
+
console.log('Step completed:', event.step_id);
|
|
413
265
|
});
|
|
414
|
-
```
|
|
415
266
|
|
|
416
|
-
|
|
417
|
-
|
|
418
|
-
```typescript
|
|
419
|
-
import { ChatAnthropic } from 'browser-use/llm/anthropic';
|
|
420
|
-
|
|
421
|
-
const llm = new ChatAnthropic({
|
|
422
|
-
model: 'claude-3-5-sonnet-20241022', // or other Claude models
|
|
423
|
-
apiKey: process.env.ANTHROPIC_API_KEY,
|
|
424
|
-
temperature: 0.1,
|
|
425
|
-
});
|
|
267
|
+
await agent.run();
|
|
426
268
|
```
|
|
427
269
|
|
|
428
|
-
|
|
429
|
-
|
|
430
|
-
```typescript
|
|
431
|
-
import { ChatGoogle } from 'browser-use/llm/google';
|
|
432
|
-
|
|
433
|
-
const llm = new ChatGoogle('gemini-2.5-flash');
|
|
434
|
-
// Configure GOOGLE_API_KEY in env. Optional:
|
|
435
|
-
// GOOGLE_API_BASE_URL / GOOGLE_API_VERSION
|
|
436
|
-
```
|
|
270
|
+
## ⚙️ Configuration
|
|
437
271
|
|
|
438
|
-
###
|
|
272
|
+
### Agent Options
|
|
439
273
|
|
|
440
274
|
```typescript
|
|
441
|
-
|
|
442
|
-
|
|
443
|
-
|
|
444
|
-
|
|
445
|
-
|
|
446
|
-
|
|
275
|
+
const agent = new Agent({
|
|
276
|
+
task: 'Your task',
|
|
277
|
+
llm,
|
|
278
|
+
use_vision: true, // Enable screenshot analysis
|
|
279
|
+
max_actions_per_step: 5, // Actions per LLM call
|
|
280
|
+
max_failures: 3, // Max retries on failure
|
|
281
|
+
generate_gif: './recording.gif', // Session recording
|
|
282
|
+
validate_output: true, // Strict output validation
|
|
283
|
+
use_thinking: true, // Extended thinking prompts
|
|
284
|
+
llm_timeout: 60, // LLM call timeout (seconds)
|
|
285
|
+
step_timeout: 180, // Step timeout (seconds)
|
|
286
|
+
extend_system_message: 'Be concise', // Custom prompt additions
|
|
447
287
|
});
|
|
448
|
-
```
|
|
449
288
|
|
|
450
|
-
|
|
451
|
-
|
|
452
|
-
```typescript
|
|
453
|
-
import { ChatAzure } from 'browser-use/llm/azure';
|
|
454
|
-
|
|
455
|
-
const llm = new ChatAzure('gpt-4o');
|
|
456
|
-
// Configure AZURE_OPENAI_API_KEY, AZURE_OPENAI_ENDPOINT, AZURE_OPENAI_API_VERSION in env.
|
|
457
|
-
```
|
|
458
|
-
|
|
459
|
-
### DeepSeek
|
|
460
|
-
|
|
461
|
-
```typescript
|
|
462
|
-
import { ChatDeepSeek } from 'browser-use/llm/deepseek';
|
|
463
|
-
|
|
464
|
-
const llm = new ChatDeepSeek('deepseek-chat');
|
|
289
|
+
const history = await agent.run(50); // Max 50 steps
|
|
465
290
|
```
|
|
466
291
|
|
|
467
|
-
###
|
|
292
|
+
### Browser Profile
|
|
468
293
|
|
|
469
294
|
```typescript
|
|
470
|
-
import {
|
|
471
|
-
|
|
472
|
-
const llm = new ChatGroq('mixtral-8x7b-32768');
|
|
473
|
-
```
|
|
474
|
-
|
|
475
|
-
### Ollama (Local)
|
|
476
|
-
|
|
477
|
-
```typescript
|
|
478
|
-
import { ChatOllama } from 'browser-use/llm/ollama';
|
|
479
|
-
|
|
480
|
-
const llm = new ChatOllama('llama3.1', 'http://localhost:11434');
|
|
481
|
-
```
|
|
482
|
-
|
|
483
|
-
### OpenRouter
|
|
295
|
+
import { BrowserProfile, BrowserSession } from 'browser-use';
|
|
484
296
|
|
|
485
|
-
|
|
486
|
-
|
|
297
|
+
const profile = new BrowserProfile({
|
|
298
|
+
headless: true,
|
|
299
|
+
viewport: { width: 1920, height: 1080 },
|
|
300
|
+
user_data_dir: './my-profile', // Persistent sessions
|
|
301
|
+
allowed_domains: ['*.example.com'], // Domain restrictions
|
|
302
|
+
highlight_elements: true, // Visual debugging
|
|
303
|
+
proxy: { server: 'http://proxy:8080' },
|
|
304
|
+
});
|
|
487
305
|
|
|
488
|
-
const
|
|
306
|
+
const session = new BrowserSession({ browser_profile: profile });
|
|
307
|
+
const agent = new Agent({ task: '...', llm, browser_session: session });
|
|
489
308
|
```
|
|
490
309
|
|
|
491
|
-
|
|
492
|
-
|
|
493
|
-
The AI agent can perform these actions:
|
|
494
|
-
|
|
495
|
-
### Navigation
|
|
496
|
-
|
|
497
|
-
- **search_google** - Search query in Google (web results only)
|
|
498
|
-
- **go_to_url** - Navigate to a specific URL (with optional new tab)
|
|
499
|
-
|
|
500
|
-
### Element Interaction
|
|
501
|
-
|
|
502
|
-
- **click_element** - Click buttons, links, or clickable elements by index
|
|
503
|
-
- **input_text** - Type text into input fields and textareas by index
|
|
504
|
-
|
|
505
|
-
### Dropdown/Select
|
|
506
|
-
|
|
507
|
-
- **dropdown_options** - Get available options from a dropdown
|
|
508
|
-
- **select_dropdown** - Select option from dropdown by index
|
|
509
|
-
|
|
510
|
-
### Scrolling
|
|
511
|
-
|
|
512
|
-
- **scroll** - Scroll page up/down by pixels or direction
|
|
513
|
-
- **scroll_to_text** - Scroll to text content on page
|
|
514
|
-
|
|
515
|
-
### Tabs
|
|
516
|
-
|
|
517
|
-
- **switch_tab** - Switch to different browser tab by index
|
|
518
|
-
- **close_tab** - Close current or specific tab
|
|
519
|
-
|
|
520
|
-
### Keyboard
|
|
521
|
-
|
|
522
|
-
- **send_keys** - Send keyboard input (Enter, Tab, Escape, etc.)
|
|
523
|
-
|
|
524
|
-
### Content Extraction
|
|
525
|
-
|
|
526
|
-
- **extract_structured_data** - Extract specific data using LLM from page markdown
|
|
527
|
-
|
|
528
|
-
### FileSystem
|
|
529
|
-
|
|
530
|
-
- **read_file** - Read file contents (supports PDF parsing)
|
|
531
|
-
- **write_file** - Write content to file
|
|
532
|
-
- **replace_file_str** - Replace string in file
|
|
533
|
-
|
|
534
|
-
### Google Sheets
|
|
535
|
-
|
|
536
|
-
- **sheets_range** - Get cell range from Google Sheet
|
|
537
|
-
- **sheets_update** - Update Google Sheet cells
|
|
538
|
-
- **sheets_input** - Input data into Google Sheet
|
|
539
|
-
|
|
540
|
-
### Gmail
|
|
541
|
-
|
|
542
|
-
- **get_recent_emails** - Fetch recent emails from Gmail
|
|
543
|
-
- **send_email** - Send email via Gmail API
|
|
544
|
-
|
|
545
|
-
### Completion
|
|
546
|
-
|
|
547
|
-
- **done** - Mark task as completed with optional structured output
|
|
310
|
+
### Environment Variables
|
|
548
311
|
|
|
549
|
-
|
|
312
|
+
| Variable | Description |
|
|
313
|
+
| ----------------------------- | ---------------------------------------------- |
|
|
314
|
+
| `OPENAI_API_KEY` | OpenAI API key |
|
|
315
|
+
| `ANTHROPIC_API_KEY` | Anthropic API key |
|
|
316
|
+
| `GOOGLE_API_KEY` | Google API key |
|
|
317
|
+
| `BROWSER_USE_HEADLESS` | Run browser headlessly (`true`/`false`) |
|
|
318
|
+
| `BROWSER_USE_LOGGING_LEVEL` | Log level: `debug`, `info`, `warning`, `error` |
|
|
319
|
+
| `BROWSER_USE_ALLOWED_DOMAINS` | Comma-separated domain allowlist |
|
|
320
|
+
| `ANONYMIZED_TELEMETRY` | Enable/disable anonymous telemetry |
|
|
550
321
|
|
|
551
|
-
See
|
|
322
|
+
> See [Configuration Guide](./docs/CONFIGURATION.md) for the full list.
|
|
552
323
|
|
|
553
|
-
|
|
554
|
-
- `examples/search-wikipedia.ts` - Wikipedia navigation with vision
|
|
555
|
-
- `examples/test-vision.ts` - Vision/multimodal capabilities demo
|
|
556
|
-
- `examples/test-filesystem.ts` - File operations and PDF parsing
|
|
557
|
-
- `examples/openapi.ts` - Complex API documentation extraction
|
|
324
|
+
## 🔌 MCP Server (Claude Desktop)
|
|
558
325
|
|
|
559
|
-
|
|
326
|
+
Browser-Use can run as an [MCP](https://modelcontextprotocol.io/) server, exposing browser automation as tools for Claude Desktop:
|
|
560
327
|
|
|
561
328
|
```bash
|
|
562
|
-
|
|
563
|
-
export OPENAI_API_KEY=your-key
|
|
564
|
-
# or for Google
|
|
565
|
-
export GOOGLE_API_KEY=your-key
|
|
566
|
-
|
|
567
|
-
# Run an example
|
|
568
|
-
npx tsx examples/simple-search.ts
|
|
329
|
+
npx browser-use --mcp
|
|
569
330
|
```
|
|
570
331
|
|
|
571
|
-
|
|
572
|
-
|
|
573
|
-
|
|
574
|
-
|
|
575
|
-
|
|
576
|
-
|
|
577
|
-
|
|
578
|
-
|
|
579
|
-
|
|
580
|
-
|
|
581
|
-
|
|
582
|
-
|
|
583
|
-
const lastStep = history.history[history.history.length - 1];
|
|
584
|
-
if (lastStep?.result.is_done) {
|
|
585
|
-
console.log('Task completed:', lastStep.result.extracted_content);
|
|
586
|
-
} else {
|
|
587
|
-
console.log('Task incomplete after max steps');
|
|
588
|
-
}
|
|
589
|
-
} catch (error) {
|
|
590
|
-
if (error instanceof AgentError) {
|
|
591
|
-
console.error('Agent error:', error.message);
|
|
592
|
-
console.error('Failed at step:', error.step);
|
|
593
|
-
} else {
|
|
594
|
-
console.error('Unexpected error:', error);
|
|
332
|
+
Add to your Claude Desktop config (`~/Library/Application Support/Claude/claude_desktop_config.json`):
|
|
333
|
+
|
|
334
|
+
```json
|
|
335
|
+
{
|
|
336
|
+
"mcpServers": {
|
|
337
|
+
"browser-use": {
|
|
338
|
+
"command": "npx",
|
|
339
|
+
"args": ["browser-use", "--mcp"],
|
|
340
|
+
"env": {
|
|
341
|
+
"OPENAI_API_KEY": "your-api-key"
|
|
342
|
+
}
|
|
343
|
+
}
|
|
595
344
|
}
|
|
596
345
|
}
|
|
597
346
|
```
|
|
598
347
|
|
|
599
|
-
|
|
600
|
-
|
|
601
|
-
### Building from Source
|
|
602
|
-
|
|
603
|
-
```bash
|
|
604
|
-
git clone https://github.com/webllm/browser-use.git
|
|
605
|
-
cd browser-use
|
|
606
|
-
yarn install # Automatically installs Playwright browsers
|
|
607
|
-
yarn build
|
|
608
|
-
```
|
|
609
|
-
|
|
610
|
-
### Running Tests
|
|
611
|
-
|
|
612
|
-
```bash
|
|
613
|
-
# Run all tests
|
|
614
|
-
yarn test
|
|
615
|
-
|
|
616
|
-
# Run specific test
|
|
617
|
-
yarn test test/integration-advanced.test.ts
|
|
618
|
-
|
|
619
|
-
# Watch mode
|
|
620
|
-
yarn test:watch
|
|
621
|
-
|
|
622
|
-
# Validate published package exports
|
|
623
|
-
yarn test:pack
|
|
624
|
-
```
|
|
625
|
-
|
|
626
|
-
### Code Quality
|
|
348
|
+
Available MCP tools: `browser_run_task`, `browser_navigate`, `browser_click`, `browser_type`, `browser_scroll`, `browser_get_state`, `browser_extract`, `browser_screenshot`, `browser_close`.
|
|
627
349
|
|
|
628
|
-
|
|
629
|
-
# Lint
|
|
630
|
-
yarn lint
|
|
631
|
-
|
|
632
|
-
# Format
|
|
633
|
-
yarn prettier
|
|
634
|
-
|
|
635
|
-
# Type check
|
|
636
|
-
yarn typecheck
|
|
637
|
-
```
|
|
350
|
+
> See [MCP Server Guide](./docs/MCP_SERVER.md) for more details.
|
|
638
351
|
|
|
639
|
-
##
|
|
352
|
+
## 🔒 Security
|
|
640
353
|
|
|
641
|
-
|
|
642
|
-
|
|
643
|
-
|
|
644
|
-
|
|
645
|
-
|
|
646
|
-
│ - Task execution & planning │
|
|
647
|
-
│ - LLM message management │
|
|
648
|
-
│ - Step execution loop │
|
|
649
|
-
└─────────┬────────────────────────────────┘
|
|
650
|
-
│
|
|
651
|
-
┌─────────▼────────────────────────────────┐
|
|
652
|
-
│ Controller (Actions) │
|
|
653
|
-
│ - Action registry & execution │
|
|
654
|
-
│ - Built-in actions (30+) │
|
|
655
|
-
│ - Custom action support │
|
|
656
|
-
└─────────┬────────────────────────────────┘
|
|
657
|
-
│
|
|
658
|
-
┌─────────▼────────────────────────────────┐
|
|
659
|
-
│ BrowserSession (Browser) │
|
|
660
|
-
│ - Playwright integration │
|
|
661
|
-
│ - Tab & page management │
|
|
662
|
-
│ - Navigation & interaction │
|
|
663
|
-
└─────────┬────────────────────────────────┘
|
|
664
|
-
│
|
|
665
|
-
┌─────────▼────────────────────────────────┐
|
|
666
|
-
│ DOMService (DOM Analysis) │
|
|
667
|
-
│ - Element extraction │
|
|
668
|
-
│ - Clickable element detection │
|
|
669
|
-
│ - History tree processing │
|
|
670
|
-
└──────────────────────────────────────────┘
|
|
671
|
-
|
|
672
|
-
Supporting Services:
|
|
673
|
-
┌──────────────────────────────────────────┐
|
|
674
|
-
│ - LLM Clients (10+ providers) │
|
|
675
|
-
│ - FileSystem (with PDF support) │
|
|
676
|
-
│ - Screenshot Service │
|
|
677
|
-
│ - Token Tracking & Cost Calculation │
|
|
678
|
-
│ - Telemetry (PostHog) │
|
|
679
|
-
│ - Observability (LMNR) │
|
|
680
|
-
│ - MCP Protocol Support │
|
|
681
|
-
│ - Gmail/Sheets Integration │
|
|
682
|
-
└──────────────────────────────────────────┘
|
|
683
|
-
```
|
|
684
|
-
|
|
685
|
-
### Key Components
|
|
686
|
-
|
|
687
|
-
- **Agent**: High-level orchestrator managing task execution, LLM communication, and step-by-step planning
|
|
688
|
-
- **Controller**: Action registry and executor with 30+ built-in actions and custom action support
|
|
689
|
-
- **BrowserSession**: Browser lifecycle manager built on Playwright with tab management and state tracking
|
|
690
|
-
- **DOMService**: Intelligent DOM analyzer extracting relevant elements for AI consumption
|
|
691
|
-
- **MessageManager**: Manages conversation history with token optimization and context window management
|
|
692
|
-
- **FileSystem**: File operations with PDF parsing and workspace management
|
|
693
|
-
- **ScreenshotService**: Captures and manages screenshots for vision capabilities
|
|
694
|
-
- **Registry**: Type-safe action registration system with Zod schema validation
|
|
695
|
-
|
|
696
|
-
## Token Usage & Cost Tracking
|
|
697
|
-
|
|
698
|
-
The library automatically tracks token usage and calculates costs:
|
|
354
|
+
- **Sensitive Data Masking** — Credentials are automatically masked in logs and LLM context
|
|
355
|
+
- **Domain Restrictions** — Lock browser navigation to trusted domains
|
|
356
|
+
- **Domain-scoped Secrets** — Credentials are only injected on matching domains
|
|
357
|
+
- **Hard Safety Gate** — `sensitive_data` requires `allowed_domains` by default
|
|
358
|
+
- **Chromium Sandbox** — Enabled by default for production security
|
|
699
359
|
|
|
700
360
|
```typescript
|
|
701
|
-
|
|
702
|
-
|
|
703
|
-
|
|
704
|
-
|
|
705
|
-
|
|
706
|
-
|
|
707
|
-
|
|
708
|
-
|
|
709
|
-
|
|
710
|
-
|
|
711
|
-
|
|
712
|
-
|
|
713
|
-
|
|
714
|
-
|
|
715
|
-
const cost = TokenCost.calculate(history);
|
|
716
|
-
console.log('Estimated cost: $', cost.toFixed(4));
|
|
717
|
-
```
|
|
718
|
-
|
|
719
|
-
## Screenshot & History Export
|
|
720
|
-
|
|
721
|
-
Generate GIF animations from agent execution history:
|
|
722
|
-
|
|
723
|
-
```typescript
|
|
724
|
-
import { create_history_gif } from 'browser-use';
|
|
725
|
-
|
|
726
|
-
const history = await agent.run();
|
|
727
|
-
|
|
728
|
-
await create_history_gif('My automation task', history, {
|
|
729
|
-
output_path: 'agent-history.gif',
|
|
730
|
-
duration: 3000, // ms per frame
|
|
731
|
-
show_goals: true,
|
|
732
|
-
show_task: true,
|
|
733
|
-
show_logo: false,
|
|
361
|
+
const agent = new Agent({
|
|
362
|
+
task: 'Login and fetch invoices',
|
|
363
|
+
llm,
|
|
364
|
+
sensitive_data: {
|
|
365
|
+
'*.example.com': {
|
|
366
|
+
username: process.env.USERNAME!,
|
|
367
|
+
password: process.env.PASSWORD!,
|
|
368
|
+
},
|
|
369
|
+
},
|
|
370
|
+
browser_session: new BrowserSession({
|
|
371
|
+
browser_profile: new BrowserProfile({
|
|
372
|
+
allowed_domains: ['*.example.com'],
|
|
373
|
+
}),
|
|
374
|
+
}),
|
|
734
375
|
});
|
|
735
|
-
|
|
736
|
-
console.log('Created agent-history.gif');
|
|
737
376
|
```
|
|
738
377
|
|
|
739
|
-
|
|
378
|
+
> See [Security Guide](./docs/SECURITY.md) for production deployment best practices.
|
|
740
379
|
|
|
741
|
-
|
|
380
|
+
## 📚 Documentation
|
|
742
381
|
|
|
743
|
-
|
|
744
|
-
|
|
382
|
+
| Document | Description |
|
|
383
|
+
| ---------------------------------------- | ------------------------------------ |
|
|
384
|
+
| [Quick Start](./docs/QUICKSTART.md) | Get started in 5 minutes |
|
|
385
|
+
| [Architecture](./docs/ARCHITECTURE.md) | System design and component overview |
|
|
386
|
+
| [API Reference](./docs/API_REFERENCE.md) | Complete API documentation |
|
|
387
|
+
| [Configuration](./docs/CONFIGURATION.md) | All configuration options |
|
|
388
|
+
| [LLM Providers](./docs/LLM_PROVIDERS.md) | Provider setup and comparison |
|
|
389
|
+
| [Actions](./docs/ACTIONS.md) | Built-in and custom actions |
|
|
390
|
+
| [MCP Server](./docs/MCP_SERVER.md) | MCP integration guide |
|
|
391
|
+
| [Security](./docs/SECURITY.md) | Security best practices |
|
|
392
|
+
| [Examples](./docs/EXAMPLES.md) | More code examples |
|
|
393
|
+
| [Contributing](./docs/CONTRIBUTING.md) | Contribution guidelines |
|
|
745
394
|
|
|
746
|
-
|
|
747
|
-
// All agent operations are automatically traced
|
|
395
|
+
## 🛠️ Development
|
|
748
396
|
|
|
749
|
-
|
|
750
|
-
|
|
751
|
-
|
|
752
|
-
// Function execution is logged and timed
|
|
753
|
-
}
|
|
754
|
-
```
|
|
755
|
-
|
|
756
|
-
## Contributing
|
|
757
|
-
|
|
758
|
-
Contributions are welcome! Please feel free to submit a Pull Request.
|
|
759
|
-
|
|
760
|
-
1. Fork the repository
|
|
761
|
-
2. Create your feature branch (`git checkout -b feature/amazing-feature`)
|
|
762
|
-
3. Commit your changes (`git commit -m 'feat: add amazing feature'`)
|
|
763
|
-
4. Push to the branch (`git push origin feature/amazing-feature`)
|
|
764
|
-
5. Open a Pull Request
|
|
765
|
-
|
|
766
|
-
## Support
|
|
767
|
-
|
|
768
|
-
- 📚 [Documentation](https://github.com/webllm/browser-use)
|
|
769
|
-
- 🐛 [Issue Tracker](https://github.com/webllm/browser-use/issues)
|
|
770
|
-
- 💬 [Discussions](https://github.com/webllm/browser-use/discussions)
|
|
771
|
-
|
|
772
|
-
## Acknowledgments
|
|
773
|
-
|
|
774
|
-
### Original Project
|
|
775
|
-
|
|
776
|
-
This TypeScript implementation would not exist without the groundbreaking work of the original **[browser-use](https://github.com/browser-use/browser-use)** Python library:
|
|
777
|
-
|
|
778
|
-
- 🎯 **Original Project**: [browser-use/browser-use](https://github.com/browser-use/browser-use) (Python)
|
|
779
|
-
- 👏 **Created by**: The browser-use team and contributors
|
|
780
|
-
- 💡 **Inspiration**: All architectural decisions, agent design patterns, and innovative approaches come from the original Python implementation
|
|
781
|
-
|
|
782
|
-
We are deeply grateful to the original authors for creating such an elegant and powerful solution for AI-driven browser automation. This TypeScript port aims to faithfully replicate their excellent work for the JavaScript/TypeScript community.
|
|
783
|
-
|
|
784
|
-
### Key Differences from Python Version
|
|
785
|
-
|
|
786
|
-
While we strive to maintain feature parity with the Python version, there are some differences due to platform constraints:
|
|
787
|
-
|
|
788
|
-
- **Runtime**: Node.js/Deno/Bun instead of Python
|
|
789
|
-
- **Type System**: TypeScript's structural typing vs Python's duck typing
|
|
790
|
-
- **Async Model**: JavaScript Promises vs Python async/await (similar but different)
|
|
791
|
-
- **Ecosystem**: npm packages vs PyPI packages
|
|
397
|
+
```bash
|
|
398
|
+
# Install dependencies
|
|
399
|
+
pnpm install
|
|
792
400
|
|
|
793
|
-
|
|
401
|
+
# Build
|
|
402
|
+
pnpm build
|
|
794
403
|
|
|
795
|
-
|
|
404
|
+
# Run tests
|
|
405
|
+
pnpm test
|
|
796
406
|
|
|
797
|
-
|
|
798
|
-
|
|
799
|
-
|
|
800
|
-
- And many other excellent open-source libraries
|
|
407
|
+
# Lint & format
|
|
408
|
+
pnpm lint
|
|
409
|
+
pnpm prettier
|
|
801
410
|
|
|
802
|
-
|
|
411
|
+
# Type checking
|
|
412
|
+
pnpm typecheck
|
|
803
413
|
|
|
804
|
-
|
|
805
|
-
|
|
806
|
-
|
|
414
|
+
# Run an example
|
|
415
|
+
pnpm exec tsx examples/simple-search.ts
|
|
416
|
+
```
|
|
807
417
|
|
|
808
|
-
##
|
|
418
|
+
## Requirements
|
|
809
419
|
|
|
810
|
-
-
|
|
811
|
-
-
|
|
812
|
-
-
|
|
813
|
-
- 🦜 [Laminar](https://laminar.run/) - LLM observability platform
|
|
420
|
+
- **Node.js** >= 18.0.0
|
|
421
|
+
- **LLM API Key** — At least one supported provider
|
|
422
|
+
- **Playwright** — Installed automatically as a dependency
|
|
814
423
|
|
|
815
|
-
## License
|
|
424
|
+
## 📄 License
|
|
816
425
|
|
|
817
|
-
MIT
|
|
426
|
+
[MIT](./LICENSE) © Web LLM
|