yiyan-browser-agent 1.0.0 → 1.0.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/LICENSE CHANGED
@@ -1,6 +1,6 @@
1
1
  MIT License
2
2
 
3
- Copyright (c) 2026 doubao-browser-agent contributors
3
+ Copyright (c) 2026 Omar Azam
4
4
 
5
5
  Permission is hereby granted, free of charge, to any person obtaining a copy
6
6
  of this software and associated documentation files (the "Software"), to deal
package/README.md CHANGED
@@ -1,145 +1,316 @@
1
- # doubao-browser-agent
1
+ <div align="center">
2
2
 
3
- NPM package for interacting with Doubao (豆包) web version via Playwright.
3
+ <img src="https://img.shields.io/badge/status-in%20development-orange?style=for-the-badge" alt="Status: In Development"/>
4
+ <img src="https://img.shields.io/badge/node-%3E%3D18.0.0-brightgreen?style=for-the-badge" alt="Node.js"/>
5
+ <img src="https://img.shields.io/badge/license-MIT-green?style=for-the-badge" alt="License"/>
6
+ <img src="https://img.shields.io/badge/PRs-welcome-brightgreen?style=for-the-badge" alt="PRs Welcome"/>
4
7
 
5
- ## Features
8
+ # 🤖 Yiyan Browser Agent (文心一言)
6
9
 
7
- - Automated browser interaction with Doubao web version
8
- - Automatic Chrome profile management for login persistence
9
- - Retry mechanism for network failures
10
- - CLI and Node.js API support
11
- - TypeScript support with full type definitions
10
+ **An autonomous AI coding agent that runs entirely for free — no API key required.**
12
11
 
13
- ## Prerequisites
12
+ It drives a real browser to talk to [Yiyan (文心一言)](https://yiyan.baidu.com), giving you a Claude Code / Cursor-style coding agent powered by Baidu's AI models at zero cost.
14
13
 
15
- 1. **Chrome browser** installed on your system
16
- 2. **Logged into Doubao** (https://www.doubao.com/chat/) in your Chrome browser at least once
14
+ [Installation](#-installation) · [Quick Start](#-quick-start) · [Usage](#-usage) · [Configuration](#-configuration) · [Tools](#-available-tools) · [Contributing](#-contributing)
17
15
 
18
- The package uses your Chrome profile to maintain login state, so you need to login manually first.
16
+ ---
19
17
 
20
- ## Installation
18
+ > ⚠️ **This project is currently in active development.**
19
+ > Core functionality works, but you may encounter rough edges. Bug reports and contributions are very welcome — see [Contributing](#-contributing).
20
+
21
+ </div>
22
+
23
+ ---
24
+
25
+ ## 🧠 How It Works
26
+
27
+ Most AI coding agents talk to a paid API. This one doesn't.
28
+
29
+ Instead, it uses **Playwright** to control a real Chromium browser, navigates to `yiyan.baidu.com`, sends your task, waits for the response, and parses it to extract tool calls — all automatically. Your local files and terminal are wired up as tools the AI can use, so it can read code, write files, run commands, and and build complete projects step by step.
30
+
31
+ ```
32
+ Your Terminal
33
+
34
+
35
+ Agent Core ← orchestrates the loop
36
+
37
+ ├──► Browser (Playwright) ← talks to yiyan.baidu.com
38
+ │ │
39
+ │ Yiyan AI (文心一言) ← thinks, decides what tool to use
40
+ │ │
41
+ └──► Tool Executor ← reads/writes files, runs commands
42
+
43
+ Your Project
44
+ ```
45
+
46
+ ---
47
+
48
+ ## 📦 Installation
21
49
 
22
50
  ```bash
23
- npm install doubao-browser-agent
51
+ npm install -g yiyan-browser-agent
24
52
  ```
25
53
 
26
- ## Node.js API Usage
54
+ > Chromium downloads automatically after install (~150 MB, one time only).
27
55
 
28
- ```typescript
29
- import { DoubaoAgent } from 'doubao-browser-agent';
56
+ **Requirements:** Node.js ≥ 18
30
57
 
31
- // Create agent instance
32
- const agent = new DoubaoAgent({
33
- timeout: 120000, // Timeout in milliseconds (default: 120000)
34
- retryCount: 3, // Retry count (default: 3)
35
- });
58
+ ---
36
59
 
37
- // Send a question and get answer
38
- try {
39
- const answer = await agent.ask('What is TypeScript?');
40
- console.log('Answer:', answer);
41
- } catch (error) {
42
- console.error('Error:', error.message);
43
- }
60
+ ## 🚀 Quick Start
61
+
62
+ **1. First run log in to Yiyan:**
63
+ ```bash
64
+ yiyan-agent --interactive
65
+ ```
66
+ A browser window opens. Log in to your Yiyan (文心一言) account, then come back to the terminal and press **Enter**. Your session is saved — you only do this once.
67
+
68
+ **2. Give it a task:**
69
+ ```bash
70
+ yiyan-agent "build a REST API in Express with user authentication"
71
+ ```
72
+
73
+ **3. Use the short alias `ya` from any project folder:**
74
+ ```bash
75
+ cd ~/my-project
76
+ ya "add input validation to all my API routes"
77
+ ```
78
+
79
+ ---
44
80
 
45
- // Check login status
46
- const status = agent.status();
47
- console.log('Logged in:', status.loggedIn);
48
- console.log('Profile path:', status.profilePath);
81
+ ## 💻 Usage
49
82
 
50
- // Clear saved profile (if needed)
51
- await agent.reset();
83
+ ```
84
+ yiyan-agent [OPTIONS] [TASK]
85
+
86
+ -t, --task <task> Task to run (or just type it as the last argument)
87
+ -i, --interactive Keep browser open, run multiple tasks in a session
88
+ -d, --dir <path> Set working directory (default: current directory)
89
+ --debug Print raw AI responses to the terminal
90
+ --headless Run browser invisibly (requires prior login)
91
+ --save-log Save full session log to ~/.yiyan-agent/logs/
92
+ --calibrate Auto-detect DOM selectors (run if agent breaks)
93
+ -h, --help Show help
94
+
95
+ Aliases:
96
+ ya Short form of yiyan-agent
52
97
  ```
53
98
 
54
- ## CLI Usage
99
+ ### Examples
55
100
 
56
101
  ```bash
57
- # Send a question
58
- doubao-agent ask "What is TypeScript?"
102
+ # Single task — runs and exits
103
+ yiyan-agent "create a Python script that scrapes Hacker News"
104
+
105
+ # Interactive mode — keeps browser open between tasks
106
+ yiyan-agent --interactive
59
107
 
60
- # With options
61
- doubao-agent ask "Explain Promise" --timeout 60000 --retry 5
108
+ # Run on a specific project
109
+ ya --dir ~/projects/my-app "refactor all callbacks to async/await"
62
110
 
63
- # Check login status
64
- doubao-agent status
111
+ # Debug mode (shows what Yiyan is actually outputting)
112
+ ya --debug "build a calculator"
65
113
 
66
- # Clear saved profile
67
- doubao-agent reset
114
+ # Headless mode (faster — browser runs in background)
115
+ ya --headless "write unit tests for utils.js"
68
116
 
69
- # Show help
70
- doubao-agent --help
117
+ # In interactive mode, type 'new' to start a fresh chat:
118
+ Task: new
119
+ ```
120
+
121
+ ---
122
+
123
+ ## ⚙️ Configuration
124
+
125
+ ### Global config — applies everywhere
126
+
127
+ Create `~/.yiyan-agent/config.json`:
128
+
129
+ ```json
130
+ {
131
+ "HEADLESS": true,
132
+ "MAX_ITERATIONS": 50,
133
+ "STABLE_DELAY": 3000,
134
+ "DEBUG": false
135
+ }
71
136
  ```
72
137
 
73
- ## API Documentation
138
+ ### Per-project config — overrides global
74
139
 
75
- ### `DoubaoAgent`
140
+ Drop `yiyan-agent.config.json` in your project root:
76
141
 
77
- Main class for interacting with Doubao.
142
+ ```json
143
+ {
144
+ "MAX_ITERATIONS": 60,
145
+ "MAX_OUTPUT_LENGTH": 12000
146
+ }
147
+ ```
78
148
 
79
- #### Constructor
149
+ ### All settings
150
+
151
+ | Setting | Default | Description |
152
+ |---|---|---|
153
+ | `HEADLESS` | `false` | Hide the browser window |
154
+ | `MAX_ITERATIONS` | `40` | Max agent steps per task before stopping |
155
+ | `RESPONSE_TIMEOUT` | `180000` | Max ms to wait for a response (3 min) |
156
+ | `STABLE_DELAY` | `2500` | Ms of silence that means Yiyan is done |
157
+ | `SEND_DELAY` | `400` | Ms between typing and pressing Enter |
158
+ | `MAX_OUTPUT_LENGTH` | `8000` | Truncate long command outputs sent to AI |
159
+ | `DEBUG` | `false` | Print raw AI responses to terminal |
160
+ | `SESSION_DIR` | `~/.yiyan-agent/session` | Where browser cookies are saved |
161
+
162
+ ---
163
+
164
+ ## 🛠️ Available Tools
165
+
166
+ The agent can use these tools autonomously to complete your task:
167
+
168
+ | Tool | Description |
169
+ |---|---|
170
+ | `read_file` | Read a file's contents, optionally by line range |
171
+ | `write_file` | Create or overwrite a file (auto-creates directories) |
172
+ | `append_to_file` | Append text to an existing file |
173
+ | `replace_in_file` | Find and replace text in a file (regex supported) |
174
+ | `delete_file` | Permanently delete a file |
175
+ | `list_directory` | List directory contents, optionally recursive |
176
+ | `create_directory` | Create a directory and all parents |
177
+ | `move_file` | Move or rename a file or directory |
178
+ | `copy_file` | Copy a file to a new location |
179
+ | `get_file_info` | Get file metadata (size, line count, dates) |
180
+ | `run_command` | Execute any shell command |
181
+ | `find_files` | Find files by name pattern (e.g. `*.ts`) |
182
+ | `search_in_files` | Search text inside files (like `grep -r`) |
183
+ | `read_url` | Fetch and read the content of a URL |
184
+ | `write_files` | Write multiple files at once (batch scaffold) |
185
+
186
+ ---
187
+
188
+ ## 📂 Where Data is Stored
189
+
190
+ Everything lives in `~/.yiyan-agent/` in your home directory:
80
191
 
81
- ```typescript
82
- new DoubaoAgent(options?: DoubaoAgentOptions)
192
+ ```
193
+ ~/.yiyan-agent/
194
+ ├── session/ ← Browser cookies (login once, runs forever)
195
+ ├── logs/ ← Session logs (only saved with --save-log)
196
+ └── config.json ← Your global settings
83
197
  ```
84
198
 
85
- **Options:**
86
- - `timeout?: number` - Timeout in milliseconds (default: 120000)
87
- - `retryCount?: number` - Number of retry attempts (default: 3)
88
- - `profileDir?: string` - Custom directory for storing Chrome profile copy
89
- - `chromePath?: string` - Custom Chrome executable path
199
+ ---
90
200
 
91
- #### Methods
201
+ ## 🔧 Troubleshooting
92
202
 
93
- ##### `ask(question: string): Promise<string>`
203
+ ### Agent responds but creates no files
204
+ The browser DOM rendered the AI's response in a way the parser didn't catch. Run with `--debug` to see exactly what's being received:
205
+ ```bash
206
+ yiyan-agent --debug "build a calculator"
207
+ ```
94
208
 
95
- Send a question to Doubao and return the answer.
209
+ ### Agent stops responding / loops
210
+ Yiyan's UI may have changed. Run the calibration tool — it inspects the live DOM and prints updated selectors:
211
+ ```bash
212
+ yiyan-agent --calibrate
213
+ ```
96
214
 
97
- ##### `status(): { loggedIn: boolean; profilePath: string }`
215
+ ### Login session expired
216
+ Just run without `--headless` — the browser opens and you log in again:
217
+ ```bash
218
+ yiyan-agent --interactive
219
+ ```
98
220
 
99
- Check the login status (whether profile exists).
221
+ ### Chromium didn't download automatically
222
+ ```bash
223
+ npx playwright install chromium
224
+ ```
100
225
 
101
- ##### `reset(): Promise<void>`
226
+ ### Response times out on long tasks
227
+ Increase the timeout in your config:
228
+ ```json
229
+ { "RESPONSE_TIMEOUT": 300000, "STABLE_DELAY": 4000 }
230
+ ```
102
231
 
103
- Clear the saved profile copy.
232
+ ---
104
233
 
105
- ### Error Types
234
+ ## 🗂️ Project Structure
106
235
 
107
- The package throws `DoubaoAgentError` with the following error types:
236
+ ```
237
+ yiyan-browser-agent/
238
+ ├── src/
239
+ │ ├── index.js ← CLI entry point and argument parsing
240
+ │ ├── agent.js ← Core agent loop (send → wait → parse → execute)
241
+ │ ├── browser.js ← Playwright controller for yiyan.baidu.com
242
+ │ ├── tools.js ← All 15 filesystem and shell tools
243
+ │ ├── parser.js ← Extracts tool calls from AI responses (6 strategies)
244
+ │ ├── prompt.js ← System prompt and conversation history manager
245
+ │ ├── config.js ← Configuration loader (global + per-project)
246
+ │ ├── logger.js ← ANSI-colored terminal output
247
+ │ ├── calibrate.js ← DOM selector inspector / auto-fix tool
248
+ │ └── postinstall.js ← Auto-downloads Chromium after npm install
249
+ ├── LICENSE
250
+ ├── README.md
251
+ └── package.json
252
+ ```
108
253
 
109
- | Type | Description |
110
- |------|-------------|
111
- | `BROWSER_LAUNCH` | Failed to launch Chrome browser |
112
- | `PROFILE_COPY` | Failed to copy Chrome profile |
113
- | `TIMEOUT` | Timeout while waiting for response |
114
- | `NETWORK` | Network or connection error |
254
+ ---
115
255
 
116
- ```typescript
117
- import { DoubaoAgentError } from 'doubao-browser-agent';
256
+ ## 🤝 Contributing
118
257
 
119
- try {
120
- const answer = await agent.ask('Hello');
121
- } catch (error) {
122
- if (error instanceof DoubaoAgentError) {
123
- console.log('Error type:', error.type);
124
- console.log('Error message:', error.message);
125
- }
126
- }
258
+ Contributions are very welcome — this project is in active development and there's plenty of room to grow.
259
+
260
+ ### Setting up locally
261
+
262
+ ```bash
263
+ git clone https://github.com/YOUR_USERNAME/yiyan-browser-agent
264
+ cd yiyan-browser-agent
265
+ npm install
266
+ npx playwright install chromium
267
+ node src/index.js --interactive
127
268
  ```
128
269
 
129
- ## How It Works
270
+ ### Areas that need work
271
+
272
+ - 🧪 **Tests** — there are currently no automated tests; a test suite would be a great contribution
273
+ - 🎨 **UI selector resilience** — Yiyan updates their UI occasionally; better selector strategies are welcome
274
+ - 🔌 **More tools** — image generation, browser control, database tools, etc.
275
+ - 🌐 **Other AI frontends** — adapting the browser layer to work with other free AI chats
276
+ - 📦 **Windows support** — currently tested on Linux; Windows path handling may need fixes
277
+ - 📝 **Better error messages** — making failures easier to diagnose
278
+
279
+ ### How to contribute
280
+
281
+ 1. Fork the repo
282
+ 2. Create a branch: `git checkout -b feature/my-improvement`
283
+ 3. Make your changes
284
+ 4. Open a Pull Request with a clear description
285
+
286
+ Please keep PRs focused — one feature or fix per PR makes review much faster.
287
+
288
+ ### Reporting bugs
289
+
290
+ Open an issue on GitHub with:
291
+ - What you ran
292
+ - What you expected
293
+ - What actually happened
294
+ - Output of `yiyan-agent --debug "your task"` if relevant
295
+
296
+ ---
297
+
298
+ ## ⚠️ Disclaimer
299
+
300
+ This project automates a web browser to interact with yiyan.baidu.com. Automating web UIs may violate the terms of service of the website being automated. Use this tool for **personal and development purposes only**. The authors take no responsibility for account suspensions or other consequences of use.
301
+
302
+ ---
303
+
304
+ ## 📄 License
305
+
306
+ MIT — see [LICENSE](./LICENSE) for details.
130
307
 
131
- 1. On first run, the package copies your Chrome profile to a temporary directory
132
- 2. Launches Chrome in headless mode with the copied profile
133
- 3. Navigates to Doubao chat page
134
- 4. Sends your question and waits for response
135
- 5. Extracts the response and closes the browser
308
+ ---
136
309
 
137
- ## Supported Platforms
310
+ <div align="center">
138
311
 
139
- - Windows (Chrome paths: `C:/Program Files/Google/Chrome/Application/chrome.exe`)
140
- - macOS (Chrome paths: `/Applications/Google Chrome.app/Contents/MacOS/Google Chrome`)
141
- - Linux (Chrome paths: `/usr/bin/google-chrome`, `/usr/bin/chrome`)
312
+ **Built with Playwright · Powered by Yiyan (文心一言) · Free forever**
142
313
 
143
- ## License
314
+ If this project helped you, consider giving it a ⭐ on GitHub!
144
315
 
145
- MIT
316
+ </div>
package/package.json CHANGED
@@ -1,47 +1,45 @@
1
1
  {
2
2
  "name": "yiyan-browser-agent",
3
- "version": "1.0.0",
4
- "description": "NPM package for interacting with Yiyan (文心一言) web version via Playwright",
5
- "type": "module",
3
+ "version": "1.0.2",
4
+ "description": "AI coding agent powered by Yiyan (文心一言) via browser automation — no API key needed",
5
+ "main": "src/index.js",
6
6
  "bin": {
7
- "yiyan-agent": "./dist/cli.js"
7
+ "yiyan-agent": "src/index.js",
8
+ "ya": "src/index.js"
8
9
  },
9
- "main": "./dist/index.js",
10
- "exports": {
11
- ".": {
12
- "import": "./dist/index.js",
13
- "types": "./dist/index.d.ts"
14
- }
10
+ "scripts": {
11
+ "start": "node src/index.js",
12
+ "postinstall": "node src/postinstall.js",
13
+ "debug": "node src/index.js --debug",
14
+ "calibrate": "node src/calibrate.js"
15
15
  },
16
16
  "files": [
17
- "dist",
17
+ "src/",
18
18
  "README.md",
19
19
  "LICENSE"
20
20
  ],
21
- "scripts": {
22
- "build": "tsup",
23
- "test": "vitest run",
24
- "test:watch": "vitest",
25
- "typecheck": "tsc --noEmit",
26
- "prepublishOnly": "npm run build && npm run test"
27
- },
28
21
  "dependencies": {
29
- "playwright-core": "^1.40.0"
22
+ "playwright": "^1.45.0"
30
23
  },
31
- "devDependencies": {
32
- "typescript": "^5.0.0",
33
- "vitest": "^3.2.4",
34
- "tsup": "^8.0.0",
35
- "@types/node": "^20.0.0"
24
+ "engines": {
25
+ "node": ">=18.0.0"
36
26
  },
37
- "keywords": ["yiyan", "文心一言", "baidu", "playwright", "browser", "agent", "ai", "chatbot"],
27
+ "keywords": [
28
+ "ai",
29
+ "agent",
30
+ "yiyan",
31
+ "wenxin",
32
+ "baidu",
33
+ "browser-automation",
34
+ "coding-agent",
35
+ "cli",
36
+ "llm"
37
+ ],
38
+ "author": "",
38
39
  "license": "MIT",
39
40
  "repository": {
40
41
  "type": "git",
41
- "url": "https://github.com/picha/yiyan-browser-agent"
42
- },
43
- "bugs": {
44
- "url": "https://github.com/picha/yiyan-browser-agent/issues"
42
+ "url": "https://github.com/YOUR_USERNAME/yiyan-browser-agent"
45
43
  },
46
- "homepage": "https://github.com/picha/yiyan-browser-agent#readme"
44
+ "homepage": "https://github.com/YOUR_USERNAME/yiyan-browser-agent#readme"
47
45
  }