yiyan-browser-agent 1.4.10 → 1.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -11,7 +11,7 @@
11
11
 
12
12
  It drives a real browser to talk to [Yiyan (文心一言)](https://yiyan.baidu.com), giving you a Claude Code / Cursor-style coding agent powered by Baidu's AI models at zero cost.
13
13
 
14
- [Installation](#-installation) · [Quick Start](#-quick-start) · [Usage](#-usage) · [Configuration](#-configuration) · [Tools](#-available-tools) · [Contributing](#-contributing)
14
+ [Installation](#-installation) · [Quick Start](#-quick-start) · [Usage](#-usage) · [HTTP API](#-http-api) · [Configuration](#-configuration) · [Tools](#-available-tools) · [Contributing](#-contributing)
15
15
 
16
16
  ---
17
17
 
@@ -47,30 +47,76 @@ Your Terminal
47
47
 
48
48
  ## 📦 Installation
49
49
 
50
- ```bash
50
+ ### Windows
51
+
52
+ ```powershell
53
+ # 安装 Node.js (如果没有)
54
+ # 从 https://nodejs.org 下载安装,或使用 winget:
55
+ winget install OpenJS.NodeJS.LTS
56
+
57
+ # 全局安装 yiyan-browser-agent
51
58
  npm install -g yiyan-browser-agent
59
+
60
+ # 安装 Chromium 浏览器 (首次安装后自动执行,约 150MB)
61
+ npx playwright install chromium
52
62
  ```
53
63
 
54
- > Chromium downloads automatically after install (~150 MB, one time only).
64
+ ### Ubuntu / Linux
65
+
66
+ ```bash
67
+ # 安装 Node.js (如果没有)
68
+ curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash -
69
+ sudo apt install -y nodejs
70
+
71
+ # 全局安装 yiyan-browser-agent
72
+ npm install -g yiyan-browser-agent
73
+
74
+ # 安装 Chromium 和依赖
75
+ npx playwright install chromium
76
+ npx playwright install-deps chromium # 安装系统依赖
77
+ ```
55
78
 
56
- **Requirements:** Node.js ≥ 18
79
+ > **Requirements:** Node.js ≥ 18
57
80
 
58
81
  ---
59
82
 
60
83
  ## 🚀 Quick Start
61
84
 
62
- **1. First run — log in to Yiyan:**
85
+ ### Windows
86
+
87
+ **1. 首次运行 — 登录文心一言:**
88
+ ```powershell
89
+ yiyan-agent -i
90
+ ```
91
+ 浏览器窗口打开后,登录你的百度账号,然后回到终端按 Enter。会话会保存 — 只需登录一次。
92
+
93
+ **2. 给任务:**
94
+ ```powershell
95
+ yiyan-agent "创建一个 Express REST API,带用户认证"
96
+ ```
97
+
98
+ **3. 发送任务到已运行的服务器:**
99
+ ```powershell
100
+ # 终端 1: 启动交互模式 (作为 HTTP 服务器)
101
+ yiyan-agent -i
102
+
103
+ # 终端 2: 发送任务 (转发到服务器,不启动新浏览器)
104
+ yiyan-agent "上海天气,20个字"
105
+ ```
106
+
107
+ ### Ubuntu / Linux
108
+
109
+ **1. 首次运行 — 登录文心一言:**
63
110
  ```bash
64
- yiyan-agent --interactive
111
+ yiyan-agent -i
65
112
  ```
66
- A browser window opens. Log in to your Yiyan (文心一言) account, then come back to the terminal and press **Enter**. Your session is saved — you only do this once.
67
113
 
68
- **2. Give it a task:**
114
+ **2. 给任务:**
69
115
  ```bash
70
116
  yiyan-agent "build a REST API in Express with user authentication"
71
117
  ```
72
118
 
73
- **3. Use the short alias `ya` from any project folder:**
119
+ **3. 从任意目录使用短别名 `ya`:**
74
120
  ```bash
75
121
  cd ~/my-project
76
122
  ya "add input validation to all my API routes"
@@ -84,11 +130,10 @@ ya "add input validation to all my API routes"
84
130
  yiyan-agent [OPTIONS] [TASK]
85
131
 
86
132
  -t, --task <task> Task to run (or just type it as the last argument)
87
- -i, --interactive Keep browser open, run multiple tasks in a session
133
+ -i, --interactive Keep browser open, run multiple tasks (starts HTTP server)
88
134
  -d, --dir <path> Set working directory (default: current directory)
89
135
  --debug Print raw AI responses to the terminal
90
- --headless Run browser invisibly (requires prior login)
91
- --save-log Save full session log to ~/.yiyan-agent/logs/
136
+ --show-browser Show browser window (non-interactive mode)
92
137
  --calibrate Auto-detect DOM selectors (run if agent breaks)
93
138
  -h, --help Show help
94
139
 
@@ -102,38 +147,116 @@ Aliases:
102
147
  # Single task — runs and exits
103
148
  yiyan-agent "create a Python script that scrapes Hacker News"
104
149
 
105
- # Interactive mode — keeps browser open between tasks
106
- yiyan-agent --interactive
150
+ # Interactive mode — keeps browser open, starts HTTP server on port 9527
151
+ yiyan-agent -i
107
152
 
108
153
  # Run on a specific project
109
154
  ya --dir ~/projects/my-app "refactor all callbacks to async/await"
110
155
 
111
- ### Process Communication (v1.4.8+)
156
+ # Debug mode (shows what Yiyan is actually outputting)
157
+ ya --debug "build a calculator"
112
158
 
113
- When interactive mode (`-i`) is running, other `yiyan-agent` processes automatically detect and forward tasks to it, avoiding repeated browser startup:
159
+ # In interactive mode, type 'quit' or 'q' to exit:
160
+ ❯ quit
161
+ ```
162
+
163
+ ---
164
+
165
+ ## 🌐 HTTP API (v1.5.0+)
166
+
167
+ When interactive mode (`-i`) is running, an HTTP server starts on port **9527**, allowing external services to send tasks.
168
+
169
+ ### Process Communication
114
170
 
115
171
  ```bash
116
- # Terminal 1: Start interactive mode (acts as server)
172
+ # Terminal 1: Start interactive mode (HTTP server)
117
173
  yiyan-agent -i
118
174
  # → Server listening on port 9527
119
175
 
120
176
  # Terminal 2: Send task (forwarded to server, no new browser)
121
177
  yiyan-agent "北京天气,15个字"
122
- # → Found running server, forwarding task...
178
+ # → Found running server on port 9527, forwarding task...
123
179
  ```
124
180
 
125
- Lock file: `~/.yiyan-agent/server.lock`
181
+ ### HTTP POST API
126
182
 
127
- # Debug mode (shows what Yiyan is actually outputting)
128
- ya --debug "build a calculator"
183
+ **Endpoint:** `POST http://localhost:9527/task`
184
+
185
+ **Request Body:**
186
+ ```json
187
+ {
188
+ "task": "你的任务描述",
189
+ "newChat": true // 可选,是否开启新对话,默认 true
190
+ }
191
+ ```
192
+
193
+ **Response:**
194
+ ```json
195
+ {
196
+ "question": "上海天气,20个字",
197
+ "answer": "上海今日晴,气温25°C...",
198
+ "duration": 5234,
199
+ "status": "success"
200
+ }
201
+ ```
202
+
203
+ ### Windows CMD (curl)
204
+
205
+ ```cmd
206
+ curl -X POST http://localhost:9527/task -H "Content-Type: application/json" -d "{\"task\":\"上海天气,20个字\"}"
207
+ ```
129
208
 
130
- # Headless mode (faster — browser runs in background)
131
- ya --headless "write unit tests for utils.js"
209
+ ### PowerShell
132
210
 
133
- # In interactive mode, type 'new' to start a fresh chat:
134
- Task: new
211
+ ```powershell
212
+ Invoke-RestMethod -Uri "http://localhost:9527/task" -Method POST -Body '{"task":"上海天气"}' -ContentType "application/json"
213
+ ```
214
+
215
+ ### Ubuntu / Linux (curl)
216
+
217
+ ```bash
218
+ curl -X POST http://localhost:9527/task -H "Content-Type: application/json" -d '{"task":"上海天气,20个字"}'
135
219
  ```
136
220
 
221
+ ### From Other Programming Languages
222
+
223
+ **Python:**
224
+ ```python
225
+ import requests
226
+ import json
227
+
228
+ response = requests.post(
229
+ 'http://localhost:9527/task',
230
+ json={'task': '上海天气,20个字'}
231
+ )
232
+ result = response.json()
233
+ print(result)
234
+ ```
235
+
236
+ **Node.js:**
237
+ ```javascript
238
+ const http = require('http');
239
+
240
+ const body = JSON.stringify({ task: '上海天气,20个字' });
241
+
242
+ const req = http.request({
243
+ hostname: 'localhost',
244
+ port: 9527,
245
+ path: '/task',
246
+ method: 'POST',
247
+ headers: { 'Content-Type': 'application/json' }
248
+ }, (res) => {
249
+ let data = '';
250
+ res.on('data', chunk => data += chunk);
251
+ res.on('end', () => console.log(JSON.parse(data)));
252
+ });
253
+
254
+ req.write(body);
255
+ req.end();
256
+ ```
257
+
258
+ **Lock file:** `~/.yiyan-agent/server.lock`
259
+
137
260
  ---
138
261
 
139
262
  ## ⚙️ Configuration
@@ -209,6 +332,7 @@ Everything lives in `~/.yiyan-agent/` in your home directory:
209
332
  ~/.yiyan-agent/
210
333
  ├── session/ ← Browser cookies (login once, runs forever)
211
334
  ├── logs/ ← Session logs (only saved with --save-log)
335
+ ├── server.lock ← HTTP server process lock
212
336
  └── config.json ← Your global settings
213
337
  ```
214
338
 
@@ -235,8 +359,15 @@ yiyan-agent --interactive
235
359
  ```
236
360
 
237
361
  ### Chromium didn't download automatically
362
+ **Windows:**
363
+ ```powershell
364
+ npx playwright install chromium
365
+ ```
366
+
367
+ **Ubuntu / Linux:**
238
368
  ```bash
239
369
  npx playwright install chromium
370
+ npx playwright install-deps chromium
240
371
  ```
241
372
 
242
373
  ### Response times out on long tasks
@@ -245,6 +376,20 @@ Increase the timeout in your config:
245
376
  { "RESPONSE_TIMEOUT": 300000, "STABLE_DELAY": 4000 }
246
377
  ```
247
378
 
379
+ ### HTTP server not detected by other processes
380
+ The lock file may be stale. Kill the old process and restart:
381
+ ```bash
382
+ # Check process
383
+ cat ~/.yiyan-agent/server.lock
384
+
385
+ # Kill if needed
386
+ kill <PID> # Linux
387
+ taskkill /PID <PID> /F # Windows
388
+
389
+ # Restart
390
+ yiyan-agent -i
391
+ ```
392
+
248
393
  ---
249
394
 
250
395
  ## 🗂️ Project Structure
@@ -255,6 +400,8 @@ yiyan-browser-agent/
255
400
  │ ├── index.js ← CLI entry point and argument parsing
256
401
  │ ├── agent.js ← Core agent loop (send → wait → parse → execute)
257
402
  │ ├── browser.js ← Playwright controller for yiyan.baidu.com
403
+ │ ├── server.js ← HTTP server for process communication (v1.5.0+)
404
+ │ ├── client.js ← HTTP client to forward tasks (v1.5.0+)
258
405
  │ ├── tools.js ← All 15 filesystem and shell tools
259
406
  │ ├── parser.js ← Extracts tool calls from AI responses (6 strategies)
260
407
  │ ├── prompt.js ← System prompt and conversation history manager
@@ -289,7 +436,6 @@ node src/index.js --interactive
289
436
  - 🎨 **UI selector resilience** — Yiyan updates their UI occasionally; better selector strategies are welcome
290
437
  - 🔌 **More tools** — image generation, browser control, database tools, etc.
291
438
  - 🌐 **Other AI frontends** — adapting the browser layer to work with other free AI chats
292
- - 📦 **Windows support** — currently tested on Linux; Windows path handling may need fixes
293
439
  - 📝 **Better error messages** — making failures easier to diagnose
294
440
 
295
441
  ### How to contribute
@@ -329,4 +475,4 @@ MIT — see [LICENSE](./LICENSE) for details.
329
475
 
330
476
  If this project helped you, consider giving it a ⭐ on GitHub!
331
477
 
332
- </div>
478
+ </div>
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "yiyan-browser-agent",
3
- "version": "1.4.10",
3
+ "version": "1.5.0",
4
4
  "description": "AI coding agent powered by Yiyan (文心一言) via browser automation — no API key needed",
5
5
  "main": "src/index.js",
6
6
  "bin": {
package/src/client.js CHANGED
@@ -1,8 +1,8 @@
1
1
  #!/usr/bin/env node
2
- // src/client.js — TCP client for sending tasks to running server
2
+ // src/client.js — HTTP client for sending tasks to running server
3
3
  'use strict';
4
4
 
5
- const net = require('net');
5
+ const http = require('http');
6
6
  const fs = require('fs');
7
7
  const path = require('path');
8
8
  const os = require('os');
@@ -46,45 +46,50 @@ class AgentClient {
46
46
  return false;
47
47
  }
48
48
 
49
- // 发送任务到服务器
49
+ // 发送任务到服务器(HTTP POST)
50
50
  async sendTask(task, options = {}) {
51
51
  return new Promise((resolve, reject) => {
52
- const socket = net.connect(this.port, 'localhost');
53
52
  const timeout = options.timeout || 180000; // 3分钟
54
53
 
55
- let data = '';
56
- let timer = setTimeout(() => {
57
- socket.destroy();
58
- reject(new Error('Connection timeout'));
59
- }, timeout);
54
+ const requestBody = JSON.stringify({
55
+ task,
56
+ newChat: options.newChat !== false
57
+ });
60
58
 
61
- socket.on('connect', () => {
62
- clearTimeout(timer);
63
- // 发送任务
64
- const request = JSON.stringify({
65
- task,
66
- newChat: options.newChat !== false
59
+ const req = http.request({
60
+ hostname: 'localhost',
61
+ port: this.port,
62
+ path: '/task',
63
+ method: 'POST',
64
+ headers: {
65
+ 'Content-Type': 'application/json',
66
+ 'Content-Length': Buffer.byteLength(requestBody)
67
+ },
68
+ timeout
69
+ }, (res) => {
70
+ let data = '';
71
+ res.on('data', chunk => { data += chunk.toString(); });
72
+ res.on('end', () => {
73
+ try {
74
+ const result = JSON.parse(data);
75
+ resolve(result);
76
+ } catch (err) {
77
+ reject(new Error(`Invalid response: ${data}`));
78
+ }
67
79
  });
68
- socket.end(request);
69
80
  });
70
81
 
71
- socket.on('data', (chunk) => {
72
- data += chunk.toString();
82
+ req.on('error', (err) => {
83
+ reject(err);
73
84
  });
74
85
 
75
- socket.on('end', () => {
76
- try {
77
- const result = JSON.parse(data);
78
- resolve(result);
79
- } catch (err) {
80
- reject(new Error(`Invalid response: ${data}`));
81
- }
86
+ req.on('timeout', () => {
87
+ req.destroy();
88
+ reject(new Error('Request timeout'));
82
89
  });
83
90
 
84
- socket.on('error', (err) => {
85
- clearTimeout(timer);
86
- reject(err);
87
- });
91
+ req.write(requestBody);
92
+ req.end();
88
93
  });
89
94
  }
90
95
  }
package/src/server.js CHANGED
@@ -1,8 +1,8 @@
1
1
  #!/usr/bin/env node
2
- // src/server.js — TCP server for accepting tasks from other processes
2
+ // src/server.js — HTTP server for accepting tasks from other processes/services
3
3
  'use strict';
4
4
 
5
- const net = require('net');
5
+ const http = require('http');
6
6
  const fs = require('fs');
7
7
  const path = require('path');
8
8
  const os = require('os');
@@ -23,15 +23,13 @@ class AgentServer {
23
23
  throw new Error(`Server already running on port ${this.port}`);
24
24
  }
25
25
 
26
- this.server = net.createServer((socket) => {
27
- this._handleConnection(socket);
26
+ this.server = http.createServer((req, res) => {
27
+ this._handleRequest(req, res);
28
28
  });
29
29
 
30
30
  await new Promise((resolve, reject) => {
31
- this.server.listen(this.port, (err) => {
32
- if (err) reject(err);
33
- else resolve();
34
- });
31
+ this.server.on('error', reject);
32
+ this.server.listen(this.port, resolve);
35
33
  });
36
34
 
37
35
  // 写锁文件
@@ -56,7 +54,6 @@ class AgentServer {
56
54
  const lock = JSON.parse(fs.readFileSync(LOCK_FILE, 'utf8'));
57
55
  // 检查进程是否还活着(Windows 兼容)
58
56
  try {
59
- // Windows: tasklist 检查, Unix: process.kill(0)
60
57
  if (process.platform === 'win32') {
61
58
  const result = require('child_process').execSync(
62
59
  `tasklist /FI "PID eq ${lock.pid}" /NH`,
@@ -99,29 +96,31 @@ class AgentServer {
99
96
  } catch {}
100
97
  }
101
98
 
102
- async _handleConnection(socket) {
103
- let data = '';
99
+ async _handleRequest(req, res) {
100
+ const logger = require('./logger');
104
101
 
105
- socket.on('data', (chunk) => {
106
- data += chunk.toString();
107
- });
102
+ // 只接受 POST /task
103
+ if (req.method !== 'POST') {
104
+ res.writeHead(405, { 'Content-Type': 'application/json' });
105
+ res.end(JSON.stringify({ status: 'error', answer: 'Method not allowed, use POST' }));
106
+ return;
107
+ }
108
108
 
109
- socket.on('end', async () => {
109
+ // 读取请求体
110
+ let body = '';
111
+ req.on('data', chunk => { body += chunk.toString(); });
112
+ req.on('end', async () => {
110
113
  try {
111
- const request = JSON.parse(data);
114
+ const request = JSON.parse(body);
112
115
  const result = await this._executeTask(request);
113
- socket.end(JSON.stringify(result));
116
+ res.writeHead(200, { 'Content-Type': 'application/json' });
117
+ res.end(JSON.stringify(result));
114
118
  } catch (err) {
115
- socket.end(JSON.stringify({
116
- status: 'error',
117
- answer: `Server error: ${err.message}`
118
- }));
119
+ logger.error(`[HTTP Error] ${err.message}`);
120
+ res.writeHead(400, { 'Content-Type': 'application/json' });
121
+ res.end(JSON.stringify({ status: 'error', answer: `Bad request: ${err.message}` }));
119
122
  }
120
123
  });
121
-
122
- socket.on('error', (err) => {
123
- // 连接错误,忽略
124
- });
125
124
  }
126
125
 
127
126
  async _executeTask(request) {
@@ -131,6 +130,9 @@ class AgentServer {
131
130
  return { status: 'error', answer: 'No task provided' };
132
131
  }
133
132
 
133
+ const logger = require('./logger');
134
+ logger.info(`[Remote Task] ${task}`);
135
+
134
136
  try {
135
137
  // 可选:开启新对话
136
138
  if (newChat) {
@@ -138,8 +140,10 @@ class AgentServer {
138
140
  }
139
141
 
140
142
  const result = await this.agent.run(task);
143
+ logger.info(`[Remote Task Done] ${result.status} (${result.duration}ms)`);
141
144
  return result;
142
145
  } catch (err) {
146
+ logger.error(`[Remote Task Error] ${err.message}`);
143
147
  return {
144
148
  question: task,
145
149
  answer: `Error: ${err.message}`,