yiyan-browser-agent 1.4.11 → 1.5.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -11,7 +11,7 @@
11
11
 
12
12
  It drives a real browser to talk to [Yiyan (文心一言)](https://yiyan.baidu.com), giving you a Claude Code / Cursor-style coding agent powered by Baidu's AI models at zero cost.
13
13
 
14
- [Installation](#-installation) · [Quick Start](#-quick-start) · [Usage](#-usage) · [Configuration](#-configuration) · [Tools](#-available-tools) · [Contributing](#-contributing)
14
+ [Installation](#-installation) · [Quick Start](#-quick-start) · [Usage](#-usage) · [HTTP API](#-http-api) · [Configuration](#-configuration) · [Tools](#-available-tools) · [Contributing](#-contributing)
15
15
 
16
16
  ---
17
17
 
@@ -47,30 +47,76 @@ Your Terminal
47
47
 
48
48
  ## 📦 Installation
49
49
 
50
- ```bash
50
+ ### Windows
51
+
52
+ ```powershell
53
+ # 安装 Node.js (如果没有)
54
+ # 从 https://nodejs.org 下载安装,或使用 winget:
55
+ winget install OpenJS.NodeJS.LTS
56
+
57
+ # 全局安装 yiyan-browser-agent
51
58
  npm install -g yiyan-browser-agent
59
+
60
+ # 安装 Chromium 浏览器 (首次安装后自动执行,约 150MB)
61
+ npx playwright install chromium
52
62
  ```
53
63
 
54
- > Chromium downloads automatically after install (~150 MB, one time only).
64
+ ### Ubuntu / Linux
65
+
66
+ ```bash
67
+ # 安装 Node.js (如果没有)
68
+ curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash -
69
+ sudo apt install -y nodejs
70
+
71
+ # 全局安装 yiyan-browser-agent
72
+ npm install -g yiyan-browser-agent
73
+
74
+ # 安装 Chromium 和依赖
75
+ npx playwright install chromium
76
+ npx playwright install-deps chromium # 安装系统依赖
77
+ ```
55
78
 
56
- **Requirements:** Node.js ≥ 18
79
+ > **Requirements:** Node.js ≥ 18
57
80
 
58
81
  ---
59
82
 
60
83
  ## 🚀 Quick Start
61
84
 
62
- **1. First run — log in to Yiyan:**
85
+ ### Windows
86
+
87
+ **1. 首次运行 — 登录文心一言:**
88
+ ```powershell
89
+ yiyan-agent -i
90
+ ```
91
+ 浏览器窗口打开后,登录你的百度账号,然后回到终端按 Enter。会话会保存 — 只需登录一次。
92
+
93
+ **2. 给任务:**
94
+ ```powershell
95
+ yiyan-agent "创建一个 Express REST API,带用户认证"
96
+ ```
97
+
98
+ **3. 发送任务到已运行的服务器:**
99
+ ```powershell
100
+ # 终端 1: 启动交互模式 (作为 HTTP 服务器)
101
+ yiyan-agent -i
102
+
103
+ # 终端 2: 发送任务 (转发到服务器,不启动新浏览器)
104
+ yiyan-agent "上海天气,20个字"
105
+ ```
106
+
107
+ ### Ubuntu / Linux
108
+
109
+ **1. 首次运行 — 登录文心一言:**
63
110
  ```bash
64
- yiyan-agent --interactive
111
+ yiyan-agent -i
65
112
  ```
66
- A browser window opens. Log in to your Yiyan (文心一言) account, then come back to the terminal and press **Enter**. Your session is saved — you only do this once.
67
113
 
68
- **2. Give it a task:**
114
+ **2. 给任务:**
69
115
  ```bash
70
116
  yiyan-agent "build a REST API in Express with user authentication"
71
117
  ```
72
118
 
73
- **3. Use the short alias `ya` from any project folder:**
119
+ **3. 从任意目录使用短别名 `ya`:**
74
120
  ```bash
75
121
  cd ~/my-project
76
122
  ya "add input validation to all my API routes"
@@ -84,11 +130,10 @@ ya "add input validation to all my API routes"
84
130
  yiyan-agent [OPTIONS] [TASK]
85
131
 
86
132
  -t, --task <task> Task to run (or just type it as the last argument)
87
- -i, --interactive Keep browser open, run multiple tasks in a session
133
+ -i, --interactive Keep browser open, run multiple tasks (starts HTTP server)
88
134
  -d, --dir <path> Set working directory (default: current directory)
89
135
  --debug Print raw AI responses to the terminal
90
- --headless Run browser invisibly (requires prior login)
91
- --save-log Save full session log to ~/.yiyan-agent/logs/
136
+ --show-browser Show browser window (non-interactive mode)
92
137
  --calibrate Auto-detect DOM selectors (run if agent breaks)
93
138
  -h, --help Show help
94
139
 
@@ -102,38 +147,155 @@ Aliases:
102
147
  # Single task — runs and exits
103
148
  yiyan-agent "create a Python script that scrapes Hacker News"
104
149
 
105
- # Interactive mode — keeps browser open between tasks
106
- yiyan-agent --interactive
150
+ # Interactive mode — keeps browser open, starts HTTP server on port 9527
151
+ yiyan-agent -i
107
152
 
108
153
  # Run on a specific project
109
154
  ya --dir ~/projects/my-app "refactor all callbacks to async/await"
110
155
 
111
- ### Process Communication (v1.4.8+)
156
+ # Debug mode (shows what Yiyan is actually outputting)
157
+ ya --debug "build a calculator"
158
+
159
+ # In interactive mode, type 'quit' or 'q' to exit:
160
+ ❯ quit
161
+ ```
162
+
163
+ ---
112
164
 
113
- When interactive mode (`-i`) is running, other `yiyan-agent` processes automatically detect and forward tasks to it, avoiding repeated browser startup:
165
+ ## 🌐 HTTP API (v1.5.0+)
166
+
167
+ When interactive mode (`-i`) is running, an HTTP server starts on port **9527**, allowing external services to send tasks.
168
+
169
+ ### Process Communication
114
170
 
115
171
  ```bash
116
- # Terminal 1: Start interactive mode (acts as server)
172
+ # Terminal 1: Start interactive mode (HTTP server)
117
173
  yiyan-agent -i
118
174
  # → Server listening on port 9527
119
175
 
120
176
  # Terminal 2: Send task (forwarded to server, no new browser)
121
177
  yiyan-agent "北京天气,15个字"
122
- # → Found running server, forwarding task...
178
+ # → Found running server on port 9527, forwarding task...
123
179
  ```
124
180
 
125
- Lock file: `~/.yiyan-agent/server.lock`
181
+ ### HTTP POST API
126
182
 
127
- # Debug mode (shows what Yiyan is actually outputting)
128
- ya --debug "build a calculator"
183
+ **Endpoint:** `POST http://localhost:9527/task`
184
+
185
+ **Request Body:**
186
+ ```json
187
+ {
188
+ "task": "你的任务描述",
189
+ "newChat": true // 可选,是否开启新对话,默认 true
190
+ }
191
+ ```
192
+
193
+ **`newChat` 参数说明:**
129
194
 
130
- # Headless mode (faster browser runs in background)
131
- ya --headless "write unit tests for utils.js"
195
+ | | 行为 | 适用场景 |
196
+ |---|---|---|
197
+ | `true` (默认) | 点击"新对话"按钮,开启全新对话,AI 无历史记忆 | 新任务、独立问题 |
198
+ | `false` | 在当前对话中继续,AI 会记住之前的内容 | 多轮对话、上下文关联任务 |
199
+
200
+ **Response:**
201
+ ```json
202
+ {
203
+ "question": "上海天气,20个字",
204
+ "answer": "上海今日晴,气温25°C...",
205
+ "duration": 5234,
206
+ "status": "success"
207
+ }
208
+ ```
209
+
210
+ ### newChat 参数使用案例
211
+
212
+ 验证 `newChat` 参数的效果(连续对话 vs 新对话):
213
+
214
+ **Windows CMD:**
215
+ ```cmd
216
+ # 第一次请求 - 告诉AI你的名字
217
+ curl -X POST http://localhost:9527/task -H "Content-Type: application/json" -d "{\"task\":\"我叫小明\"}"
218
+
219
+ # 第二次请求 - newChat=false,AI应记得你叫小明
220
+ curl -X POST http://localhost:9527/task -H "Content-Type: application/json" -d "{\"task\":\"我叫什么名字\",\"newChat\":false}"
221
+
222
+ # 第三次请求 - newChat=true,AI应不记得你叫小明(新对话)
223
+ curl -X POST http://localhost:9527/task -H "Content-Type: application/json" -d "{\"task\":\"我叫什么名字\",\"newChat\":true}"
224
+ ```
225
+
226
+ **Ubuntu / Linux:**
227
+ ```bash
228
+ # 第一次请求 - 告诉AI你的名字
229
+ curl -X POST http://localhost:9527/task -H "Content-Type: application/json" -d '{"task":"我叫小明"}'
230
+
231
+ # 第二次请求 - newChat=false,AI应记得你叫小明
232
+ curl -X POST http://localhost:9527/task -H "Content-Type: application/json" -d '{"task":"我叫什么名字","newChat":false}'
233
+
234
+ # 第三次请求 - newChat=true,AI应不记得你叫小明(新对话)
235
+ curl -X POST http://localhost:9527/task -H "Content-Type: application/json" -d '{"task":"我叫什么名字","newChat":true}'
236
+ ```
237
+
238
+ **预期结果:**
239
+ - 第二次请求 (`newChat=false`):AI 回答"小明"
240
+ - 第三次请求 (`newChat=true`):AI 回答"不知道"或"你没有告诉我"
241
+
242
+ ### Windows CMD (curl)
243
+
244
+ ```cmd
245
+ curl -X POST http://localhost:9527/task -H "Content-Type: application/json" -d "{\"task\":\"上海天气,20个字\"}"
246
+ ```
247
+
248
+ ### PowerShell
132
249
 
133
- # In interactive mode, type 'new' to start a fresh chat:
134
- Task: new
250
+ ```powershell
251
+ Invoke-RestMethod -Uri "http://localhost:9527/task" -Method POST -Body '{"task":"上海天气"}' -ContentType "application/json"
135
252
  ```
136
253
 
254
+ ### Ubuntu / Linux (curl)
255
+
256
+ ```bash
257
+ curl -X POST http://localhost:9527/task -H "Content-Type: application/json" -d '{"task":"上海天气,20个字"}'
258
+ ```
259
+
260
+ ### From Other Programming Languages
261
+
262
+ **Python:**
263
+ ```python
264
+ import requests
265
+ import json
266
+
267
+ response = requests.post(
268
+ 'http://localhost:9527/task',
269
+ json={'task': '上海天气,20个字'}
270
+ )
271
+ result = response.json()
272
+ print(result)
273
+ ```
274
+
275
+ **Node.js:**
276
+ ```javascript
277
+ const http = require('http');
278
+
279
+ const body = JSON.stringify({ task: '上海天气,20个字' });
280
+
281
+ const req = http.request({
282
+ hostname: 'localhost',
283
+ port: 9527,
284
+ path: '/task',
285
+ method: 'POST',
286
+ headers: { 'Content-Type': 'application/json' }
287
+ }, (res) => {
288
+ let data = '';
289
+ res.on('data', chunk => data += chunk);
290
+ res.on('end', () => console.log(JSON.parse(data)));
291
+ });
292
+
293
+ req.write(body);
294
+ req.end();
295
+ ```
296
+
297
+ **Lock file:** `~/.yiyan-agent/server.lock`
298
+
137
299
  ---
138
300
 
139
301
  ## ⚙️ Configuration
@@ -209,6 +371,7 @@ Everything lives in `~/.yiyan-agent/` in your home directory:
209
371
  ~/.yiyan-agent/
210
372
  ├── session/ ← Browser cookies (login once, runs forever)
211
373
  ├── logs/ ← Session logs (only saved with --save-log)
374
+ ├── server.lock ← HTTP server process lock
212
375
  └── config.json ← Your global settings
213
376
  ```
214
377
 
@@ -235,8 +398,15 @@ yiyan-agent --interactive
235
398
  ```
236
399
 
237
400
  ### Chromium didn't download automatically
401
+ **Windows:**
402
+ ```powershell
403
+ npx playwright install chromium
404
+ ```
405
+
406
+ **Ubuntu / Linux:**
238
407
  ```bash
239
408
  npx playwright install chromium
409
+ npx playwright install-deps chromium
240
410
  ```
241
411
 
242
412
  ### Response times out on long tasks
@@ -245,6 +415,20 @@ Increase the timeout in your config:
245
415
  { "RESPONSE_TIMEOUT": 300000, "STABLE_DELAY": 4000 }
246
416
  ```
247
417
 
418
+ ### HTTP server not detected by other processes
419
+ The lock file may be stale. Kill the old process and restart:
420
+ ```bash
421
+ # Check process
422
+ cat ~/.yiyan-agent/server.lock
423
+
424
+ # Kill if needed
425
+ kill <PID> # Linux
426
+ taskkill /PID <PID> /F # Windows
427
+
428
+ # Restart
429
+ yiyan-agent -i
430
+ ```
431
+
248
432
  ---
249
433
 
250
434
  ## 🗂️ Project Structure
@@ -255,6 +439,8 @@ yiyan-browser-agent/
255
439
  │ ├── index.js ← CLI entry point and argument parsing
256
440
  │ ├── agent.js ← Core agent loop (send → wait → parse → execute)
257
441
  │ ├── browser.js ← Playwright controller for yiyan.baidu.com
442
+ │ ├── server.js ← HTTP server for process communication (v1.5.0+)
443
+ │ ├── client.js ← HTTP client to forward tasks (v1.5.0+)
258
444
  │ ├── tools.js ← All 15 filesystem and shell tools
259
445
  │ ├── parser.js ← Extracts tool calls from AI responses (6 strategies)
260
446
  │ ├── prompt.js ← System prompt and conversation history manager
@@ -289,7 +475,6 @@ node src/index.js --interactive
289
475
  - 🎨 **UI selector resilience** — Yiyan updates their UI occasionally; better selector strategies are welcome
290
476
  - 🔌 **More tools** — image generation, browser control, database tools, etc.
291
477
  - 🌐 **Other AI frontends** — adapting the browser layer to work with other free AI chats
292
- - 📦 **Windows support** — currently tested on Linux; Windows path handling may need fixes
293
478
  - 📝 **Better error messages** — making failures easier to diagnose
294
479
 
295
480
  ### How to contribute
@@ -329,4 +514,4 @@ MIT — see [LICENSE](./LICENSE) for details.
329
514
 
330
515
  If this project helped you, consider giving it a ⭐ on GitHub!
331
516
 
332
- </div>
517
+ </div>
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "yiyan-browser-agent",
3
- "version": "1.4.11",
3
+ "version": "1.5.1",
4
4
  "description": "AI coding agent powered by Yiyan (文心一言) via browser automation — no API key needed",
5
5
  "main": "src/index.js",
6
6
  "bin": {
package/src/client.js CHANGED
@@ -1,8 +1,8 @@
1
1
  #!/usr/bin/env node
2
- // src/client.js — TCP client for sending tasks to running server
2
+ // src/client.js — HTTP client for sending tasks to running server
3
3
  'use strict';
4
4
 
5
- const net = require('net');
5
+ const http = require('http');
6
6
  const fs = require('fs');
7
7
  const path = require('path');
8
8
  const os = require('os');
@@ -46,45 +46,50 @@ class AgentClient {
46
46
  return false;
47
47
  }
48
48
 
49
- // 发送任务到服务器
49
+ // 发送任务到服务器(HTTP POST)
50
50
  async sendTask(task, options = {}) {
51
51
  return new Promise((resolve, reject) => {
52
- const socket = net.connect(this.port, 'localhost');
53
52
  const timeout = options.timeout || 180000; // 3分钟
54
53
 
55
- let data = '';
56
- let timer = setTimeout(() => {
57
- socket.destroy();
58
- reject(new Error('Connection timeout'));
59
- }, timeout);
54
+ const requestBody = JSON.stringify({
55
+ task,
56
+ newChat: options.newChat !== false
57
+ });
60
58
 
61
- socket.on('connect', () => {
62
- clearTimeout(timer);
63
- // 发送任务
64
- const request = JSON.stringify({
65
- task,
66
- newChat: options.newChat !== false
59
+ const req = http.request({
60
+ hostname: 'localhost',
61
+ port: this.port,
62
+ path: '/task',
63
+ method: 'POST',
64
+ headers: {
65
+ 'Content-Type': 'application/json',
66
+ 'Content-Length': Buffer.byteLength(requestBody)
67
+ },
68
+ timeout
69
+ }, (res) => {
70
+ let data = '';
71
+ res.on('data', chunk => { data += chunk.toString(); });
72
+ res.on('end', () => {
73
+ try {
74
+ const result = JSON.parse(data);
75
+ resolve(result);
76
+ } catch (err) {
77
+ reject(new Error(`Invalid response: ${data}`));
78
+ }
67
79
  });
68
- socket.end(request);
69
80
  });
70
81
 
71
- socket.on('data', (chunk) => {
72
- data += chunk.toString();
82
+ req.on('error', (err) => {
83
+ reject(err);
73
84
  });
74
85
 
75
- socket.on('end', () => {
76
- try {
77
- const result = JSON.parse(data);
78
- resolve(result);
79
- } catch (err) {
80
- reject(new Error(`Invalid response: ${data}`));
81
- }
86
+ req.on('timeout', () => {
87
+ req.destroy();
88
+ reject(new Error('Request timeout'));
82
89
  });
83
90
 
84
- socket.on('error', (err) => {
85
- clearTimeout(timer);
86
- reject(err);
87
- });
91
+ req.write(requestBody);
92
+ req.end();
88
93
  });
89
94
  }
90
95
  }
package/src/server.js CHANGED
@@ -1,8 +1,8 @@
1
1
  #!/usr/bin/env node
2
- // src/server.js — TCP server for accepting tasks from other processes
2
+ // src/server.js — HTTP server for accepting tasks from other processes/services
3
3
  'use strict';
4
4
 
5
- const net = require('net');
5
+ const http = require('http');
6
6
  const fs = require('fs');
7
7
  const path = require('path');
8
8
  const os = require('os');
@@ -23,8 +23,8 @@ class AgentServer {
23
23
  throw new Error(`Server already running on port ${this.port}`);
24
24
  }
25
25
 
26
- this.server = net.createServer((socket) => {
27
- this._handleConnection(socket);
26
+ this.server = http.createServer((req, res) => {
27
+ this._handleRequest(req, res);
28
28
  });
29
29
 
30
30
  await new Promise((resolve, reject) => {
@@ -54,7 +54,6 @@ class AgentServer {
54
54
  const lock = JSON.parse(fs.readFileSync(LOCK_FILE, 'utf8'));
55
55
  // 检查进程是否还活着(Windows 兼容)
56
56
  try {
57
- // Windows: tasklist 检查, Unix: process.kill(0)
58
57
  if (process.platform === 'win32') {
59
58
  const result = require('child_process').execSync(
60
59
  `tasklist /FI "PID eq ${lock.pid}" /NH`,
@@ -97,29 +96,31 @@ class AgentServer {
97
96
  } catch {}
98
97
  }
99
98
 
100
- async _handleConnection(socket) {
101
- let data = '';
99
+ async _handleRequest(req, res) {
100
+ const logger = require('./logger');
102
101
 
103
- socket.on('data', (chunk) => {
104
- data += chunk.toString();
105
- });
102
+ // 只接受 POST /task
103
+ if (req.method !== 'POST') {
104
+ res.writeHead(405, { 'Content-Type': 'application/json' });
105
+ res.end(JSON.stringify({ status: 'error', answer: 'Method not allowed, use POST' }));
106
+ return;
107
+ }
106
108
 
107
- socket.on('end', async () => {
109
+ // 读取请求体
110
+ let body = '';
111
+ req.on('data', chunk => { body += chunk.toString(); });
112
+ req.on('end', async () => {
108
113
  try {
109
- const request = JSON.parse(data);
114
+ const request = JSON.parse(body);
110
115
  const result = await this._executeTask(request);
111
- socket.end(JSON.stringify(result));
116
+ res.writeHead(200, { 'Content-Type': 'application/json' });
117
+ res.end(JSON.stringify(result));
112
118
  } catch (err) {
113
- socket.end(JSON.stringify({
114
- status: 'error',
115
- answer: `Server error: ${err.message}`
116
- }));
119
+ logger.error(`[HTTP Error] ${err.message}`);
120
+ res.writeHead(400, { 'Content-Type': 'application/json' });
121
+ res.end(JSON.stringify({ status: 'error', answer: `Bad request: ${err.message}` }));
117
122
  }
118
123
  });
119
-
120
- socket.on('error', (err) => {
121
- // 连接错误,忽略
122
- });
123
124
  }
124
125
 
125
126
  async _executeTask(request) {
@@ -129,6 +130,9 @@ class AgentServer {
129
130
  return { status: 'error', answer: 'No task provided' };
130
131
  }
131
132
 
133
+ const logger = require('./logger');
134
+ logger.info(`[Remote Task] ${task}`);
135
+
132
136
  try {
133
137
  // 可选:开启新对话
134
138
  if (newChat) {
@@ -136,8 +140,10 @@ class AgentServer {
136
140
  }
137
141
 
138
142
  const result = await this.agent.run(task);
143
+ logger.info(`[Remote Task Done] ${result.status} (${result.duration}ms)`);
139
144
  return result;
140
145
  } catch (err) {
146
+ logger.error(`[Remote Task Error] ${err.message}`);
141
147
  return {
142
148
  question: task,
143
149
  answer: `Error: ${err.message}`,