agentpage 0.0.15 → 0.0.16

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -148,6 +148,7 @@ AI 每一轮不是“凭记忆猜页面”,而是基于最新快照选择可
148
148
  - 模型可在文本中返回:
149
149
  - `REMAINING: <剩余内容>`:表示还有任务要继续
150
150
  - `REMAINING: DONE`:表示剩余任务已空
151
+ - 注意:模型在 `tool_calls` 轮可能返回空 `content`;这不代表任务结束。
151
152
 
152
153
  ### 3) 批量但不跨变更链式执行
153
154
 
@@ -180,6 +181,7 @@ AI 每一轮不是“凭记忆猜页面”,而是基于最新快照选择可
180
181
 
181
182
  - `Current remaining instruction`(当前剩余任务)
182
183
  - `Previous round planned task array`(上一轮已执行任务)
184
+ - `Previous round model output (normalized)`(上一轮模型输出归一化摘要)
183
185
  - `Latest DOM snapshot`(当前快照)
184
186
 
185
187
  说明:
@@ -195,6 +197,9 @@ AI 每一轮不是“凭记忆猜页面”,而是基于最新快照选择可
195
197
  - `REMAINING: <new remaining instruction>`
196
198
  - 或 `REMAINING: DONE`
197
199
 
200
+ 实现细节:
201
+ - 若该轮返回 `tool_calls` 且 `content` 为空,loop 仍以“工具执行结果”推进状态,不把空文本当完成信号。
202
+
198
203
  ### 3) 每轮执行与状态推进
199
204
 
200
205
  loop 对本轮返回做以下处理:
@@ -205,7 +210,11 @@ loop 对本轮返回做以下处理:
205
210
  4. 刷新快照进入下一轮
206
211
  5. 更新下一轮任务文本:
207
212
  - 优先使用 `REMAINING`
208
- - 若缺失 `REMAINING`,保持当前任务不推进(按协议回退)
213
+ - 若缺失 `REMAINING` 且本轮有执行动作:按线性任务剔除做启发式推进(避免整段原任务重复)
214
+ - 若缺失 `REMAINING` 且本轮无执行进展:保持当前任务不推进(按协议回退)
215
+ 6. 若“remaining 未完成 + 无工具调用”:
216
+ - 不直接结束
217
+ - 下一轮注入 `Protocol violation` 强约束提示,要求“要么给可执行工具调用,要么严格 `REMAINING: DONE`”
209
218
 
210
219
  ### 3.1) 找不到元素重试流(Not-found Retry Dialogue)
211
220
 
@@ -223,7 +232,7 @@ loop 对本轮返回做以下处理:
223
232
 
224
233
  ### 4) 停机条件
225
234
 
226
- - 无工具调用
235
+ - 无工具调用且 remaining 已完成(或明确 `REMAINING: DONE`)
227
236
  - `REMAINING: DONE` 后自然收敛
228
237
  - 重复批次防自转触发
229
238
  - 达到 `maxRounds`
@@ -274,6 +283,7 @@ loop 对本轮返回做以下处理:
274
283
  - `Current remaining instruction`
275
284
  - `Done steps (do NOT repeat)`
276
285
  - `Previous round planned task array`
286
+ - `Previous round model output (normalized)`
277
287
  - `Latest DOM snapshot`
278
288
 
279
289
  这层是“每轮变化”的动态上下文。
@@ -285,7 +295,8 @@ loop 对本轮返回做以下处理:
285
295
  - 首轮使用前端注入的 `initialSnapshot`
286
296
  - 每轮执行后刷新快照
287
297
  - 推进 `remainingInstruction`
288
- - `REMAINING` 缺失时不推进任务(保持当前 remaining)
298
+ - `REMAINING` 缺失且本轮有执行动作时:按线性任务剔除做启发式推进
299
+ - `REMAINING` 缺失且本轮无执行进展时:保持当前 remaining
289
300
  - 防空转、防重复、防无限循环
290
301
  - DOM 变更动作触发强制断轮(等待下一轮新快照)
291
302
 
@@ -384,7 +395,7 @@ sequenceDiagram
384
395
  主流程位于 `src/core/agent-loop/index.ts`:
385
396
 
386
397
  1. 确保当前快照可用
387
- 2. 构建紧凑消息(原始目标 + done steps + 最新快照)
398
+ 2. 构建紧凑消息(remaining + 执行历史 + 上轮模型输出 + 最新快照)
388
399
  3. 调用 AI
389
400
  4. 执行工具调用并记录 trace
390
401
  5. 运行保护机制
@@ -392,20 +403,24 @@ sequenceDiagram
392
403
 
393
404
  ### 渐进式执行状态(新增)
394
405
 
395
- `src/core/agent-loop/index.ts` 内部维护 3 个关键状态:
406
+ `src/core/agent-loop/index.ts` 内部维护 5 个关键状态:
396
407
  - `remainingInstruction`:当前轮次待消费文本(初始值为用户原始输入)
397
408
  - `previousRoundTasks`:上一轮执行任务数组
409
+ - `previousRoundPlannedTasks`:上一轮模型给出的计划批次(执行前)
410
+ - `previousRoundModelOutput`:上一轮模型输出归一化摘要(执行后供下轮输入)
398
411
  - `lastPlannedBatchKey`:用于识别是否连续两轮给出完全相同的任务批次
399
412
 
400
413
  停机规则:
401
- - 若模型返回无工具调用直接结束
414
+ - 若模型返回无工具调用且 remaining 未完成 不直接结束,进入协议修复轮
415
+ - 若模型返回无工具调用且 remaining 已完成(或 `REMAINING: DONE`)→ 结束
402
416
  - 若连续两轮规划出相同任务批次,且上一轮无错误 → 自动终止,防止自转
403
417
  - 若模型文本包含 `REMAINING: DONE`,通常下一轮会自然进入“无工具调用总结”并结束
404
418
 
405
419
  ### 紧凑消息结构
406
420
 
407
421
  由 `messages.ts` 构建,核心语义:
408
- - Master goal:用户原始任务(永远保留)
422
+ - Round 0:用户原始任务 + 首轮快照
423
+ - Round 1+:剩余任务 + done steps + 上轮计划批次 + 上轮模型输出归一化 + 最新快照
409
424
  - Done steps:已完成动作(避免重复)
410
425
  - Execution context + latest snapshot:当前可执行范围
411
426
 
@@ -458,6 +473,14 @@ sequenceDiagram
458
473
 
459
474
  通过 `ToolRegistry` 统一暴露给模型,执行结果标准化返回。
460
475
 
476
+ ### Playwright 对齐说明(当前实现)
477
+
478
+ - `dom.click`:采用更完整的点击事件链(`pointerdown/mousedown/pointerup/mouseup/click`)。
479
+ - `dom.select_option`:支持 `value/label/index`;结果返回显式 `value + label`。
480
+ - `dom.fill`:不允许用于 `checkbox/radio/file/button/submit/reset` 等不兼容输入类型。
481
+ - `wait.wait_for_selector`:支持 `state=attached|visible|hidden|detached`(默认 `attached`)。
482
+ - 快照运行态增强:可见 `select val`、`option selected`、`checked`、`disabled`、`readonly`,减少重复操作。
483
+
461
484
  ---
462
485
 
463
486
  ## 扩展与自定义
package/dist/index.mjs CHANGED
@@ -162,7 +162,7 @@ function formatToolResultBrief(result) {
162
162
  * - `previousRoundTasks`:上一轮已执行的任务数组,避免重复计划。
163
163
  * - 消息中要求模型输出 `REMAINING: ...` 或 `REMAINING: DONE`,供下一轮继续消费。
164
164
  */
165
- function buildCompactMessages(userMessage, trace, latestSnapshot, currentUrl, history, remainingInstruction, previousRoundTasks) {
165
+ function buildCompactMessages(userMessage, trace, latestSnapshot, currentUrl, history, remainingInstruction, previousRoundTasks, previousRoundModelOutput, previousRoundPlannedTasks, protocolViolationHint) {
166
166
  const messages = history ? [...history] : [];
167
167
  const allowAgentUiInteraction = isExplicitAgentUiRequest(userMessage);
168
168
  const activeInstruction = remainingInstruction && remainingInstruction.trim() ? remainingInstruction.trim() : userMessage;
@@ -176,6 +176,7 @@ function buildCompactMessages(userMessage, trace, latestSnapshot, currentUrl, hi
176
176
  ];
177
177
  if (currentUrl) parts.push("", `URL: ${currentUrl}`);
178
178
  if (latestSnapshot) parts.push("", "## Current page snapshot", "Apply task-reduction model directly from this snapshot. Do NOT restate the task.", "Use hash IDs (e.g. #a1b2c) from the snapshot as selector params.", "Do NOT call page_info (get_url/get_title/query_all/snapshot).", "Batch independent visible actions in one round.", "If action changes DOM (open modal/navigate), stop that batch and continue next round.", "For dropdown/select fields, use dom with action=select_option (or fill on a select).", allowAgentUiInteraction ? "User explicitly asked to operate AutoPilot UI. You may interact with chat input/send/dock only as requested." : "Do NOT interact with any AI chat UI elements (chat input, send button, dock). Only operate on the actual page content.", "Output one line: REMAINING: <new remaining task after this round> or REMAINING: DONE", wrapSnapshot(latestSnapshot));
179
+ if (protocolViolationHint) parts.push("", protocolViolationHint);
179
180
  messages.push({
180
181
  role: "user",
181
182
  content: parts.join("\n")
@@ -215,6 +216,8 @@ function buildCompactMessages(userMessage, trace, latestSnapshot, currentUrl, hi
215
216
  if (hasErrors) contextParts.push("", "The last step failed. Retry with a different approach, or skip and continue with other visible targets.");
216
217
  else contextParts.push("", "If the goal is fully done, reply with a short summary (no tool calls).");
217
218
  if (previousRoundTasks && previousRoundTasks.length > 0) contextParts.push("", "Previous round planned task array (already executed):", ...previousRoundTasks.map((task, index) => `${index + 1}. ${task}`));
219
+ if (previousRoundPlannedTasks && previousRoundPlannedTasks.length > 0) contextParts.push("", "Previous round model planned task array (before execution):", ...previousRoundPlannedTasks.map((task, index) => `${index + 1}. ${task}`));
220
+ if (previousRoundModelOutput) contextParts.push("", "Previous round model output (normalized, for task reduction input):", previousRoundModelOutput);
218
221
  contextParts.push("", "After this round, include one plain text line:", "REMAINING: <new remaining instruction after this-round actions>", "or REMAINING: DONE");
219
222
  const lastEntry = trace[trace.length - 1];
220
223
  if (hasToolError(lastEntry.result)) {
@@ -222,6 +225,7 @@ function buildCompactMessages(userMessage, trace, latestSnapshot, currentUrl, hi
222
225
  if (stripped && stripped.length < 300) contextParts.push("", "Last error: " + stripped);
223
226
  }
224
227
  if (currentUrl) contextParts.push("", `URL: ${currentUrl}`);
228
+ if (protocolViolationHint) contextParts.push("", protocolViolationHint);
225
229
  if (latestSnapshot) contextParts.push("", "## Latest DOM snapshot", "Use hash IDs from this snapshot. Do NOT call page_info — this is already the latest.", wrapSnapshot(latestSnapshot));
226
230
  messages.push({
227
231
  role: "user",
@@ -385,9 +389,12 @@ async function executeAgentLoop(params) {
385
389
  let outputTokens = 0;
386
390
  let remainingInstruction = message.trim();
387
391
  let previousRoundTasks = [];
392
+ let previousRoundPlannedTasks = [];
393
+ let previousRoundModelOutput = "";
388
394
  let lastPlannedBatchKey = "";
389
395
  let consecutiveSamePlannedBatch = 0;
390
396
  let lastRoundHadError = false;
397
+ let protocolViolationHint;
391
398
  let recoveryCount = 0;
392
399
  let redundantInterceptCount = 0;
393
400
  let pendingNotFoundRetry;
@@ -449,6 +456,20 @@ async function executeAgentLoop(params) {
449
456
  return `${tc.name}:${inputText}`;
450
457
  });
451
458
  /**
459
+ * 规范化模型文本输出(中)/ Normalize model text for next-round input (EN).
460
+ *
461
+ * 优先保留 REMAINING 行;否则截断首段文本,避免长篇规划污染下一轮输入。
462
+ * Prefer REMAINING line; otherwise keep a short excerpt to avoid long planning spillover.
463
+ */
464
+ const normalizeModelOutput = (text) => {
465
+ if (!text) return "";
466
+ const trimmed = text.trim();
467
+ if (!trimmed) return "";
468
+ const remainingMatch = trimmed.match(/REMAINING\s*:\s*([\s\S]*)$/i);
469
+ if (remainingMatch) return `REMAINING: ${remainingMatch[1].trim()}`;
470
+ return (trimmed.split(/\n\s*\n/)[0]?.trim() ?? trimmed).slice(0, 220);
471
+ };
472
+ /**
452
473
  * 判定动作是否会触发 DOM 结构变化(中)/ Whether action may cause DOM-shape change (EN).
453
474
  *
454
475
  * 触发后应强制断轮,等待下一轮新快照继续。
@@ -490,8 +511,8 @@ async function executeAgentLoop(params) {
490
511
  /**
491
512
  * 推进下一轮描述(中)/ Derive next-round instruction from model text (EN).
492
513
  *
493
- * 优先 REMAINING 协议;若未提供,则把本轮 content 视为“更新后的任务描述”。
494
- * Priority: REMAINING protocol first; otherwise treat current content as updated instruction.
514
+ * 优先 REMAINING 协议;若未提供,则保持当前 remaining 不变。
515
+ * Priority: REMAINING protocol first; otherwise keep current remaining instruction unchanged.
495
516
  */
496
517
  const deriveNextInstruction = (text, currentInstruction) => {
497
518
  const parsed = parseRemainingInstruction(text);
@@ -504,12 +525,26 @@ async function executeAgentLoop(params) {
504
525
  hasRemainingProtocol: false
505
526
  };
506
527
  };
528
+ /**
529
+ * 启发式任务剔除(中)/ Heuristic remaining reduction for linear instructions (EN).
530
+ *
531
+ * 在 REMAINING 缺失但本轮有执行动作时,按“线性片段”剔除已执行步数,避免下一轮继续携带整段原任务。
532
+ * When REMAINING is missing but actions were executed, drop executed step count from a linearized instruction.
533
+ */
534
+ const reduceRemainingHeuristically = (currentInstruction, executedCount) => {
535
+ if (!currentInstruction.trim() || executedCount <= 0) return currentInstruction;
536
+ const parts = currentInstruction.replace(/\s+/g, " ").replace(/(->|=>|→)/g, " 然后 ").replace(/[,,。;;]/g, " 然后 ").split(/\s*(?:然后|再|并且|并|接着|随后|之后)\s*/g).map((part) => part.trim()).filter(Boolean);
537
+ if (parts.length <= 1) return currentInstruction;
538
+ const nextParts = parts.slice(Math.min(executedCount, parts.length));
539
+ if (nextParts.length === 0) return "";
540
+ return nextParts.join(" -> ");
541
+ };
507
542
  for (let round = 0; round < maxRounds; round++) {
508
543
  callbacks?.onRound?.(round);
509
544
  usedRounds = round + 1;
510
545
  if (!pageContext.latestSnapshot) await refreshSnapshot();
511
546
  const effectivePrompt = stripSnapshotFromPrompt(systemPrompt);
512
- const chatMessages = buildCompactMessages(message, fullToolTrace, pageContext.latestSnapshot, pageContext.currentUrl, history, remainingInstruction, previousRoundTasks);
547
+ const chatMessages = buildCompactMessages(message, fullToolTrace, pageContext.latestSnapshot, pageContext.currentUrl, history, remainingInstruction, previousRoundTasks, previousRoundModelOutput, previousRoundPlannedTasks, protocolViolationHint);
513
548
  if (pendingNotFoundRetry && pendingNotFoundRetry.tasks.length > 0) chatMessages.push({
514
549
  role: "user",
515
550
  content: [
@@ -528,8 +563,7 @@ async function executeAgentLoop(params) {
528
563
  });
529
564
  inputTokens += response.usage?.inputTokens ?? 0;
530
565
  outputTokens += response.usage?.outputTokens ?? 0;
531
- const nextInstructionState = deriveNextInstruction(response.text, remainingInstruction);
532
- remainingInstruction = nextInstructionState.nextInstruction;
566
+ const parsedInstructionState = deriveNextInstruction(response.text, remainingInstruction);
533
567
  if (!response.toolCalls || response.toolCalls.length === 0) {
534
568
  if (pendingNotFoundRetry) {
535
569
  const unresolvedHint = response.text?.toLowerCase() ?? "";
@@ -545,10 +579,29 @@ async function executeAgentLoop(params) {
545
579
  }
546
580
  pendingNotFoundRetry = void 0;
547
581
  }
582
+ if (parsedInstructionState.hasRemainingProtocol) remainingInstruction = parsedInstructionState.nextInstruction;
583
+ if (remainingInstruction.trim().length > 0 && round < maxRounds - 1) {
584
+ protocolViolationHint = [
585
+ "Protocol violation in previous round:",
586
+ "- Remaining task is not DONE, but no tool calls were returned.",
587
+ "This round MUST do one of:",
588
+ "1) Return actionable tool calls for visible targets; or",
589
+ "2) If truly complete, return a short summary and EXACTLY `REMAINING: DONE`.",
590
+ "Do NOT output planning/explaining text."
591
+ ].join("\n");
592
+ lastRoundHadError = true;
593
+ await refreshSnapshot();
594
+ continue;
595
+ }
548
596
  finalReply = response.text ?? "";
549
597
  if (finalReply) callbacks?.onText?.(finalReply);
550
598
  break;
551
599
  }
600
+ protocolViolationHint = void 0;
601
+ const plannedTasksCurrentRound = buildTaskArray(response.toolCalls.map((tc) => ({
602
+ name: tc.name,
603
+ input: tc.input
604
+ })));
552
605
  const plannedBatchKey = JSON.stringify(response.toolCalls.map((tc) => ({
553
606
  name: tc.name,
554
607
  input: tc.input
@@ -617,9 +670,16 @@ async function executeAgentLoop(params) {
617
670
  tasks: roundMissingTasks
618
671
  };
619
672
  else pendingNotFoundRetry = void 0;
620
- if (!nextInstructionState.hasRemainingProtocol) roundHasError = true;
673
+ if (parsedInstructionState.hasRemainingProtocol) remainingInstruction = parsedInstructionState.nextInstruction;
674
+ else {
675
+ const nextByHeuristic = reduceRemainingHeuristically(remainingInstruction, executedTaskCalls.length);
676
+ if (nextByHeuristic !== remainingInstruction) remainingInstruction = nextByHeuristic;
677
+ else roundHasError = true;
678
+ }
679
+ previousRoundModelOutput = parsedInstructionState.hasRemainingProtocol ? normalizeModelOutput(response.text) : `REMAINING: ${remainingInstruction || "DONE"}`;
621
680
  lastRoundHadError = roundHasError;
622
681
  previousRoundTasks = buildTaskArray(executedTaskCalls);
682
+ previousRoundPlannedTasks = plannedTasksCurrentRound;
623
683
  const idleResult = detectIdleLoop(executedTaskCalls.map((tc) => tc.name), consecutiveReadOnlyRounds);
624
684
  if (idleResult === -1) {
625
685
  finalReply = response.text || "任务已完成。";