explorbot 0.1.5 → 0.1.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,5 +1,12 @@
1
1
  Detect new valid paths that previous tests missed. Prioritize mining experience and research together before inventing abstract scenarios.
2
2
 
3
+ Rank every scenario you build by the **strength of its outcome**, from strongest to weakest:
4
+ 1. **Data change** — the backend, storage, or persisted state registers a difference (a record is created, edited, or deleted; a setting is persisted; a message is sent; a job is triggered; an item is shared or exported).
5
+ 2. **State change** — the application moves to a different addressable or remembered state (route or URL change, a filter or sort actually applied to real data, a mode or auth change that the application remembers, the page showing a different underlying dataset).
6
+ 3. **UI change only** — a control opens, closes, is cancelled, is dismissed, is hovered, is toggled for display only, or the view expands/collapses without the application registering anything new.
7
+
8
+ Prefer scenarios whose ending falls into category 1. Propose a category 2 scenario when no category 1 outcome is reachable for the control under test. Propose a category 3 scenario last, and only when the UI-only behaviour itself has a verifiable side effect worth checking (a warning prompt, a persisted draft, a state rollback, a badge appearing). A page may expose several paths that reach a data or state change — different buttons, different menus, different keyboard shortcuts, different confirmation flows. Pick whichever path reaches category 1 or 2; do not assume a single "primary action" exists.
9
+
3
10
  When <previously_tested_flows> is present, treat it as the ground truth for what already worked:
4
11
  - List items under Successful Flow describe the path that was executed
5
12
  - Lines in blockquotes (lines starting with >) are discoveries: extra fields, side panels, conditional UI, inputs called out during that run
@@ -11,7 +18,7 @@ When <previously_tested_flows> is NOT present, use <tested_scenarios> as the gro
11
18
  Read the step lines for each test to understand which controls were actually interacted with.
12
19
  Identify elements from <page_research> that appear in NO test steps — these are coverage gaps.
13
20
 
14
- Cross-read with <page_research>: for each form and Extended Research subsection, compare against those flows. Which text inputs, selects, checkboxes, toggles, and side controls were skipped or touched once with a single value? Prefer filling those gaps over repeating the same path.
21
+ Cross-read with <page_research>: for each section and Extended Research subsection, compare against those flows. Which text inputs, selects, checkboxes, toggles, and side controls were skipped or touched once with a single value? Prefer filling those gaps over repeating the same path.
15
22
 
16
23
  The Type column in <page_research> tables shows the ARIA role of each element.
17
24
  Cross-reference these types with the steps listed in <tested_scenarios> or <previously_tested_flows>:
@@ -24,16 +31,22 @@ Coverage gaps to look for:
24
31
  - Action buttons that were never clicked as part of a complete workflow
25
32
  - Dependent UI: controls that appear or change based on another control's value
26
33
 
27
- When proposing tests for forms, prefer filling ALL visible fields — not just required ones.
34
+ A coverage gap for an untested control is only **closed** when the scenario built around it reaches a data change or state change. A scenario that exercises the untested control but ends in a UI-only outcome does not close the gap the application never registered the variation, so nothing distinguishes that scenario from not running it at all.
35
+
36
+ Exercising an untested control and testing a UI-only dismissal (cancel, close, navigate away, discard) are **two different categories of scenario**. Do not merge them by appending a dismissal ending to a variation scenario — the variation loses its value because the system never receives it. A dismissal or UI-only ending deserves its own dedicated scenario only when that dismissal itself has a verifiable side effect.
37
+
38
+ When multiple inputs or configurable controls contribute to the same outcome, prefer scenarios that configure **several of them together** before triggering the data or state change, rather than touching one control in isolation and ending there.
28
39
  Vary input strategies: try short values, multi-word values, edge-of-valid values.
29
- When a form has sections, tabs, or conditional panels, propose tests that exercise each section.
30
- If a control has downstream effects (e.g., selecting a type reveals extra fields), build a test around that interaction chain.
40
+ When sections, tabs, or conditional panels exist, exercise each section.
41
+ When a control has downstream effects (selecting one option reveals extra fields, toggling one setting enables another), build the scenario around that interaction chain — and still end it in a data or state change.
31
42
 
32
43
  Combinatorial coverage (valid data only):
33
44
  - For each select or equivalent, ensure each option is exercised in at least one scenario, or one scenario whose steps walk through distinct options in sequence if that fits the task constraints better
34
45
  - Exercise each checkbox or binary control in both states when behavior can differ
35
46
  - Combine checkboxes and related toggles in small sets (pairs or triples) when they plausibly change validation, visible sections, or outcomes — avoid exploding into huge Cartesian products
36
47
 
37
- When heavy forms are not the focus, still pursue: unvisited state transitions, follow-ups after creates (share, export, duplicate), alternative routes to the same goal, preconditions that unlock UI, and visible controls never clicked.
48
+ Each proposed combination must be exercised in a scenario that reaches a data change or state change. Combinations that only change the UI and never reach a registerable outcome do not count as coverage — the system never distinguishes them from each other.
49
+
50
+ When the page is not heavy on inputs, still pursue: unvisited state transitions, follow-ups after data-changing operations (share, export, duplicate, re-open), alternative paths to the same data change, preconditions that unlock new data-changing actions, and visible controls never clicked. Again, prioritise scenarios whose ending falls into category 1 or 2.
38
51
 
39
52
  Skip the Menu/Navigation section — we are testing THIS page.
@@ -2,18 +2,18 @@ Study the page and figure out its business purpose. What is this page FOR? What
2
2
 
3
3
  Based on the page type, propose tests for COMPLETE user workflows:
4
4
  - If this is a data page (lists, tables): test CRUD operations end-to-end (create item → verify in list, edit item → verify changes saved, delete item → verify removed)
5
- - If this is a form page: test full submission flow, not just "form appears"
5
+ - If the page has inputs to fill in: test the full commit flow, not just that the controls render
6
6
  - If this has filters and search: test filtering AND verify results change, not just "filter tab clicked"
7
7
  - If this has modals/dropdowns: test the ACTION inside them, not just opening/closing them
8
8
 
9
- Each test should end with the application in a different state than it started.
9
+ Each test should end in a **data change** (record created/edited/deleted, setting persisted, message sent) or a **state change** (route change, filter applied to real data, mode change the app remembers). Tests ending in UI-only outcomes (open, close, hover, expand) are the weakest and should be rare.
10
10
 
11
11
  IMPORTANT: Distribute tests across DIFFERENT feature areas from the research.
12
12
  Do not propose more than 2 tests for the same feature area.
13
13
  Every Extended Research section (modal, dropdown, panel) with actionable features deserves at least one test.
14
- Tests that change application data MUST come first — create, update, delete records before testing filters, search, or pagination. You are aiming to change application state.
14
+ Tests that change application data MUST come first — create, update, delete records before testing filters, search, or pagination.
15
15
  If the research shows multiple ways to create or modify data (different types, forms, or options), propose a separate test for each.
16
- View only tests (tab switching, pagination, view toggles) should be proposed only after data-changing interactions are covered.
16
+ UI-only tests (tab switching, pagination, view toggles) should be proposed only after data-changing and state-changing interactions are covered.
17
17
 
18
18
  Skip the Menu/Navigation section — we are testing THIS page.
19
19
 
@@ -1,14 +1,17 @@
1
- Stress-test the page by filling invalid, empty, and extreme values into every input.
1
+ Stress-test the page by feeding invalid, empty, or extreme values to its controls and committing.
2
2
 
3
- Focus on:
4
- - Empty states: submit forms with no data, clear required fields, remove default values
5
- - Long values: paste 10000 characters into inputs, use extremely long names and descriptions
6
- - Boundary values: zero, negative numbers, special characters, unicode, HTML tags in text fields
7
- - Invalid formats: wrong email formats, letters in number fields, SQL injection strings, script tags
8
- - Invalid combinations: select incompatible options, mix conflicting settings
9
- - Combining states: apply multiple filters at once, use conflicting form values together
10
- - Out-of-range values: dates in the past/future, quantities beyond limits, prices with too many decimals
3
+ **Match attack breadth to controls reachable.** If only ONE control is reachable, attack it alone. If several are reachable, attack **all of them in the same scenario** — each with a different strange value. Never stress one while leaving the rest untouched: attacking one-at-a-time hides interaction bugs and wastes plan budget.
11
4
 
12
- Push every input to its limits. Find what breaks when the form receives unexpected data.
5
+ Do not produce multiple scenarios that each isolate one control of the same section. Fold those attacks into fewer scenarios that push every reachable control strangely at once. Vary the **mix** between scenarios — which control receives SQL, which receives 10000 chars, which receives unicode, which receives a conflicting combination — not the single control under attack.
13
6
 
14
- Skip the Menu/Navigation section we are testing THIS page.
7
+ **Attack categories** (combine across controls, not one-per-scenario):
8
+ empty • very long (10000+ chars) • boundary (zero, negative, unicode, HTML, special chars) • invalid formats (malformed email/url/number, SQL, script tags) • invalid combinations (mutually exclusive toggles together, conflicting modes) • out-of-range (far dates, quantities beyond limits, excess decimals) • dependent-UI stress (flip a control that reveals more, attack those too).
9
+
10
+ **Prefer scenarios that:**
11
+ - Push every reachable control to a different bad-data category, then commit
12
+ - Trigger a conditional section, attack revealed and base controls together, then commit
13
+ - Combine mutually exclusive control states with invalid values, then commit
14
+
15
+ End each scenario with the state **committed** (saved, applied, sent, triggered). A scenario that enters bad data then cancels or navigates away reveals nothing — the application never received the payload.
16
+
17
+ Skip the Menu/Navigation section — we are testing THIS page.
@@ -1,6 +1,7 @@
1
1
  import { tool } from 'ai';
2
2
  import dedent from 'dedent';
3
3
  import { z } from 'zod';
4
+ import { ActionResult } from "../../action-result.js";
4
5
  import { actionRule, locatorRule, sectionContextRule } from "../rules.js";
5
6
  import { createAgentTools, createCodeceptJSTools } from "../tools.js";
6
7
  import { debugLog } from "./mixin.js";
@@ -8,13 +9,19 @@ export function WithWebMode(Base) {
8
9
  return class extends Base {
9
10
  webModeTools(ctx) {
10
11
  const explorer = ctx.explorBot.getExplorer();
12
+ const stateManager = explorer.getStateManager();
11
13
  const codeceptTools = createCodeceptJSTools(explorer, ctx.task);
12
14
  const agentTools = createAgentTools({
13
15
  explorer,
14
16
  researcher: ctx.explorBot.agentResearcher(),
15
17
  navigator: ctx.explorBot.agentNavigator(),
18
+ experienceTracker: stateManager.getExperienceTracker(),
19
+ getState: () => {
20
+ const state = stateManager.getCurrentState();
21
+ return state ? ActionResult.fromState(state) : null;
22
+ },
16
23
  });
17
- const { see, context, visualClick } = agentTools;
24
+ const { see, context, visualClick, learn_experience } = agentTools;
18
25
  return {
19
26
  navigate: tool({
20
27
  description: 'Navigate to a URL or page description using AI-powered navigation.',
@@ -91,6 +98,7 @@ export function WithWebMode(Base) {
91
98
  see,
92
99
  context,
93
100
  visualClick,
101
+ learn_experience,
94
102
  };
95
103
  }
96
104
  webModePrompt() {
@@ -16,12 +16,16 @@ export class Historian {
16
16
  experienceTracker;
17
17
  reporter;
18
18
  stateManager;
19
+ savedFiles = new Set();
19
20
  constructor(provider, experienceTracker, reporter, stateManager) {
20
21
  this.provider = provider;
21
22
  this.experienceTracker = experienceTracker || new ExperienceTracker();
22
23
  this.reporter = reporter;
23
24
  this.stateManager = stateManager;
24
25
  }
26
+ getSavedFiles() {
27
+ return [...this.savedFiles];
28
+ }
25
29
  async saveSession(task, initialState, conversation) {
26
30
  debugLog('Saving session experience');
27
31
  const result = this.determineResult(task);
@@ -363,6 +367,7 @@ export class Historian {
363
367
  const filename = plan.title.replace(/[^a-zA-Z0-9]/g, '_').toLowerCase();
364
368
  const filePath = join(testsDir, `${filename}.js`);
365
369
  writeFileSync(filePath, lines.join('\n'));
370
+ this.savedFiles.add(filePath);
366
371
  tag('substep').log(`Saved plan tests to: ${filePath}`);
367
372
  return filePath;
368
373
  }
@@ -374,6 +379,7 @@ export class Historian {
374
379
  content = content.replace(step.original, step.healed);
375
380
  }
376
381
  writeFileSync(filePath, content);
382
+ this.savedFiles.add(filePath);
377
383
  tag('substep').log(`Updated test file with healed steps: ${filePath}`);
378
384
  }
379
385
  getExecutionLabel(exec, fallback) {
@@ -1,7 +1,9 @@
1
1
  import { tool } from 'ai';
2
2
  import dedent from 'dedent';
3
3
  import { z } from 'zod';
4
+ import { ActionResult } from "../action-result.js";
4
5
  import { ConfigParser } from "../config.js";
6
+ import { renderExperienceToc } from "../experience-tracker.js";
5
7
  import { TestResult } from "../test-plan.js";
6
8
  import { collectInteractiveNodes, detectFocusArea, extractFocusedElement } from "../utils/aria.js";
7
9
  import { createDebug, tag } from "../utils/logger.js";
@@ -18,11 +20,13 @@ export class Pilot {
18
20
  researcher;
19
21
  explorer;
20
22
  fisherman = null;
21
- constructor(provider, agentTools, researcher, explorer) {
23
+ experienceTracker;
24
+ constructor(provider, agentTools, researcher, explorer, experienceTracker) {
22
25
  this.provider = provider;
23
26
  this.agentTools = agentTools;
24
27
  this.researcher = researcher;
25
28
  this.explorer = explorer;
29
+ this.experienceTracker = experienceTracker || null;
26
30
  }
27
31
  setFisherman(fisherman) {
28
32
  this.fisherman = fisherman;
@@ -317,7 +321,14 @@ export class Pilot {
317
321
  }
318
322
  async sendToPilot(userText, functionId, opts = {}) {
319
323
  debugLog(`sendToPilot: ${functionId}, tools: ${!!opts.tools}, roundtrips: ${opts.maxToolRoundtrips ?? 0}`);
320
- this.conversation.addUserText(userText);
324
+ let finalUserText = userText;
325
+ if (opts.tools) {
326
+ const tocBlock = this.getExperienceToc();
327
+ if (tocBlock) {
328
+ finalUserText = `${tocBlock}\n\n${userText}`;
329
+ }
330
+ }
331
+ this.conversation.addUserText(finalUserText);
321
332
  let tools = opts.tools ? this.agentTools : undefined;
322
333
  if (opts.tools && opts.task) {
323
334
  tools = { ...tools, ...this.buildPreconditionTool(opts.task) };
@@ -329,6 +340,16 @@ export class Pilot {
329
340
  });
330
341
  return result?.response?.text || '';
331
342
  }
343
+ getExperienceToc() {
344
+ if (!this.experienceTracker)
345
+ return '';
346
+ const state = this.explorer.getStateManager().getCurrentState();
347
+ if (!state)
348
+ return '';
349
+ const actionResult = ActionResult.fromState(state);
350
+ const toc = this.experienceTracker.getExperienceTableOfContents(actionResult);
351
+ return renderExperienceToc(toc);
352
+ }
332
353
  buildPreconditionTool(task) {
333
354
  return {
334
355
  precondition: tool({
@@ -1,8 +1,10 @@
1
1
  import dedent from 'dedent';
2
2
  import { ActionResult } from '../../action-result.js';
3
- import { diffAriaSnapshots } from "../../utils/aria.js";
3
+ import { detectFocusArea, diffAriaSnapshots } from "../../utils/aria.js";
4
4
  import { executionController } from "../../execution-controller.js";
5
5
  import { tag } from '../../utils/logger.js';
6
+ import { mdq } from "../../utils/markdown-query.js";
7
+ import { getCachedResearch, saveResearch } from "./cache.js";
6
8
  import { debugLog } from "./mixin.js";
7
9
  import { parseResearchSections } from "./parser.js";
8
10
  const DEFAULT_MAX_EXPANDABLE_CLICKS = 10;
@@ -53,6 +55,48 @@ export function WithDeepAnalysis(Base) {
53
55
  result.text += `\n\n## Navigation Links\n\n${links}`;
54
56
  }
55
57
  }
58
+ async researchOverlay(current, previous, pageStateHash) {
59
+ const focusArea = detectFocusArea(current.ariaSnapshot);
60
+ if (!focusArea.detected || !focusArea.name)
61
+ return null;
62
+ if (focusArea.type !== 'dialog' && focusArea.type !== 'modal')
63
+ return null;
64
+ const cached = getCachedResearch(pageStateHash);
65
+ if (!cached)
66
+ return null;
67
+ const escaped = focusArea.name.replace(/"/g, '\\"');
68
+ if (mdq(cached).query(`section3(~"${escaped}")`).count() > 0) {
69
+ debugLog(`Overlay "${focusArea.name}" already in cached research, skipping`);
70
+ return null;
71
+ }
72
+ const diff = await current.diff(previous);
73
+ await diff.calculate();
74
+ if (!diff.ariaChanged && diff.htmlParts.length === 0) {
75
+ debugLog(`No diff between current and previous state for overlay "${focusArea.name}"`);
76
+ return null;
77
+ }
78
+ const alreadyExpanded = this._summarizeExpanded(parseResearchSections(cached)
79
+ .filter((s) => s.elements.length > 0)
80
+ .map((s) => s.rawMarkdown));
81
+ tag('substep').log(`Researching overlay: ${focusArea.name}`);
82
+ const sectionMarkdown = await this._analyzeExpandedAction('', focusArea.name, diff, alreadyExpanded);
83
+ if (!sectionMarkdown) {
84
+ debugLog(`Overlay "${focusArea.name}" produced no meaningful expansion`);
85
+ return null;
86
+ }
87
+ const extQuery = mdq(cached).query('section1(~"Extended Research")');
88
+ let updated;
89
+ if (extQuery.count() > 0) {
90
+ const existing = extQuery.text().trimEnd();
91
+ updated = extQuery.replace(`${existing}\n\n${sectionMarkdown}\n`);
92
+ }
93
+ else {
94
+ updated = `${cached.trimEnd()}\n\n# Extended Research\n\n${sectionMarkdown}\n`;
95
+ }
96
+ saveResearch(pageStateHash, updated);
97
+ tag('substep').log(`Overlay research appended: ${focusArea.name}`);
98
+ return sectionMarkdown;
99
+ }
56
100
  async _discoverExpandables(researchText) {
57
101
  const allElements = new Map();
58
102
  for (const section of parseResearchSections(researchText)) {
@@ -272,8 +316,26 @@ export function WithDeepAnalysis(Base) {
272
316
  }
273
317
  async _analyzeExpandedAction(code, description, diff, alreadyExpanded) {
274
318
  const alreadyHint = alreadyExpanded.length > 0 ? `\nAlready expanded sections:\n${alreadyExpanded.join('\n')}` : '';
319
+ let intro;
320
+ if (code) {
321
+ intro = `An action on "${description}" (\`${code}\`) revealed new UI content.`;
322
+ }
323
+ else {
324
+ intro = `An overlay "${description}" appeared on the page.`;
325
+ }
326
+ let actionBlock = '';
327
+ if (code) {
328
+ actionBlock = dedent `
329
+ Action:
330
+
331
+ \`\`\`js
332
+ ${code}
333
+ \`\`\`
334
+
335
+ `;
336
+ }
275
337
  const prompt = dedent `
276
- An action on "${description}" (\`${code}\`) revealed new UI content.
338
+ ${intro}
277
339
  Analyze the changes and produce a UI map section.
278
340
 
279
341
  ARIA changes:
@@ -287,13 +349,7 @@ export function WithDeepAnalysis(Base) {
287
349
 
288
350
  ### <Short descriptive name>
289
351
 
290
- Action:
291
-
292
- \`\`\`js
293
- ${code}
294
- \`\`\`
295
-
296
- <One sentence: what appeared — dropdown menu, modal, tab content, expanded panel, etc.>
352
+ ${actionBlock}<One sentence: what appeared — dropdown menu, modal, tab content, expanded panel, etc.>
297
353
 
298
354
  | Element | ARIA | CSS |
299
355
  |---------|------|-----|
@@ -0,0 +1,103 @@
1
+ import dedent from 'dedent';
2
+ import { executionController } from "../../execution-controller.js";
3
+ import { tag } from '../../utils/logger.js';
4
+ import { RulesLoader } from "../../utils/rules-loader.js";
5
+ import { locatorRule as generalLocatorRuleText } from '../rules.js';
6
+ export function WithSections(Base) {
7
+ return class extends Base {
8
+ async researchBySections() {
9
+ const ariaSnapshot = this.actionResult?.getCompactARIA() || '';
10
+ const configured = this.getConfiguredSections();
11
+ const focusCss = await this._detectFocusCss();
12
+ let targets;
13
+ if (focusCss) {
14
+ targets = [['Focus', `element bounded by CSS container '${focusCss}'`]];
15
+ tag('info').log(`Focus element detected via selector '${focusCss}', researching focused area only`);
16
+ }
17
+ else {
18
+ targets = Object.entries(configured);
19
+ tag('info').log(`Splitting research into ${targets.length} per-section requests`);
20
+ }
21
+ const parts = [];
22
+ for (const [name, description] of targets) {
23
+ if (executionController.isInterrupted())
24
+ break;
25
+ const text = await this._researchSingleSection(name, description, ariaSnapshot, focusCss);
26
+ if (!text)
27
+ continue;
28
+ const trimmed = text.trim();
29
+ if (trimmed === 'NOT_PRESENT' || trimmed.startsWith('NOT_PRESENT'))
30
+ continue;
31
+ parts.push(trimmed);
32
+ }
33
+ if (parts.length === 0) {
34
+ throw new Error('Per-section research produced no sections — AI responses all empty or NOT_PRESENT');
35
+ }
36
+ let merged = parts.join('\n\n');
37
+ if (focusCss)
38
+ merged += '\n\n> Focused: Focus';
39
+ return merged;
40
+ }
41
+ async _detectFocusCss() {
42
+ const focusSections = this.explorer.getConfig().ai?.agents?.researcher?.focusSections;
43
+ if (!focusSections?.length)
44
+ return null;
45
+ for (const css of focusSections) {
46
+ const count = await this.explorer.playwrightLocatorCount((page) => page.locator(css)).catch(() => 0);
47
+ if (count > 0)
48
+ return css;
49
+ }
50
+ return null;
51
+ }
52
+ async _researchSingleSection(name, description, ariaSnapshot, focusCss) {
53
+ const currentUrl = this.stateManager.getCurrentState()?.url || '';
54
+ const rules = RulesLoader.loadRules('researcher', ['ui-map-table', 'list-element', 'container-rules'], currentUrl);
55
+ const url = this.actionResult?.url || 'Unknown';
56
+ const title = this.actionResult?.title || 'Unknown';
57
+ let focusHint = '';
58
+ if (focusCss) {
59
+ focusHint = dedent `
60
+ The user's focus is the element matching CSS '${focusCss}'.
61
+ Use that CSS as the Container for this section.
62
+ `;
63
+ }
64
+ const prompt = dedent `
65
+ <task>
66
+ Identify the "${name}" section on this page: ${description}
67
+ If this section is NOT present on the page, respond with ONLY: NOT_PRESENT
68
+ Otherwise output only this single section in the format below.
69
+ ${focusHint}
70
+ </task>
71
+
72
+ <section_format>
73
+ ## ${name}
74
+
75
+ > Container: '.semantic-container-selector'
76
+
77
+ | Element | ARIA | CSS | eidx |
78
+ </section_format>
79
+
80
+ <rules>
81
+ - Every element with eidx MUST appear in the table.
82
+ - Every row needs CSS; ARIA may be "-" for icon-only buttons.
83
+ - ARIA locator JSON uses keys "role" and "text" (NOT "name").
84
+ </rules>
85
+
86
+ ${generalLocatorRuleText}
87
+
88
+ ${rules}
89
+
90
+ URL: ${url}
91
+ Title: ${title}
92
+
93
+ <aria>
94
+ ${ariaSnapshot}
95
+ </aria>
96
+ `;
97
+ const conversation = this.provider.startConversation(this.getSystemMessage(), 'researcher');
98
+ conversation.addUserText(prompt);
99
+ const result = await this.provider.invokeConversation(conversation, undefined, { agentName: 'researcher' });
100
+ return result?.response.text || '';
101
+ }
102
+ };
103
+ }
@@ -20,6 +20,7 @@ import { detectFocusFromAria, hasFocusedSection, markSectionAsFocused, pickDefau
20
20
  import { WithLocators } from "./researcher/locators.js";
21
21
  import { extractValidContainers, formatResearchSummary, parseResearchSections } from "./researcher/parser.js";
22
22
  import { ResearchResult } from "./researcher/research-result.js";
23
+ import { WithSections } from "./researcher/sections.js";
23
24
  import { locatorRule as generalLocatorRuleText } from './rules.js';
24
25
  import { RulesLoader } from "../utils/rules-loader.js";
25
26
  import { TaskAgent } from "./task-agent.js";
@@ -33,7 +34,7 @@ export const POSSIBLE_SECTIONS = {
33
34
  menu: 'page menu (toolbar, context actions, filters, dropdowns)',
34
35
  navigation: 'main navigation (top bar, sidebar, breadcrumbs)',
35
36
  };
36
- const ResearcherBase = WithDeepAnalysis(WithCoordinates(WithLocators(TaskAgent)));
37
+ const ResearcherBase = WithSections(WithDeepAnalysis(WithCoordinates(WithLocators(TaskAgent))));
37
38
  export class Researcher extends ResearcherBase {
38
39
  ACTION_TOOLS = ['click'];
39
40
  emoji = '🔍';
@@ -130,10 +131,13 @@ export class Researcher extends ResearcherBase {
130
131
  const conversation = this.provider.startConversation(this.getSystemMessage(), 'researcher');
131
132
  const prompt = await this.buildResearchPrompt();
132
133
  conversation.addUserText(prompt);
133
- let invocationResult;
134
+ let researchText;
134
135
  let activeConversation = conversation;
135
136
  try {
136
- invocationResult = await this.provider.invokeConversation(conversation, undefined, { agentName: 'researcher' });
137
+ const invocationResult = await this.provider.invokeConversation(conversation, undefined, { agentName: 'researcher' });
138
+ if (!invocationResult)
139
+ throw new Error('Failed to get response from provider');
140
+ researchText = invocationResult.response.text;
137
141
  }
138
142
  catch (error) {
139
143
  if (!(error instanceof ContextLengthError) || retriesLeft <= 0) {
@@ -142,15 +146,11 @@ export class Researcher extends ResearcherBase {
142
146
  }
143
147
  throw error;
144
148
  }
145
- tag('warning').log('Output truncated, retrying with fresh focused conversation (ARIA only)...');
146
149
  retriesLeft = 0;
150
+ researchText = await this.researchBySections();
147
151
  activeConversation = this.provider.startConversation(this.getSystemMessage(), 'researcher');
148
- activeConversation.addUserText(this.buildFocusedRetryPrompt());
149
- invocationResult = await this.provider.invokeConversation(activeConversation, undefined, { agentName: 'researcher' });
150
152
  }
151
- if (!invocationResult)
152
- throw new Error('Failed to get response from provider');
153
- const result = new ResearchResult(invocationResult.response.text, state.url);
153
+ const result = new ResearchResult(researchText, state.url);
154
154
  debugLog(`Original research response length: ${result.text.length} chars`);
155
155
  const interrupted = () => executionController.isInterrupted();
156
156
  // Stage 2: Test containers + locators
@@ -469,43 +469,6 @@ export class Researcher extends ResearcherBase {
469
469
  </output_rules>
470
470
 
471
471
 
472
- `;
473
- }
474
- buildFocusedRetryPrompt() {
475
- const currentUrl = this.stateManager.getCurrentState()?.url || '';
476
- const example = RulesLoader.loadRules('researcher', ['section-example'], currentUrl);
477
- const uiMapTable = RulesLoader.loadRules('researcher', ['ui-map-table'], currentUrl);
478
- const url = this.actionResult?.url || 'Unknown';
479
- const title = this.actionResult?.title || 'Unknown';
480
- const aria = this.actionResult?.getCompactARIA() || '';
481
- return dedent `
482
- Previous response was truncated. Restart with a minimal output.
483
-
484
- <task>
485
- Output a UI map for ONE section only — the main interactive area of this page.
486
- Skip navigation, sidebar, and footer. Max 15 elements.
487
- Every element with an eidx MUST appear. Every row needs CSS; ARIA may be "-" for icon-only.
488
- End with a single line: \`> Focused: <section name>\`.
489
- </task>
490
-
491
- <section_format>
492
- ## Section Name
493
-
494
- > Container: '.container-css-selector'
495
-
496
- | Element | ARIA | CSS | eidx |
497
- </section_format>
498
-
499
- ${example}
500
-
501
- ${uiMapTable}
502
-
503
- URL: ${url}
504
- Title: ${title}
505
-
506
- <aria>
507
- ${aria}
508
- </aria>
509
472
  `;
510
473
  }
511
474
  async textContent(state) {
@@ -131,7 +131,15 @@ export const fileUploadRule = dedent `
131
131
  export const protectionRule = dedent `
132
132
  <important>
133
133
  Do not sign out current user of the application.
134
- Do not change current user account settings
134
+ Do not change current user account settings.
135
+
136
+ Pre-existing data on the page belongs to the application, not the test.
137
+ Items that were not created inside the current test scenario must not be deleted, removed, emptied, reset, archived, or otherwise destroyed.
138
+ If a scenario needs to verify destructive behaviour, the same scenario must first create a disposable target and then destroy that specific target — never operate on data that was already there when the test started.
139
+
140
+ The resource that the current page URL represents is "under test".
141
+ The test must not destroy the resource it is running against — doing so invalidates every subsequent scenario that starts on the same URL.
142
+ Do not propose or perform delete/remove/archive actions on the entity that owns the current URL; propose such actions only on disposable children created within the scenario itself.
135
143
  </important>
136
144
  `;
137
145
  export const focusedElementRule = dedent `
@@ -1,4 +1,5 @@
1
1
  import dedent from 'dedent';
2
+ import { renderExperienceToc } from '../experience-tracker.js';
2
3
  import { createDebug, pluralize, tag } from '../utils/logger.js';
3
4
  const debugLog = createDebug('explorbot:task-agent');
4
5
  export function isInteractive() {
@@ -34,34 +35,13 @@ export class TaskAgent {
34
35
  }
35
36
  getExperience(actionResult) {
36
37
  const tracker = this.getExperienceTracker();
37
- const relevantExperience = tracker.getRelevantExperience(actionResult);
38
- if (relevantExperience.length === 0)
38
+ const toc = tracker.getExperienceTableOfContents(actionResult);
39
+ if (toc.length === 0)
39
40
  return '';
40
- const allContent = relevantExperience
41
- .map((e) => e.content)
42
- .filter((e) => !!e)
43
- .join('\n\n---\n\n');
44
- const totalChars = allContent.length;
45
- let experienceContent;
46
- if (totalChars <= 10_000) {
47
- debugLog(`injecting all experience (${Math.round(totalChars / 1000)}k chars)`);
48
- experienceContent = allContent;
49
- }
50
- else {
51
- experienceContent = tracker.getSuccessfulExperience(actionResult).join('\n\n---\n\n');
52
- debugLog(`injecting success-only experience (${Math.round(experienceContent.length / 1000)}k chars, filtered from ${Math.round(totalChars / 1000)}k)`);
53
- }
54
- if (!experienceContent)
55
- return '';
56
- tag('substep').log(`Found ${relevantExperience.length} experience ${pluralize(relevantExperience.length, 'file')}`);
57
- return dedent `
58
- <experience>
59
- Here is past experience of interacting with this page.
60
- Use successful solutions first. Avoid repeating failed actions.
61
-
62
- ${experienceContent}
63
- </experience>
64
- `;
41
+ const totalSections = toc.reduce((sum, entry) => sum + entry.sections.length, 0);
42
+ debugLog(`injecting experience TOC (${toc.length} files, ${totalSections} sections)`);
43
+ tag('substep').log(`Found ${toc.length} experience ${pluralize(toc.length, 'file')} (${totalSections} sections)`);
44
+ return renderExperienceToc(toc);
65
45
  }
66
46
  setHistorian(historian) {
67
47
  this._historian = historian;