testchimp-runner-core 0.0.33 โ 0.0.35
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/execution-service.d.ts +1 -4
- package/dist/execution-service.d.ts.map +1 -1
- package/dist/execution-service.js +155 -468
- package/dist/execution-service.js.map +1 -1
- package/dist/index.d.ts +3 -1
- package/dist/index.d.ts.map +1 -1
- package/dist/index.js +11 -1
- package/dist/index.js.map +1 -1
- package/dist/llm-facade.d.ts.map +1 -1
- package/dist/llm-facade.js +7 -7
- package/dist/llm-facade.js.map +1 -1
- package/dist/llm-provider.d.ts +9 -0
- package/dist/llm-provider.d.ts.map +1 -1
- package/dist/model-constants.d.ts +16 -5
- package/dist/model-constants.d.ts.map +1 -1
- package/dist/model-constants.js +17 -6
- package/dist/model-constants.js.map +1 -1
- package/dist/orchestrator/decision-parser.d.ts +18 -0
- package/dist/orchestrator/decision-parser.d.ts.map +1 -0
- package/dist/orchestrator/decision-parser.js +127 -0
- package/dist/orchestrator/decision-parser.js.map +1 -0
- package/dist/orchestrator/index.d.ts +4 -2
- package/dist/orchestrator/index.d.ts.map +1 -1
- package/dist/orchestrator/index.js +15 -2
- package/dist/orchestrator/index.js.map +1 -1
- package/dist/orchestrator/orchestrator-agent.d.ts +17 -22
- package/dist/orchestrator/orchestrator-agent.d.ts.map +1 -1
- package/dist/orchestrator/orchestrator-agent.js +708 -577
- package/dist/orchestrator/orchestrator-agent.js.map +1 -1
- package/dist/orchestrator/orchestrator-prompts.d.ts +32 -0
- package/dist/orchestrator/orchestrator-prompts.d.ts.map +1 -0
- package/dist/orchestrator/orchestrator-prompts.js +737 -0
- package/dist/orchestrator/orchestrator-prompts.js.map +1 -0
- package/dist/orchestrator/page-som-handler.d.ts +106 -0
- package/dist/orchestrator/page-som-handler.d.ts.map +1 -0
- package/dist/orchestrator/page-som-handler.js +1353 -0
- package/dist/orchestrator/page-som-handler.js.map +1 -0
- package/dist/orchestrator/som-types.d.ts +149 -0
- package/dist/orchestrator/som-types.d.ts.map +1 -0
- package/dist/orchestrator/som-types.js +87 -0
- package/dist/orchestrator/som-types.js.map +1 -0
- package/dist/orchestrator/tool-registry.d.ts +2 -0
- package/dist/orchestrator/tool-registry.d.ts.map +1 -1
- package/dist/orchestrator/tool-registry.js.map +1 -1
- package/dist/orchestrator/tools/index.d.ts +5 -1
- package/dist/orchestrator/tools/index.d.ts.map +1 -1
- package/dist/orchestrator/tools/index.js +9 -2
- package/dist/orchestrator/tools/index.js.map +1 -1
- package/dist/orchestrator/tools/refresh-som-markers.d.ts +12 -0
- package/dist/orchestrator/tools/refresh-som-markers.d.ts.map +1 -0
- package/dist/orchestrator/tools/refresh-som-markers.js +64 -0
- package/dist/orchestrator/tools/refresh-som-markers.js.map +1 -0
- package/dist/orchestrator/tools/verify-action-result.d.ts +17 -0
- package/dist/orchestrator/tools/verify-action-result.d.ts.map +1 -0
- package/dist/orchestrator/tools/verify-action-result.js +140 -0
- package/dist/orchestrator/tools/verify-action-result.js.map +1 -0
- package/dist/orchestrator/tools/view-previous-screenshot.d.ts +15 -0
- package/dist/orchestrator/tools/view-previous-screenshot.d.ts.map +1 -0
- package/dist/orchestrator/tools/view-previous-screenshot.js +92 -0
- package/dist/orchestrator/tools/view-previous-screenshot.js.map +1 -0
- package/dist/orchestrator/types.d.ts +49 -1
- package/dist/orchestrator/types.d.ts.map +1 -1
- package/dist/orchestrator/types.js +11 -1
- package/dist/orchestrator/types.js.map +1 -1
- package/dist/prompts.d.ts.map +1 -1
- package/dist/prompts.js +40 -34
- package/dist/prompts.js.map +1 -1
- package/dist/scenario-service.d.ts +5 -0
- package/dist/scenario-service.d.ts.map +1 -1
- package/dist/scenario-service.js +17 -0
- package/dist/scenario-service.js.map +1 -1
- package/dist/scenario-worker-class.d.ts +4 -0
- package/dist/scenario-worker-class.d.ts.map +1 -1
- package/dist/scenario-worker-class.js +21 -3
- package/dist/scenario-worker-class.js.map +1 -1
- package/dist/testing/agent-tester.d.ts +35 -0
- package/dist/testing/agent-tester.d.ts.map +1 -0
- package/dist/testing/agent-tester.js +84 -0
- package/dist/testing/agent-tester.js.map +1 -0
- package/dist/testing/ref-translator-tester.d.ts +44 -0
- package/dist/testing/ref-translator-tester.d.ts.map +1 -0
- package/dist/testing/ref-translator-tester.js +104 -0
- package/dist/testing/ref-translator-tester.js.map +1 -0
- package/dist/utils/coordinate-converter.d.ts +32 -0
- package/dist/utils/coordinate-converter.d.ts.map +1 -0
- package/dist/utils/coordinate-converter.js +130 -0
- package/dist/utils/coordinate-converter.js.map +1 -0
- package/dist/utils/hierarchical-selector.d.ts +47 -0
- package/dist/utils/hierarchical-selector.d.ts.map +1 -0
- package/dist/utils/hierarchical-selector.js +212 -0
- package/dist/utils/hierarchical-selector.js.map +1 -0
- package/dist/utils/page-info-retry.d.ts +14 -0
- package/dist/utils/page-info-retry.d.ts.map +1 -0
- package/dist/utils/page-info-retry.js +60 -0
- package/dist/utils/page-info-retry.js.map +1 -0
- package/dist/utils/page-info-utils.d.ts +1 -0
- package/dist/utils/page-info-utils.d.ts.map +1 -1
- package/dist/utils/page-info-utils.js +46 -18
- package/dist/utils/page-info-utils.js.map +1 -1
- package/dist/utils/ref-attacher.d.ts +21 -0
- package/dist/utils/ref-attacher.d.ts.map +1 -0
- package/dist/utils/ref-attacher.js +149 -0
- package/dist/utils/ref-attacher.js.map +1 -0
- package/dist/utils/ref-translator.d.ts +49 -0
- package/dist/utils/ref-translator.d.ts.map +1 -0
- package/dist/utils/ref-translator.js +276 -0
- package/dist/utils/ref-translator.js.map +1 -0
- package/package.json +1 -1
- package/plandocs/BEFORE_AFTER_VERIFICATION.md +148 -0
- package/plandocs/COORDINATE_MODE_DIAGNOSIS.md +144 -0
- package/plandocs/IMPLEMENTATION_STATUS.md +108 -0
- package/plandocs/PHASE_1_COMPLETE.md +165 -0
- package/plandocs/PHASE_1_SUMMARY.md +184 -0
- package/plandocs/PROMPT_OPTIMIZATION_ANALYSIS.md +120 -0
- package/plandocs/PROMPT_SANITY_CHECK.md +120 -0
- package/plandocs/SESSION_SUMMARY_v0.0.33.md +151 -0
- package/plandocs/TROUBLESHOOTING_SESSION.md +72 -0
- package/plandocs/VISUAL_AGENT_EVOLUTION_PLAN.md +396 -0
- package/plandocs/WHATS_NEW_v0.0.33.md +183 -0
- package/plandocs/exploratory-mode-support-v2.plan.md +953 -0
- package/plandocs/exploratory-mode-support.plan.md +928 -0
- package/plandocs/journey-id-tracking-addendum.md +227 -0
- package/src/execution-service.ts +179 -596
- package/src/index.ts +10 -0
- package/src/llm-facade.ts +8 -8
- package/src/llm-provider.ts +11 -1
- package/src/model-constants.ts +17 -5
- package/src/orchestrator/decision-parser.ts +139 -0
- package/src/orchestrator/index.ts +27 -2
- package/src/orchestrator/orchestrator-agent.ts +868 -623
- package/src/orchestrator/orchestrator-prompts.ts +786 -0
- package/src/orchestrator/page-som-handler.ts +1565 -0
- package/src/orchestrator/som-types.ts +188 -0
- package/src/orchestrator/tool-registry.ts +2 -0
- package/src/orchestrator/tools/index.ts +5 -1
- package/src/orchestrator/tools/refresh-som-markers.ts +69 -0
- package/src/orchestrator/tools/verify-action-result.ts +159 -0
- package/src/orchestrator/tools/view-previous-screenshot.ts +103 -0
- package/src/orchestrator/types.ts +95 -4
- package/src/prompts.ts +40 -34
- package/src/scenario-service.ts +20 -0
- package/src/scenario-worker-class.ts +30 -4
- package/src/utils/coordinate-converter.ts +162 -0
- package/src/utils/page-info-retry.ts +65 -0
- package/src/utils/page-info-utils.ts +53 -18
- package/testchimp-runner-core-0.0.35.tgz +0 -0
- /package/{CREDIT_CALLBACK_ARCHITECTURE.md โ plandocs/CREDIT_CALLBACK_ARCHITECTURE.md} +0 -0
- /package/{INTEGRATION_COMPLETE.md โ plandocs/INTEGRATION_COMPLETE.md} +0 -0
- /package/{VISION_DIAGNOSTICS_IMPROVEMENTS.md โ plandocs/VISION_DIAGNOSTICS_IMPROVEMENTS.md} +0 -0
- /package/{RELEASE_0.0.26.md โ releasenotes/RELEASE_0.0.26.md} +0 -0
- /package/{RELEASE_0.0.27.md โ releasenotes/RELEASE_0.0.27.md} +0 -0
- /package/{RELEASE_0.0.28.md โ releasenotes/RELEASE_0.0.28.md} +0 -0
|
@@ -0,0 +1,396 @@
|
|
|
1
|
+
# Runner-Core Visual Agent Evolution - Complete Implementation Plan
|
|
2
|
+
|
|
3
|
+
## Overview
|
|
4
|
+
|
|
5
|
+
Two-phase pragmatic evolution without major architecture overhaul:
|
|
6
|
+
- **Phase 1 (Week 1-2):** Percentage coordinates + Free-form notes
|
|
7
|
+
- **Phase 2 (Week 3-5):** Numbered element system with three-tier fallback
|
|
8
|
+
|
|
9
|
+
---
|
|
10
|
+
|
|
11
|
+
## PHASE 1: Tactical Improvements
|
|
12
|
+
|
|
13
|
+
### 1A: Free-Form "Note to Future Self"
|
|
14
|
+
|
|
15
|
+
**Why:** Agent needs tactical memory between iterations of the SAME step.
|
|
16
|
+
|
|
17
|
+
**Type:**
|
|
18
|
+
```typescript
|
|
19
|
+
interface NoteToFutureSelf {
|
|
20
|
+
fromIteration: number;
|
|
21
|
+
content: string; // FREE-FORM - agent writes whatever it wants
|
|
22
|
+
}
|
|
23
|
+
```
|
|
24
|
+
|
|
25
|
+
**Examples:**
|
|
26
|
+
- "Tried #sidebar-toggle, failed. Will try SVG child next."
|
|
27
|
+
- "Plan: Hover over menu first to reveal dropdown, then click Settings."
|
|
28
|
+
- "Cookie banner blocking. Next: dismiss it, then retry main action."
|
|
29
|
+
|
|
30
|
+
**vs Current Learnings:**
|
|
31
|
+
- Learnings = App-wide patterns ("App uses getByRole")
|
|
32
|
+
- Note to self = Iteration-specific tactics ("Just tried X, will try Y next")
|
|
33
|
+
|
|
34
|
+
**Keep BOTH!**
|
|
35
|
+
|
|
36
|
+
### 1B: Percentage-Based Coordinates
|
|
37
|
+
|
|
38
|
+
**LLM outputs percentages:**
|
|
39
|
+
```json
|
|
40
|
+
{
|
|
41
|
+
"coordinateAction": {
|
|
42
|
+
"action": "click|fill|drag|hover|scroll",
|
|
43
|
+
"xPercent": 15.75, // 2 decimal precision for accuracy
|
|
44
|
+
"yPercent": 8.50,
|
|
45
|
+
|
|
46
|
+
// For drag:
|
|
47
|
+
"toXPercent": 45.25,
|
|
48
|
+
"toYPercent": 8.50,
|
|
49
|
+
|
|
50
|
+
// For fill:
|
|
51
|
+
"value": "alice@example.com"
|
|
52
|
+
}
|
|
53
|
+
}
|
|
54
|
+
```
|
|
55
|
+
|
|
56
|
+
**Code converts to pixels:**
|
|
57
|
+
```typescript
|
|
58
|
+
const viewport = await page.evaluate(() => ({width: window.innerWidth, height: window.innerHeight}));
|
|
59
|
+
const x = Math.round((xPercent / 100) * viewport.width);
|
|
60
|
+
const y = Math.round((yPercent / 100) * viewport.height);
|
|
61
|
+
```
|
|
62
|
+
|
|
63
|
+
### 1C: Code-Controlled Fallback
|
|
64
|
+
|
|
65
|
+
**Tier 1 (iteration 1-3):** Normal Playwright selectors
|
|
66
|
+
**Tier 2 (iteration 4+):** Percentage coordinates (auto-triggered after 3 failures)
|
|
67
|
+
|
|
68
|
+
```typescript
|
|
69
|
+
// In callAgent():
|
|
70
|
+
if (consecutiveFailures >= 3) { // Phase 1: Only 2 tiers
|
|
71
|
+
// Auto-use coordinate-specific system prompt
|
|
72
|
+
// LLM outputs percentages
|
|
73
|
+
// Phase 2 will add tier 2 (indexed elements) between selectors and coordinates
|
|
74
|
+
}
|
|
75
|
+
```
|
|
76
|
+
|
|
77
|
+
**Files:**
|
|
78
|
+
- `utils/coordinate-converter.ts` (NEW)
|
|
79
|
+
- `orchestrator/orchestrator-agent.ts` (UPDATE)
|
|
80
|
+
- `orchestrator/types.ts` (UPDATE - add CoordinateAction, NoteToFutureSelf)
|
|
81
|
+
|
|
82
|
+
---
|
|
83
|
+
|
|
84
|
+
## PHASE 2: Numbered Element System
|
|
85
|
+
|
|
86
|
+
### Architecture
|
|
87
|
+
|
|
88
|
+
```
|
|
89
|
+
Three-Tier Fallback:
|
|
90
|
+
|
|
91
|
+
Tier 1 (iteration 1 ONLY): Playwright Selector Mode - ONE SHOT
|
|
92
|
+
โโ Agent generates: await page.getByRole('button', {name: 'Login'}).click()
|
|
93
|
+
โโ Direct execution
|
|
94
|
+
โโ 70% of tasks finish here (simple/medium complexity)
|
|
95
|
+
|
|
96
|
+
Tier 2 (iterations 2-3): Index Command Mode - TWO ATTEMPTS
|
|
97
|
+
โโ Inject numbered markers [1], [2], [3] โ screenshot
|
|
98
|
+
โโ Agent outputs: CLICK[3], FILL[5, "alice@example.com"]
|
|
99
|
+
โโ Execution: Use data-testchimp-el="[3]" (reliable targeting)
|
|
100
|
+
โโ Script output: Translate to NATIVE selector (getByRole, #id, etc.)
|
|
101
|
+
โโ 25% of tasks finish here (complex UIs, icons, shadow DOM)
|
|
102
|
+
|
|
103
|
+
Tier 3 (iterations 4+): Percentage Coordinate Mode - LAST RESORT
|
|
104
|
+
โโ Agent outputs: {xPercent: 15.755, yPercent: 8.500}
|
|
105
|
+
โโ CoordinateConverter: % โ pixels
|
|
106
|
+
โโ Execute: mouse.click(x, y)
|
|
107
|
+
โโ <5% of tasks need this (extreme edge cases)
|
|
108
|
+
```
|
|
109
|
+
|
|
110
|
+
### 2.1: Reusable Utility: ElementDetector
|
|
111
|
+
|
|
112
|
+
**File:** `runner-core/src/utils/element-detector.ts`
|
|
113
|
+
|
|
114
|
+
**Purpose:** Detect ALL interactive elements with z-index and occlusion awareness.
|
|
115
|
+
|
|
116
|
+
**Key Features:**
|
|
117
|
+
- Comprehensive element queries (buttons, links, inputs, SVGs, clickable divs)
|
|
118
|
+
- Z-index calculation via `getComputedStyle()`
|
|
119
|
+
- Occlusion detection via `elementFromPoint()` at center
|
|
120
|
+
- Context tags: `[header|nav|sidebar|main|modal]`
|
|
121
|
+
- Spatial tags: `[top|bottom|left|right|center]`
|
|
122
|
+
- Center position as percentage (2 decimal precision)
|
|
123
|
+
|
|
124
|
+
**Output:**
|
|
125
|
+
```typescript
|
|
126
|
+
interface DetectedElement {
|
|
127
|
+
index: number; // [1], [2], [3]
|
|
128
|
+
tag: string;
|
|
129
|
+
text: string;
|
|
130
|
+
role: string;
|
|
131
|
+
ariaLabel: string;
|
|
132
|
+
bbox: {x, y, width, height};
|
|
133
|
+
centerPercent: {x: 15.75, y: 8.50}; // As percentage
|
|
134
|
+
context: string[]; // ['header', 'nav', 'top']
|
|
135
|
+
zIndex: number;
|
|
136
|
+
isVisible: boolean; // Not occluded by higher z-index
|
|
137
|
+
selectors: {
|
|
138
|
+
dataAttribute: string; // [data-testchimp-el="[3]"]
|
|
139
|
+
semantic: string[]; // [getByRole(...), getByLabel(...)]
|
|
140
|
+
cssId: string | null;
|
|
141
|
+
cssClass: string | null;
|
|
142
|
+
};
|
|
143
|
+
}
|
|
144
|
+
```
|
|
145
|
+
|
|
146
|
+
**Implementation Highlights:**
|
|
147
|
+
- Queries: `button`, `a[href]`, `input`, `svg`, `[onclick]`, `[role=...]`, `[data-testid]`, clickable divs/spans
|
|
148
|
+
- Z-index check: `elementFromPoint(centerX, centerY)` must return element or child
|
|
149
|
+
- Filter: Only return `isVisible: true` elements
|
|
150
|
+
|
|
151
|
+
### 2.2: Reusable Utility: SelectorResolver
|
|
152
|
+
|
|
153
|
+
**File:** `runner-core/src/utils/selector-resolver.ts`
|
|
154
|
+
|
|
155
|
+
**Purpose:** Given element index, return most reliable Playwright selector FOR GENERATED SCRIPTS.
|
|
156
|
+
|
|
157
|
+
**CRITICAL DISTINCTION:**
|
|
158
|
+
|
|
159
|
+
A. **During Execution (Internal):**
|
|
160
|
+
- Agent can click using `data-testchimp-el="[N]"` (we inject it temporarily)
|
|
161
|
+
- This ensures agent clicks the exact right element
|
|
162
|
+
|
|
163
|
+
B. **For Generated Script (Output):**
|
|
164
|
+
- Must use NATIVE selectors that work without our attributes
|
|
165
|
+
- Script will run on real application without data-testchimp-el
|
|
166
|
+
|
|
167
|
+
**Selector Resolution for Script Output (in order):**
|
|
168
|
+
1. Semantic selectors - `getByRole()`, `getByLabel()` (BEST - maintainable)
|
|
169
|
+
2. CSS ID - `#element-id` (GOOD - stable)
|
|
170
|
+
3. CSS class - `.button-primary` (scoped to context if ambiguous)
|
|
171
|
+
4. Contextual selector - `header .menu-toggle svg` (LAST RESORT)
|
|
172
|
+
|
|
173
|
+
**For each selector, validate:**
|
|
174
|
+
- Element exists on page
|
|
175
|
+
- Not covered by higher z-index (using elementFromPoint)
|
|
176
|
+
- Position matches expected (ยฑ5% tolerance)
|
|
177
|
+
|
|
178
|
+
**Method:**
|
|
179
|
+
```typescript
|
|
180
|
+
static async resolveIndexToSelector(
|
|
181
|
+
index: number,
|
|
182
|
+
elements: DetectedElement[],
|
|
183
|
+
page: Page
|
|
184
|
+
): Promise<{selector: string, strategy: string}>
|
|
185
|
+
```
|
|
186
|
+
|
|
187
|
+
**Validation:**
|
|
188
|
+
```typescript
|
|
189
|
+
private static async validateInteractable(
|
|
190
|
+
page: Page,
|
|
191
|
+
selector: string,
|
|
192
|
+
expectedCenterPercent: {x, y}
|
|
193
|
+
): Promise<boolean> {
|
|
194
|
+
// 1. Element exists
|
|
195
|
+
// 2. Non-zero dimensions
|
|
196
|
+
// 3. elementFromPoint(center) === element (z-index check)
|
|
197
|
+
// 4. Position within ยฑ5% tolerance
|
|
198
|
+
}
|
|
199
|
+
```
|
|
200
|
+
|
|
201
|
+
### 2.3: Reusable Utility: VisualMarkerInjector
|
|
202
|
+
|
|
203
|
+
**File:** `runner-core/src/utils/visual-marker-injector.ts`
|
|
204
|
+
|
|
205
|
+
**Purpose:** Inject visual numbered labels on page.
|
|
206
|
+
|
|
207
|
+
**Methods:**
|
|
208
|
+
```typescript
|
|
209
|
+
static async injectNumberedMarkers(page): Promise<DetectedElement[]>
|
|
210
|
+
static async removeMarkers(page): Promise<void>
|
|
211
|
+
static async captureMarkedScreenshot(page, elements): Promise<string>
|
|
212
|
+
```
|
|
213
|
+
|
|
214
|
+
**Visual Markers:**
|
|
215
|
+
- Red gradient background `[1]`, `[2]`, `[3]`
|
|
216
|
+
- Positioned at top-left of element
|
|
217
|
+
- z-index: 999999 (always visible)
|
|
218
|
+
- Inject `data-testchimp-el` attribute on actual element (TEMPORARY - for agent execution only)
|
|
219
|
+
* Used internally to ensure agent clicks correct element
|
|
220
|
+
* Removed after step completion
|
|
221
|
+
* NEVER appears in generated script output
|
|
222
|
+
|
|
223
|
+
### 2.4: Reusable Utility: IndexCommandTranslator
|
|
224
|
+
|
|
225
|
+
**File:** `runner-core/src/utils/index-command-translator.ts`
|
|
226
|
+
|
|
227
|
+
**Purpose:** Translate index commands to Playwright commands with NATIVE selectors (for script generation).
|
|
228
|
+
|
|
229
|
+
**Input:**
|
|
230
|
+
```typescript
|
|
231
|
+
{ action: "CLICK", index: 3 }
|
|
232
|
+
{ action: "FILL", index: 5, value: "alice@example.com" }
|
|
233
|
+
```
|
|
234
|
+
|
|
235
|
+
**Output (MUST use native selectors):**
|
|
236
|
+
```typescript
|
|
237
|
+
"await page.getByRole('button', {name: 'Menu'}).click();"
|
|
238
|
+
"await page.locator('#username').fill('alice@example.com');"
|
|
239
|
+
// OR
|
|
240
|
+
"await page.locator('#sidebar-toggle svg').click();"
|
|
241
|
+
```
|
|
242
|
+
|
|
243
|
+
**NOT this (won't work in generated script):**
|
|
244
|
+
```typescript
|
|
245
|
+
โ "await page.locator('[data-testchimp-el=\"[3]\"]').click();"
|
|
246
|
+
โ "await page.locator('[data-testchimp-el=\"[5]\"]').fill('alice@example.com');"
|
|
247
|
+
```
|
|
248
|
+
|
|
249
|
+
**Process:**
|
|
250
|
+
1. **During execution:** Click element using `data-testchimp-el="[index]"` (reliable)
|
|
251
|
+
2. **For script output:** Use SelectorResolver to get NATIVE selector (semantic/id/class)
|
|
252
|
+
3. Generate Playwright command with native selector
|
|
253
|
+
4. Return command string that works on real application
|
|
254
|
+
|
|
255
|
+
**Critical Distinction:**
|
|
256
|
+
- `data-testchimp-el` = Internal execution helper (temporary)
|
|
257
|
+
- Script output = Native selectors (permanent, works standalone)
|
|
258
|
+
|
|
259
|
+
### 2.5: Integration - Three-Tier System (Optimized Escalation)
|
|
260
|
+
|
|
261
|
+
**File:** `orchestrator/orchestrator-agent.ts`
|
|
262
|
+
|
|
263
|
+
**Optimized Strategy:** Escalate quickly to avoid wasting time on difficult tasks
|
|
264
|
+
|
|
265
|
+
**Mode Determination:**
|
|
266
|
+
```typescript
|
|
267
|
+
let tier: 1 | 2 | 3;
|
|
268
|
+
|
|
269
|
+
// Tier 1: Try normal selectors ONCE (iteration 1)
|
|
270
|
+
// Tier 2: Use indexed elements TWICE (iterations 2-3)
|
|
271
|
+
// Tier 3: Use percentage coords (iterations 4+)
|
|
272
|
+
|
|
273
|
+
if (iteration >= 4) {
|
|
274
|
+
tier = 3; // Coordinate mode
|
|
275
|
+
} else if (iteration >= 2) {
|
|
276
|
+
tier = 2; // Index command mode
|
|
277
|
+
} else {
|
|
278
|
+
tier = 1; // Normal Playwright selector mode
|
|
279
|
+
}
|
|
280
|
+
```
|
|
281
|
+
|
|
282
|
+
**Rationale:**
|
|
283
|
+
- Simple tasks: Succeed in Tier 1 (iteration 1) - fast!
|
|
284
|
+
- Medium tasks: Tier 2 gives 2 attempts with reliable index system
|
|
285
|
+
- Hard tasks: Tier 3 coordinates as absolute fallback
|
|
286
|
+
- No wasted iterations on difficult element detection
|
|
287
|
+
|
|
288
|
+
**Tier Preparation:**
|
|
289
|
+
```typescript
|
|
290
|
+
if (tier === 2) await this.prepareIndexMode(context, page);
|
|
291
|
+
if (tier === 3) await this.prepareCoordinateMode(context, page);
|
|
292
|
+
```
|
|
293
|
+
|
|
294
|
+
**System Prompt Selection:**
|
|
295
|
+
```typescript
|
|
296
|
+
const systemPrompt =
|
|
297
|
+
tier === 3 ? this.buildCoordinateSystemPrompt() :
|
|
298
|
+
tier === 2 ? this.buildIndexSystemPrompt() :
|
|
299
|
+
this.buildSystemPrompt();
|
|
300
|
+
```
|
|
301
|
+
|
|
302
|
+
**Execution Flow:**
|
|
303
|
+
```typescript
|
|
304
|
+
// Tier 1 (iteration 1):
|
|
305
|
+
if (decision.commands) {
|
|
306
|
+
// Execute normal Playwright command
|
|
307
|
+
// If fails โ iteration++, move to Tier 2
|
|
308
|
+
}
|
|
309
|
+
|
|
310
|
+
// Tier 2 (iterations 2-3):
|
|
311
|
+
if (decision.indexCommand) {
|
|
312
|
+
// Step A: Click using data-testchimp-el="[N]" (execution)
|
|
313
|
+
await page.locator('[data-testchimp-el="[3]"]').click();
|
|
314
|
+
|
|
315
|
+
// Step B: Resolve to native selector (script generation)
|
|
316
|
+
const nativeSelector = await SelectorResolver.resolve(3, elements);
|
|
317
|
+
// โ Returns: "getByRole('button', {name: 'Menu'})"
|
|
318
|
+
|
|
319
|
+
// Step C: Add to generated script
|
|
320
|
+
commandsExecuted.push(`await page.${nativeSelector}.click();`);
|
|
321
|
+
|
|
322
|
+
// If fails after 2 attempts โ iteration++, move to Tier 3
|
|
323
|
+
}
|
|
324
|
+
|
|
325
|
+
// Tier 3 (iterations 4+):
|
|
326
|
+
if (decision.coordinateAction) {
|
|
327
|
+
// Convert % to pixels and execute
|
|
328
|
+
// Add coordinate commands to script (acceptable for edge cases)
|
|
329
|
+
}
|
|
330
|
+
```
|
|
331
|
+
|
|
332
|
+
---
|
|
333
|
+
|
|
334
|
+
## Utilities Summary
|
|
335
|
+
|
|
336
|
+
All utilities are **stateless and reusable**:
|
|
337
|
+
|
|
338
|
+
| Utility | Purpose | Reusable For |
|
|
339
|
+
|---------|---------|--------------|
|
|
340
|
+
| ElementDetector | Find interactive elements | Accessibility audits, page analysis |
|
|
341
|
+
| SelectorResolver | Index โ selector with validation | Any numbered system |
|
|
342
|
+
| VisualMarkerInjector | Add visual labels | Manual testing, debugging |
|
|
343
|
+
| IndexCommandTranslator | Index command โ Playwright | Any index-based automation |
|
|
344
|
+
| CoordinateConverter | Percentage โ pixels | Any coordinate system |
|
|
345
|
+
|
|
346
|
+
---
|
|
347
|
+
|
|
348
|
+
## Implementation Timeline
|
|
349
|
+
|
|
350
|
+
### Week 1: Phase 1 Core
|
|
351
|
+
- [ ] NoteToFutureSelf type and tracking
|
|
352
|
+
- [ ] CoordinateAction with percentages
|
|
353
|
+
- [ ] CoordinateConverter utility
|
|
354
|
+
- [ ] Coordinate mode switching (tier 3)
|
|
355
|
+
|
|
356
|
+
### Week 2: Phase 1 Testing
|
|
357
|
+
- [ ] Test note-to-self on 10 scenarios
|
|
358
|
+
- [ ] Test percentage coordinates at multiple viewport sizes
|
|
359
|
+
- [ ] Verify all coordinate actions (click, fill, drag, scroll, hover)
|
|
360
|
+
|
|
361
|
+
### Week 3: Phase 2 Utilities
|
|
362
|
+
- [ ] ElementDetector with z-index awareness
|
|
363
|
+
- [ ] SelectorResolver with occlusion validation
|
|
364
|
+
- [ ] Test utilities standalone on complex pages
|
|
365
|
+
|
|
366
|
+
### Week 4: Phase 2 Integration
|
|
367
|
+
- [ ] VisualMarkerInjector
|
|
368
|
+
- [ ] IndexCommandTranslator (TWO-STAGE: execution via data-attr, script via native selector)
|
|
369
|
+
- [ ] Index mode (tier 2) integration with iteration-based switching
|
|
370
|
+
- [ ] Optimized escalation: iteration 1 โ tier 1, iteration 2-3 โ tier 2, iteration 4+ โ tier 3
|
|
371
|
+
- [ ] Test PeopleHR with tier 2 (should succeed in iteration 2-3)
|
|
372
|
+
|
|
373
|
+
### Week 5: Phase 2 Testing
|
|
374
|
+
- [ ] Three-tier end-to-end testing
|
|
375
|
+
- [ ] Measure tier distribution (target: 70/25/5)
|
|
376
|
+
- [ ] A/B test vs current implementation
|
|
377
|
+
- [ ] Performance optimization
|
|
378
|
+
|
|
379
|
+
## Success Metrics
|
|
380
|
+
|
|
381
|
+
**Phase 1:**
|
|
382
|
+
- 20-30% reduction in average iterations per step
|
|
383
|
+
- Note-to-self prevents 40%+ of repeated selector failures
|
|
384
|
+
- Coordinates used in < 5% of scenarios
|
|
385
|
+
|
|
386
|
+
**Phase 2:**
|
|
387
|
+
- 70% scenarios complete in Tier 1 (iteration 1) - simple cases
|
|
388
|
+
- 25% scenarios use Tier 2 (iterations 2-3) - complex UIs with icons/shadows
|
|
389
|
+
- < 5% scenarios escalate to Tier 3 (iterations 4+) - impossible selector cases
|
|
390
|
+
- **PeopleHR hamburger menu:** Succeeds in Tier 2 iteration 2 with CLICK[N]
|
|
391
|
+
- **Average iterations per step:** Should decrease from ~4 to ~1.5
|
|
392
|
+
|
|
393
|
+
## Total Effort: 4-5 weeks
|
|
394
|
+
|
|
395
|
+
**Ready to implement?**
|
|
396
|
+
|
|
@@ -0,0 +1,183 @@
|
|
|
1
|
+
# What's New in Runner-Core v0.0.33
|
|
2
|
+
|
|
3
|
+
## Phase 1: Tactical Improvements - COMPLETE โ
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## 1. ๐ Note to Future Self (Cross-Step Memory)
|
|
8
|
+
|
|
9
|
+
**The agent can now leave notes that persist across the entire scenario journey.**
|
|
10
|
+
|
|
11
|
+
### How it works:
|
|
12
|
+
```typescript
|
|
13
|
+
// Step 1 - Login
|
|
14
|
+
Agent: "Cookie modal appears after 2s. Dismiss it before interacting."
|
|
15
|
+
โ Stored in memory.latestNote
|
|
16
|
+
|
|
17
|
+
// Step 2 - Navigate to Dashboard
|
|
18
|
+
Agent reads note from Step 1
|
|
19
|
+
Agent: "Waiting 2s for cookie modal..."
|
|
20
|
+
โ Dismisses modal proactively
|
|
21
|
+
```
|
|
22
|
+
|
|
23
|
+
### Scope:
|
|
24
|
+
- โ
Across iterations (within same step)
|
|
25
|
+
- โ
Across steps (entire scenario)
|
|
26
|
+
- โ
Free-form text (agent decides what's important)
|
|
27
|
+
|
|
28
|
+
### Example notes:
|
|
29
|
+
- **Tactical:** "Tried #menu, failed. Try SVG child next."
|
|
30
|
+
- **Strategic:** "This app uses shadow DOM. Prefer CSS selectors over getByRole."
|
|
31
|
+
- **Behavioral:** "Modals load after 2s delay. Wait before clicking."
|
|
32
|
+
|
|
33
|
+
---
|
|
34
|
+
|
|
35
|
+
## 2. ๐ฏ Percentage-Based Coordinate Fallback
|
|
36
|
+
|
|
37
|
+
**When selectors fail, use visual positioning as last resort.**
|
|
38
|
+
|
|
39
|
+
### Precision:
|
|
40
|
+
- 3 decimal places (e.g., 15.755%, 8.500%)
|
|
41
|
+
- ~1 pixel accuracy on most screens
|
|
42
|
+
- Resolution-independent
|
|
43
|
+
|
|
44
|
+
### Supported Actions:
|
|
45
|
+
- **Click:** `{action: "click", xPercent: 15.755, yPercent: 8.500}`
|
|
46
|
+
- **Fill:** `{action: "fill", xPercent: 30.000, yPercent: 25.000, value: "text"}`
|
|
47
|
+
- **Drag:** `{action: "drag", xPercent: 10.000, yPercent: 50.000, toXPercent: 60.000, toYPercent: 50.000}`
|
|
48
|
+
- **Hover, RightClick, DoubleClick, Scroll**
|
|
49
|
+
|
|
50
|
+
### Auto-Activation:
|
|
51
|
+
- Triggers after 3 consecutive selector failures
|
|
52
|
+
- Limited to 2 coordinate attempts
|
|
53
|
+
- Then gives up (stuck)
|
|
54
|
+
|
|
55
|
+
---
|
|
56
|
+
|
|
57
|
+
## 3. โก Optimized Iteration Budget
|
|
58
|
+
|
|
59
|
+
**Maximum 5 iterations per step** (down from 8)
|
|
60
|
+
|
|
61
|
+
```
|
|
62
|
+
Iterations 1-3: Playwright selectors (3 attempts)
|
|
63
|
+
with note-to-self between each
|
|
64
|
+
|
|
65
|
+
Iterations 4-5: Coordinates (2 attempts max)
|
|
66
|
+
If both fail โ stuck
|
|
67
|
+
```
|
|
68
|
+
|
|
69
|
+
**Why:** Coordinates either work or don't - no point retrying 5+ times.
|
|
70
|
+
|
|
71
|
+
---
|
|
72
|
+
|
|
73
|
+
## 4. ๐ Smart Timeout Handling (Earlier Fix)
|
|
74
|
+
|
|
75
|
+
**Navigation operations now have appropriate timeouts:**
|
|
76
|
+
- `waitForLoadState()`: 30 seconds (was 5s)
|
|
77
|
+
- `goto()`: 30 seconds
|
|
78
|
+
- Element operations: 5 seconds (unchanged)
|
|
79
|
+
|
|
80
|
+
**Detects automatically:** Code scans command for navigation keywords.
|
|
81
|
+
|
|
82
|
+
---
|
|
83
|
+
|
|
84
|
+
## How Notes Work Across Steps
|
|
85
|
+
|
|
86
|
+
### Example Scenario:
|
|
87
|
+
|
|
88
|
+
```
|
|
89
|
+
Step 1: Login
|
|
90
|
+
Iteration 1: Fill username โ Success
|
|
91
|
+
Iteration 2: Fill password โ Success
|
|
92
|
+
Iteration 3: Click login โ Success
|
|
93
|
+
Agent note: "Login redirects to dashboard. Cookie modal appears after 2s."
|
|
94
|
+
|
|
95
|
+
Step 2: Navigate to Settings
|
|
96
|
+
Reads note from Step 1: "Cookie modal appears after 2s"
|
|
97
|
+
Iteration 1:
|
|
98
|
+
- Wait 2s
|
|
99
|
+
- Dismiss modal
|
|
100
|
+
- Click Settings
|
|
101
|
+
โ Success in 1 iteration! (note prevented wasted attempts)
|
|
102
|
+
```
|
|
103
|
+
|
|
104
|
+
**Benefit:** Agent builds up knowledge about the application and uses it in future steps.
|
|
105
|
+
|
|
106
|
+
---
|
|
107
|
+
|
|
108
|
+
## Comparison: Before vs After
|
|
109
|
+
|
|
110
|
+
| Aspect | Before (v0.0.32) | After (v0.0.33) |
|
|
111
|
+
|--------|------------------|------------------|
|
|
112
|
+
| Iteration memory | None | Note to self (cross-step) |
|
|
113
|
+
| Selector fails | Give up or loop | Coordinate fallback |
|
|
114
|
+
| Max iterations | 8 per step | 5 per step |
|
|
115
|
+
| Timeout handling | 5s for all | 30s for navigation |
|
|
116
|
+
| Coordinate support | None | Full (click, fill, drag, etc.) |
|
|
117
|
+
| Average iterations | ~4 per step | ~2.5 per step (estimated) |
|
|
118
|
+
|
|
119
|
+
---
|
|
120
|
+
|
|
121
|
+
## Testing Recommendations
|
|
122
|
+
|
|
123
|
+
### Test 1: Note Continuity
|
|
124
|
+
Create a scenario with repeated patterns:
|
|
125
|
+
```
|
|
126
|
+
- Login
|
|
127
|
+
- Go to page A โ encounter modal
|
|
128
|
+
- Go to page B โ should handle modal proactively
|
|
129
|
+
```
|
|
130
|
+
|
|
131
|
+
**Expected:** Step 2 learns from Step 1's note.
|
|
132
|
+
|
|
133
|
+
### Test 2: Coordinate Fallback
|
|
134
|
+
Run PeopleHR scenario:
|
|
135
|
+
```
|
|
136
|
+
- Click hamburger menu (SVG icon)
|
|
137
|
+
```
|
|
138
|
+
|
|
139
|
+
**Expected:**
|
|
140
|
+
- Iterations 1-3: Try selectors (may fail)
|
|
141
|
+
- Iteration 4: Coordinates โ succeeds
|
|
142
|
+
- Generated script: `await page.mouse.click(x, y);`
|
|
143
|
+
|
|
144
|
+
### Test 3: Timeout Fix
|
|
145
|
+
Any scenario with:
|
|
146
|
+
```
|
|
147
|
+
- await page.waitForLoadState('networkidle');
|
|
148
|
+
```
|
|
149
|
+
|
|
150
|
+
**Expected:** No more 5s timeout errors.
|
|
151
|
+
|
|
152
|
+
---
|
|
153
|
+
|
|
154
|
+
## Migration
|
|
155
|
+
|
|
156
|
+
**No code changes needed!** Existing code works as-is with improvements.
|
|
157
|
+
|
|
158
|
+
**New response fields** (optional):
|
|
159
|
+
- `noteToFutureSelf`: string (agent can optionally include)
|
|
160
|
+
- `coordinateAction`: object (only when coordinate mode active)
|
|
161
|
+
|
|
162
|
+
---
|
|
163
|
+
|
|
164
|
+
## What's Next: Phase 2
|
|
165
|
+
|
|
166
|
+
Phase 2 will add numbered element system for even better reliability:
|
|
167
|
+
- Iteration 1: Playwright selector (1 attempt)
|
|
168
|
+
- Iterations 2-3: Index commands CLICK[3] (2 attempts)
|
|
169
|
+
- Iterations 4-5: Coordinates (2 attempts)
|
|
170
|
+
|
|
171
|
+
**Target:** ~1.5 average iterations per step
|
|
172
|
+
|
|
173
|
+
---
|
|
174
|
+
|
|
175
|
+
## Status
|
|
176
|
+
|
|
177
|
+
โ
**Built and Ready**
|
|
178
|
+
๐ฆ **Version:** v0.0.33
|
|
179
|
+
๐งช **Status:** Ready for testing
|
|
180
|
+
๐ **Expected Impact:** 30-40% reduction in iterations
|
|
181
|
+
|
|
182
|
+
**Test now to validate improvements before Phase 2!**
|
|
183
|
+
|