testchimp-runner-core 0.0.33 โ†’ 0.0.35

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (152) hide show
  1. package/dist/execution-service.d.ts +1 -4
  2. package/dist/execution-service.d.ts.map +1 -1
  3. package/dist/execution-service.js +155 -468
  4. package/dist/execution-service.js.map +1 -1
  5. package/dist/index.d.ts +3 -1
  6. package/dist/index.d.ts.map +1 -1
  7. package/dist/index.js +11 -1
  8. package/dist/index.js.map +1 -1
  9. package/dist/llm-facade.d.ts.map +1 -1
  10. package/dist/llm-facade.js +7 -7
  11. package/dist/llm-facade.js.map +1 -1
  12. package/dist/llm-provider.d.ts +9 -0
  13. package/dist/llm-provider.d.ts.map +1 -1
  14. package/dist/model-constants.d.ts +16 -5
  15. package/dist/model-constants.d.ts.map +1 -1
  16. package/dist/model-constants.js +17 -6
  17. package/dist/model-constants.js.map +1 -1
  18. package/dist/orchestrator/decision-parser.d.ts +18 -0
  19. package/dist/orchestrator/decision-parser.d.ts.map +1 -0
  20. package/dist/orchestrator/decision-parser.js +127 -0
  21. package/dist/orchestrator/decision-parser.js.map +1 -0
  22. package/dist/orchestrator/index.d.ts +4 -2
  23. package/dist/orchestrator/index.d.ts.map +1 -1
  24. package/dist/orchestrator/index.js +15 -2
  25. package/dist/orchestrator/index.js.map +1 -1
  26. package/dist/orchestrator/orchestrator-agent.d.ts +17 -22
  27. package/dist/orchestrator/orchestrator-agent.d.ts.map +1 -1
  28. package/dist/orchestrator/orchestrator-agent.js +708 -577
  29. package/dist/orchestrator/orchestrator-agent.js.map +1 -1
  30. package/dist/orchestrator/orchestrator-prompts.d.ts +32 -0
  31. package/dist/orchestrator/orchestrator-prompts.d.ts.map +1 -0
  32. package/dist/orchestrator/orchestrator-prompts.js +737 -0
  33. package/dist/orchestrator/orchestrator-prompts.js.map +1 -0
  34. package/dist/orchestrator/page-som-handler.d.ts +106 -0
  35. package/dist/orchestrator/page-som-handler.d.ts.map +1 -0
  36. package/dist/orchestrator/page-som-handler.js +1353 -0
  37. package/dist/orchestrator/page-som-handler.js.map +1 -0
  38. package/dist/orchestrator/som-types.d.ts +149 -0
  39. package/dist/orchestrator/som-types.d.ts.map +1 -0
  40. package/dist/orchestrator/som-types.js +87 -0
  41. package/dist/orchestrator/som-types.js.map +1 -0
  42. package/dist/orchestrator/tool-registry.d.ts +2 -0
  43. package/dist/orchestrator/tool-registry.d.ts.map +1 -1
  44. package/dist/orchestrator/tool-registry.js.map +1 -1
  45. package/dist/orchestrator/tools/index.d.ts +5 -1
  46. package/dist/orchestrator/tools/index.d.ts.map +1 -1
  47. package/dist/orchestrator/tools/index.js +9 -2
  48. package/dist/orchestrator/tools/index.js.map +1 -1
  49. package/dist/orchestrator/tools/refresh-som-markers.d.ts +12 -0
  50. package/dist/orchestrator/tools/refresh-som-markers.d.ts.map +1 -0
  51. package/dist/orchestrator/tools/refresh-som-markers.js +64 -0
  52. package/dist/orchestrator/tools/refresh-som-markers.js.map +1 -0
  53. package/dist/orchestrator/tools/verify-action-result.d.ts +17 -0
  54. package/dist/orchestrator/tools/verify-action-result.d.ts.map +1 -0
  55. package/dist/orchestrator/tools/verify-action-result.js +140 -0
  56. package/dist/orchestrator/tools/verify-action-result.js.map +1 -0
  57. package/dist/orchestrator/tools/view-previous-screenshot.d.ts +15 -0
  58. package/dist/orchestrator/tools/view-previous-screenshot.d.ts.map +1 -0
  59. package/dist/orchestrator/tools/view-previous-screenshot.js +92 -0
  60. package/dist/orchestrator/tools/view-previous-screenshot.js.map +1 -0
  61. package/dist/orchestrator/types.d.ts +49 -1
  62. package/dist/orchestrator/types.d.ts.map +1 -1
  63. package/dist/orchestrator/types.js +11 -1
  64. package/dist/orchestrator/types.js.map +1 -1
  65. package/dist/prompts.d.ts.map +1 -1
  66. package/dist/prompts.js +40 -34
  67. package/dist/prompts.js.map +1 -1
  68. package/dist/scenario-service.d.ts +5 -0
  69. package/dist/scenario-service.d.ts.map +1 -1
  70. package/dist/scenario-service.js +17 -0
  71. package/dist/scenario-service.js.map +1 -1
  72. package/dist/scenario-worker-class.d.ts +4 -0
  73. package/dist/scenario-worker-class.d.ts.map +1 -1
  74. package/dist/scenario-worker-class.js +21 -3
  75. package/dist/scenario-worker-class.js.map +1 -1
  76. package/dist/testing/agent-tester.d.ts +35 -0
  77. package/dist/testing/agent-tester.d.ts.map +1 -0
  78. package/dist/testing/agent-tester.js +84 -0
  79. package/dist/testing/agent-tester.js.map +1 -0
  80. package/dist/testing/ref-translator-tester.d.ts +44 -0
  81. package/dist/testing/ref-translator-tester.d.ts.map +1 -0
  82. package/dist/testing/ref-translator-tester.js +104 -0
  83. package/dist/testing/ref-translator-tester.js.map +1 -0
  84. package/dist/utils/coordinate-converter.d.ts +32 -0
  85. package/dist/utils/coordinate-converter.d.ts.map +1 -0
  86. package/dist/utils/coordinate-converter.js +130 -0
  87. package/dist/utils/coordinate-converter.js.map +1 -0
  88. package/dist/utils/hierarchical-selector.d.ts +47 -0
  89. package/dist/utils/hierarchical-selector.d.ts.map +1 -0
  90. package/dist/utils/hierarchical-selector.js +212 -0
  91. package/dist/utils/hierarchical-selector.js.map +1 -0
  92. package/dist/utils/page-info-retry.d.ts +14 -0
  93. package/dist/utils/page-info-retry.d.ts.map +1 -0
  94. package/dist/utils/page-info-retry.js +60 -0
  95. package/dist/utils/page-info-retry.js.map +1 -0
  96. package/dist/utils/page-info-utils.d.ts +1 -0
  97. package/dist/utils/page-info-utils.d.ts.map +1 -1
  98. package/dist/utils/page-info-utils.js +46 -18
  99. package/dist/utils/page-info-utils.js.map +1 -1
  100. package/dist/utils/ref-attacher.d.ts +21 -0
  101. package/dist/utils/ref-attacher.d.ts.map +1 -0
  102. package/dist/utils/ref-attacher.js +149 -0
  103. package/dist/utils/ref-attacher.js.map +1 -0
  104. package/dist/utils/ref-translator.d.ts +49 -0
  105. package/dist/utils/ref-translator.d.ts.map +1 -0
  106. package/dist/utils/ref-translator.js +276 -0
  107. package/dist/utils/ref-translator.js.map +1 -0
  108. package/package.json +1 -1
  109. package/plandocs/BEFORE_AFTER_VERIFICATION.md +148 -0
  110. package/plandocs/COORDINATE_MODE_DIAGNOSIS.md +144 -0
  111. package/plandocs/IMPLEMENTATION_STATUS.md +108 -0
  112. package/plandocs/PHASE_1_COMPLETE.md +165 -0
  113. package/plandocs/PHASE_1_SUMMARY.md +184 -0
  114. package/plandocs/PROMPT_OPTIMIZATION_ANALYSIS.md +120 -0
  115. package/plandocs/PROMPT_SANITY_CHECK.md +120 -0
  116. package/plandocs/SESSION_SUMMARY_v0.0.33.md +151 -0
  117. package/plandocs/TROUBLESHOOTING_SESSION.md +72 -0
  118. package/plandocs/VISUAL_AGENT_EVOLUTION_PLAN.md +396 -0
  119. package/plandocs/WHATS_NEW_v0.0.33.md +183 -0
  120. package/plandocs/exploratory-mode-support-v2.plan.md +953 -0
  121. package/plandocs/exploratory-mode-support.plan.md +928 -0
  122. package/plandocs/journey-id-tracking-addendum.md +227 -0
  123. package/src/execution-service.ts +179 -596
  124. package/src/index.ts +10 -0
  125. package/src/llm-facade.ts +8 -8
  126. package/src/llm-provider.ts +11 -1
  127. package/src/model-constants.ts +17 -5
  128. package/src/orchestrator/decision-parser.ts +139 -0
  129. package/src/orchestrator/index.ts +27 -2
  130. package/src/orchestrator/orchestrator-agent.ts +868 -623
  131. package/src/orchestrator/orchestrator-prompts.ts +786 -0
  132. package/src/orchestrator/page-som-handler.ts +1565 -0
  133. package/src/orchestrator/som-types.ts +188 -0
  134. package/src/orchestrator/tool-registry.ts +2 -0
  135. package/src/orchestrator/tools/index.ts +5 -1
  136. package/src/orchestrator/tools/refresh-som-markers.ts +69 -0
  137. package/src/orchestrator/tools/verify-action-result.ts +159 -0
  138. package/src/orchestrator/tools/view-previous-screenshot.ts +103 -0
  139. package/src/orchestrator/types.ts +95 -4
  140. package/src/prompts.ts +40 -34
  141. package/src/scenario-service.ts +20 -0
  142. package/src/scenario-worker-class.ts +30 -4
  143. package/src/utils/coordinate-converter.ts +162 -0
  144. package/src/utils/page-info-retry.ts +65 -0
  145. package/src/utils/page-info-utils.ts +53 -18
  146. package/testchimp-runner-core-0.0.35.tgz +0 -0
  147. /package/{CREDIT_CALLBACK_ARCHITECTURE.md โ†’ plandocs/CREDIT_CALLBACK_ARCHITECTURE.md} +0 -0
  148. /package/{INTEGRATION_COMPLETE.md โ†’ plandocs/INTEGRATION_COMPLETE.md} +0 -0
  149. /package/{VISION_DIAGNOSTICS_IMPROVEMENTS.md โ†’ plandocs/VISION_DIAGNOSTICS_IMPROVEMENTS.md} +0 -0
  150. /package/{RELEASE_0.0.26.md โ†’ releasenotes/RELEASE_0.0.26.md} +0 -0
  151. /package/{RELEASE_0.0.27.md โ†’ releasenotes/RELEASE_0.0.27.md} +0 -0
  152. /package/{RELEASE_0.0.28.md โ†’ releasenotes/RELEASE_0.0.28.md} +0 -0
@@ -0,0 +1,396 @@
1
+ # Runner-Core Visual Agent Evolution - Complete Implementation Plan
2
+
3
+ ## Overview
4
+
5
+ Two-phase pragmatic evolution without major architecture overhaul:
6
+ - **Phase 1 (Week 1-2):** Percentage coordinates + Free-form notes
7
+ - **Phase 2 (Week 3-5):** Numbered element system with three-tier fallback
8
+
9
+ ---
10
+
11
+ ## PHASE 1: Tactical Improvements
12
+
13
+ ### 1A: Free-Form "Note to Future Self"
14
+
15
+ **Why:** Agent needs tactical memory between iterations of the SAME step.
16
+
17
+ **Type:**
18
+ ```typescript
19
+ interface NoteToFutureSelf {
20
+ fromIteration: number;
21
+ content: string; // FREE-FORM - agent writes whatever it wants
22
+ }
23
+ ```
24
+
25
+ **Examples:**
26
+ - "Tried #sidebar-toggle, failed. Will try SVG child next."
27
+ - "Plan: Hover over menu first to reveal dropdown, then click Settings."
28
+ - "Cookie banner blocking. Next: dismiss it, then retry main action."
29
+
30
+ **vs Current Learnings:**
31
+ - Learnings = App-wide patterns ("App uses getByRole")
32
+ - Note to self = Iteration-specific tactics ("Just tried X, will try Y next")
33
+
34
+ **Keep BOTH!**
35
+
36
+ ### 1B: Percentage-Based Coordinates
37
+
38
+ **LLM outputs percentages:**
39
+ ```json
40
+ {
41
+ "coordinateAction": {
42
+ "action": "click|fill|drag|hover|scroll",
43
+ "xPercent": 15.75, // 2 decimal precision for accuracy
44
+ "yPercent": 8.50,
45
+
46
+ // For drag:
47
+ "toXPercent": 45.25,
48
+ "toYPercent": 8.50,
49
+
50
+ // For fill:
51
+ "value": "alice@example.com"
52
+ }
53
+ }
54
+ ```
55
+
56
+ **Code converts to pixels:**
57
+ ```typescript
58
+ const viewport = await page.evaluate(() => ({width: window.innerWidth, height: window.innerHeight}));
59
+ const x = Math.round((xPercent / 100) * viewport.width);
60
+ const y = Math.round((yPercent / 100) * viewport.height);
61
+ ```
62
+
63
+ ### 1C: Code-Controlled Fallback
64
+
65
+ **Tier 1 (iteration 1-3):** Normal Playwright selectors
66
+ **Tier 2 (iteration 4+):** Percentage coordinates (auto-triggered after 3 failures)
67
+
68
+ ```typescript
69
+ // In callAgent():
70
+ if (consecutiveFailures >= 3) { // Phase 1: Only 2 tiers
71
+ // Auto-use coordinate-specific system prompt
72
+ // LLM outputs percentages
73
+ // Phase 2 will add tier 2 (indexed elements) between selectors and coordinates
74
+ }
75
+ ```
76
+
77
+ **Files:**
78
+ - `utils/coordinate-converter.ts` (NEW)
79
+ - `orchestrator/orchestrator-agent.ts` (UPDATE)
80
+ - `orchestrator/types.ts` (UPDATE - add CoordinateAction, NoteToFutureSelf)
81
+
82
+ ---
83
+
84
+ ## PHASE 2: Numbered Element System
85
+
86
+ ### Architecture
87
+
88
+ ```
89
+ Three-Tier Fallback:
90
+
91
+ Tier 1 (iteration 1 ONLY): Playwright Selector Mode - ONE SHOT
92
+ โ”œโ”€ Agent generates: await page.getByRole('button', {name: 'Login'}).click()
93
+ โ”œโ”€ Direct execution
94
+ โ””โ”€ 70% of tasks finish here (simple/medium complexity)
95
+
96
+ Tier 2 (iterations 2-3): Index Command Mode - TWO ATTEMPTS
97
+ โ”œโ”€ Inject numbered markers [1], [2], [3] โ†’ screenshot
98
+ โ”œโ”€ Agent outputs: CLICK[3], FILL[5, "alice@example.com"]
99
+ โ”œโ”€ Execution: Use data-testchimp-el="[3]" (reliable targeting)
100
+ โ”œโ”€ Script output: Translate to NATIVE selector (getByRole, #id, etc.)
101
+ โ””โ”€ 25% of tasks finish here (complex UIs, icons, shadow DOM)
102
+
103
+ Tier 3 (iterations 4+): Percentage Coordinate Mode - LAST RESORT
104
+ โ”œโ”€ Agent outputs: {xPercent: 15.755, yPercent: 8.500}
105
+ โ”œโ”€ CoordinateConverter: % โ†’ pixels
106
+ โ”œโ”€ Execute: mouse.click(x, y)
107
+ โ””โ”€ <5% of tasks need this (extreme edge cases)
108
+ ```
109
+
110
+ ### 2.1: Reusable Utility: ElementDetector
111
+
112
+ **File:** `runner-core/src/utils/element-detector.ts`
113
+
114
+ **Purpose:** Detect ALL interactive elements with z-index and occlusion awareness.
115
+
116
+ **Key Features:**
117
+ - Comprehensive element queries (buttons, links, inputs, SVGs, clickable divs)
118
+ - Z-index calculation via `getComputedStyle()`
119
+ - Occlusion detection via `elementFromPoint()` at center
120
+ - Context tags: `[header|nav|sidebar|main|modal]`
121
+ - Spatial tags: `[top|bottom|left|right|center]`
122
+ - Center position as percentage (2 decimal precision)
123
+
124
+ **Output:**
125
+ ```typescript
126
+ interface DetectedElement {
127
+ index: number; // [1], [2], [3]
128
+ tag: string;
129
+ text: string;
130
+ role: string;
131
+ ariaLabel: string;
132
+ bbox: {x, y, width, height};
133
+ centerPercent: {x: 15.75, y: 8.50}; // As percentage
134
+ context: string[]; // ['header', 'nav', 'top']
135
+ zIndex: number;
136
+ isVisible: boolean; // Not occluded by higher z-index
137
+ selectors: {
138
+ dataAttribute: string; // [data-testchimp-el="[3]"]
139
+ semantic: string[]; // [getByRole(...), getByLabel(...)]
140
+ cssId: string | null;
141
+ cssClass: string | null;
142
+ };
143
+ }
144
+ ```
145
+
146
+ **Implementation Highlights:**
147
+ - Queries: `button`, `a[href]`, `input`, `svg`, `[onclick]`, `[role=...]`, `[data-testid]`, clickable divs/spans
148
+ - Z-index check: `elementFromPoint(centerX, centerY)` must return element or child
149
+ - Filter: Only return `isVisible: true` elements
150
+
151
+ ### 2.2: Reusable Utility: SelectorResolver
152
+
153
+ **File:** `runner-core/src/utils/selector-resolver.ts`
154
+
155
+ **Purpose:** Given element index, return most reliable Playwright selector FOR GENERATED SCRIPTS.
156
+
157
+ **CRITICAL DISTINCTION:**
158
+
159
+ A. **During Execution (Internal):**
160
+ - Agent can click using `data-testchimp-el="[N]"` (we inject it temporarily)
161
+ - This ensures agent clicks the exact right element
162
+
163
+ B. **For Generated Script (Output):**
164
+ - Must use NATIVE selectors that work without our attributes
165
+ - Script will run on real application without data-testchimp-el
166
+
167
+ **Selector Resolution for Script Output (in order):**
168
+ 1. Semantic selectors - `getByRole()`, `getByLabel()` (BEST - maintainable)
169
+ 2. CSS ID - `#element-id` (GOOD - stable)
170
+ 3. CSS class - `.button-primary` (scoped to context if ambiguous)
171
+ 4. Contextual selector - `header .menu-toggle svg` (LAST RESORT)
172
+
173
+ **For each selector, validate:**
174
+ - Element exists on page
175
+ - Not covered by higher z-index (using elementFromPoint)
176
+ - Position matches expected (ยฑ5% tolerance)
177
+
178
+ **Method:**
179
+ ```typescript
180
+ static async resolveIndexToSelector(
181
+ index: number,
182
+ elements: DetectedElement[],
183
+ page: Page
184
+ ): Promise<{selector: string, strategy: string}>
185
+ ```
186
+
187
+ **Validation:**
188
+ ```typescript
189
+ private static async validateInteractable(
190
+ page: Page,
191
+ selector: string,
192
+ expectedCenterPercent: {x, y}
193
+ ): Promise<boolean> {
194
+ // 1. Element exists
195
+ // 2. Non-zero dimensions
196
+ // 3. elementFromPoint(center) === element (z-index check)
197
+ // 4. Position within ยฑ5% tolerance
198
+ }
199
+ ```
200
+
201
+ ### 2.3: Reusable Utility: VisualMarkerInjector
202
+
203
+ **File:** `runner-core/src/utils/visual-marker-injector.ts`
204
+
205
+ **Purpose:** Inject visual numbered labels on page.
206
+
207
+ **Methods:**
208
+ ```typescript
209
+ static async injectNumberedMarkers(page): Promise<DetectedElement[]>
210
+ static async removeMarkers(page): Promise<void>
211
+ static async captureMarkedScreenshot(page, elements): Promise<string>
212
+ ```
213
+
214
+ **Visual Markers:**
215
+ - Red gradient background `[1]`, `[2]`, `[3]`
216
+ - Positioned at top-left of element
217
+ - z-index: 999999 (always visible)
218
+ - Inject `data-testchimp-el` attribute on actual element (TEMPORARY - for agent execution only)
219
+ * Used internally to ensure agent clicks correct element
220
+ * Removed after step completion
221
+ * NEVER appears in generated script output
222
+
223
+ ### 2.4: Reusable Utility: IndexCommandTranslator
224
+
225
+ **File:** `runner-core/src/utils/index-command-translator.ts`
226
+
227
+ **Purpose:** Translate index commands to Playwright commands with NATIVE selectors (for script generation).
228
+
229
+ **Input:**
230
+ ```typescript
231
+ { action: "CLICK", index: 3 }
232
+ { action: "FILL", index: 5, value: "alice@example.com" }
233
+ ```
234
+
235
+ **Output (MUST use native selectors):**
236
+ ```typescript
237
+ "await page.getByRole('button', {name: 'Menu'}).click();"
238
+ "await page.locator('#username').fill('alice@example.com');"
239
+ // OR
240
+ "await page.locator('#sidebar-toggle svg').click();"
241
+ ```
242
+
243
+ **NOT this (won't work in generated script):**
244
+ ```typescript
245
+ โŒ "await page.locator('[data-testchimp-el=\"[3]\"]').click();"
246
+ โŒ "await page.locator('[data-testchimp-el=\"[5]\"]').fill('alice@example.com');"
247
+ ```
248
+
249
+ **Process:**
250
+ 1. **During execution:** Click element using `data-testchimp-el="[index]"` (reliable)
251
+ 2. **For script output:** Use SelectorResolver to get NATIVE selector (semantic/id/class)
252
+ 3. Generate Playwright command with native selector
253
+ 4. Return command string that works on real application
254
+
255
+ **Critical Distinction:**
256
+ - `data-testchimp-el` = Internal execution helper (temporary)
257
+ - Script output = Native selectors (permanent, works standalone)
258
+
259
+ ### 2.5: Integration - Three-Tier System (Optimized Escalation)
260
+
261
+ **File:** `orchestrator/orchestrator-agent.ts`
262
+
263
+ **Optimized Strategy:** Escalate quickly to avoid wasting time on difficult tasks
264
+
265
+ **Mode Determination:**
266
+ ```typescript
267
+ let tier: 1 | 2 | 3;
268
+
269
+ // Tier 1: Try normal selectors ONCE (iteration 1)
270
+ // Tier 2: Use indexed elements TWICE (iterations 2-3)
271
+ // Tier 3: Use percentage coords (iterations 4+)
272
+
273
+ if (iteration >= 4) {
274
+ tier = 3; // Coordinate mode
275
+ } else if (iteration >= 2) {
276
+ tier = 2; // Index command mode
277
+ } else {
278
+ tier = 1; // Normal Playwright selector mode
279
+ }
280
+ ```
281
+
282
+ **Rationale:**
283
+ - Simple tasks: Succeed in Tier 1 (iteration 1) - fast!
284
+ - Medium tasks: Tier 2 gives 2 attempts with reliable index system
285
+ - Hard tasks: Tier 3 coordinates as absolute fallback
286
+ - No wasted iterations on difficult element detection
287
+
288
+ **Tier Preparation:**
289
+ ```typescript
290
+ if (tier === 2) await this.prepareIndexMode(context, page);
291
+ if (tier === 3) await this.prepareCoordinateMode(context, page);
292
+ ```
293
+
294
+ **System Prompt Selection:**
295
+ ```typescript
296
+ const systemPrompt =
297
+ tier === 3 ? this.buildCoordinateSystemPrompt() :
298
+ tier === 2 ? this.buildIndexSystemPrompt() :
299
+ this.buildSystemPrompt();
300
+ ```
301
+
302
+ **Execution Flow:**
303
+ ```typescript
304
+ // Tier 1 (iteration 1):
305
+ if (decision.commands) {
306
+ // Execute normal Playwright command
307
+ // If fails โ†’ iteration++, move to Tier 2
308
+ }
309
+
310
+ // Tier 2 (iterations 2-3):
311
+ if (decision.indexCommand) {
312
+ // Step A: Click using data-testchimp-el="[N]" (execution)
313
+ await page.locator('[data-testchimp-el="[3]"]').click();
314
+
315
+ // Step B: Resolve to native selector (script generation)
316
+ const nativeSelector = await SelectorResolver.resolve(3, elements);
317
+ // โ†’ Returns: "getByRole('button', {name: 'Menu'})"
318
+
319
+ // Step C: Add to generated script
320
+ commandsExecuted.push(`await page.${nativeSelector}.click();`);
321
+
322
+ // If fails after 2 attempts โ†’ iteration++, move to Tier 3
323
+ }
324
+
325
+ // Tier 3 (iterations 4+):
326
+ if (decision.coordinateAction) {
327
+ // Convert % to pixels and execute
328
+ // Add coordinate commands to script (acceptable for edge cases)
329
+ }
330
+ ```
331
+
332
+ ---
333
+
334
+ ## Utilities Summary
335
+
336
+ All utilities are **stateless and reusable**:
337
+
338
+ | Utility | Purpose | Reusable For |
339
+ |---------|---------|--------------|
340
+ | ElementDetector | Find interactive elements | Accessibility audits, page analysis |
341
+ | SelectorResolver | Index โ†’ selector with validation | Any numbered system |
342
+ | VisualMarkerInjector | Add visual labels | Manual testing, debugging |
343
+ | IndexCommandTranslator | Index command โ†’ Playwright | Any index-based automation |
344
+ | CoordinateConverter | Percentage โ†’ pixels | Any coordinate system |
345
+
346
+ ---
347
+
348
+ ## Implementation Timeline
349
+
350
+ ### Week 1: Phase 1 Core
351
+ - [ ] NoteToFutureSelf type and tracking
352
+ - [ ] CoordinateAction with percentages
353
+ - [ ] CoordinateConverter utility
354
+ - [ ] Coordinate mode switching (tier 3)
355
+
356
+ ### Week 2: Phase 1 Testing
357
+ - [ ] Test note-to-self on 10 scenarios
358
+ - [ ] Test percentage coordinates at multiple viewport sizes
359
+ - [ ] Verify all coordinate actions (click, fill, drag, scroll, hover)
360
+
361
+ ### Week 3: Phase 2 Utilities
362
+ - [ ] ElementDetector with z-index awareness
363
+ - [ ] SelectorResolver with occlusion validation
364
+ - [ ] Test utilities standalone on complex pages
365
+
366
+ ### Week 4: Phase 2 Integration
367
+ - [ ] VisualMarkerInjector
368
+ - [ ] IndexCommandTranslator (TWO-STAGE: execution via data-attr, script via native selector)
369
+ - [ ] Index mode (tier 2) integration with iteration-based switching
370
+ - [ ] Optimized escalation: iteration 1 โ†’ tier 1, iteration 2-3 โ†’ tier 2, iteration 4+ โ†’ tier 3
371
+ - [ ] Test PeopleHR with tier 2 (should succeed in iteration 2-3)
372
+
373
+ ### Week 5: Phase 2 Testing
374
+ - [ ] Three-tier end-to-end testing
375
+ - [ ] Measure tier distribution (target: 70/25/5)
376
+ - [ ] A/B test vs current implementation
377
+ - [ ] Performance optimization
378
+
379
+ ## Success Metrics
380
+
381
+ **Phase 1:**
382
+ - 20-30% reduction in average iterations per step
383
+ - Note-to-self prevents 40%+ of repeated selector failures
384
+ - Coordinates used in < 5% of scenarios
385
+
386
+ **Phase 2:**
387
+ - 70% scenarios complete in Tier 1 (iteration 1) - simple cases
388
+ - 25% scenarios use Tier 2 (iterations 2-3) - complex UIs with icons/shadows
389
+ - < 5% scenarios escalate to Tier 3 (iterations 4+) - impossible selector cases
390
+ - **PeopleHR hamburger menu:** Succeeds in Tier 2 iteration 2 with CLICK[N]
391
+ - **Average iterations per step:** Should decrease from ~4 to ~1.5
392
+
393
+ ## Total Effort: 4-5 weeks
394
+
395
+ **Ready to implement?**
396
+
@@ -0,0 +1,183 @@
1
+ # What's New in Runner-Core v0.0.33
2
+
3
+ ## Phase 1: Tactical Improvements - COMPLETE โœ…
4
+
5
+ ---
6
+
7
+ ## 1. ๐Ÿ“ Note to Future Self (Cross-Step Memory)
8
+
9
+ **The agent can now leave notes that persist across the entire scenario journey.**
10
+
11
+ ### How it works:
12
+ ```typescript
13
+ // Step 1 - Login
14
+ Agent: "Cookie modal appears after 2s. Dismiss it before interacting."
15
+ โ†’ Stored in memory.latestNote
16
+
17
+ // Step 2 - Navigate to Dashboard
18
+ Agent reads note from Step 1
19
+ Agent: "Waiting 2s for cookie modal..."
20
+ โ†’ Dismisses modal proactively
21
+ ```
22
+
23
+ ### Scope:
24
+ - โœ… Across iterations (within same step)
25
+ - โœ… Across steps (entire scenario)
26
+ - โœ… Free-form text (agent decides what's important)
27
+
28
+ ### Example notes:
29
+ - **Tactical:** "Tried #menu, failed. Try SVG child next."
30
+ - **Strategic:** "This app uses shadow DOM. Prefer CSS selectors over getByRole."
31
+ - **Behavioral:** "Modals load after 2s delay. Wait before clicking."
32
+
33
+ ---
34
+
35
+ ## 2. ๐ŸŽฏ Percentage-Based Coordinate Fallback
36
+
37
+ **When selectors fail, use visual positioning as last resort.**
38
+
39
+ ### Precision:
40
+ - 3 decimal places (e.g., 15.755%, 8.500%)
41
+ - ~1 pixel accuracy on most screens
42
+ - Resolution-independent
43
+
44
+ ### Supported Actions:
45
+ - **Click:** `{action: "click", xPercent: 15.755, yPercent: 8.500}`
46
+ - **Fill:** `{action: "fill", xPercent: 30.000, yPercent: 25.000, value: "text"}`
47
+ - **Drag:** `{action: "drag", xPercent: 10.000, yPercent: 50.000, toXPercent: 60.000, toYPercent: 50.000}`
48
+ - **Hover, RightClick, DoubleClick, Scroll**
49
+
50
+ ### Auto-Activation:
51
+ - Triggers after 3 consecutive selector failures
52
+ - Limited to 2 coordinate attempts
53
+ - Then gives up (stuck)
54
+
55
+ ---
56
+
57
+ ## 3. โšก Optimized Iteration Budget
58
+
59
+ **Maximum 5 iterations per step** (down from 8)
60
+
61
+ ```
62
+ Iterations 1-3: Playwright selectors (3 attempts)
63
+ with note-to-self between each
64
+
65
+ Iterations 4-5: Coordinates (2 attempts max)
66
+ If both fail โ†’ stuck
67
+ ```
68
+
69
+ **Why:** Coordinates either work or don't - no point retrying 5+ times.
70
+
71
+ ---
72
+
73
+ ## 4. ๐Ÿ• Smart Timeout Handling (Earlier Fix)
74
+
75
+ **Navigation operations now have appropriate timeouts:**
76
+ - `waitForLoadState()`: 30 seconds (was 5s)
77
+ - `goto()`: 30 seconds
78
+ - Element operations: 5 seconds (unchanged)
79
+
80
+ **Detects automatically:** Code scans command for navigation keywords.
81
+
82
+ ---
83
+
84
+ ## How Notes Work Across Steps
85
+
86
+ ### Example Scenario:
87
+
88
+ ```
89
+ Step 1: Login
90
+ Iteration 1: Fill username โ†’ Success
91
+ Iteration 2: Fill password โ†’ Success
92
+ Iteration 3: Click login โ†’ Success
93
+ Agent note: "Login redirects to dashboard. Cookie modal appears after 2s."
94
+
95
+ Step 2: Navigate to Settings
96
+ Reads note from Step 1: "Cookie modal appears after 2s"
97
+ Iteration 1:
98
+ - Wait 2s
99
+ - Dismiss modal
100
+ - Click Settings
101
+ โ†’ Success in 1 iteration! (note prevented wasted attempts)
102
+ ```
103
+
104
+ **Benefit:** Agent builds up knowledge about the application and uses it in future steps.
105
+
106
+ ---
107
+
108
+ ## Comparison: Before vs After
109
+
110
+ | Aspect | Before (v0.0.32) | After (v0.0.33) |
111
+ |--------|------------------|------------------|
112
+ | Iteration memory | None | Note to self (cross-step) |
113
+ | Selector fails | Give up or loop | Coordinate fallback |
114
+ | Max iterations | 8 per step | 5 per step |
115
+ | Timeout handling | 5s for all | 30s for navigation |
116
+ | Coordinate support | None | Full (click, fill, drag, etc.) |
117
+ | Average iterations | ~4 per step | ~2.5 per step (estimated) |
118
+
119
+ ---
120
+
121
+ ## Testing Recommendations
122
+
123
+ ### Test 1: Note Continuity
124
+ Create a scenario with repeated patterns:
125
+ ```
126
+ - Login
127
+ - Go to page A โ†’ encounter modal
128
+ - Go to page B โ†’ should handle modal proactively
129
+ ```
130
+
131
+ **Expected:** Step 2 learns from Step 1's note.
132
+
133
+ ### Test 2: Coordinate Fallback
134
+ Run PeopleHR scenario:
135
+ ```
136
+ - Click hamburger menu (SVG icon)
137
+ ```
138
+
139
+ **Expected:**
140
+ - Iterations 1-3: Try selectors (may fail)
141
+ - Iteration 4: Coordinates โ†’ succeeds
142
+ - Generated script: `await page.mouse.click(x, y);`
143
+
144
+ ### Test 3: Timeout Fix
145
+ Any scenario with:
146
+ ```
147
+ - await page.waitForLoadState('networkidle');
148
+ ```
149
+
150
+ **Expected:** No more 5s timeout errors.
151
+
152
+ ---
153
+
154
+ ## Migration
155
+
156
+ **No code changes needed!** Existing code works as-is with improvements.
157
+
158
+ **New response fields** (optional):
159
+ - `noteToFutureSelf`: string (agent can optionally include)
160
+ - `coordinateAction`: object (only when coordinate mode active)
161
+
162
+ ---
163
+
164
+ ## What's Next: Phase 2
165
+
166
+ Phase 2 will add numbered element system for even better reliability:
167
+ - Iteration 1: Playwright selector (1 attempt)
168
+ - Iterations 2-3: Index commands CLICK[3] (2 attempts)
169
+ - Iterations 4-5: Coordinates (2 attempts)
170
+
171
+ **Target:** ~1.5 average iterations per step
172
+
173
+ ---
174
+
175
+ ## Status
176
+
177
+ โœ… **Built and Ready**
178
+ ๐Ÿ“ฆ **Version:** v0.0.33
179
+ ๐Ÿงช **Status:** Ready for testing
180
+ ๐Ÿ“Š **Expected Impact:** 30-40% reduction in iterations
181
+
182
+ **Test now to validate improvements before Phase 2!**
183
+