testchimp-runner-core 0.0.32 → 0.0.34

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (71) hide show
  1. package/dist/llm-facade.d.ts.map +1 -1
  2. package/dist/llm-facade.js +7 -7
  3. package/dist/llm-facade.js.map +1 -1
  4. package/dist/llm-provider.d.ts +9 -0
  5. package/dist/llm-provider.d.ts.map +1 -1
  6. package/dist/model-constants.d.ts +16 -5
  7. package/dist/model-constants.d.ts.map +1 -1
  8. package/dist/model-constants.js +17 -6
  9. package/dist/model-constants.js.map +1 -1
  10. package/dist/orchestrator/index.d.ts +1 -1
  11. package/dist/orchestrator/index.d.ts.map +1 -1
  12. package/dist/orchestrator/index.js +3 -2
  13. package/dist/orchestrator/index.js.map +1 -1
  14. package/dist/orchestrator/orchestrator-agent.d.ts +0 -8
  15. package/dist/orchestrator/orchestrator-agent.d.ts.map +1 -1
  16. package/dist/orchestrator/orchestrator-agent.js +206 -405
  17. package/dist/orchestrator/orchestrator-agent.js.map +1 -1
  18. package/dist/orchestrator/orchestrator-prompts.d.ts +20 -0
  19. package/dist/orchestrator/orchestrator-prompts.d.ts.map +1 -0
  20. package/dist/orchestrator/orchestrator-prompts.js +455 -0
  21. package/dist/orchestrator/orchestrator-prompts.js.map +1 -0
  22. package/dist/orchestrator/tools/index.d.ts +2 -1
  23. package/dist/orchestrator/tools/index.d.ts.map +1 -1
  24. package/dist/orchestrator/tools/index.js +4 -2
  25. package/dist/orchestrator/tools/index.js.map +1 -1
  26. package/dist/orchestrator/tools/verify-action-result.d.ts +17 -0
  27. package/dist/orchestrator/tools/verify-action-result.d.ts.map +1 -0
  28. package/dist/orchestrator/tools/verify-action-result.js +140 -0
  29. package/dist/orchestrator/tools/verify-action-result.js.map +1 -0
  30. package/dist/orchestrator/types.d.ts +26 -0
  31. package/dist/orchestrator/types.d.ts.map +1 -1
  32. package/dist/orchestrator/types.js.map +1 -1
  33. package/dist/prompts.d.ts.map +1 -1
  34. package/dist/prompts.js +87 -37
  35. package/dist/prompts.js.map +1 -1
  36. package/dist/scenario-worker-class.d.ts.map +1 -1
  37. package/dist/scenario-worker-class.js +4 -1
  38. package/dist/scenario-worker-class.js.map +1 -1
  39. package/dist/utils/coordinate-converter.d.ts +32 -0
  40. package/dist/utils/coordinate-converter.d.ts.map +1 -0
  41. package/dist/utils/coordinate-converter.js +130 -0
  42. package/dist/utils/coordinate-converter.js.map +1 -0
  43. package/package.json +1 -1
  44. package/plandocs/BEFORE_AFTER_VERIFICATION.md +148 -0
  45. package/plandocs/COORDINATE_MODE_DIAGNOSIS.md +144 -0
  46. package/plandocs/IMPLEMENTATION_STATUS.md +108 -0
  47. package/plandocs/PHASE_1_COMPLETE.md +165 -0
  48. package/plandocs/PHASE_1_SUMMARY.md +184 -0
  49. package/plandocs/PROMPT_OPTIMIZATION_ANALYSIS.md +120 -0
  50. package/plandocs/PROMPT_SANITY_CHECK.md +120 -0
  51. package/plandocs/SESSION_SUMMARY_v0.0.33.md +151 -0
  52. package/plandocs/TROUBLESHOOTING_SESSION.md +72 -0
  53. package/plandocs/VISUAL_AGENT_EVOLUTION_PLAN.md +396 -0
  54. package/plandocs/WHATS_NEW_v0.0.33.md +183 -0
  55. package/src/llm-facade.ts +8 -8
  56. package/src/llm-provider.ts +11 -1
  57. package/src/model-constants.ts +17 -5
  58. package/src/orchestrator/index.ts +3 -2
  59. package/src/orchestrator/orchestrator-agent.ts +249 -424
  60. package/src/orchestrator/orchestrator-agent.ts.backup +1386 -0
  61. package/src/orchestrator/orchestrator-prompts.ts +474 -0
  62. package/src/orchestrator/tools/index.ts +2 -1
  63. package/src/orchestrator/tools/verify-action-result.ts +159 -0
  64. package/src/orchestrator/types.ts +48 -0
  65. package/src/prompts.ts +87 -37
  66. package/src/scenario-worker-class.ts +7 -2
  67. package/src/utils/coordinate-converter.ts +162 -0
  68. package/testchimp-runner-core-0.0.33.tgz +0 -0
  69. /package/{CREDIT_CALLBACK_ARCHITECTURE.md → plandocs/CREDIT_CALLBACK_ARCHITECTURE.md} +0 -0
  70. /package/{INTEGRATION_COMPLETE.md → plandocs/INTEGRATION_COMPLETE.md} +0 -0
  71. /package/{VISION_DIAGNOSTICS_IMPROVEMENTS.md → plandocs/VISION_DIAGNOSTICS_IMPROVEMENTS.md} +0 -0
@@ -0,0 +1,151 @@
1
+ # Runner-Core v0.0.33 - Session Summary
2
+
3
+ ## Date: October 15, 2025
4
+
5
+ ## Major Accomplishments:
6
+
7
+ ### 1. ✅ **Coordinate Fallback System** (Phase 1 Complete)
8
+ - Percentage-based coordinates (0-100%, 3 decimal precision)
9
+ - Activates after 3 selector failures
10
+ - 2 coordinate attempts before giving up
11
+ - Resolution-independent positioning
12
+
13
+ ### 2. ✅ **Note to Future Self** (Tactical Memory)
14
+ - Free-form notes persist across iterations AND steps
15
+ - Enables strategic planning across agent decisions
16
+ - Helps maintain context: "Tried X, will try Y next"
17
+
18
+ ### 3. ✅ **Visual Verification Tool** (NEW)
19
+ - `verify_action_result` - Before/after screenshot comparison
20
+ - Agent-callable (decides when to use)
21
+ - JPEG 60% quality (85-90% smaller than PNG)
22
+ - Multi-image LLM interface support
23
+
24
+ ### 4. ✅ **Critical Bug Fixes**
25
+ - **Coordinate mode never activated**: Changed forced stuck from >= 3 to >= 5 failures
26
+ - **Missing required fields**: Made parser flexible (accepts reasoning OR statusReasoning)
27
+ - **Navigation timeouts**: Added 30s timeout guidance for page.goto()
28
+ - **Strict mode violations**: Added scoping guidance (locator('#parent').getByText())
29
+
30
+ ### 5. ✅ **Prompt Optimizations**
31
+ - **59% reduction**: 17,573 chars → 7,287 chars in system prompt
32
+ - **Cache-optimized**: Static content first, dynamic last
33
+ - **Cost savings**: ~40% overall with model tiering
34
+ - **Focused on cognition**: Removed bloat, kept decision-making guidance
35
+
36
+ ### 6. ✅ **Model Optimization**
37
+ - **gpt-5-mini**: Complex tasks (4 operations)
38
+ - Command generation
39
+ - Goal completion checks
40
+ - Repair suggestions
41
+ - Agent orchestration
42
+ - **gpt-4o-mini**: Simple tasks (7 operations)
43
+ - Scenario breakdown
44
+ - Screenshot need assessment
45
+ - Repair confidence
46
+ - Test name generation
47
+ - Hashtag generation
48
+ - Script parsing
49
+ - Final script merging
50
+ - **Est. 25-30% cost reduction**
51
+
52
+ ### 7. ✅ **Code Cleanup**
53
+ - Removed V1 SmartTestRunnerCore (V2 is stable)
54
+ - Removed backup files (.bak, .tmp)
55
+ - Consolidated types into V2
56
+ - Removed PeopleHR-specific examples from prompts
57
+
58
+ ### 8. ✅ **Enhanced Logging**
59
+ - Prompt length metrics (chars + estimated tokens)
60
+ - Full LLM response on parsing errors
61
+ - Field presence diagnostics
62
+ - Retry logging for 500 errors
63
+
64
+ ### 9. ✅ **Retry Logic**
65
+ - Automatic retry for OpenAI 500 errors
66
+ - Exponential backoff (1s, 2s, 4s)
67
+ - Up to 3 attempts before failing
68
+
69
+ ### 10. ✅ **Headed Mode for Local Testing**
70
+ - All browser instances use headed: false → headed: false for local dev
71
+ - Visual debugging enabled
72
+
73
+ ## Files Modified:
74
+
75
+ ### Runner-Core:
76
+ 1. `src/orchestrator/orchestrator-agent.ts` - Main agent logic
77
+ 2. `src/orchestrator/types.ts` - NoteToFutureSelf, CoordinateAction
78
+ 3. `src/utils/coordinate-converter.ts` - NEW - Coordinate to Playwright conversion
79
+ 4. `src/orchestrator/tools/verify-action-result.ts` - NEW - Visual verification tool
80
+ 5. `src/llm-provider.ts` - Added LabeledImage, multi-image support
81
+ 6. `src/llm-facade.ts` - Model optimization
82
+ 7. `src/model-constants.ts` - Added DEFAULT_SIMPLER_MODEL
83
+ 8. `src/scenario-worker-class.ts` - Tool registration
84
+ 9. `src/orchestrator/index.ts` - Exports
85
+ 10. `src/orchestrator/tools/index.ts` - Tool exports
86
+
87
+ ### Scriptservice:
88
+ 1. `providers/scriptservice-llm-provider.ts` - Multi-image handling, retry logic
89
+ 2. `smart-test-runner-core-v2.ts` - Type definitions, V1 removal
90
+ 3. `smart-test-execution-handler.ts` - V1 removal
91
+ 4. `workers/test-based-explorer.ts` - V1 removal
92
+ 5. `script-generation-handlers.ts` - Headed mode
93
+ 6. `script-generation/script-generation-service.ts` - Headed mode
94
+ 7. `smart-test-execution-handler.ts` - Headed mode
95
+
96
+ ### Documentation:
97
+ 1. `WHATS_NEW_v0.0.33.md`
98
+ 2. `PHASE_1_COMPLETE.md`
99
+ 3. `PHASE_1_SUMMARY.md`
100
+ 4. `IMPLEMENTATION_STATUS.md`
101
+ 5. `VISUAL_AGENT_EVOLUTION_PLAN.md`
102
+ 6. `PROMPT_SANITY_CHECK.md`
103
+ 7. `PROMPT_OPTIMIZATION_ANALYSIS.md`
104
+ 8. `COORDINATE_MODE_DIAGNOSIS.md`
105
+ 9. `BEFORE_AFTER_VERIFICATION.md`
106
+ 10. `TROUBLESHOOTING_SESSION.md`
107
+
108
+ ## Live Test Status:
109
+
110
+ **Job**: `71b88c60-52f5-4343-aef8-c44ebb07f3e9`
111
+ **Status**: Running (check browser + logs)
112
+ **Watch For**:
113
+ - Step 5 (Employee Information) - Previously problematic
114
+ - Coordinate mode activation
115
+ - verify_action_result tool usage
116
+ - Overall completion
117
+
118
+ ## Key Metrics:
119
+
120
+ **Cost Optimization:**
121
+ - Prompt size: 59% reduction
122
+ - Model tiering: 7/11 tasks on cheaper model
123
+ - JPEG compression: 85-90% smaller screenshots
124
+ - **Total savings: ~40% cost reduction**
125
+
126
+ ## Next Steps After Test Completes:
127
+
128
+ 1. Check if Step 5 completes successfully
129
+ 2. Verify coordinate mode activated if needed
130
+ 3. Check if verify_action_result tool was used
131
+ 4. Analyze any remaining failures
132
+ 5. Iterate on prompts/logic based on results
133
+
134
+ ## Known Issues to Monitor:
135
+
136
+ 1. **Step 5 False Positive**: Clicking menu item vs navigating to page
137
+ 2. **Coordinate Loop**: Agent not knowing when coordinate clicks succeed
138
+ 3. **Vision verification usage**: Will agent call it proactively?
139
+
140
+ ## Success Criteria:
141
+
142
+ ✅ All 7 steps complete
143
+ ✅ Coordinate fallback used when selectors fail
144
+ ✅ Visual verification validates goal achievement
145
+ ✅ No infinite loops or stuck states
146
+ ✅ Generated script is accurate
147
+
148
+ ---
149
+
150
+ **Check your browser window and /tmp/scriptservice-test.log for live execution!**
151
+
@@ -0,0 +1,72 @@
1
+ # Troubleshooting Session: All Modules Icon Click Failure
2
+
3
+ ## Objective:
4
+ Understand why the orchestrator agent gets stuck on "Click on the all Modules menu item (top menu icon)" while manual Playwright MCP navigation succeeded.
5
+
6
+ ## What I Need to See:
7
+
8
+ ### 1. Full Agent Logs for the Failing Step
9
+ Please provide the complete logs showing:
10
+ - What iteration attempts were made (iteration 1, 2, 3...)
11
+ - What selectors the agent tried each time
12
+ - What errors it encountered
13
+ - What the DOM snapshot showed
14
+ - Whether it took screenshots
15
+ - What notes it left to future self
16
+
17
+ ### 2. The DOM Context It Saw
18
+ - Interactive elements list
19
+ - ARIA tree snapshot
20
+ - Whether the hamburger icon was visible in the list
21
+
22
+ ## What Worked (My Manual MCP Session):
23
+
24
+ From earlier successful navigation:
25
+ ```
26
+ ✅ Step 1: Clicked hamburger menu
27
+ Selector: #sidebar-toggle > span > svg
28
+
29
+ ✅ Step 2: Clicked "Core HR"
30
+ Selector: getByText('Core HR')
31
+
32
+ ✅ Step 3: Clicked "Employee Information"
33
+ Selector: getByText('Employee Information')
34
+ ```
35
+
36
+ ## Hypothesis of Why Agent Fails:
37
+
38
+ ### Possible Issue 1: Wrong Selector Strategy
39
+ - Agent might be trying: `getByText('All Modules')` (strict mode violation)
40
+ - Or: `#MenuToggle` (wrong ID)
41
+ - Or: `#sidebar-toggle-menu` (doesn't exist)
42
+ - Instead of: `#sidebar-toggle > span > svg` (actual selector)
43
+
44
+ ### Possible Issue 2: Missing Icon Detection
45
+ - Hamburger icons are often SVG elements without accessible text
46
+ - Agent might not recognize this pattern
47
+ - Prompt doesn't explicitly guide on icon/SVG selector strategy
48
+
49
+ ### Possible Issue 3: DOM List Incomplete
50
+ - Interactive elements might not include the SVG icon
51
+ - If icon isn't in the list, agent won't know it exists
52
+ - Need to check if `getEnhancedPageInfo` captures SVG icons
53
+
54
+ ### Possible Issue 4: Ambiguous Text
55
+ - "All Modules" might appear in multiple places (menu button + modal title)
56
+ - Agent tries `getByText('All Modules')` → strict mode violation
57
+ - Should scope to parent: `locator('#sidebar-toggle').getByText('All Modules')`
58
+
59
+ ## Next Steps:
60
+
61
+ 1. **Get full logs** from your failing run
62
+ 2. **Compare** what agent saw vs what I saw
63
+ 3. **Identify** the gap (prompt, DOM extraction, or selector logic)
64
+ 4. **Plan fixes**:
65
+ - Prompt improvements (icon/SVG guidance)
66
+ - DOM extraction improvements (ensure icons are captured)
67
+ - Selector strategy improvements (parent scoping for icons)
68
+ - Example-based learning (hamburger menu pattern)
69
+
70
+ ## Waiting For:
71
+ Please paste the full logs from the failing step showing all iteration attempts and what the agent tried.
72
+
@@ -0,0 +1,396 @@
1
+ # Runner-Core Visual Agent Evolution - Complete Implementation Plan
2
+
3
+ ## Overview
4
+
5
+ Two-phase pragmatic evolution without major architecture overhaul:
6
+ - **Phase 1 (Week 1-2):** Percentage coordinates + Free-form notes
7
+ - **Phase 2 (Week 3-5):** Numbered element system with three-tier fallback
8
+
9
+ ---
10
+
11
+ ## PHASE 1: Tactical Improvements
12
+
13
+ ### 1A: Free-Form "Note to Future Self"
14
+
15
+ **Why:** Agent needs tactical memory between iterations of the SAME step.
16
+
17
+ **Type:**
18
+ ```typescript
19
+ interface NoteToFutureSelf {
20
+ fromIteration: number;
21
+ content: string; // FREE-FORM - agent writes whatever it wants
22
+ }
23
+ ```
24
+
25
+ **Examples:**
26
+ - "Tried #sidebar-toggle, failed. Will try SVG child next."
27
+ - "Plan: Hover over menu first to reveal dropdown, then click Settings."
28
+ - "Cookie banner blocking. Next: dismiss it, then retry main action."
29
+
30
+ **vs Current Learnings:**
31
+ - Learnings = App-wide patterns ("App uses getByRole")
32
+ - Note to self = Iteration-specific tactics ("Just tried X, will try Y next")
33
+
34
+ **Keep BOTH!**
35
+
36
+ ### 1B: Percentage-Based Coordinates
37
+
38
+ **LLM outputs percentages:**
39
+ ```json
40
+ {
41
+ "coordinateAction": {
42
+ "action": "click|fill|drag|hover|scroll",
43
+ "xPercent": 15.75, // 2 decimal precision for accuracy
44
+ "yPercent": 8.50,
45
+
46
+ // For drag:
47
+ "toXPercent": 45.25,
48
+ "toYPercent": 8.50,
49
+
50
+ // For fill:
51
+ "value": "alice@example.com"
52
+ }
53
+ }
54
+ ```
55
+
56
+ **Code converts to pixels:**
57
+ ```typescript
58
+ const viewport = await page.evaluate(() => ({width: window.innerWidth, height: window.innerHeight}));
59
+ const x = Math.round((xPercent / 100) * viewport.width);
60
+ const y = Math.round((yPercent / 100) * viewport.height);
61
+ ```
62
+
63
+ ### 1C: Code-Controlled Fallback
64
+
65
+ **Tier 1 (iteration 1-3):** Normal Playwright selectors
66
+ **Tier 2 (iteration 4+):** Percentage coordinates (auto-triggered after 3 failures)
67
+
68
+ ```typescript
69
+ // In callAgent():
70
+ if (consecutiveFailures >= 3) { // Phase 1: Only 2 tiers
71
+ // Auto-use coordinate-specific system prompt
72
+ // LLM outputs percentages
73
+ // Phase 2 will add tier 2 (indexed elements) between selectors and coordinates
74
+ }
75
+ ```
76
+
77
+ **Files:**
78
+ - `utils/coordinate-converter.ts` (NEW)
79
+ - `orchestrator/orchestrator-agent.ts` (UPDATE)
80
+ - `orchestrator/types.ts` (UPDATE - add CoordinateAction, NoteToFutureSelf)
81
+
82
+ ---
83
+
84
+ ## PHASE 2: Numbered Element System
85
+
86
+ ### Architecture
87
+
88
+ ```
89
+ Three-Tier Fallback:
90
+
91
+ Tier 1 (iteration 1 ONLY): Playwright Selector Mode - ONE SHOT
92
+ ├─ Agent generates: await page.getByRole('button', {name: 'Login'}).click()
93
+ ├─ Direct execution
94
+ └─ 70% of tasks finish here (simple/medium complexity)
95
+
96
+ Tier 2 (iterations 2-3): Index Command Mode - TWO ATTEMPTS
97
+ ├─ Inject numbered markers [1], [2], [3] → screenshot
98
+ ├─ Agent outputs: CLICK[3], FILL[5, "alice@example.com"]
99
+ ├─ Execution: Use data-testchimp-el="[3]" (reliable targeting)
100
+ ├─ Script output: Translate to NATIVE selector (getByRole, #id, etc.)
101
+ └─ 25% of tasks finish here (complex UIs, icons, shadow DOM)
102
+
103
+ Tier 3 (iterations 4+): Percentage Coordinate Mode - LAST RESORT
104
+ ├─ Agent outputs: {xPercent: 15.755, yPercent: 8.500}
105
+ ├─ CoordinateConverter: % → pixels
106
+ ├─ Execute: mouse.click(x, y)
107
+ └─ <5% of tasks need this (extreme edge cases)
108
+ ```
109
+
110
+ ### 2.1: Reusable Utility: ElementDetector
111
+
112
+ **File:** `runner-core/src/utils/element-detector.ts`
113
+
114
+ **Purpose:** Detect ALL interactive elements with z-index and occlusion awareness.
115
+
116
+ **Key Features:**
117
+ - Comprehensive element queries (buttons, links, inputs, SVGs, clickable divs)
118
+ - Z-index calculation via `getComputedStyle()`
119
+ - Occlusion detection via `elementFromPoint()` at center
120
+ - Context tags: `[header|nav|sidebar|main|modal]`
121
+ - Spatial tags: `[top|bottom|left|right|center]`
122
+ - Center position as percentage (2 decimal precision)
123
+
124
+ **Output:**
125
+ ```typescript
126
+ interface DetectedElement {
127
+ index: number; // [1], [2], [3]
128
+ tag: string;
129
+ text: string;
130
+ role: string;
131
+ ariaLabel: string;
132
+ bbox: {x, y, width, height};
133
+ centerPercent: {x: 15.75, y: 8.50}; // As percentage
134
+ context: string[]; // ['header', 'nav', 'top']
135
+ zIndex: number;
136
+ isVisible: boolean; // Not occluded by higher z-index
137
+ selectors: {
138
+ dataAttribute: string; // [data-testchimp-el="[3]"]
139
+ semantic: string[]; // [getByRole(...), getByLabel(...)]
140
+ cssId: string | null;
141
+ cssClass: string | null;
142
+ };
143
+ }
144
+ ```
145
+
146
+ **Implementation Highlights:**
147
+ - Queries: `button`, `a[href]`, `input`, `svg`, `[onclick]`, `[role=...]`, `[data-testid]`, clickable divs/spans
148
+ - Z-index check: `elementFromPoint(centerX, centerY)` must return element or child
149
+ - Filter: Only return `isVisible: true` elements
150
+
151
+ ### 2.2: Reusable Utility: SelectorResolver
152
+
153
+ **File:** `runner-core/src/utils/selector-resolver.ts`
154
+
155
+ **Purpose:** Given element index, return most reliable Playwright selector FOR GENERATED SCRIPTS.
156
+
157
+ **CRITICAL DISTINCTION:**
158
+
159
+ A. **During Execution (Internal):**
160
+ - Agent can click using `data-testchimp-el="[N]"` (we inject it temporarily)
161
+ - This ensures agent clicks the exact right element
162
+
163
+ B. **For Generated Script (Output):**
164
+ - Must use NATIVE selectors that work without our attributes
165
+ - Script will run on real application without data-testchimp-el
166
+
167
+ **Selector Resolution for Script Output (in order):**
168
+ 1. Semantic selectors - `getByRole()`, `getByLabel()` (BEST - maintainable)
169
+ 2. CSS ID - `#element-id` (GOOD - stable)
170
+ 3. CSS class - `.button-primary` (scoped to context if ambiguous)
171
+ 4. Contextual selector - `header .menu-toggle svg` (LAST RESORT)
172
+
173
+ **For each selector, validate:**
174
+ - Element exists on page
175
+ - Not covered by higher z-index (using elementFromPoint)
176
+ - Position matches expected (±5% tolerance)
177
+
178
+ **Method:**
179
+ ```typescript
180
+ static async resolveIndexToSelector(
181
+ index: number,
182
+ elements: DetectedElement[],
183
+ page: Page
184
+ ): Promise<{selector: string, strategy: string}>
185
+ ```
186
+
187
+ **Validation:**
188
+ ```typescript
189
+ private static async validateInteractable(
190
+ page: Page,
191
+ selector: string,
192
+ expectedCenterPercent: {x, y}
193
+ ): Promise<boolean> {
194
+ // 1. Element exists
195
+ // 2. Non-zero dimensions
196
+ // 3. elementFromPoint(center) === element (z-index check)
197
+ // 4. Position within ±5% tolerance
198
+ }
199
+ ```
200
+
201
+ ### 2.3: Reusable Utility: VisualMarkerInjector
202
+
203
+ **File:** `runner-core/src/utils/visual-marker-injector.ts`
204
+
205
+ **Purpose:** Inject visual numbered labels on page.
206
+
207
+ **Methods:**
208
+ ```typescript
209
+ static async injectNumberedMarkers(page): Promise<DetectedElement[]>
210
+ static async removeMarkers(page): Promise<void>
211
+ static async captureMarkedScreenshot(page, elements): Promise<string>
212
+ ```
213
+
214
+ **Visual Markers:**
215
+ - Red gradient background `[1]`, `[2]`, `[3]`
216
+ - Positioned at top-left of element
217
+ - z-index: 999999 (always visible)
218
+ - Inject `data-testchimp-el` attribute on actual element (TEMPORARY - for agent execution only)
219
+ * Used internally to ensure agent clicks correct element
220
+ * Removed after step completion
221
+ * NEVER appears in generated script output
222
+
223
+ ### 2.4: Reusable Utility: IndexCommandTranslator
224
+
225
+ **File:** `runner-core/src/utils/index-command-translator.ts`
226
+
227
+ **Purpose:** Translate index commands to Playwright commands with NATIVE selectors (for script generation).
228
+
229
+ **Input:**
230
+ ```typescript
231
+ { action: "CLICK", index: 3 }
232
+ { action: "FILL", index: 5, value: "alice@example.com" }
233
+ ```
234
+
235
+ **Output (MUST use native selectors):**
236
+ ```typescript
237
+ "await page.getByRole('button', {name: 'Menu'}).click();"
238
+ "await page.locator('#username').fill('alice@example.com');"
239
+ // OR
240
+ "await page.locator('#sidebar-toggle svg').click();"
241
+ ```
242
+
243
+ **NOT this (won't work in generated script):**
244
+ ```typescript
245
+ ❌ "await page.locator('[data-testchimp-el=\"[3]\"]').click();"
246
+ ❌ "await page.locator('[data-testchimp-el=\"[5]\"]').fill('alice@example.com');"
247
+ ```
248
+
249
+ **Process:**
250
+ 1. **During execution:** Click element using `data-testchimp-el="[index]"` (reliable)
251
+ 2. **For script output:** Use SelectorResolver to get NATIVE selector (semantic/id/class)
252
+ 3. Generate Playwright command with native selector
253
+ 4. Return command string that works on real application
254
+
255
+ **Critical Distinction:**
256
+ - `data-testchimp-el` = Internal execution helper (temporary)
257
+ - Script output = Native selectors (permanent, works standalone)
258
+
259
+ ### 2.5: Integration - Three-Tier System (Optimized Escalation)
260
+
261
+ **File:** `orchestrator/orchestrator-agent.ts`
262
+
263
+ **Optimized Strategy:** Escalate quickly to avoid wasting time on difficult tasks
264
+
265
+ **Mode Determination:**
266
+ ```typescript
267
+ let tier: 1 | 2 | 3;
268
+
269
+ // Tier 1: Try normal selectors ONCE (iteration 1)
270
+ // Tier 2: Use indexed elements TWICE (iterations 2-3)
271
+ // Tier 3: Use percentage coords (iterations 4+)
272
+
273
+ if (iteration >= 4) {
274
+ tier = 3; // Coordinate mode
275
+ } else if (iteration >= 2) {
276
+ tier = 2; // Index command mode
277
+ } else {
278
+ tier = 1; // Normal Playwright selector mode
279
+ }
280
+ ```
281
+
282
+ **Rationale:**
283
+ - Simple tasks: Succeed in Tier 1 (iteration 1) - fast!
284
+ - Medium tasks: Tier 2 gives 2 attempts with reliable index system
285
+ - Hard tasks: Tier 3 coordinates as absolute fallback
286
+ - No wasted iterations on difficult element detection
287
+
288
+ **Tier Preparation:**
289
+ ```typescript
290
+ if (tier === 2) await this.prepareIndexMode(context, page);
291
+ if (tier === 3) await this.prepareCoordinateMode(context, page);
292
+ ```
293
+
294
+ **System Prompt Selection:**
295
+ ```typescript
296
+ const systemPrompt =
297
+ tier === 3 ? this.buildCoordinateSystemPrompt() :
298
+ tier === 2 ? this.buildIndexSystemPrompt() :
299
+ this.buildSystemPrompt();
300
+ ```
301
+
302
+ **Execution Flow:**
303
+ ```typescript
304
+ // Tier 1 (iteration 1):
305
+ if (decision.commands) {
306
+ // Execute normal Playwright command
307
+ // If fails → iteration++, move to Tier 2
308
+ }
309
+
310
+ // Tier 2 (iterations 2-3):
311
+ if (decision.indexCommand) {
312
+ // Step A: Click using data-testchimp-el="[N]" (execution)
313
+ await page.locator('[data-testchimp-el="[3]"]').click();
314
+
315
+ // Step B: Resolve to native selector (script generation)
316
+ const nativeSelector = await SelectorResolver.resolve(3, elements);
317
+ // → Returns: "getByRole('button', {name: 'Menu'})"
318
+
319
+ // Step C: Add to generated script
320
+ commandsExecuted.push(`await page.${nativeSelector}.click();`);
321
+
322
+ // If fails after 2 attempts → iteration++, move to Tier 3
323
+ }
324
+
325
+ // Tier 3 (iterations 4+):
326
+ if (decision.coordinateAction) {
327
+ // Convert % to pixels and execute
328
+ // Add coordinate commands to script (acceptable for edge cases)
329
+ }
330
+ ```
331
+
332
+ ---
333
+
334
+ ## Utilities Summary
335
+
336
+ All utilities are **stateless and reusable**:
337
+
338
+ | Utility | Purpose | Reusable For |
339
+ |---------|---------|--------------|
340
+ | ElementDetector | Find interactive elements | Accessibility audits, page analysis |
341
+ | SelectorResolver | Index → selector with validation | Any numbered system |
342
+ | VisualMarkerInjector | Add visual labels | Manual testing, debugging |
343
+ | IndexCommandTranslator | Index command → Playwright | Any index-based automation |
344
+ | CoordinateConverter | Percentage → pixels | Any coordinate system |
345
+
346
+ ---
347
+
348
+ ## Implementation Timeline
349
+
350
+ ### Week 1: Phase 1 Core
351
+ - [ ] NoteToFutureSelf type and tracking
352
+ - [ ] CoordinateAction with percentages
353
+ - [ ] CoordinateConverter utility
354
+ - [ ] Coordinate mode switching (tier 3)
355
+
356
+ ### Week 2: Phase 1 Testing
357
+ - [ ] Test note-to-self on 10 scenarios
358
+ - [ ] Test percentage coordinates at multiple viewport sizes
359
+ - [ ] Verify all coordinate actions (click, fill, drag, scroll, hover)
360
+
361
+ ### Week 3: Phase 2 Utilities
362
+ - [ ] ElementDetector with z-index awareness
363
+ - [ ] SelectorResolver with occlusion validation
364
+ - [ ] Test utilities standalone on complex pages
365
+
366
+ ### Week 4: Phase 2 Integration
367
+ - [ ] VisualMarkerInjector
368
+ - [ ] IndexCommandTranslator (TWO-STAGE: execution via data-attr, script via native selector)
369
+ - [ ] Index mode (tier 2) integration with iteration-based switching
370
+ - [ ] Optimized escalation: iteration 1 → tier 1, iteration 2-3 → tier 2, iteration 4+ → tier 3
371
+ - [ ] Test PeopleHR with tier 2 (should succeed in iteration 2-3)
372
+
373
+ ### Week 5: Phase 2 Testing
374
+ - [ ] Three-tier end-to-end testing
375
+ - [ ] Measure tier distribution (target: 70/25/5)
376
+ - [ ] A/B test vs current implementation
377
+ - [ ] Performance optimization
378
+
379
+ ## Success Metrics
380
+
381
+ **Phase 1:**
382
+ - 20-30% reduction in average iterations per step
383
+ - Note-to-self prevents 40%+ of repeated selector failures
384
+ - Coordinates used in < 5% of scenarios
385
+
386
+ **Phase 2:**
387
+ - 70% scenarios complete in Tier 1 (iteration 1) - simple cases
388
+ - 25% scenarios use Tier 2 (iterations 2-3) - complex UIs with icons/shadows
389
+ - < 5% scenarios escalate to Tier 3 (iterations 4+) - impossible selector cases
390
+ - **PeopleHR hamburger menu:** Succeeds in Tier 2 iteration 2 with CLICK[N]
391
+ - **Average iterations per step:** Should decrease from ~4 to ~1.5
392
+
393
+ ## Total Effort: 4-5 weeks
394
+
395
+ **Ready to implement?**
396
+