testchimp-runner-core 0.0.33 → 0.0.35

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (152) hide show
  1. package/dist/execution-service.d.ts +1 -4
  2. package/dist/execution-service.d.ts.map +1 -1
  3. package/dist/execution-service.js +155 -468
  4. package/dist/execution-service.js.map +1 -1
  5. package/dist/index.d.ts +3 -1
  6. package/dist/index.d.ts.map +1 -1
  7. package/dist/index.js +11 -1
  8. package/dist/index.js.map +1 -1
  9. package/dist/llm-facade.d.ts.map +1 -1
  10. package/dist/llm-facade.js +7 -7
  11. package/dist/llm-facade.js.map +1 -1
  12. package/dist/llm-provider.d.ts +9 -0
  13. package/dist/llm-provider.d.ts.map +1 -1
  14. package/dist/model-constants.d.ts +16 -5
  15. package/dist/model-constants.d.ts.map +1 -1
  16. package/dist/model-constants.js +17 -6
  17. package/dist/model-constants.js.map +1 -1
  18. package/dist/orchestrator/decision-parser.d.ts +18 -0
  19. package/dist/orchestrator/decision-parser.d.ts.map +1 -0
  20. package/dist/orchestrator/decision-parser.js +127 -0
  21. package/dist/orchestrator/decision-parser.js.map +1 -0
  22. package/dist/orchestrator/index.d.ts +4 -2
  23. package/dist/orchestrator/index.d.ts.map +1 -1
  24. package/dist/orchestrator/index.js +15 -2
  25. package/dist/orchestrator/index.js.map +1 -1
  26. package/dist/orchestrator/orchestrator-agent.d.ts +17 -22
  27. package/dist/orchestrator/orchestrator-agent.d.ts.map +1 -1
  28. package/dist/orchestrator/orchestrator-agent.js +708 -577
  29. package/dist/orchestrator/orchestrator-agent.js.map +1 -1
  30. package/dist/orchestrator/orchestrator-prompts.d.ts +32 -0
  31. package/dist/orchestrator/orchestrator-prompts.d.ts.map +1 -0
  32. package/dist/orchestrator/orchestrator-prompts.js +737 -0
  33. package/dist/orchestrator/orchestrator-prompts.js.map +1 -0
  34. package/dist/orchestrator/page-som-handler.d.ts +106 -0
  35. package/dist/orchestrator/page-som-handler.d.ts.map +1 -0
  36. package/dist/orchestrator/page-som-handler.js +1353 -0
  37. package/dist/orchestrator/page-som-handler.js.map +1 -0
  38. package/dist/orchestrator/som-types.d.ts +149 -0
  39. package/dist/orchestrator/som-types.d.ts.map +1 -0
  40. package/dist/orchestrator/som-types.js +87 -0
  41. package/dist/orchestrator/som-types.js.map +1 -0
  42. package/dist/orchestrator/tool-registry.d.ts +2 -0
  43. package/dist/orchestrator/tool-registry.d.ts.map +1 -1
  44. package/dist/orchestrator/tool-registry.js.map +1 -1
  45. package/dist/orchestrator/tools/index.d.ts +5 -1
  46. package/dist/orchestrator/tools/index.d.ts.map +1 -1
  47. package/dist/orchestrator/tools/index.js +9 -2
  48. package/dist/orchestrator/tools/index.js.map +1 -1
  49. package/dist/orchestrator/tools/refresh-som-markers.d.ts +12 -0
  50. package/dist/orchestrator/tools/refresh-som-markers.d.ts.map +1 -0
  51. package/dist/orchestrator/tools/refresh-som-markers.js +64 -0
  52. package/dist/orchestrator/tools/refresh-som-markers.js.map +1 -0
  53. package/dist/orchestrator/tools/verify-action-result.d.ts +17 -0
  54. package/dist/orchestrator/tools/verify-action-result.d.ts.map +1 -0
  55. package/dist/orchestrator/tools/verify-action-result.js +140 -0
  56. package/dist/orchestrator/tools/verify-action-result.js.map +1 -0
  57. package/dist/orchestrator/tools/view-previous-screenshot.d.ts +15 -0
  58. package/dist/orchestrator/tools/view-previous-screenshot.d.ts.map +1 -0
  59. package/dist/orchestrator/tools/view-previous-screenshot.js +92 -0
  60. package/dist/orchestrator/tools/view-previous-screenshot.js.map +1 -0
  61. package/dist/orchestrator/types.d.ts +49 -1
  62. package/dist/orchestrator/types.d.ts.map +1 -1
  63. package/dist/orchestrator/types.js +11 -1
  64. package/dist/orchestrator/types.js.map +1 -1
  65. package/dist/prompts.d.ts.map +1 -1
  66. package/dist/prompts.js +40 -34
  67. package/dist/prompts.js.map +1 -1
  68. package/dist/scenario-service.d.ts +5 -0
  69. package/dist/scenario-service.d.ts.map +1 -1
  70. package/dist/scenario-service.js +17 -0
  71. package/dist/scenario-service.js.map +1 -1
  72. package/dist/scenario-worker-class.d.ts +4 -0
  73. package/dist/scenario-worker-class.d.ts.map +1 -1
  74. package/dist/scenario-worker-class.js +21 -3
  75. package/dist/scenario-worker-class.js.map +1 -1
  76. package/dist/testing/agent-tester.d.ts +35 -0
  77. package/dist/testing/agent-tester.d.ts.map +1 -0
  78. package/dist/testing/agent-tester.js +84 -0
  79. package/dist/testing/agent-tester.js.map +1 -0
  80. package/dist/testing/ref-translator-tester.d.ts +44 -0
  81. package/dist/testing/ref-translator-tester.d.ts.map +1 -0
  82. package/dist/testing/ref-translator-tester.js +104 -0
  83. package/dist/testing/ref-translator-tester.js.map +1 -0
  84. package/dist/utils/coordinate-converter.d.ts +32 -0
  85. package/dist/utils/coordinate-converter.d.ts.map +1 -0
  86. package/dist/utils/coordinate-converter.js +130 -0
  87. package/dist/utils/coordinate-converter.js.map +1 -0
  88. package/dist/utils/hierarchical-selector.d.ts +47 -0
  89. package/dist/utils/hierarchical-selector.d.ts.map +1 -0
  90. package/dist/utils/hierarchical-selector.js +212 -0
  91. package/dist/utils/hierarchical-selector.js.map +1 -0
  92. package/dist/utils/page-info-retry.d.ts +14 -0
  93. package/dist/utils/page-info-retry.d.ts.map +1 -0
  94. package/dist/utils/page-info-retry.js +60 -0
  95. package/dist/utils/page-info-retry.js.map +1 -0
  96. package/dist/utils/page-info-utils.d.ts +1 -0
  97. package/dist/utils/page-info-utils.d.ts.map +1 -1
  98. package/dist/utils/page-info-utils.js +46 -18
  99. package/dist/utils/page-info-utils.js.map +1 -1
  100. package/dist/utils/ref-attacher.d.ts +21 -0
  101. package/dist/utils/ref-attacher.d.ts.map +1 -0
  102. package/dist/utils/ref-attacher.js +149 -0
  103. package/dist/utils/ref-attacher.js.map +1 -0
  104. package/dist/utils/ref-translator.d.ts +49 -0
  105. package/dist/utils/ref-translator.d.ts.map +1 -0
  106. package/dist/utils/ref-translator.js +276 -0
  107. package/dist/utils/ref-translator.js.map +1 -0
  108. package/package.json +1 -1
  109. package/plandocs/BEFORE_AFTER_VERIFICATION.md +148 -0
  110. package/plandocs/COORDINATE_MODE_DIAGNOSIS.md +144 -0
  111. package/plandocs/IMPLEMENTATION_STATUS.md +108 -0
  112. package/plandocs/PHASE_1_COMPLETE.md +165 -0
  113. package/plandocs/PHASE_1_SUMMARY.md +184 -0
  114. package/plandocs/PROMPT_OPTIMIZATION_ANALYSIS.md +120 -0
  115. package/plandocs/PROMPT_SANITY_CHECK.md +120 -0
  116. package/plandocs/SESSION_SUMMARY_v0.0.33.md +151 -0
  117. package/plandocs/TROUBLESHOOTING_SESSION.md +72 -0
  118. package/plandocs/VISUAL_AGENT_EVOLUTION_PLAN.md +396 -0
  119. package/plandocs/WHATS_NEW_v0.0.33.md +183 -0
  120. package/plandocs/exploratory-mode-support-v2.plan.md +953 -0
  121. package/plandocs/exploratory-mode-support.plan.md +928 -0
  122. package/plandocs/journey-id-tracking-addendum.md +227 -0
  123. package/src/execution-service.ts +179 -596
  124. package/src/index.ts +10 -0
  125. package/src/llm-facade.ts +8 -8
  126. package/src/llm-provider.ts +11 -1
  127. package/src/model-constants.ts +17 -5
  128. package/src/orchestrator/decision-parser.ts +139 -0
  129. package/src/orchestrator/index.ts +27 -2
  130. package/src/orchestrator/orchestrator-agent.ts +868 -623
  131. package/src/orchestrator/orchestrator-prompts.ts +786 -0
  132. package/src/orchestrator/page-som-handler.ts +1565 -0
  133. package/src/orchestrator/som-types.ts +188 -0
  134. package/src/orchestrator/tool-registry.ts +2 -0
  135. package/src/orchestrator/tools/index.ts +5 -1
  136. package/src/orchestrator/tools/refresh-som-markers.ts +69 -0
  137. package/src/orchestrator/tools/verify-action-result.ts +159 -0
  138. package/src/orchestrator/tools/view-previous-screenshot.ts +103 -0
  139. package/src/orchestrator/types.ts +95 -4
  140. package/src/prompts.ts +40 -34
  141. package/src/scenario-service.ts +20 -0
  142. package/src/scenario-worker-class.ts +30 -4
  143. package/src/utils/coordinate-converter.ts +162 -0
  144. package/src/utils/page-info-retry.ts +65 -0
  145. package/src/utils/page-info-utils.ts +53 -18
  146. package/testchimp-runner-core-0.0.35.tgz +0 -0
  147. /package/{CREDIT_CALLBACK_ARCHITECTURE.md → plandocs/CREDIT_CALLBACK_ARCHITECTURE.md} +0 -0
  148. /package/{INTEGRATION_COMPLETE.md → plandocs/INTEGRATION_COMPLETE.md} +0 -0
  149. /package/{VISION_DIAGNOSTICS_IMPROVEMENTS.md → plandocs/VISION_DIAGNOSTICS_IMPROVEMENTS.md} +0 -0
  150. /package/{RELEASE_0.0.26.md → releasenotes/RELEASE_0.0.26.md} +0 -0
  151. /package/{RELEASE_0.0.27.md → releasenotes/RELEASE_0.0.27.md} +0 -0
  152. /package/{RELEASE_0.0.28.md → releasenotes/RELEASE_0.0.28.md} +0 -0
@@ -0,0 +1,148 @@
1
+ # Before/After Screenshot Verification
2
+
3
+ ## Feature: Visual Goal Verification for Coordinate Actions
4
+
5
+ ### Problem Solved:
6
+ When using coordinate-based actions (clicking at x,y%), the agent has no way to know if the click achieved the goal:
7
+ - No element reference to check state
8
+ - No selector feedback
9
+ - Can't verify if expected page loaded or modal opened
10
+
11
+ This led to:
12
+ - False positives (click succeeded but goal not achieved)
13
+ - Infinite loops (agent keeps clicking, unsure if it worked)
14
+
15
+ ### Solution:
16
+ Automatic before/after screenshot comparison after coordinate clicks.
17
+
18
+ ## How It Works:
19
+
20
+ ### 1. **Automatic Trigger** (No Agent Action Required)
21
+ When agent uses coordinate action:
22
+ ```typescript
23
+ Iteration 4: 🎯 Coordinate mode activated
24
+ Step 1: Capture BEFORE screenshot
25
+ Step 2: Execute coordinate click (x%, y%)
26
+ Step 3: Wait 1000ms for UI to settle
27
+ Step 4: Capture AFTER screenshot
28
+ Step 5: Call LLM with both images (labeled "BEFORE", "AFTER")
29
+ Step 6: LLM responds: { goalAchieved: true/false, reasoning: "..." }
30
+ Step 7a: If TRUE → Mark complete, exit step ✅
31
+ Step 7b: If FALSE → Continue to next iteration, try different coordinates
32
+ ```
33
+
34
+ ### 2. **LLM Prompt for Verification**
35
+ ```
36
+ Goal: [Current step goal]
37
+
38
+ Compare the BEFORE and AFTER screenshots.
39
+
40
+ Did the action achieve the goal? Respond with JSON:
41
+ {
42
+ "goalAchieved": boolean,
43
+ "reasoning": "What changed (or didn't change)",
44
+ "visibleChanges": ["List of UI changes observed"]
45
+ }
46
+
47
+ Focus on:
48
+ - Did expected elements appear/disappear?
49
+ - Did page navigate or content change?
50
+ - Visual indicators of success (new panels, forms, highlights)?
51
+
52
+ Be strict: Only return true if you clearly see the expected change.
53
+ ```
54
+
55
+ ### 3. **Multi-Image LLM Interface**
56
+ ```typescript
57
+ // NEW: LabeledImage interface
58
+ export interface LabeledImage {
59
+ label: string; // "Before", "After", etc.
60
+ dataUrl: string; // Base64 data URL
61
+ }
62
+
63
+ // UPDATED: LLMRequest
64
+ export interface LLMRequest {
65
+ imageUrl?: string; // Backward compatible (single image)
66
+ images?: LabeledImage[]; // NEW - multi-image support
67
+ }
68
+ ```
69
+
70
+ ### 4. **Provider Implementation** (scriptservice-llm-provider.ts)
71
+ ```typescript
72
+ if (request.images && request.images.length > 0) {
73
+ for (const img of request.images) {
74
+ contentParts.push({ type: 'text', text: `\n[${img.label}]:` });
75
+ contentParts.push({ type: 'image_url', image_url: { url: img.dataUrl } });
76
+ }
77
+ // Sends: [BEFORE]: <image1>, [AFTER]: <image2>
78
+ }
79
+ ```
80
+
81
+ ## When Verification Happens:
82
+
83
+ ✅ **Always**: After first coordinate action attempt
84
+ ❌ **Never**: After selector-based actions (have element state to check)
85
+ ⚠️ **Conditional**: Can add for other scenarios where goal verification is unclear
86
+
87
+ ## Cost Considerations:
88
+
89
+ **Per verification call:**
90
+ - 2 viewport screenshots (~50-100KB each)
91
+ - Vision model (gpt-5-mini): ~$0.001 per call
92
+ - Used only when coordinate mode activates (after 3 selector failures)
93
+
94
+ **Typical scenario:**
95
+ - Steps 1-10: Regular selectors → No verification cost
96
+ - Step 5 gets stuck → Coordinate mode → 1 verification call → $0.001
97
+ - Overall impact: Minimal, used sparingly
98
+
99
+ ## Example Flow:
100
+
101
+ **Step 5: "Select Employee Information"**
102
+ ```
103
+ Iteration 1: getByText('Employee Information') → Strict mode ❌
104
+ Iteration 2: locator('#collapse-1').getByText('Employee Information') → Click succeeds ✅
105
+ BUT: Didn't navigate to Employee Information page (false positive)
106
+
107
+ Iteration 3: Selector fails again
108
+ Iteration 4: 🎯 Coordinate mode
109
+ → BEFORE: Homepage with sidebar
110
+ → Click at (19.3%, 22.9%)
111
+ → Wait 1s
112
+ → AFTER: Check screenshot
113
+ → LLM: "goalAchieved": true, "reasoning": "Employee Information page loaded with form"
114
+ → ✅ Mark complete, exit
115
+ ```
116
+
117
+ ## Backward Compatibility:
118
+
119
+ ✅ **Single image still works:**
120
+ ```typescript
121
+ const request = {
122
+ imageUrl: 'data:image/png;base64,...' // Old way
123
+ };
124
+ ```
125
+
126
+ ✅ **Multi-image NEW:**
127
+ ```typescript
128
+ const request = {
129
+ images: [
130
+ { label: 'BEFORE', dataUrl: '...' },
131
+ { label: 'AFTER', dataUrl: '...' }
132
+ ]
133
+ };
134
+ ```
135
+
136
+ ## Files Modified:
137
+
138
+ 1. `runner-core/src/llm-provider.ts` - Added LabeledImage interface and images field
139
+ 2. `scriptservice/providers/scriptservice-llm-provider.ts` - Handle multiple images in OpenAI API
140
+ 3. `runner-core/src/orchestrator/orchestrator-agent.ts` - Added verifyGoalWithScreenshotComparison method
141
+ 4. Automatic trigger after coordinate actions
142
+
143
+ ## Next Steps:
144
+
145
+ - ✅ Infrastructure ready
146
+ - ⏳ Need to test with real scenario
147
+ - 🔮 Future: Could expose as agent-callable tool if needed
148
+
@@ -0,0 +1,144 @@
1
+ # Coordinate Mode Diagnosis - Live Test Results
2
+
3
+ ## Test Scenario: PeopleHR Employee Information Flow
4
+
5
+ ### ✅ What Worked:
6
+
7
+ 1. **Coordinate fallback DID activate** (after fix from >= 3 to >= 5)
8
+ 2. **Agent successfully used coordinates** at (87.5%, 23.438%)
9
+ 3. **Physical clicks succeeded** - page.mouse.click(1120, 169)
10
+ 4. **Agent learned** to stick with coordinates after selectors failed
11
+
12
+ ### ❌ What Didn't Work:
13
+
14
+ **Agent hit max iterations (8) without marking "complete"**
15
+
16
+ ## Detailed Step 6 Flow:
17
+
18
+ ```
19
+ Iteration 1: Selector attempt → Timeout ❌
20
+ Iteration 2: Selector attempt → Timeout ❌
21
+ Iteration 3: Selector attempt → Timeout ❌
22
+ Iteration 4: 🎯 COORDINATE MODE → Click (87.5%, 23.438%) → ✅ Success
23
+ Iteration 5: Repeat coordinate → ✅ Success
24
+ Iteration 6: Repeat coordinate → ✅ Success (?)
25
+ Iteration 7: Repeat coordinate → ✅ Success
26
+ Iteration 8: Repeat coordinate → ✅ Success
27
+ Result: ⚠️ Max iterations → system_limit
28
+ ```
29
+
30
+ ## Root Cause Analysis:
31
+
32
+ ### Problem: **No Goal Verification After Coordinate Success**
33
+
34
+ **With selectors:**
35
+ ```typescript
36
+ await page.getByRole('button').click();
37
+ // Can verify: await expect(button).toHaveState('pressed')
38
+ // Can check: New elements appeared, URL changed, etc.
39
+ ```
40
+
41
+ **With coordinates:**
42
+ ```typescript
43
+ await page.mouse.click(1120, 169);
44
+ // ❓ Did it work? No element reference!
45
+ // ❓ How to verify? Can't check button state
46
+ // ❓ What changed? Need to inspect DOM/screenshot
47
+ ```
48
+
49
+ ### Why Agent Kept Retrying:
50
+
51
+ **Agent's reasoning (iterations 5-8):**
52
+ - "Coordinate click succeeded (executed without error)"
53
+ - "But I don't know if goal was achieved"
54
+ - "Step says 'Click on New' - did the New form open?"
55
+ - "I should try again to be sure..."
56
+ - → **Loops until max iterations**
57
+
58
+ ## Solutions to Consider:
59
+
60
+ ### Option 1: **Trust Coordinate Success** (Simple)
61
+ After coordinate click succeeds:
62
+ - Wait 500ms for UI response
63
+ - Mark status="complete" automatically
64
+ - Assume click worked (trust the coordinates)
65
+
66
+ ```typescript
67
+ if (coordinateAction && coordResult.allSucceeded) {
68
+ await page.waitForTimeout(500); // Let UI respond
69
+ return { status: 'complete', reasoning: 'Coordinate click succeeded' };
70
+ }
71
+ ```
72
+
73
+ **Pros**: Simple, fast
74
+ **Cons**: No verification of actual goal achievement
75
+
76
+ ### Option 2: **Visual Verification** (Better)
77
+ After coordinate click:
78
+ - Wait 500ms
79
+ - Take screenshot
80
+ - Compare before/after
81
+ - If changed → complete, else → retry with different coords
82
+
83
+ ```typescript
84
+ const beforeScreenshot = await page.screenshot();
85
+ await page.mouse.click(x, y);
86
+ await page.waitForTimeout(500);
87
+ const afterScreenshot = await page.screenshot();
88
+ if (screenshotsAreDifferent(before, after)) {
89
+ return { status: 'complete' };
90
+ }
91
+ ```
92
+
93
+ **Pros**: Validates something changed
94
+ **Cons**: Slower, more LLM calls
95
+
96
+ ### Option 3: **DOM Change Detection** (Balanced)
97
+ After coordinate click:
98
+ - Capture DOM snapshot before
99
+ - Click coordinates
100
+ - Capture DOM snapshot after
101
+ - If new elements/navigation → complete
102
+
103
+ ```typescript
104
+ const beforeUrl = page.url();
105
+ const beforeElements = await getEnhancedPageInfo(page);
106
+ await page.mouse.click(x, y);
107
+ await page.waitForTimeout(500);
108
+ const afterUrl = page.url();
109
+ const afterElements = await getEnhancedPageInfo(page);
110
+
111
+ if (afterUrl !== beforeUrl || afterElements.count !== beforeElements.count) {
112
+ return { status: 'complete', reasoning: 'Page state changed after coordinate click' };
113
+ }
114
+ ```
115
+
116
+ **Pros**: Fast, objective verification
117
+ **Cons**: Might miss subtle changes (modal opens without URL/element count change)
118
+
119
+ ### Option 4: **Prompt Guidance** (Immediate)
120
+ Update prompt to tell agent:
121
+ "After coordinate click succeeds, mark status='complete' unless you can clearly verify it failed"
122
+
123
+ **Pros**: No code changes
124
+ **Cons**: Relies on LLM judgment
125
+
126
+ ## Recommendation:
127
+
128
+ **Hybrid approach:**
129
+ 1. **Immediate** (Prompt): Tell agent to trust coordinate success
130
+ 2. **Phase 2** (Code): Add DOM change detection for validation
131
+
132
+ ## Current Status:
133
+
134
+ - ✅ Coordinate fallback works technically
135
+ - ✅ Physical clicks succeed
136
+ - ❌ Agent doesn't know when to stop
137
+ - 🔧 Need completion detection logic
138
+
139
+ ## Test Results Summary:
140
+
141
+ **Steps 1-5**: ✅ All completed successfully
142
+ **Step 6**: ⚠️ Coordinates worked but hit max iterations (no completion detection)
143
+ **Overall**: Coordinate mode is functional but needs completion logic
144
+
@@ -0,0 +1,108 @@
1
+ # Runner-Core Visual Agent Implementation Status
2
+
3
+ ## Phase 1: ✅ COMPLETE (v0.0.33)
4
+
5
+ ### Implemented Features:
6
+
7
+ 1. **Note to Future Self** - Tactical iteration memory
8
+ 2. **Percentage-Based Coordinates** - Last-resort fallback with 3-decimal precision
9
+ 3. **Two-Tier Auto-Escalation** - Code-controlled mode switching
10
+
11
+ ### Current Behavior (Phase 1):
12
+
13
+ ```
14
+ Iteration 1-3: Normal Playwright selectors + note-to-self (3 attempts)
15
+ ↓ (after 3 failures)
16
+ Iteration 4-5: Percentage coordinates (2 attempts max)
17
+ ↓ (if both coordinate attempts fail)
18
+ Give up - mark as stuck
19
+
20
+ Total: Maximum 5 iterations per step
21
+ ```
22
+
23
+ ---
24
+
25
+ ## Phase 2: 📋 PLANNED (Not Started)
26
+
27
+ ### Will Add:
28
+
29
+ 1. **ElementDetector** - Detect interactive elements with z-index awareness
30
+ 2. **VisualMarkerInjector** - Number elements [1], [2], [3] on screenshot
31
+ 3. **SelectorResolver** - Translate index → native Playwright selector
32
+ 4. **IndexCommandTranslator** - Convert CLICK[3] → native Playwright command
33
+
34
+ ### Future Behavior (Phase 2):
35
+
36
+ ```
37
+ Iteration 1: Playwright selector (1 attempt) → 70% success
38
+ ↓ (on first failure)
39
+ Iteration 2-3: Index commands CLICK[3] (2 attempts) → 25% success
40
+ ↓ (after 3 total failures)
41
+ Iteration 4-5: Percentage coordinates (2 attempts max) → 5% success
42
+ ↓ (if all fail)
43
+ Give up - mark as stuck
44
+
45
+ Total: Maximum 5 iterations per step (down from 8)
46
+ Average: ~1.5 iterations per step (fast!)
47
+ ```
48
+
49
+ ### Key Design Principle for Phase 2:
50
+
51
+ **During Execution:**
52
+ - Agent clicks using `data-testchimp-el="[3]"` (reliable, we inject it)
53
+
54
+ **In Generated Script:**
55
+ - Translator outputs NATIVE selector: `getByRole('button', {name: 'Menu'})`
56
+ - Script works standalone without data-testchimp-el
57
+
58
+ **Why Two-Stage:**
59
+ 1. Agent needs reliability during exploration → use data attribute
60
+ 2. Generated script must be portable → use native selectors
61
+ 3. Best of both worlds: reliable execution + maintainable output
62
+
63
+ ---
64
+
65
+ ## Optimizations vs Original Plan
66
+
67
+ ### Original Plan:
68
+ - Tier 1: iterations 1-2
69
+ - Tier 2: iterations 3-4
70
+ - Tier 3: iterations 5+
71
+ - Average: ~4 iterations per step
72
+
73
+ ### Optimized Plan (Current):
74
+ - Tier 1: iteration 1 ONLY (fast path)
75
+ - Tier 2: iterations 2-3 (reliable fallback)
76
+ - Tier 3: iterations 4+ (absolute last resort)
77
+ - **Target: ~1.5 average iterations per step**
78
+
79
+ **Rationale:** Don't waste time! Simple tasks finish in 1 iteration, complex tasks escalate quickly to more reliable methods.
80
+
81
+ ---
82
+
83
+ ## Testing Checklist
84
+
85
+ ### Phase 1 (Ready Now):
86
+ - [ ] Run PeopleHR scenario - verify note-to-self helps
87
+ - [ ] Test coordinate fallback on deliberately difficult case
88
+ - [ ] Measure iteration reduction (expect 20-30%)
89
+ - [ ] Verify timeout fixes for waitForLoadState
90
+
91
+ ### Phase 2 (When Implemented):
92
+ - [ ] Test ElementDetector on modals/overlays
93
+ - [ ] Verify z-index occlusion detection
94
+ - [ ] Validate native selector generation (no data-testchimp-el in output)
95
+ - [ ] Run generated scripts standalone - must work!
96
+ - [ ] Measure tier distribution: 70/25/5
97
+
98
+ ---
99
+
100
+ ## Current Version
101
+
102
+ **Runner-Core:** v0.0.33
103
+ **Status:** Built and ready to test
104
+ **Phase 1:** ✅ Complete
105
+ **Phase 2:** 📋 Planned but not started
106
+
107
+ **Next Step:** Test Phase 1 with PeopleHR scenario to validate improvements before implementing Phase 2.
108
+
@@ -0,0 +1,165 @@
1
+ # Phase 1 Implementation - COMPLETE ✅
2
+
3
+ ## Version: runner-core v0.0.33
4
+
5
+ ## What's Been Implemented
6
+
7
+ ### 1. Free-Form "Note to Future Self"
8
+ **Purpose:** Tactical memory - agent leaves notes that persist across iterations AND steps.
9
+
10
+ **Type:**
11
+ ```typescript
12
+ interface NoteToFutureSelf {
13
+ fromIteration: number;
14
+ content: string; // FREE-FORM - agent writes whatever it wants
15
+ }
16
+ ```
17
+
18
+ **How it works:**
19
+ - Agent includes `"noteToFutureSelf": "..."` in response
20
+ - System stores it in `memory.latestNote` (persists across steps!)
21
+ - Passed to next iteration AND next step
22
+ - Displayed prominently at top of prompt
23
+ - Agent reads it FIRST before making decision
24
+
25
+ **Scope:** Entire scenario journey (not just current step)
26
+
27
+ **Example notes:**
28
+
29
+ *Iteration-specific:*
30
+ - "Tried #sidebar-toggle, failed with 'not clickable'. Will try child SVG element next."
31
+
32
+ *Step-spanning:*
33
+ - "This app has slow-loading modals. Always wait 2s after page load before clicking."
34
+ - "Cookie consent appears on every page. Check for and dismiss it first."
35
+ - "Sidebar only visible on desktop viewport (>1024px width)."
36
+
37
+ ### 2. Percentage-Based Coordinate Fallback
38
+ **Purpose:** Last-resort mechanism when selector generation repeatedly fails.
39
+
40
+ **Type:**
41
+ ```typescript
42
+ interface CoordinateAction {
43
+ type: 'coordinate';
44
+ action: 'click' | 'doubleClick' | 'rightClick' | 'hover' | 'drag' | 'fill' | 'scroll';
45
+ xPercent: number; // 0-100, 3 decimal precision
46
+ yPercent: number;
47
+ toXPercent?: number; // For drag
48
+ toYPercent?: number;
49
+ value?: string; // For fill
50
+ scrollAmount?: number; // For scroll
51
+ }
52
+ ```
53
+
54
+ **How it works:**
55
+ - LLM outputs percentages: `{xPercent: 15.755, yPercent: 8.500}`
56
+ - CoordinateConverter converts to pixels: `15.755% → 252px`
57
+ - Generates Playwright command: `await page.mouse.click(252, 68);`
58
+
59
+ **Supported actions:**
60
+ - click, doubleClick, rightClick, hover
61
+ - fill (clicks then types value)
62
+ - drag (from x%,y% to toX%,toY%)
63
+ - scroll (at position, by amount)
64
+
65
+ ### 3. Two-Tier Auto-Escalation
66
+ **Trigger:** Code-controlled (not LLM-decided)
67
+
68
+ ```
69
+ Tier 1 (iterations 1-3): Playwright Selector Mode
70
+ ├─ Normal buildSystemPrompt()
71
+ ├─ Agent generates: await page.getByRole(...).click()
72
+ ├─ Leaves noteToFutureSelf for continuity
73
+ └─ 3 attempts, then escalate
74
+
75
+ Tier 2 (iterations 4-5): Coordinate Mode
76
+ ├─ Auto-activates when consecutiveFailures >= 3
77
+ ├─ Uses buildCoordinateSystemPrompt()
78
+ ├─ Agent outputs: {xPercent: 15.755, yPercent: 8.500}
79
+ ├─ CoordinateConverter → mouse.click(x, y)
80
+ └─ 2 attempts max, then give up
81
+
82
+ Total: Maximum 5 iterations per step
83
+ ```
84
+
85
+ ### 4. Precision & Accuracy
86
+ - **3 decimal precision** for coordinates (~1px accuracy on most screens)
87
+ - **Resolution-independent** - works on any viewport size
88
+ - **Percentage reference:**
89
+ - Top-left: (0, 0)
90
+ - Top-right: (100, 0)
91
+ - Center: (50, 50)
92
+ - Bottom-right: (100, 100)
93
+
94
+ ## Files Modified
95
+
96
+ 1. **orchestrator/types.ts**
97
+ - Added `NoteToFutureSelf` interface
98
+ - Added `CoordinateAction` interface
99
+ - Updated `AgentDecision` with new fields
100
+ - Updated `AgentContext` with noteFromPreviousIteration
101
+
102
+ 2. **orchestrator/orchestrator-agent.ts**
103
+ - Added note tracking in executeStep()
104
+ - Added coordinate action execution
105
+ - Added buildCoordinateSystemPrompt()
106
+ - Updated buildUserPrompt() to display notes
107
+ - Added mode switching in callAgent()
108
+ - Updated response format documentation
109
+
110
+ 3. **utils/coordinate-converter.ts** (NEW)
111
+ - percentToPixels() - Convert % to pixels
112
+ - getViewportSize() - Get current viewport dimensions
113
+ - generateCommands() - Create Playwright commands from percentages
114
+ - executeAction() - Direct execution helper
115
+
116
+ 4. **scenario-worker-class.ts** (Earlier fix)
117
+ - Smart timeout handling for waitForLoadState
118
+
119
+ 5. **execution-service.ts** (Earlier fix)
120
+ - Smart timeout handling for navigation commands
121
+
122
+ ## How to Use
123
+
124
+ **No code changes needed!** The features activate automatically:
125
+
126
+ 1. **Note to self:** Agent can optionally include `noteToFutureSelf` in any iteration
127
+ 2. **Coordinates:** Auto-activate at iteration 4 if selectors keep failing
128
+
129
+ ## Testing Phase 1
130
+
131
+ To validate the implementation:
132
+
133
+ 1. **Run PeopleHR scenario** (previously failed on hamburger menu)
134
+ - Should now succeed with note guidance
135
+ - May use coordinates if SVG selector still fails
136
+
137
+ 2. **Check logs for:**
138
+ - `📝 Note to self: ...` (agent leaving tactical notes)
139
+ - `🎯 COORDINATE MODE ACTIVATED` (tier 2 triggered)
140
+ - `🎯 Coordinate Action: click at (X%, Y%)` (using fallback)
141
+
142
+ 3. **Expected improvements:**
143
+ - 20-30% fewer iterations per step (thanks to notes)
144
+ - < 5% scenarios need coordinate fallback
145
+ - Coordinates work when everything else fails
146
+
147
+ ## Phase 2 Preview (Not Yet Implemented)
148
+
149
+ When Phase 2 is added, it will become a **three-tier** system:
150
+ - Tier 1 (iterations 1-2): Playwright selectors
151
+ - Tier 2 (iterations 3-4): Numbered elements (CLICK[3])
152
+ - Tier 3 (iterations 5+): Percentage coordinates
153
+
154
+ Phase 2 adds visual markers [1], [2], [3] on elements with structured commands.
155
+
156
+ ---
157
+
158
+ ## Status: ✅ READY FOR TESTING
159
+
160
+ Runner-core v0.0.33 is built and ready. Test it with:
161
+ - VS Code extension "Run Test" on peoplehr-corrected.smart.spec.ts
162
+ - Or generate new script from peoplehr.txt scenario
163
+
164
+ **Next:** Validate Phase 1 works before starting Phase 2.
165
+