testchimp-runner-core 0.0.33 → 0.0.35

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (152) hide show
  1. package/dist/execution-service.d.ts +1 -4
  2. package/dist/execution-service.d.ts.map +1 -1
  3. package/dist/execution-service.js +155 -468
  4. package/dist/execution-service.js.map +1 -1
  5. package/dist/index.d.ts +3 -1
  6. package/dist/index.d.ts.map +1 -1
  7. package/dist/index.js +11 -1
  8. package/dist/index.js.map +1 -1
  9. package/dist/llm-facade.d.ts.map +1 -1
  10. package/dist/llm-facade.js +7 -7
  11. package/dist/llm-facade.js.map +1 -1
  12. package/dist/llm-provider.d.ts +9 -0
  13. package/dist/llm-provider.d.ts.map +1 -1
  14. package/dist/model-constants.d.ts +16 -5
  15. package/dist/model-constants.d.ts.map +1 -1
  16. package/dist/model-constants.js +17 -6
  17. package/dist/model-constants.js.map +1 -1
  18. package/dist/orchestrator/decision-parser.d.ts +18 -0
  19. package/dist/orchestrator/decision-parser.d.ts.map +1 -0
  20. package/dist/orchestrator/decision-parser.js +127 -0
  21. package/dist/orchestrator/decision-parser.js.map +1 -0
  22. package/dist/orchestrator/index.d.ts +4 -2
  23. package/dist/orchestrator/index.d.ts.map +1 -1
  24. package/dist/orchestrator/index.js +15 -2
  25. package/dist/orchestrator/index.js.map +1 -1
  26. package/dist/orchestrator/orchestrator-agent.d.ts +17 -22
  27. package/dist/orchestrator/orchestrator-agent.d.ts.map +1 -1
  28. package/dist/orchestrator/orchestrator-agent.js +708 -577
  29. package/dist/orchestrator/orchestrator-agent.js.map +1 -1
  30. package/dist/orchestrator/orchestrator-prompts.d.ts +32 -0
  31. package/dist/orchestrator/orchestrator-prompts.d.ts.map +1 -0
  32. package/dist/orchestrator/orchestrator-prompts.js +737 -0
  33. package/dist/orchestrator/orchestrator-prompts.js.map +1 -0
  34. package/dist/orchestrator/page-som-handler.d.ts +106 -0
  35. package/dist/orchestrator/page-som-handler.d.ts.map +1 -0
  36. package/dist/orchestrator/page-som-handler.js +1353 -0
  37. package/dist/orchestrator/page-som-handler.js.map +1 -0
  38. package/dist/orchestrator/som-types.d.ts +149 -0
  39. package/dist/orchestrator/som-types.d.ts.map +1 -0
  40. package/dist/orchestrator/som-types.js +87 -0
  41. package/dist/orchestrator/som-types.js.map +1 -0
  42. package/dist/orchestrator/tool-registry.d.ts +2 -0
  43. package/dist/orchestrator/tool-registry.d.ts.map +1 -1
  44. package/dist/orchestrator/tool-registry.js.map +1 -1
  45. package/dist/orchestrator/tools/index.d.ts +5 -1
  46. package/dist/orchestrator/tools/index.d.ts.map +1 -1
  47. package/dist/orchestrator/tools/index.js +9 -2
  48. package/dist/orchestrator/tools/index.js.map +1 -1
  49. package/dist/orchestrator/tools/refresh-som-markers.d.ts +12 -0
  50. package/dist/orchestrator/tools/refresh-som-markers.d.ts.map +1 -0
  51. package/dist/orchestrator/tools/refresh-som-markers.js +64 -0
  52. package/dist/orchestrator/tools/refresh-som-markers.js.map +1 -0
  53. package/dist/orchestrator/tools/verify-action-result.d.ts +17 -0
  54. package/dist/orchestrator/tools/verify-action-result.d.ts.map +1 -0
  55. package/dist/orchestrator/tools/verify-action-result.js +140 -0
  56. package/dist/orchestrator/tools/verify-action-result.js.map +1 -0
  57. package/dist/orchestrator/tools/view-previous-screenshot.d.ts +15 -0
  58. package/dist/orchestrator/tools/view-previous-screenshot.d.ts.map +1 -0
  59. package/dist/orchestrator/tools/view-previous-screenshot.js +92 -0
  60. package/dist/orchestrator/tools/view-previous-screenshot.js.map +1 -0
  61. package/dist/orchestrator/types.d.ts +49 -1
  62. package/dist/orchestrator/types.d.ts.map +1 -1
  63. package/dist/orchestrator/types.js +11 -1
  64. package/dist/orchestrator/types.js.map +1 -1
  65. package/dist/prompts.d.ts.map +1 -1
  66. package/dist/prompts.js +40 -34
  67. package/dist/prompts.js.map +1 -1
  68. package/dist/scenario-service.d.ts +5 -0
  69. package/dist/scenario-service.d.ts.map +1 -1
  70. package/dist/scenario-service.js +17 -0
  71. package/dist/scenario-service.js.map +1 -1
  72. package/dist/scenario-worker-class.d.ts +4 -0
  73. package/dist/scenario-worker-class.d.ts.map +1 -1
  74. package/dist/scenario-worker-class.js +21 -3
  75. package/dist/scenario-worker-class.js.map +1 -1
  76. package/dist/testing/agent-tester.d.ts +35 -0
  77. package/dist/testing/agent-tester.d.ts.map +1 -0
  78. package/dist/testing/agent-tester.js +84 -0
  79. package/dist/testing/agent-tester.js.map +1 -0
  80. package/dist/testing/ref-translator-tester.d.ts +44 -0
  81. package/dist/testing/ref-translator-tester.d.ts.map +1 -0
  82. package/dist/testing/ref-translator-tester.js +104 -0
  83. package/dist/testing/ref-translator-tester.js.map +1 -0
  84. package/dist/utils/coordinate-converter.d.ts +32 -0
  85. package/dist/utils/coordinate-converter.d.ts.map +1 -0
  86. package/dist/utils/coordinate-converter.js +130 -0
  87. package/dist/utils/coordinate-converter.js.map +1 -0
  88. package/dist/utils/hierarchical-selector.d.ts +47 -0
  89. package/dist/utils/hierarchical-selector.d.ts.map +1 -0
  90. package/dist/utils/hierarchical-selector.js +212 -0
  91. package/dist/utils/hierarchical-selector.js.map +1 -0
  92. package/dist/utils/page-info-retry.d.ts +14 -0
  93. package/dist/utils/page-info-retry.d.ts.map +1 -0
  94. package/dist/utils/page-info-retry.js +60 -0
  95. package/dist/utils/page-info-retry.js.map +1 -0
  96. package/dist/utils/page-info-utils.d.ts +1 -0
  97. package/dist/utils/page-info-utils.d.ts.map +1 -1
  98. package/dist/utils/page-info-utils.js +46 -18
  99. package/dist/utils/page-info-utils.js.map +1 -1
  100. package/dist/utils/ref-attacher.d.ts +21 -0
  101. package/dist/utils/ref-attacher.d.ts.map +1 -0
  102. package/dist/utils/ref-attacher.js +149 -0
  103. package/dist/utils/ref-attacher.js.map +1 -0
  104. package/dist/utils/ref-translator.d.ts +49 -0
  105. package/dist/utils/ref-translator.d.ts.map +1 -0
  106. package/dist/utils/ref-translator.js +276 -0
  107. package/dist/utils/ref-translator.js.map +1 -0
  108. package/package.json +1 -1
  109. package/plandocs/BEFORE_AFTER_VERIFICATION.md +148 -0
  110. package/plandocs/COORDINATE_MODE_DIAGNOSIS.md +144 -0
  111. package/plandocs/IMPLEMENTATION_STATUS.md +108 -0
  112. package/plandocs/PHASE_1_COMPLETE.md +165 -0
  113. package/plandocs/PHASE_1_SUMMARY.md +184 -0
  114. package/plandocs/PROMPT_OPTIMIZATION_ANALYSIS.md +120 -0
  115. package/plandocs/PROMPT_SANITY_CHECK.md +120 -0
  116. package/plandocs/SESSION_SUMMARY_v0.0.33.md +151 -0
  117. package/plandocs/TROUBLESHOOTING_SESSION.md +72 -0
  118. package/plandocs/VISUAL_AGENT_EVOLUTION_PLAN.md +396 -0
  119. package/plandocs/WHATS_NEW_v0.0.33.md +183 -0
  120. package/plandocs/exploratory-mode-support-v2.plan.md +953 -0
  121. package/plandocs/exploratory-mode-support.plan.md +928 -0
  122. package/plandocs/journey-id-tracking-addendum.md +227 -0
  123. package/src/execution-service.ts +179 -596
  124. package/src/index.ts +10 -0
  125. package/src/llm-facade.ts +8 -8
  126. package/src/llm-provider.ts +11 -1
  127. package/src/model-constants.ts +17 -5
  128. package/src/orchestrator/decision-parser.ts +139 -0
  129. package/src/orchestrator/index.ts +27 -2
  130. package/src/orchestrator/orchestrator-agent.ts +868 -623
  131. package/src/orchestrator/orchestrator-prompts.ts +786 -0
  132. package/src/orchestrator/page-som-handler.ts +1565 -0
  133. package/src/orchestrator/som-types.ts +188 -0
  134. package/src/orchestrator/tool-registry.ts +2 -0
  135. package/src/orchestrator/tools/index.ts +5 -1
  136. package/src/orchestrator/tools/refresh-som-markers.ts +69 -0
  137. package/src/orchestrator/tools/verify-action-result.ts +159 -0
  138. package/src/orchestrator/tools/view-previous-screenshot.ts +103 -0
  139. package/src/orchestrator/types.ts +95 -4
  140. package/src/prompts.ts +40 -34
  141. package/src/scenario-service.ts +20 -0
  142. package/src/scenario-worker-class.ts +30 -4
  143. package/src/utils/coordinate-converter.ts +162 -0
  144. package/src/utils/page-info-retry.ts +65 -0
  145. package/src/utils/page-info-utils.ts +53 -18
  146. package/testchimp-runner-core-0.0.35.tgz +0 -0
  147. /package/{CREDIT_CALLBACK_ARCHITECTURE.md → plandocs/CREDIT_CALLBACK_ARCHITECTURE.md} +0 -0
  148. /package/{INTEGRATION_COMPLETE.md → plandocs/INTEGRATION_COMPLETE.md} +0 -0
  149. /package/{VISION_DIAGNOSTICS_IMPROVEMENTS.md → plandocs/VISION_DIAGNOSTICS_IMPROVEMENTS.md} +0 -0
  150. /package/{RELEASE_0.0.26.md → releasenotes/RELEASE_0.0.26.md} +0 -0
  151. /package/{RELEASE_0.0.27.md → releasenotes/RELEASE_0.0.27.md} +0 -0
  152. /package/{RELEASE_0.0.28.md → releasenotes/RELEASE_0.0.28.md} +0 -0
@@ -0,0 +1,184 @@
1
+ # Phase 1 Complete - Summary & Testing Guide
2
+
3
+ ## Version: runner-core v0.0.33 ✅
4
+
5
+ ---
6
+
7
+ ## Implementation Complete
8
+
9
+ ### What's New:
10
+
11
+ 1. **📝 Note to Future Self**
12
+ - Free-form tactical memory between iterations
13
+ - Agent writes: "Tried X, failed. Will try Y next."
14
+ - Prevents repeated mistakes
15
+
16
+ 2. **🎯 Percentage-Based Coordinates**
17
+ - Last-resort fallback (3-decimal precision)
18
+ - Resolution-independent (works any viewport size)
19
+ - Supports: click, fill, drag, hover, scroll
20
+
21
+ 3. **⚡ Optimized Iteration Limits**
22
+ - Max 5 iterations per step (down from 8)
23
+ - 2 coordinate attempts max (coordinates work or they don't)
24
+ - Faster feedback on stuck scenarios
25
+
26
+ ---
27
+
28
+ ## Current Behavior (Phase 1)
29
+
30
+ ```
31
+ ┌─────────────────────────────────────────────────────┐
32
+ │ Iteration 1: Playwright selector │
33
+ │ Try: await page.getByRole('button'...).click() │
34
+ │ Note: "If this fails, try #id selector" │
35
+ │ │
36
+ │ Iteration 2: Playwright selector │
37
+ │ Read note from iteration 1 │
38
+ │ Try: await page.locator('#sidebar-toggle').click()│
39
+ │ Note: "If this fails, try SVG child" │
40
+ │ │
41
+ │ Iteration 3: Playwright selector │
42
+ │ Read note from iteration 2 │
43
+ │ Try: await page.locator('#sidebar-toggle svg') │
44
+ │ → Fails again │
45
+ │ │
46
+ │ 🎯 COORDINATE MODE ACTIVATED 🎯 │
47
+ │ │
48
+ │ Iteration 4: Coordinate action │
49
+ │ Agent outputs: {xPercent: 5.500, yPercent: 8.250}│
50
+ │ Execute: page.mouse.click(88, 66) │
51
+ │ → Success! │
52
+ │ │
53
+ │ OR if fails... │
54
+ │ │
55
+ │ Iteration 5: Coordinate action (2nd attempt) │
56
+ │ Try slightly adjusted coordinates │
57
+ │ → If fails: GIVE UP (stuck) │
58
+ │ │
59
+ │ Total: Max 5 iterations │
60
+ └─────────────────────────────────────────────────────┘
61
+ ```
62
+
63
+ ---
64
+
65
+ ## Testing Phase 1
66
+
67
+ ### Test 1: PeopleHR Scenario (Previously Failed)
68
+
69
+ **Expected outcome:**
70
+ - Iteration 1-2: Try text/ID selectors → fail
71
+ - Iteration 3: Note says "try SVG child" → succeeds!
72
+ - OR Iteration 4: Coordinates → succeeds!
73
+
74
+ **Run:**
75
+ ```bash
76
+ # Via VS extension: "Generate Script" on peoplehr.txt
77
+ # Or "Run Test" on peoplehr-corrected.smart.spec.ts
78
+ ```
79
+
80
+ **Look for in logs:**
81
+ ```
82
+ 📝 Note to self: ...
83
+ 🎯 COORDINATE MODE ACTIVATED
84
+ 🎯 Coordinate Action (attempt 1/2): click at (5.500%, 8.250%)
85
+ ```
86
+
87
+ ### Test 2: Simple Scenario (Should Still Be Fast)
88
+
89
+ Create test: `simple-login.txt`
90
+ ```
91
+ - go to https://example.com/login
92
+ - fill username with "alice"
93
+ - fill password with "password123"
94
+ - click login button
95
+ ```
96
+
97
+ **Expected:**
98
+ - Each step: 1 iteration (Tier 1 success)
99
+ - No coordinates needed
100
+ - Fast execution
101
+
102
+ ### Test 3: Coordinate Fallback
103
+
104
+ **Deliberately difficult scenario:**
105
+ ```
106
+ - go to https://some-app-with-shadow-dom.com
107
+ - click on custom web component icon
108
+ ```
109
+
110
+ **Expected:**
111
+ - Iterations 1-3: Selectors fail
112
+ - Iteration 4: Coordinates succeed
113
+ - Generated script contains: `await page.mouse.click(x, y);`
114
+
115
+ ---
116
+
117
+ ## Expected Improvements
118
+
119
+ ### Metrics to Track:
120
+
121
+ 1. **Iteration Efficiency**
122
+ - Before: ~4 average iterations per step
123
+ - After: ~2.5 average iterations per step (30-40% reduction)
124
+
125
+ 2. **Success Rate**
126
+ - Before: Stuck on complex UIs (hamburgers, icons, shadow DOM)
127
+ - After: Coordinates provide escape hatch
128
+
129
+ 3. **Coordinate Usage**
130
+ - Target: < 10% of scenarios use coordinates
131
+ - Most scenarios still succeed with selectors
132
+
133
+ ---
134
+
135
+ ## Files Changed
136
+
137
+ **New:**
138
+ - `src/utils/coordinate-converter.ts` - Percentage conversion utility
139
+ - `VISUAL_AGENT_EVOLUTION_PLAN.md` - Complete plan
140
+ - `PHASE_1_COMPLETE.md` - Feature documentation
141
+ - `IMPLEMENTATION_STATUS.md` - Current status
142
+ - `PHASE_1_SUMMARY.md` - This file
143
+
144
+ **Modified:**
145
+ - `src/orchestrator/types.ts` - Added NoteToFutureSelf, CoordinateAction
146
+ - `src/orchestrator/orchestrator-agent.ts` - Note tracking, coordinate handling, mode switching
147
+ - `src/scenario-worker-class.ts` - Timeout handling (earlier fix)
148
+ - `src/execution-service.ts` - Timeout handling (earlier fix)
149
+
150
+ ---
151
+
152
+ ## Iteration Budget (Max 5 per Step)
153
+
154
+ **Phase 1 (Current):**
155
+ ```
156
+ Iterations 1-3: Playwright selectors (3 attempts)
157
+ Iterations 4-5: Coordinates (2 attempts)
158
+ ```
159
+
160
+ **Phase 2 (Future - Optimized):**
161
+ ```
162
+ Iteration 1: Playwright selector (1 attempt) - fast path
163
+ Iterations 2-3: Index commands (2 attempts) - reliable fallback
164
+ Iterations 4-5: Coordinates (2 attempts) - last resort
165
+ ```
166
+
167
+ **Benefit of Phase 2:**
168
+ - Most scenarios finish in iteration 1 (fast!)
169
+ - Complex scenarios use iterations 2-3 (index system)
170
+ - Only extreme cases reach iterations 4-5 (coordinates)
171
+
172
+ ---
173
+
174
+ ## Ready to Test!
175
+
176
+ **Current version** (runner-core v0.0.33) is built and ready.
177
+
178
+ **Test with:**
179
+ 1. VS Code extension "Generate Script" on `peoplehr.txt`
180
+ 2. Or "Run Test" on any existing smart test
181
+ 3. Check logs for note-to-self and coordinate usage
182
+
183
+ **After validating Phase 1 works well, proceed to Phase 2 for numbered element system.**
184
+
@@ -0,0 +1,120 @@
1
+ # System Prompt Optimization Analysis
2
+
3
+ ## Current Stats:
4
+ - **System Prompt**: 17,573 chars (346 lines)
5
+ - **With Tool Descriptions**: 19,613 chars (~4,903 tokens)
6
+ - **Cost per call**: ~$0.0007 (gpt-5-mini input tokens)
7
+
8
+ ## Optimization Opportunities:
9
+
10
+ ### 1. **Duplicate Examples** (Save ~30%)
11
+ **Current**: Multiple example sections with ❌/✅ pairs
12
+ - Lines 633-644: Examples section with goto, fill, click examples
13
+ - Lines 621-626: Ambiguous text handling examples
14
+ - Lines 603-607: DOM snapshot examples
15
+ - Lines 615-619: Selector preference list
16
+
17
+ **Optimization**: Consolidate into ONE examples section
18
+ **Savings**: ~2,000 chars
19
+
20
+ ### 2. **Verbose Selector Section** (Save ~20%)
21
+ **Current**: Lines 602-644 (42 lines, ~1,800 chars)
22
+ - Lists all selector types with emoji
23
+ - Detailed examples for each
24
+ - Repetitive "Good/Bad" patterns
25
+
26
+ **Optimization**: Create compact reference table
27
+ ```
28
+ SELECTORS (preference order):
29
+ 1. getByRole/Label/Placeholder (semantic, stable)
30
+ 2. getByText (scope to parent if ambiguous!)
31
+ 3. CSS IDs (avoid auto-generated)
32
+
33
+ Common mistakes: Missing goto timeout, unscoped getByText, auto-generated IDs
34
+ ```
35
+ **Savings**: ~1,200 chars
36
+
37
+ ### 3. **Emoji Overuse** (Save ~5%)
38
+ **Current**: Heavy use of ⚠️, ❌, ✅, 🏆, etc.
39
+
40
+ **Optimization**: Use sparingly (only for critical warnings)
41
+ **Savings**: ~500 chars
42
+
43
+ ### 4. **Redundant "WHY" Explanations** (Save ~10%)
44
+ **Current**: Multiple "WHY:" sections explaining rationale
45
+ - Line 642-644: WHY semantic selectors
46
+ - Similar explanations scattered throughout
47
+
48
+ **Optimization**: Remove or consolidate
49
+ **Savings**: ~800 chars
50
+
51
+ ### 5. **Tool Instructions Redundancy** (Save ~10%)
52
+ **Current**: Tools described twice:
53
+ - In tool registry (dynamic)
54
+ - In prompt rules (static)
55
+
56
+ **Optimization**: Rely more on tool registry descriptions
57
+ **Savings**: ~600 chars
58
+
59
+ ### 6. **Status Rules Repetition** (Save ~5%)
60
+ **Current**: Lines 468-486 - Status rules explained multiple times
61
+
62
+ **Optimization**: Single concise statement
63
+ **Savings**: ~400 chars
64
+
65
+ ## Proposed Condensed Structure:
66
+
67
+ ```markdown
68
+ # System Prompt (Optimized)
69
+
70
+ ## Agent Role & Tools
71
+ [Tool descriptions from registry]
72
+
73
+ ## Response Format (JSON)
74
+ {required fields} - minimal format, no extensive comments
75
+
76
+ ## Core Rules (Prioritized)
77
+ 1. Status decisions (complete/continue/stuck)
78
+ 2. Selector strategy (semantic > text > CSS)
79
+ 3. Common errors (goto timeout, strict mode, auto-IDs)
80
+ 4. When to use tools vs commands
81
+ 5. Note to future self usage
82
+
83
+ ## Examples (Consolidated)
84
+ - Navigation: goto with 30s timeout
85
+ - Selectors: Scoped getByText, semantic selectors
86
+ - Coordinates: When and how
87
+
88
+ ## Advanced Features
89
+ - Blocker detection
90
+ - Step re-evaluation
91
+ - Coordinate fallback
92
+ ```
93
+
94
+ ## Total Potential Savings:
95
+
96
+ - **Before**: 17,573 chars (~4,393 tokens)
97
+ - **After**: ~12,000 chars (~3,000 tokens)
98
+ - **Reduction**: ~32% reduction in system prompt
99
+ - **Cost savings**: ~$0.0002 per call (~30% per call)
100
+ - **Overall impact**: With 7 tasks using gpt-4o-mini, only 4 tasks will benefit
101
+ - **Est. total savings**: ~5-8% additional cost reduction
102
+
103
+ ## Recommendation:
104
+
105
+ **Optimize if:**
106
+ - You're seeing consistent 500 errors (less likely now with retry)
107
+ - Want to maximize caching efficiency
108
+ - Running high-volume scenarios (1000+ per day)
109
+
110
+ **Skip if:**
111
+ - Current cost is acceptable
112
+ - Prompt clarity is more important than 5-8% savings
113
+ - Risk of quality degradation concerns you
114
+
115
+ ## Action Items (if optimizing):
116
+
117
+ 1. ✅ Keep: Critical decision logic, JSON format, coordinate mode
118
+ 2. ⚠️ Condense: Selector examples, error responses, WHY sections
119
+ 3. ❌ Remove: Duplicate examples, excessive emojis, redundant explanations
120
+
@@ -0,0 +1,120 @@
1
+ # Prompt Sanity Check - Runner-Core v0.0.33
2
+
3
+ ## ✅ STRENGTHS
4
+
5
+ ### System Prompt (`buildSystemPrompt`)
6
+ - ✅ Required fields clearly marked at top (status, reasoning, statusReasoning)
7
+ - ✅ Comprehensive JSON format with examples
8
+ - ✅ Clear status decision rules
9
+ - ✅ Good blocker detection guidance
10
+ - ✅ Semantic selector preference clearly explained with examples
11
+ - ✅ Tool vs command distinction is clear
12
+ - ✅ Coordinate fallback documented
13
+
14
+ ### User Prompt (`buildUserPrompt`)
15
+ - ✅ Static content first (cache-optimized)
16
+ - ✅ Dynamic content last (current state, page info)
17
+ - ✅ Notes from previous iteration shown prominently
18
+ - ✅ Clear warnings for consecutive failures
19
+ - ✅ Coordinate mode trigger clear
20
+
21
+ ## ⚠️ ISSUES FOUND
22
+
23
+ ### 1. **Duplication/Redundancy**
24
+ - ❌ "Use semantic selectors" mentioned in:
25
+ - System prompt (line ~605: "SELECTOR PREFERENCE")
26
+ - User prompt (line ~860: "SELECTOR STRATEGY")
27
+ - **FIX**: Remove from user prompt, keep in system prompt only
28
+
29
+ ### 2. **Length Concerns**
30
+ - ⚠️ System prompt is ~325 lines (very long)
31
+ - ⚠️ May cause LLM to miss critical details in the middle
32
+ - **SUGGESTION**: Consider breaking into sections or condensing
33
+
34
+ ### 3. **Conflicting Guidance**
35
+ - ⚠️ Line ~469: "stuck: Tried 3+ iterations"
36
+ - But coordinate mode triggers at 3 failures (line ~904)
37
+ - **FIX**: Clarify: stuck = 5 attempts total (3 regular + 2 coordinate)
38
+
39
+ ### 4. **Unclear Iteration Count**
40
+ - ❌ Line ~714: "When iteration count reaches 4+"
41
+ - ❌ Line ~748: "iteration 4+"
42
+ - ✅ But code triggers at 3 failures
43
+ - **FIX**: Update prompt to say "iteration 4+" (0,1,2 = 3 failures, next is #3 which is 4th iteration)
44
+
45
+ ### 5. **Missing Information**
46
+ - ❌ Max iterations per step not mentioned (code has 5)
47
+ - **FIX**: Add to system prompt: "MAX 5 iterations per step"
48
+
49
+ ### 6. **Verbosity**
50
+ - ⚠️ Examples section (lines ~617-628) is great but long
51
+ - ⚠️ Multiple emoji warnings (⚠️⚠️⚠️) can be reduced to single ⚠️
52
+ - **SUGGESTION**: Keep examples, reduce emoji spam
53
+
54
+ ## 🔧 RECOMMENDED FIXES
55
+
56
+ ### Priority 1 (Critical):
57
+ 1. Remove duplicate selector strategy from user prompt
58
+ 2. Clarify max iterations (5 total)
59
+ 3. Fix coordinate mode iteration number (4th iteration = after 3 failures)
60
+
61
+ ### Priority 2 (Nice to have):
62
+ 4. Condense system prompt if possible (target: 250 lines)
63
+ 5. Reduce emoji overuse
64
+ 6. Add section headers in system prompt for clarity
65
+
66
+ ## 📊 PROMPT STRUCTURE ANALYSIS
67
+
68
+ ### System Prompt Sections:
69
+ 1. Introduction (1 line)
70
+ 2. Tool descriptions (dynamic, from registry)
71
+ 3. JSON format (40 lines) ✅
72
+ 4. Status rules (15 lines) ✅
73
+ 5. Step re-evaluation (20 lines) ✅
74
+ 6. Blocker detection (25 lines) ✅
75
+ 7. Experiences (25 lines) ✅
76
+ 8. Critical rules (200 lines) ⚠️ TOO LONG
77
+ 9. Coordinate actions (45 lines) ✅
78
+
79
+ **TOTAL**: ~370 lines (with tool descriptions)
80
+
81
+ ### User Prompt Sections:
82
+ 1. Static instructions (20 lines) - **Cache-friendly** ✅
83
+ 2. Dynamic context marker (1 line) ✅
84
+ 3. Notes from previous iteration (5 lines) ✅
85
+ 4. Warnings for failures (15 lines) ✅
86
+ 5. Coordinate mode trigger (8 lines) ✅
87
+ 6. Current step goal (10 lines) ✅
88
+ 7. Page state (50-100 lines, variable) ✅
89
+ 8. Recent steps (20-50 lines, variable) ✅
90
+ 9. Experiences (10 lines) ✅
91
+
92
+ **TOTAL**: ~140-200 lines per call
93
+
94
+ ## 🎯 RECOMMENDATION SUMMARY
95
+
96
+ **Keep as-is:**
97
+ - JSON structure
98
+ - Semantic selector examples
99
+ - Blocker detection
100
+ - Note to future self
101
+ - Coordinate fallback
102
+ - Cache optimization
103
+
104
+ **Fix:**
105
+ - Remove selector duplication in user prompt
106
+ - Clarify iteration counts
107
+ - Add max iteration limit
108
+ - Reduce emoji spam
109
+
110
+ **Consider:**
111
+ - Condensing "Critical Rules" section (currently 200 lines)
112
+ - Moving some examples to external docs
113
+ - Breaking long sections with clear headers
114
+
115
+ ## Overall Assessment: **8/10**
116
+ - Prompts are comprehensive and well-structured
117
+ - Main issues are length and minor redundancies
118
+ - Cache optimization is excellent
119
+ - A few clarity fixes needed for iteration counts
120
+
@@ -0,0 +1,151 @@
1
+ # Runner-Core v0.0.33 - Session Summary
2
+
3
+ ## Date: October 15, 2025
4
+
5
+ ## Major Accomplishments:
6
+
7
+ ### 1. ✅ **Coordinate Fallback System** (Phase 1 Complete)
8
+ - Percentage-based coordinates (0-100%, 3 decimal precision)
9
+ - Activates after 3 selector failures
10
+ - 2 coordinate attempts before giving up
11
+ - Resolution-independent positioning
12
+
13
+ ### 2. ✅ **Note to Future Self** (Tactical Memory)
14
+ - Free-form notes persist across iterations AND steps
15
+ - Enables strategic planning across agent decisions
16
+ - Helps maintain context: "Tried X, will try Y next"
17
+
18
+ ### 3. ✅ **Visual Verification Tool** (NEW)
19
+ - `verify_action_result` - Before/after screenshot comparison
20
+ - Agent-callable (decides when to use)
21
+ - JPEG 60% quality (85-90% smaller than PNG)
22
+ - Multi-image LLM interface support
23
+
24
+ ### 4. ✅ **Critical Bug Fixes**
25
+ - **Coordinate mode never activated**: Changed forced stuck from >= 3 to >= 5 failures
26
+ - **Missing required fields**: Made parser flexible (accepts reasoning OR statusReasoning)
27
+ - **Navigation timeouts**: Added 30s timeout guidance for page.goto()
28
+ - **Strict mode violations**: Added scoping guidance (locator('#parent').getByText())
29
+
30
+ ### 5. ✅ **Prompt Optimizations**
31
+ - **59% reduction**: 17,573 chars → 7,287 chars in system prompt
32
+ - **Cache-optimized**: Static content first, dynamic last
33
+ - **Cost savings**: ~40% overall with model tiering
34
+ - **Focused on cognition**: Removed bloat, kept decision-making guidance
35
+
36
+ ### 6. ✅ **Model Optimization**
37
+ - **gpt-5-mini**: Complex tasks (4 operations)
38
+ - Command generation
39
+ - Goal completion checks
40
+ - Repair suggestions
41
+ - Agent orchestration
42
+ - **gpt-4o-mini**: Simple tasks (7 operations)
43
+ - Scenario breakdown
44
+ - Screenshot need assessment
45
+ - Repair confidence
46
+ - Test name generation
47
+ - Hashtag generation
48
+ - Script parsing
49
+ - Final script merging
50
+ - **Est. 25-30% cost reduction**
51
+
52
+ ### 7. ✅ **Code Cleanup**
53
+ - Removed V1 SmartTestRunnerCore (V2 is stable)
54
+ - Removed backup files (.bak, .tmp)
55
+ - Consolidated types into V2
56
+ - Removed PeopleHR-specific examples from prompts
57
+
58
+ ### 8. ✅ **Enhanced Logging**
59
+ - Prompt length metrics (chars + estimated tokens)
60
+ - Full LLM response on parsing errors
61
+ - Field presence diagnostics
62
+ - Retry logging for 500 errors
63
+
64
+ ### 9. ✅ **Retry Logic**
65
+ - Automatic retry for OpenAI 500 errors
66
+ - Exponential backoff (1s, 2s, 4s)
67
+ - Up to 3 attempts before failing
68
+
69
+ ### 10. ✅ **Headed Mode for Local Testing**
70
+ - All browser instances use headed: false → headed: false for local dev
71
+ - Visual debugging enabled
72
+
73
+ ## Files Modified:
74
+
75
+ ### Runner-Core:
76
+ 1. `src/orchestrator/orchestrator-agent.ts` - Main agent logic
77
+ 2. `src/orchestrator/types.ts` - NoteToFutureSelf, CoordinateAction
78
+ 3. `src/utils/coordinate-converter.ts` - NEW - Coordinate to Playwright conversion
79
+ 4. `src/orchestrator/tools/verify-action-result.ts` - NEW - Visual verification tool
80
+ 5. `src/llm-provider.ts` - Added LabeledImage, multi-image support
81
+ 6. `src/llm-facade.ts` - Model optimization
82
+ 7. `src/model-constants.ts` - Added DEFAULT_SIMPLER_MODEL
83
+ 8. `src/scenario-worker-class.ts` - Tool registration
84
+ 9. `src/orchestrator/index.ts` - Exports
85
+ 10. `src/orchestrator/tools/index.ts` - Tool exports
86
+
87
+ ### Scriptservice:
88
+ 1. `providers/scriptservice-llm-provider.ts` - Multi-image handling, retry logic
89
+ 2. `smart-test-runner-core-v2.ts` - Type definitions, V1 removal
90
+ 3. `smart-test-execution-handler.ts` - V1 removal
91
+ 4. `workers/test-based-explorer.ts` - V1 removal
92
+ 5. `script-generation-handlers.ts` - Headed mode
93
+ 6. `script-generation/script-generation-service.ts` - Headed mode
94
+ 7. `smart-test-execution-handler.ts` - Headed mode
95
+
96
+ ### Documentation:
97
+ 1. `WHATS_NEW_v0.0.33.md`
98
+ 2. `PHASE_1_COMPLETE.md`
99
+ 3. `PHASE_1_SUMMARY.md`
100
+ 4. `IMPLEMENTATION_STATUS.md`
101
+ 5. `VISUAL_AGENT_EVOLUTION_PLAN.md`
102
+ 6. `PROMPT_SANITY_CHECK.md`
103
+ 7. `PROMPT_OPTIMIZATION_ANALYSIS.md`
104
+ 8. `COORDINATE_MODE_DIAGNOSIS.md`
105
+ 9. `BEFORE_AFTER_VERIFICATION.md`
106
+ 10. `TROUBLESHOOTING_SESSION.md`
107
+
108
+ ## Live Test Status:
109
+
110
+ **Job**: `71b88c60-52f5-4343-aef8-c44ebb07f3e9`
111
+ **Status**: Running (check browser + logs)
112
+ **Watch For**:
113
+ - Step 5 (Employee Information) - Previously problematic
114
+ - Coordinate mode activation
115
+ - verify_action_result tool usage
116
+ - Overall completion
117
+
118
+ ## Key Metrics:
119
+
120
+ **Cost Optimization:**
121
+ - Prompt size: 59% reduction
122
+ - Model tiering: 7/11 tasks on cheaper model
123
+ - JPEG compression: 85-90% smaller screenshots
124
+ - **Total savings: ~40% cost reduction**
125
+
126
+ ## Next Steps After Test Completes:
127
+
128
+ 1. Check if Step 5 completes successfully
129
+ 2. Verify coordinate mode activated if needed
130
+ 3. Check if verify_action_result tool was used
131
+ 4. Analyze any remaining failures
132
+ 5. Iterate on prompts/logic based on results
133
+
134
+ ## Known Issues to Monitor:
135
+
136
+ 1. **Step 5 False Positive**: Clicking menu item vs navigating to page
137
+ 2. **Coordinate Loop**: Agent not knowing when coordinate clicks succeed
138
+ 3. **Vision verification usage**: Will agent call it proactively?
139
+
140
+ ## Success Criteria:
141
+
142
+ ✅ All 7 steps complete
143
+ ✅ Coordinate fallback used when selectors fail
144
+ ✅ Visual verification validates goal achievement
145
+ ✅ No infinite loops or stuck states
146
+ ✅ Generated script is accurate
147
+
148
+ ---
149
+
150
+ **Check your browser window and /tmp/scriptservice-test.log for live execution!**
151
+
@@ -0,0 +1,72 @@
1
+ # Troubleshooting Session: All Modules Icon Click Failure
2
+
3
+ ## Objective:
4
+ Understand why the orchestrator agent gets stuck on "Click on the all Modules menu item (top menu icon)" while manual Playwright MCP navigation succeeded.
5
+
6
+ ## What I Need to See:
7
+
8
+ ### 1. Full Agent Logs for the Failing Step
9
+ Please provide the complete logs showing:
10
+ - What iteration attempts were made (iteration 1, 2, 3...)
11
+ - What selectors the agent tried each time
12
+ - What errors it encountered
13
+ - What the DOM snapshot showed
14
+ - Whether it took screenshots
15
+ - What notes it left to future self
16
+
17
+ ### 2. The DOM Context It Saw
18
+ - Interactive elements list
19
+ - ARIA tree snapshot
20
+ - Whether the hamburger icon was visible in the list
21
+
22
+ ## What Worked (My Manual MCP Session):
23
+
24
+ From earlier successful navigation:
25
+ ```
26
+ ✅ Step 1: Clicked hamburger menu
27
+ Selector: #sidebar-toggle > span > svg
28
+
29
+ ✅ Step 2: Clicked "Core HR"
30
+ Selector: getByText('Core HR')
31
+
32
+ ✅ Step 3: Clicked "Employee Information"
33
+ Selector: getByText('Employee Information')
34
+ ```
35
+
36
+ ## Hypothesis of Why Agent Fails:
37
+
38
+ ### Possible Issue 1: Wrong Selector Strategy
39
+ - Agent might be trying: `getByText('All Modules')` (strict mode violation)
40
+ - Or: `#MenuToggle` (wrong ID)
41
+ - Or: `#sidebar-toggle-menu` (doesn't exist)
42
+ - Instead of: `#sidebar-toggle > span > svg` (actual selector)
43
+
44
+ ### Possible Issue 2: Missing Icon Detection
45
+ - Hamburger icons are often SVG elements without accessible text
46
+ - Agent might not recognize this pattern
47
+ - Prompt doesn't explicitly guide on icon/SVG selector strategy
48
+
49
+ ### Possible Issue 3: DOM List Incomplete
50
+ - Interactive elements might not include the SVG icon
51
+ - If icon isn't in the list, agent won't know it exists
52
+ - Need to check if `getEnhancedPageInfo` captures SVG icons
53
+
54
+ ### Possible Issue 4: Ambiguous Text
55
+ - "All Modules" might appear in multiple places (menu button + modal title)
56
+ - Agent tries `getByText('All Modules')` → strict mode violation
57
+ - Should scope to parent: `locator('#sidebar-toggle').getByText('All Modules')`
58
+
59
+ ## Next Steps:
60
+
61
+ 1. **Get full logs** from your failing run
62
+ 2. **Compare** what agent saw vs what I saw
63
+ 3. **Identify** the gap (prompt, DOM extraction, or selector logic)
64
+ 4. **Plan fixes**:
65
+ - Prompt improvements (icon/SVG guidance)
66
+ - DOM extraction improvements (ensure icons are captured)
67
+ - Selector strategy improvements (parent scoping for icons)
68
+ - Example-based learning (hamburger menu pattern)
69
+
70
+ ## Waiting For:
71
+ Please paste the full logs from the failing step showing all iteration attempts and what the agent tried.
72
+