testchimp-runner-core 0.0.33 → 0.0.35
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/execution-service.d.ts +1 -4
- package/dist/execution-service.d.ts.map +1 -1
- package/dist/execution-service.js +155 -468
- package/dist/execution-service.js.map +1 -1
- package/dist/index.d.ts +3 -1
- package/dist/index.d.ts.map +1 -1
- package/dist/index.js +11 -1
- package/dist/index.js.map +1 -1
- package/dist/llm-facade.d.ts.map +1 -1
- package/dist/llm-facade.js +7 -7
- package/dist/llm-facade.js.map +1 -1
- package/dist/llm-provider.d.ts +9 -0
- package/dist/llm-provider.d.ts.map +1 -1
- package/dist/model-constants.d.ts +16 -5
- package/dist/model-constants.d.ts.map +1 -1
- package/dist/model-constants.js +17 -6
- package/dist/model-constants.js.map +1 -1
- package/dist/orchestrator/decision-parser.d.ts +18 -0
- package/dist/orchestrator/decision-parser.d.ts.map +1 -0
- package/dist/orchestrator/decision-parser.js +127 -0
- package/dist/orchestrator/decision-parser.js.map +1 -0
- package/dist/orchestrator/index.d.ts +4 -2
- package/dist/orchestrator/index.d.ts.map +1 -1
- package/dist/orchestrator/index.js +15 -2
- package/dist/orchestrator/index.js.map +1 -1
- package/dist/orchestrator/orchestrator-agent.d.ts +17 -22
- package/dist/orchestrator/orchestrator-agent.d.ts.map +1 -1
- package/dist/orchestrator/orchestrator-agent.js +708 -577
- package/dist/orchestrator/orchestrator-agent.js.map +1 -1
- package/dist/orchestrator/orchestrator-prompts.d.ts +32 -0
- package/dist/orchestrator/orchestrator-prompts.d.ts.map +1 -0
- package/dist/orchestrator/orchestrator-prompts.js +737 -0
- package/dist/orchestrator/orchestrator-prompts.js.map +1 -0
- package/dist/orchestrator/page-som-handler.d.ts +106 -0
- package/dist/orchestrator/page-som-handler.d.ts.map +1 -0
- package/dist/orchestrator/page-som-handler.js +1353 -0
- package/dist/orchestrator/page-som-handler.js.map +1 -0
- package/dist/orchestrator/som-types.d.ts +149 -0
- package/dist/orchestrator/som-types.d.ts.map +1 -0
- package/dist/orchestrator/som-types.js +87 -0
- package/dist/orchestrator/som-types.js.map +1 -0
- package/dist/orchestrator/tool-registry.d.ts +2 -0
- package/dist/orchestrator/tool-registry.d.ts.map +1 -1
- package/dist/orchestrator/tool-registry.js.map +1 -1
- package/dist/orchestrator/tools/index.d.ts +5 -1
- package/dist/orchestrator/tools/index.d.ts.map +1 -1
- package/dist/orchestrator/tools/index.js +9 -2
- package/dist/orchestrator/tools/index.js.map +1 -1
- package/dist/orchestrator/tools/refresh-som-markers.d.ts +12 -0
- package/dist/orchestrator/tools/refresh-som-markers.d.ts.map +1 -0
- package/dist/orchestrator/tools/refresh-som-markers.js +64 -0
- package/dist/orchestrator/tools/refresh-som-markers.js.map +1 -0
- package/dist/orchestrator/tools/verify-action-result.d.ts +17 -0
- package/dist/orchestrator/tools/verify-action-result.d.ts.map +1 -0
- package/dist/orchestrator/tools/verify-action-result.js +140 -0
- package/dist/orchestrator/tools/verify-action-result.js.map +1 -0
- package/dist/orchestrator/tools/view-previous-screenshot.d.ts +15 -0
- package/dist/orchestrator/tools/view-previous-screenshot.d.ts.map +1 -0
- package/dist/orchestrator/tools/view-previous-screenshot.js +92 -0
- package/dist/orchestrator/tools/view-previous-screenshot.js.map +1 -0
- package/dist/orchestrator/types.d.ts +49 -1
- package/dist/orchestrator/types.d.ts.map +1 -1
- package/dist/orchestrator/types.js +11 -1
- package/dist/orchestrator/types.js.map +1 -1
- package/dist/prompts.d.ts.map +1 -1
- package/dist/prompts.js +40 -34
- package/dist/prompts.js.map +1 -1
- package/dist/scenario-service.d.ts +5 -0
- package/dist/scenario-service.d.ts.map +1 -1
- package/dist/scenario-service.js +17 -0
- package/dist/scenario-service.js.map +1 -1
- package/dist/scenario-worker-class.d.ts +4 -0
- package/dist/scenario-worker-class.d.ts.map +1 -1
- package/dist/scenario-worker-class.js +21 -3
- package/dist/scenario-worker-class.js.map +1 -1
- package/dist/testing/agent-tester.d.ts +35 -0
- package/dist/testing/agent-tester.d.ts.map +1 -0
- package/dist/testing/agent-tester.js +84 -0
- package/dist/testing/agent-tester.js.map +1 -0
- package/dist/testing/ref-translator-tester.d.ts +44 -0
- package/dist/testing/ref-translator-tester.d.ts.map +1 -0
- package/dist/testing/ref-translator-tester.js +104 -0
- package/dist/testing/ref-translator-tester.js.map +1 -0
- package/dist/utils/coordinate-converter.d.ts +32 -0
- package/dist/utils/coordinate-converter.d.ts.map +1 -0
- package/dist/utils/coordinate-converter.js +130 -0
- package/dist/utils/coordinate-converter.js.map +1 -0
- package/dist/utils/hierarchical-selector.d.ts +47 -0
- package/dist/utils/hierarchical-selector.d.ts.map +1 -0
- package/dist/utils/hierarchical-selector.js +212 -0
- package/dist/utils/hierarchical-selector.js.map +1 -0
- package/dist/utils/page-info-retry.d.ts +14 -0
- package/dist/utils/page-info-retry.d.ts.map +1 -0
- package/dist/utils/page-info-retry.js +60 -0
- package/dist/utils/page-info-retry.js.map +1 -0
- package/dist/utils/page-info-utils.d.ts +1 -0
- package/dist/utils/page-info-utils.d.ts.map +1 -1
- package/dist/utils/page-info-utils.js +46 -18
- package/dist/utils/page-info-utils.js.map +1 -1
- package/dist/utils/ref-attacher.d.ts +21 -0
- package/dist/utils/ref-attacher.d.ts.map +1 -0
- package/dist/utils/ref-attacher.js +149 -0
- package/dist/utils/ref-attacher.js.map +1 -0
- package/dist/utils/ref-translator.d.ts +49 -0
- package/dist/utils/ref-translator.d.ts.map +1 -0
- package/dist/utils/ref-translator.js +276 -0
- package/dist/utils/ref-translator.js.map +1 -0
- package/package.json +1 -1
- package/plandocs/BEFORE_AFTER_VERIFICATION.md +148 -0
- package/plandocs/COORDINATE_MODE_DIAGNOSIS.md +144 -0
- package/plandocs/IMPLEMENTATION_STATUS.md +108 -0
- package/plandocs/PHASE_1_COMPLETE.md +165 -0
- package/plandocs/PHASE_1_SUMMARY.md +184 -0
- package/plandocs/PROMPT_OPTIMIZATION_ANALYSIS.md +120 -0
- package/plandocs/PROMPT_SANITY_CHECK.md +120 -0
- package/plandocs/SESSION_SUMMARY_v0.0.33.md +151 -0
- package/plandocs/TROUBLESHOOTING_SESSION.md +72 -0
- package/plandocs/VISUAL_AGENT_EVOLUTION_PLAN.md +396 -0
- package/plandocs/WHATS_NEW_v0.0.33.md +183 -0
- package/plandocs/exploratory-mode-support-v2.plan.md +953 -0
- package/plandocs/exploratory-mode-support.plan.md +928 -0
- package/plandocs/journey-id-tracking-addendum.md +227 -0
- package/src/execution-service.ts +179 -596
- package/src/index.ts +10 -0
- package/src/llm-facade.ts +8 -8
- package/src/llm-provider.ts +11 -1
- package/src/model-constants.ts +17 -5
- package/src/orchestrator/decision-parser.ts +139 -0
- package/src/orchestrator/index.ts +27 -2
- package/src/orchestrator/orchestrator-agent.ts +868 -623
- package/src/orchestrator/orchestrator-prompts.ts +786 -0
- package/src/orchestrator/page-som-handler.ts +1565 -0
- package/src/orchestrator/som-types.ts +188 -0
- package/src/orchestrator/tool-registry.ts +2 -0
- package/src/orchestrator/tools/index.ts +5 -1
- package/src/orchestrator/tools/refresh-som-markers.ts +69 -0
- package/src/orchestrator/tools/verify-action-result.ts +159 -0
- package/src/orchestrator/tools/view-previous-screenshot.ts +103 -0
- package/src/orchestrator/types.ts +95 -4
- package/src/prompts.ts +40 -34
- package/src/scenario-service.ts +20 -0
- package/src/scenario-worker-class.ts +30 -4
- package/src/utils/coordinate-converter.ts +162 -0
- package/src/utils/page-info-retry.ts +65 -0
- package/src/utils/page-info-utils.ts +53 -18
- package/testchimp-runner-core-0.0.35.tgz +0 -0
- /package/{CREDIT_CALLBACK_ARCHITECTURE.md → plandocs/CREDIT_CALLBACK_ARCHITECTURE.md} +0 -0
- /package/{INTEGRATION_COMPLETE.md → plandocs/INTEGRATION_COMPLETE.md} +0 -0
- /package/{VISION_DIAGNOSTICS_IMPROVEMENTS.md → plandocs/VISION_DIAGNOSTICS_IMPROVEMENTS.md} +0 -0
- /package/{RELEASE_0.0.26.md → releasenotes/RELEASE_0.0.26.md} +0 -0
- /package/{RELEASE_0.0.27.md → releasenotes/RELEASE_0.0.27.md} +0 -0
- /package/{RELEASE_0.0.28.md → releasenotes/RELEASE_0.0.28.md} +0 -0
|
@@ -0,0 +1,184 @@
|
|
|
1
|
+
# Phase 1 Complete - Summary & Testing Guide
|
|
2
|
+
|
|
3
|
+
## Version: runner-core v0.0.33 ✅
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## Implementation Complete
|
|
8
|
+
|
|
9
|
+
### What's New:
|
|
10
|
+
|
|
11
|
+
1. **📝 Note to Future Self**
|
|
12
|
+
- Free-form tactical memory between iterations
|
|
13
|
+
- Agent writes: "Tried X, failed. Will try Y next."
|
|
14
|
+
- Prevents repeated mistakes
|
|
15
|
+
|
|
16
|
+
2. **🎯 Percentage-Based Coordinates**
|
|
17
|
+
- Last-resort fallback (3-decimal precision)
|
|
18
|
+
- Resolution-independent (works any viewport size)
|
|
19
|
+
- Supports: click, fill, drag, hover, scroll
|
|
20
|
+
|
|
21
|
+
3. **⚡ Optimized Iteration Limits**
|
|
22
|
+
- Max 5 iterations per step (down from 8)
|
|
23
|
+
- 2 coordinate attempts max (coordinates work or they don't)
|
|
24
|
+
- Faster feedback on stuck scenarios
|
|
25
|
+
|
|
26
|
+
---
|
|
27
|
+
|
|
28
|
+
## Current Behavior (Phase 1)
|
|
29
|
+
|
|
30
|
+
```
|
|
31
|
+
┌─────────────────────────────────────────────────────┐
|
|
32
|
+
│ Iteration 1: Playwright selector │
|
|
33
|
+
│ Try: await page.getByRole('button'...).click() │
|
|
34
|
+
│ Note: "If this fails, try #id selector" │
|
|
35
|
+
│ │
|
|
36
|
+
│ Iteration 2: Playwright selector │
|
|
37
|
+
│ Read note from iteration 1 │
|
|
38
|
+
│ Try: await page.locator('#sidebar-toggle').click()│
|
|
39
|
+
│ Note: "If this fails, try SVG child" │
|
|
40
|
+
│ │
|
|
41
|
+
│ Iteration 3: Playwright selector │
|
|
42
|
+
│ Read note from iteration 2 │
|
|
43
|
+
│ Try: await page.locator('#sidebar-toggle svg') │
|
|
44
|
+
│ → Fails again │
|
|
45
|
+
│ │
|
|
46
|
+
│ 🎯 COORDINATE MODE ACTIVATED 🎯 │
|
|
47
|
+
│ │
|
|
48
|
+
│ Iteration 4: Coordinate action │
|
|
49
|
+
│ Agent outputs: {xPercent: 5.500, yPercent: 8.250}│
|
|
50
|
+
│ Execute: page.mouse.click(88, 66) │
|
|
51
|
+
│ → Success! │
|
|
52
|
+
│ │
|
|
53
|
+
│ OR if fails... │
|
|
54
|
+
│ │
|
|
55
|
+
│ Iteration 5: Coordinate action (2nd attempt) │
|
|
56
|
+
│ Try slightly adjusted coordinates │
|
|
57
|
+
│ → If fails: GIVE UP (stuck) │
|
|
58
|
+
│ │
|
|
59
|
+
│ Total: Max 5 iterations │
|
|
60
|
+
└─────────────────────────────────────────────────────┘
|
|
61
|
+
```
|
|
62
|
+
|
|
63
|
+
---
|
|
64
|
+
|
|
65
|
+
## Testing Phase 1
|
|
66
|
+
|
|
67
|
+
### Test 1: PeopleHR Scenario (Previously Failed)
|
|
68
|
+
|
|
69
|
+
**Expected outcome:**
|
|
70
|
+
- Iteration 1-2: Try text/ID selectors → fail
|
|
71
|
+
- Iteration 3: Note says "try SVG child" → succeeds!
|
|
72
|
+
- OR Iteration 4: Coordinates → succeeds!
|
|
73
|
+
|
|
74
|
+
**Run:**
|
|
75
|
+
```bash
|
|
76
|
+
# Via VS extension: "Generate Script" on peoplehr.txt
|
|
77
|
+
# Or "Run Test" on peoplehr-corrected.smart.spec.ts
|
|
78
|
+
```
|
|
79
|
+
|
|
80
|
+
**Look for in logs:**
|
|
81
|
+
```
|
|
82
|
+
📝 Note to self: ...
|
|
83
|
+
🎯 COORDINATE MODE ACTIVATED
|
|
84
|
+
🎯 Coordinate Action (attempt 1/2): click at (5.500%, 8.250%)
|
|
85
|
+
```
|
|
86
|
+
|
|
87
|
+
### Test 2: Simple Scenario (Should Still Be Fast)
|
|
88
|
+
|
|
89
|
+
Create test: `simple-login.txt`
|
|
90
|
+
```
|
|
91
|
+
- go to https://example.com/login
|
|
92
|
+
- fill username with "alice"
|
|
93
|
+
- fill password with "password123"
|
|
94
|
+
- click login button
|
|
95
|
+
```
|
|
96
|
+
|
|
97
|
+
**Expected:**
|
|
98
|
+
- Each step: 1 iteration (Tier 1 success)
|
|
99
|
+
- No coordinates needed
|
|
100
|
+
- Fast execution
|
|
101
|
+
|
|
102
|
+
### Test 3: Coordinate Fallback
|
|
103
|
+
|
|
104
|
+
**Deliberately difficult scenario:**
|
|
105
|
+
```
|
|
106
|
+
- go to https://some-app-with-shadow-dom.com
|
|
107
|
+
- click on custom web component icon
|
|
108
|
+
```
|
|
109
|
+
|
|
110
|
+
**Expected:**
|
|
111
|
+
- Iterations 1-3: Selectors fail
|
|
112
|
+
- Iteration 4: Coordinates succeed
|
|
113
|
+
- Generated script contains: `await page.mouse.click(x, y);`
|
|
114
|
+
|
|
115
|
+
---
|
|
116
|
+
|
|
117
|
+
## Expected Improvements
|
|
118
|
+
|
|
119
|
+
### Metrics to Track:
|
|
120
|
+
|
|
121
|
+
1. **Iteration Efficiency**
|
|
122
|
+
- Before: ~4 average iterations per step
|
|
123
|
+
- After: ~2.5 average iterations per step (30-40% reduction)
|
|
124
|
+
|
|
125
|
+
2. **Success Rate**
|
|
126
|
+
- Before: Stuck on complex UIs (hamburgers, icons, shadow DOM)
|
|
127
|
+
- After: Coordinates provide escape hatch
|
|
128
|
+
|
|
129
|
+
3. **Coordinate Usage**
|
|
130
|
+
- Target: < 10% of scenarios use coordinates
|
|
131
|
+
- Most scenarios still succeed with selectors
|
|
132
|
+
|
|
133
|
+
---
|
|
134
|
+
|
|
135
|
+
## Files Changed
|
|
136
|
+
|
|
137
|
+
**New:**
|
|
138
|
+
- `src/utils/coordinate-converter.ts` - Percentage conversion utility
|
|
139
|
+
- `VISUAL_AGENT_EVOLUTION_PLAN.md` - Complete plan
|
|
140
|
+
- `PHASE_1_COMPLETE.md` - Feature documentation
|
|
141
|
+
- `IMPLEMENTATION_STATUS.md` - Current status
|
|
142
|
+
- `PHASE_1_SUMMARY.md` - This file
|
|
143
|
+
|
|
144
|
+
**Modified:**
|
|
145
|
+
- `src/orchestrator/types.ts` - Added NoteToFutureSelf, CoordinateAction
|
|
146
|
+
- `src/orchestrator/orchestrator-agent.ts` - Note tracking, coordinate handling, mode switching
|
|
147
|
+
- `src/scenario-worker-class.ts` - Timeout handling (earlier fix)
|
|
148
|
+
- `src/execution-service.ts` - Timeout handling (earlier fix)
|
|
149
|
+
|
|
150
|
+
---
|
|
151
|
+
|
|
152
|
+
## Iteration Budget (Max 5 per Step)
|
|
153
|
+
|
|
154
|
+
**Phase 1 (Current):**
|
|
155
|
+
```
|
|
156
|
+
Iterations 1-3: Playwright selectors (3 attempts)
|
|
157
|
+
Iterations 4-5: Coordinates (2 attempts)
|
|
158
|
+
```
|
|
159
|
+
|
|
160
|
+
**Phase 2 (Future - Optimized):**
|
|
161
|
+
```
|
|
162
|
+
Iteration 1: Playwright selector (1 attempt) - fast path
|
|
163
|
+
Iterations 2-3: Index commands (2 attempts) - reliable fallback
|
|
164
|
+
Iterations 4-5: Coordinates (2 attempts) - last resort
|
|
165
|
+
```
|
|
166
|
+
|
|
167
|
+
**Benefit of Phase 2:**
|
|
168
|
+
- Most scenarios finish in iteration 1 (fast!)
|
|
169
|
+
- Complex scenarios use iterations 2-3 (index system)
|
|
170
|
+
- Only extreme cases reach iterations 4-5 (coordinates)
|
|
171
|
+
|
|
172
|
+
---
|
|
173
|
+
|
|
174
|
+
## Ready to Test!
|
|
175
|
+
|
|
176
|
+
**Current version** (runner-core v0.0.33) is built and ready.
|
|
177
|
+
|
|
178
|
+
**Test with:**
|
|
179
|
+
1. VS Code extension "Generate Script" on `peoplehr.txt`
|
|
180
|
+
2. Or "Run Test" on any existing smart test
|
|
181
|
+
3. Check logs for note-to-self and coordinate usage
|
|
182
|
+
|
|
183
|
+
**After validating Phase 1 works well, proceed to Phase 2 for numbered element system.**
|
|
184
|
+
|
|
@@ -0,0 +1,120 @@
|
|
|
1
|
+
# System Prompt Optimization Analysis
|
|
2
|
+
|
|
3
|
+
## Current Stats:
|
|
4
|
+
- **System Prompt**: 17,573 chars (346 lines)
|
|
5
|
+
- **With Tool Descriptions**: 19,613 chars (~4,903 tokens)
|
|
6
|
+
- **Cost per call**: ~$0.0007 (gpt-5-mini input tokens)
|
|
7
|
+
|
|
8
|
+
## Optimization Opportunities:
|
|
9
|
+
|
|
10
|
+
### 1. **Duplicate Examples** (Save ~30%)
|
|
11
|
+
**Current**: Multiple example sections with ❌/✅ pairs
|
|
12
|
+
- Lines 633-644: Examples section with goto, fill, click examples
|
|
13
|
+
- Lines 621-626: Ambiguous text handling examples
|
|
14
|
+
- Lines 603-607: DOM snapshot examples
|
|
15
|
+
- Lines 615-619: Selector preference list
|
|
16
|
+
|
|
17
|
+
**Optimization**: Consolidate into ONE examples section
|
|
18
|
+
**Savings**: ~2,000 chars
|
|
19
|
+
|
|
20
|
+
### 2. **Verbose Selector Section** (Save ~20%)
|
|
21
|
+
**Current**: Lines 602-644 (42 lines, ~1,800 chars)
|
|
22
|
+
- Lists all selector types with emoji
|
|
23
|
+
- Detailed examples for each
|
|
24
|
+
- Repetitive "Good/Bad" patterns
|
|
25
|
+
|
|
26
|
+
**Optimization**: Create compact reference table
|
|
27
|
+
```
|
|
28
|
+
SELECTORS (preference order):
|
|
29
|
+
1. getByRole/Label/Placeholder (semantic, stable)
|
|
30
|
+
2. getByText (scope to parent if ambiguous!)
|
|
31
|
+
3. CSS IDs (avoid auto-generated)
|
|
32
|
+
|
|
33
|
+
Common mistakes: Missing goto timeout, unscoped getByText, auto-generated IDs
|
|
34
|
+
```
|
|
35
|
+
**Savings**: ~1,200 chars
|
|
36
|
+
|
|
37
|
+
### 3. **Emoji Overuse** (Save ~5%)
|
|
38
|
+
**Current**: Heavy use of ⚠️, ❌, ✅, 🏆, etc.
|
|
39
|
+
|
|
40
|
+
**Optimization**: Use sparingly (only for critical warnings)
|
|
41
|
+
**Savings**: ~500 chars
|
|
42
|
+
|
|
43
|
+
### 4. **Redundant "WHY" Explanations** (Save ~10%)
|
|
44
|
+
**Current**: Multiple "WHY:" sections explaining rationale
|
|
45
|
+
- Line 642-644: WHY semantic selectors
|
|
46
|
+
- Similar explanations scattered throughout
|
|
47
|
+
|
|
48
|
+
**Optimization**: Remove or consolidate
|
|
49
|
+
**Savings**: ~800 chars
|
|
50
|
+
|
|
51
|
+
### 5. **Tool Instructions Redundancy** (Save ~10%)
|
|
52
|
+
**Current**: Tools described twice:
|
|
53
|
+
- In tool registry (dynamic)
|
|
54
|
+
- In prompt rules (static)
|
|
55
|
+
|
|
56
|
+
**Optimization**: Rely more on tool registry descriptions
|
|
57
|
+
**Savings**: ~600 chars
|
|
58
|
+
|
|
59
|
+
### 6. **Status Rules Repetition** (Save ~5%)
|
|
60
|
+
**Current**: Lines 468-486 - Status rules explained multiple times
|
|
61
|
+
|
|
62
|
+
**Optimization**: Single concise statement
|
|
63
|
+
**Savings**: ~400 chars
|
|
64
|
+
|
|
65
|
+
## Proposed Condensed Structure:
|
|
66
|
+
|
|
67
|
+
```markdown
|
|
68
|
+
# System Prompt (Optimized)
|
|
69
|
+
|
|
70
|
+
## Agent Role & Tools
|
|
71
|
+
[Tool descriptions from registry]
|
|
72
|
+
|
|
73
|
+
## Response Format (JSON)
|
|
74
|
+
{required fields} - minimal format, no extensive comments
|
|
75
|
+
|
|
76
|
+
## Core Rules (Prioritized)
|
|
77
|
+
1. Status decisions (complete/continue/stuck)
|
|
78
|
+
2. Selector strategy (semantic > text > CSS)
|
|
79
|
+
3. Common errors (goto timeout, strict mode, auto-IDs)
|
|
80
|
+
4. When to use tools vs commands
|
|
81
|
+
5. Note to future self usage
|
|
82
|
+
|
|
83
|
+
## Examples (Consolidated)
|
|
84
|
+
- Navigation: goto with 30s timeout
|
|
85
|
+
- Selectors: Scoped getByText, semantic selectors
|
|
86
|
+
- Coordinates: When and how
|
|
87
|
+
|
|
88
|
+
## Advanced Features
|
|
89
|
+
- Blocker detection
|
|
90
|
+
- Step re-evaluation
|
|
91
|
+
- Coordinate fallback
|
|
92
|
+
```
|
|
93
|
+
|
|
94
|
+
## Total Potential Savings:
|
|
95
|
+
|
|
96
|
+
- **Before**: 17,573 chars (~4,393 tokens)
|
|
97
|
+
- **After**: ~12,000 chars (~3,000 tokens)
|
|
98
|
+
- **Reduction**: ~32% reduction in system prompt
|
|
99
|
+
- **Cost savings**: ~$0.0002 per call (~30% per call)
|
|
100
|
+
- **Overall impact**: With 7 tasks using gpt-4o-mini, only 4 tasks will benefit
|
|
101
|
+
- **Est. total savings**: ~5-8% additional cost reduction
|
|
102
|
+
|
|
103
|
+
## Recommendation:
|
|
104
|
+
|
|
105
|
+
**Optimize if:**
|
|
106
|
+
- You're seeing consistent 500 errors (less likely now with retry)
|
|
107
|
+
- Want to maximize caching efficiency
|
|
108
|
+
- Running high-volume scenarios (1000+ per day)
|
|
109
|
+
|
|
110
|
+
**Skip if:**
|
|
111
|
+
- Current cost is acceptable
|
|
112
|
+
- Prompt clarity is more important than 5-8% savings
|
|
113
|
+
- Risk of quality degradation concerns you
|
|
114
|
+
|
|
115
|
+
## Action Items (if optimizing):
|
|
116
|
+
|
|
117
|
+
1. ✅ Keep: Critical decision logic, JSON format, coordinate mode
|
|
118
|
+
2. ⚠️ Condense: Selector examples, error responses, WHY sections
|
|
119
|
+
3. ❌ Remove: Duplicate examples, excessive emojis, redundant explanations
|
|
120
|
+
|
|
@@ -0,0 +1,120 @@
|
|
|
1
|
+
# Prompt Sanity Check - Runner-Core v0.0.33
|
|
2
|
+
|
|
3
|
+
## ✅ STRENGTHS
|
|
4
|
+
|
|
5
|
+
### System Prompt (`buildSystemPrompt`)
|
|
6
|
+
- ✅ Required fields clearly marked at top (status, reasoning, statusReasoning)
|
|
7
|
+
- ✅ Comprehensive JSON format with examples
|
|
8
|
+
- ✅ Clear status decision rules
|
|
9
|
+
- ✅ Good blocker detection guidance
|
|
10
|
+
- ✅ Semantic selector preference clearly explained with examples
|
|
11
|
+
- ✅ Tool vs command distinction is clear
|
|
12
|
+
- ✅ Coordinate fallback documented
|
|
13
|
+
|
|
14
|
+
### User Prompt (`buildUserPrompt`)
|
|
15
|
+
- ✅ Static content first (cache-optimized)
|
|
16
|
+
- ✅ Dynamic content last (current state, page info)
|
|
17
|
+
- ✅ Notes from previous iteration shown prominently
|
|
18
|
+
- ✅ Clear warnings for consecutive failures
|
|
19
|
+
- ✅ Coordinate mode trigger clear
|
|
20
|
+
|
|
21
|
+
## ⚠️ ISSUES FOUND
|
|
22
|
+
|
|
23
|
+
### 1. **Duplication/Redundancy**
|
|
24
|
+
- ❌ "Use semantic selectors" mentioned in:
|
|
25
|
+
- System prompt (line ~605: "SELECTOR PREFERENCE")
|
|
26
|
+
- User prompt (line ~860: "SELECTOR STRATEGY")
|
|
27
|
+
- **FIX**: Remove from user prompt, keep in system prompt only
|
|
28
|
+
|
|
29
|
+
### 2. **Length Concerns**
|
|
30
|
+
- ⚠️ System prompt is ~325 lines (very long)
|
|
31
|
+
- ⚠️ May cause LLM to miss critical details in the middle
|
|
32
|
+
- **SUGGESTION**: Consider breaking into sections or condensing
|
|
33
|
+
|
|
34
|
+
### 3. **Conflicting Guidance**
|
|
35
|
+
- ⚠️ Line ~469: "stuck: Tried 3+ iterations"
|
|
36
|
+
- But coordinate mode triggers at 3 failures (line ~904)
|
|
37
|
+
- **FIX**: Clarify: stuck = 5 attempts total (3 regular + 2 coordinate)
|
|
38
|
+
|
|
39
|
+
### 4. **Unclear Iteration Count**
|
|
40
|
+
- ❌ Line ~714: "When iteration count reaches 4+"
|
|
41
|
+
- ❌ Line ~748: "iteration 4+"
|
|
42
|
+
- ✅ But code triggers at 3 failures
|
|
43
|
+
- **FIX**: Update prompt to say "iteration 4+" (0,1,2 = 3 failures, next is #3 which is 4th iteration)
|
|
44
|
+
|
|
45
|
+
### 5. **Missing Information**
|
|
46
|
+
- ❌ Max iterations per step not mentioned (code has 5)
|
|
47
|
+
- **FIX**: Add to system prompt: "MAX 5 iterations per step"
|
|
48
|
+
|
|
49
|
+
### 6. **Verbosity**
|
|
50
|
+
- ⚠️ Examples section (lines ~617-628) is great but long
|
|
51
|
+
- ⚠️ Multiple emoji warnings (⚠️⚠️⚠️) can be reduced to single ⚠️
|
|
52
|
+
- **SUGGESTION**: Keep examples, reduce emoji spam
|
|
53
|
+
|
|
54
|
+
## 🔧 RECOMMENDED FIXES
|
|
55
|
+
|
|
56
|
+
### Priority 1 (Critical):
|
|
57
|
+
1. Remove duplicate selector strategy from user prompt
|
|
58
|
+
2. Clarify max iterations (5 total)
|
|
59
|
+
3. Fix coordinate mode iteration number (4th iteration = after 3 failures)
|
|
60
|
+
|
|
61
|
+
### Priority 2 (Nice to have):
|
|
62
|
+
4. Condense system prompt if possible (target: 250 lines)
|
|
63
|
+
5. Reduce emoji overuse
|
|
64
|
+
6. Add section headers in system prompt for clarity
|
|
65
|
+
|
|
66
|
+
## 📊 PROMPT STRUCTURE ANALYSIS
|
|
67
|
+
|
|
68
|
+
### System Prompt Sections:
|
|
69
|
+
1. Introduction (1 line)
|
|
70
|
+
2. Tool descriptions (dynamic, from registry)
|
|
71
|
+
3. JSON format (40 lines) ✅
|
|
72
|
+
4. Status rules (15 lines) ✅
|
|
73
|
+
5. Step re-evaluation (20 lines) ✅
|
|
74
|
+
6. Blocker detection (25 lines) ✅
|
|
75
|
+
7. Experiences (25 lines) ✅
|
|
76
|
+
8. Critical rules (200 lines) ⚠️ TOO LONG
|
|
77
|
+
9. Coordinate actions (45 lines) ✅
|
|
78
|
+
|
|
79
|
+
**TOTAL**: ~370 lines (with tool descriptions)
|
|
80
|
+
|
|
81
|
+
### User Prompt Sections:
|
|
82
|
+
1. Static instructions (20 lines) - **Cache-friendly** ✅
|
|
83
|
+
2. Dynamic context marker (1 line) ✅
|
|
84
|
+
3. Notes from previous iteration (5 lines) ✅
|
|
85
|
+
4. Warnings for failures (15 lines) ✅
|
|
86
|
+
5. Coordinate mode trigger (8 lines) ✅
|
|
87
|
+
6. Current step goal (10 lines) ✅
|
|
88
|
+
7. Page state (50-100 lines, variable) ✅
|
|
89
|
+
8. Recent steps (20-50 lines, variable) ✅
|
|
90
|
+
9. Experiences (10 lines) ✅
|
|
91
|
+
|
|
92
|
+
**TOTAL**: ~140-200 lines per call
|
|
93
|
+
|
|
94
|
+
## 🎯 RECOMMENDATION SUMMARY
|
|
95
|
+
|
|
96
|
+
**Keep as-is:**
|
|
97
|
+
- JSON structure
|
|
98
|
+
- Semantic selector examples
|
|
99
|
+
- Blocker detection
|
|
100
|
+
- Note to future self
|
|
101
|
+
- Coordinate fallback
|
|
102
|
+
- Cache optimization
|
|
103
|
+
|
|
104
|
+
**Fix:**
|
|
105
|
+
- Remove selector duplication in user prompt
|
|
106
|
+
- Clarify iteration counts
|
|
107
|
+
- Add max iteration limit
|
|
108
|
+
- Reduce emoji spam
|
|
109
|
+
|
|
110
|
+
**Consider:**
|
|
111
|
+
- Condensing "Critical Rules" section (currently 200 lines)
|
|
112
|
+
- Moving some examples to external docs
|
|
113
|
+
- Breaking long sections with clear headers
|
|
114
|
+
|
|
115
|
+
## Overall Assessment: **8/10**
|
|
116
|
+
- Prompts are comprehensive and well-structured
|
|
117
|
+
- Main issues are length and minor redundancies
|
|
118
|
+
- Cache optimization is excellent
|
|
119
|
+
- A few clarity fixes needed for iteration counts
|
|
120
|
+
|
|
@@ -0,0 +1,151 @@
|
|
|
1
|
+
# Runner-Core v0.0.33 - Session Summary
|
|
2
|
+
|
|
3
|
+
## Date: October 15, 2025
|
|
4
|
+
|
|
5
|
+
## Major Accomplishments:
|
|
6
|
+
|
|
7
|
+
### 1. ✅ **Coordinate Fallback System** (Phase 1 Complete)
|
|
8
|
+
- Percentage-based coordinates (0-100%, 3 decimal precision)
|
|
9
|
+
- Activates after 3 selector failures
|
|
10
|
+
- 2 coordinate attempts before giving up
|
|
11
|
+
- Resolution-independent positioning
|
|
12
|
+
|
|
13
|
+
### 2. ✅ **Note to Future Self** (Tactical Memory)
|
|
14
|
+
- Free-form notes persist across iterations AND steps
|
|
15
|
+
- Enables strategic planning across agent decisions
|
|
16
|
+
- Helps maintain context: "Tried X, will try Y next"
|
|
17
|
+
|
|
18
|
+
### 3. ✅ **Visual Verification Tool** (NEW)
|
|
19
|
+
- `verify_action_result` - Before/after screenshot comparison
|
|
20
|
+
- Agent-callable (decides when to use)
|
|
21
|
+
- JPEG 60% quality (85-90% smaller than PNG)
|
|
22
|
+
- Multi-image LLM interface support
|
|
23
|
+
|
|
24
|
+
### 4. ✅ **Critical Bug Fixes**
|
|
25
|
+
- **Coordinate mode never activated**: Changed forced stuck from >= 3 to >= 5 failures
|
|
26
|
+
- **Missing required fields**: Made parser flexible (accepts reasoning OR statusReasoning)
|
|
27
|
+
- **Navigation timeouts**: Added 30s timeout guidance for page.goto()
|
|
28
|
+
- **Strict mode violations**: Added scoping guidance (locator('#parent').getByText())
|
|
29
|
+
|
|
30
|
+
### 5. ✅ **Prompt Optimizations**
|
|
31
|
+
- **59% reduction**: 17,573 chars → 7,287 chars in system prompt
|
|
32
|
+
- **Cache-optimized**: Static content first, dynamic last
|
|
33
|
+
- **Cost savings**: ~40% overall with model tiering
|
|
34
|
+
- **Focused on cognition**: Removed bloat, kept decision-making guidance
|
|
35
|
+
|
|
36
|
+
### 6. ✅ **Model Optimization**
|
|
37
|
+
- **gpt-5-mini**: Complex tasks (4 operations)
|
|
38
|
+
- Command generation
|
|
39
|
+
- Goal completion checks
|
|
40
|
+
- Repair suggestions
|
|
41
|
+
- Agent orchestration
|
|
42
|
+
- **gpt-4o-mini**: Simple tasks (7 operations)
|
|
43
|
+
- Scenario breakdown
|
|
44
|
+
- Screenshot need assessment
|
|
45
|
+
- Repair confidence
|
|
46
|
+
- Test name generation
|
|
47
|
+
- Hashtag generation
|
|
48
|
+
- Script parsing
|
|
49
|
+
- Final script merging
|
|
50
|
+
- **Est. 25-30% cost reduction**
|
|
51
|
+
|
|
52
|
+
### 7. ✅ **Code Cleanup**
|
|
53
|
+
- Removed V1 SmartTestRunnerCore (V2 is stable)
|
|
54
|
+
- Removed backup files (.bak, .tmp)
|
|
55
|
+
- Consolidated types into V2
|
|
56
|
+
- Removed PeopleHR-specific examples from prompts
|
|
57
|
+
|
|
58
|
+
### 8. ✅ **Enhanced Logging**
|
|
59
|
+
- Prompt length metrics (chars + estimated tokens)
|
|
60
|
+
- Full LLM response on parsing errors
|
|
61
|
+
- Field presence diagnostics
|
|
62
|
+
- Retry logging for 500 errors
|
|
63
|
+
|
|
64
|
+
### 9. ✅ **Retry Logic**
|
|
65
|
+
- Automatic retry for OpenAI 500 errors
|
|
66
|
+
- Exponential backoff (1s, 2s, 4s)
|
|
67
|
+
- Up to 3 attempts before failing
|
|
68
|
+
|
|
69
|
+
### 10. ✅ **Headed Mode for Local Testing**
|
|
70
|
+
- All browser instances use headed: false → headed: false for local dev
|
|
71
|
+
- Visual debugging enabled
|
|
72
|
+
|
|
73
|
+
## Files Modified:
|
|
74
|
+
|
|
75
|
+
### Runner-Core:
|
|
76
|
+
1. `src/orchestrator/orchestrator-agent.ts` - Main agent logic
|
|
77
|
+
2. `src/orchestrator/types.ts` - NoteToFutureSelf, CoordinateAction
|
|
78
|
+
3. `src/utils/coordinate-converter.ts` - NEW - Coordinate to Playwright conversion
|
|
79
|
+
4. `src/orchestrator/tools/verify-action-result.ts` - NEW - Visual verification tool
|
|
80
|
+
5. `src/llm-provider.ts` - Added LabeledImage, multi-image support
|
|
81
|
+
6. `src/llm-facade.ts` - Model optimization
|
|
82
|
+
7. `src/model-constants.ts` - Added DEFAULT_SIMPLER_MODEL
|
|
83
|
+
8. `src/scenario-worker-class.ts` - Tool registration
|
|
84
|
+
9. `src/orchestrator/index.ts` - Exports
|
|
85
|
+
10. `src/orchestrator/tools/index.ts` - Tool exports
|
|
86
|
+
|
|
87
|
+
### Scriptservice:
|
|
88
|
+
1. `providers/scriptservice-llm-provider.ts` - Multi-image handling, retry logic
|
|
89
|
+
2. `smart-test-runner-core-v2.ts` - Type definitions, V1 removal
|
|
90
|
+
3. `smart-test-execution-handler.ts` - V1 removal
|
|
91
|
+
4. `workers/test-based-explorer.ts` - V1 removal
|
|
92
|
+
5. `script-generation-handlers.ts` - Headed mode
|
|
93
|
+
6. `script-generation/script-generation-service.ts` - Headed mode
|
|
94
|
+
7. `smart-test-execution-handler.ts` - Headed mode
|
|
95
|
+
|
|
96
|
+
### Documentation:
|
|
97
|
+
1. `WHATS_NEW_v0.0.33.md`
|
|
98
|
+
2. `PHASE_1_COMPLETE.md`
|
|
99
|
+
3. `PHASE_1_SUMMARY.md`
|
|
100
|
+
4. `IMPLEMENTATION_STATUS.md`
|
|
101
|
+
5. `VISUAL_AGENT_EVOLUTION_PLAN.md`
|
|
102
|
+
6. `PROMPT_SANITY_CHECK.md`
|
|
103
|
+
7. `PROMPT_OPTIMIZATION_ANALYSIS.md`
|
|
104
|
+
8. `COORDINATE_MODE_DIAGNOSIS.md`
|
|
105
|
+
9. `BEFORE_AFTER_VERIFICATION.md`
|
|
106
|
+
10. `TROUBLESHOOTING_SESSION.md`
|
|
107
|
+
|
|
108
|
+
## Live Test Status:
|
|
109
|
+
|
|
110
|
+
**Job**: `71b88c60-52f5-4343-aef8-c44ebb07f3e9`
|
|
111
|
+
**Status**: Running (check browser + logs)
|
|
112
|
+
**Watch For**:
|
|
113
|
+
- Step 5 (Employee Information) - Previously problematic
|
|
114
|
+
- Coordinate mode activation
|
|
115
|
+
- verify_action_result tool usage
|
|
116
|
+
- Overall completion
|
|
117
|
+
|
|
118
|
+
## Key Metrics:
|
|
119
|
+
|
|
120
|
+
**Cost Optimization:**
|
|
121
|
+
- Prompt size: 59% reduction
|
|
122
|
+
- Model tiering: 7/11 tasks on cheaper model
|
|
123
|
+
- JPEG compression: 85-90% smaller screenshots
|
|
124
|
+
- **Total savings: ~40% cost reduction**
|
|
125
|
+
|
|
126
|
+
## Next Steps After Test Completes:
|
|
127
|
+
|
|
128
|
+
1. Check if Step 5 completes successfully
|
|
129
|
+
2. Verify coordinate mode activated if needed
|
|
130
|
+
3. Check if verify_action_result tool was used
|
|
131
|
+
4. Analyze any remaining failures
|
|
132
|
+
5. Iterate on prompts/logic based on results
|
|
133
|
+
|
|
134
|
+
## Known Issues to Monitor:
|
|
135
|
+
|
|
136
|
+
1. **Step 5 False Positive**: Clicking menu item vs navigating to page
|
|
137
|
+
2. **Coordinate Loop**: Agent not knowing when coordinate clicks succeed
|
|
138
|
+
3. **Vision verification usage**: Will agent call it proactively?
|
|
139
|
+
|
|
140
|
+
## Success Criteria:
|
|
141
|
+
|
|
142
|
+
✅ All 7 steps complete
|
|
143
|
+
✅ Coordinate fallback used when selectors fail
|
|
144
|
+
✅ Visual verification validates goal achievement
|
|
145
|
+
✅ No infinite loops or stuck states
|
|
146
|
+
✅ Generated script is accurate
|
|
147
|
+
|
|
148
|
+
---
|
|
149
|
+
|
|
150
|
+
**Check your browser window and /tmp/scriptservice-test.log for live execution!**
|
|
151
|
+
|
|
@@ -0,0 +1,72 @@
|
|
|
1
|
+
# Troubleshooting Session: All Modules Icon Click Failure
|
|
2
|
+
|
|
3
|
+
## Objective:
|
|
4
|
+
Understand why the orchestrator agent gets stuck on "Click on the all Modules menu item (top menu icon)" while manual Playwright MCP navigation succeeded.
|
|
5
|
+
|
|
6
|
+
## What I Need to See:
|
|
7
|
+
|
|
8
|
+
### 1. Full Agent Logs for the Failing Step
|
|
9
|
+
Please provide the complete logs showing:
|
|
10
|
+
- What iteration attempts were made (iteration 1, 2, 3...)
|
|
11
|
+
- What selectors the agent tried each time
|
|
12
|
+
- What errors it encountered
|
|
13
|
+
- What the DOM snapshot showed
|
|
14
|
+
- Whether it took screenshots
|
|
15
|
+
- What notes it left to future self
|
|
16
|
+
|
|
17
|
+
### 2. The DOM Context It Saw
|
|
18
|
+
- Interactive elements list
|
|
19
|
+
- ARIA tree snapshot
|
|
20
|
+
- Whether the hamburger icon was visible in the list
|
|
21
|
+
|
|
22
|
+
## What Worked (My Manual MCP Session):
|
|
23
|
+
|
|
24
|
+
From earlier successful navigation:
|
|
25
|
+
```
|
|
26
|
+
✅ Step 1: Clicked hamburger menu
|
|
27
|
+
Selector: #sidebar-toggle > span > svg
|
|
28
|
+
|
|
29
|
+
✅ Step 2: Clicked "Core HR"
|
|
30
|
+
Selector: getByText('Core HR')
|
|
31
|
+
|
|
32
|
+
✅ Step 3: Clicked "Employee Information"
|
|
33
|
+
Selector: getByText('Employee Information')
|
|
34
|
+
```
|
|
35
|
+
|
|
36
|
+
## Hypothesis of Why Agent Fails:
|
|
37
|
+
|
|
38
|
+
### Possible Issue 1: Wrong Selector Strategy
|
|
39
|
+
- Agent might be trying: `getByText('All Modules')` (strict mode violation)
|
|
40
|
+
- Or: `#MenuToggle` (wrong ID)
|
|
41
|
+
- Or: `#sidebar-toggle-menu` (doesn't exist)
|
|
42
|
+
- Instead of: `#sidebar-toggle > span > svg` (actual selector)
|
|
43
|
+
|
|
44
|
+
### Possible Issue 2: Missing Icon Detection
|
|
45
|
+
- Hamburger icons are often SVG elements without accessible text
|
|
46
|
+
- Agent might not recognize this pattern
|
|
47
|
+
- Prompt doesn't explicitly guide on icon/SVG selector strategy
|
|
48
|
+
|
|
49
|
+
### Possible Issue 3: DOM List Incomplete
|
|
50
|
+
- Interactive elements might not include the SVG icon
|
|
51
|
+
- If icon isn't in the list, agent won't know it exists
|
|
52
|
+
- Need to check if `getEnhancedPageInfo` captures SVG icons
|
|
53
|
+
|
|
54
|
+
### Possible Issue 4: Ambiguous Text
|
|
55
|
+
- "All Modules" might appear in multiple places (menu button + modal title)
|
|
56
|
+
- Agent tries `getByText('All Modules')` → strict mode violation
|
|
57
|
+
- Should scope to parent: `locator('#sidebar-toggle').getByText('All Modules')`
|
|
58
|
+
|
|
59
|
+
## Next Steps:
|
|
60
|
+
|
|
61
|
+
1. **Get full logs** from your failing run
|
|
62
|
+
2. **Compare** what agent saw vs what I saw
|
|
63
|
+
3. **Identify** the gap (prompt, DOM extraction, or selector logic)
|
|
64
|
+
4. **Plan fixes**:
|
|
65
|
+
- Prompt improvements (icon/SVG guidance)
|
|
66
|
+
- DOM extraction improvements (ensure icons are captured)
|
|
67
|
+
- Selector strategy improvements (parent scoping for icons)
|
|
68
|
+
- Example-based learning (hamburger menu pattern)
|
|
69
|
+
|
|
70
|
+
## Waiting For:
|
|
71
|
+
Please paste the full logs from the failing step showing all iteration attempts and what the agent tried.
|
|
72
|
+
|