testchimp-runner-core 0.0.32 → 0.0.34
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/llm-facade.d.ts.map +1 -1
- package/dist/llm-facade.js +7 -7
- package/dist/llm-facade.js.map +1 -1
- package/dist/llm-provider.d.ts +9 -0
- package/dist/llm-provider.d.ts.map +1 -1
- package/dist/model-constants.d.ts +16 -5
- package/dist/model-constants.d.ts.map +1 -1
- package/dist/model-constants.js +17 -6
- package/dist/model-constants.js.map +1 -1
- package/dist/orchestrator/index.d.ts +1 -1
- package/dist/orchestrator/index.d.ts.map +1 -1
- package/dist/orchestrator/index.js +3 -2
- package/dist/orchestrator/index.js.map +1 -1
- package/dist/orchestrator/orchestrator-agent.d.ts +0 -8
- package/dist/orchestrator/orchestrator-agent.d.ts.map +1 -1
- package/dist/orchestrator/orchestrator-agent.js +206 -405
- package/dist/orchestrator/orchestrator-agent.js.map +1 -1
- package/dist/orchestrator/orchestrator-prompts.d.ts +20 -0
- package/dist/orchestrator/orchestrator-prompts.d.ts.map +1 -0
- package/dist/orchestrator/orchestrator-prompts.js +455 -0
- package/dist/orchestrator/orchestrator-prompts.js.map +1 -0
- package/dist/orchestrator/tools/index.d.ts +2 -1
- package/dist/orchestrator/tools/index.d.ts.map +1 -1
- package/dist/orchestrator/tools/index.js +4 -2
- package/dist/orchestrator/tools/index.js.map +1 -1
- package/dist/orchestrator/tools/verify-action-result.d.ts +17 -0
- package/dist/orchestrator/tools/verify-action-result.d.ts.map +1 -0
- package/dist/orchestrator/tools/verify-action-result.js +140 -0
- package/dist/orchestrator/tools/verify-action-result.js.map +1 -0
- package/dist/orchestrator/types.d.ts +26 -0
- package/dist/orchestrator/types.d.ts.map +1 -1
- package/dist/orchestrator/types.js.map +1 -1
- package/dist/prompts.d.ts.map +1 -1
- package/dist/prompts.js +87 -37
- package/dist/prompts.js.map +1 -1
- package/dist/scenario-worker-class.d.ts.map +1 -1
- package/dist/scenario-worker-class.js +4 -1
- package/dist/scenario-worker-class.js.map +1 -1
- package/dist/utils/coordinate-converter.d.ts +32 -0
- package/dist/utils/coordinate-converter.d.ts.map +1 -0
- package/dist/utils/coordinate-converter.js +130 -0
- package/dist/utils/coordinate-converter.js.map +1 -0
- package/package.json +1 -1
- package/plandocs/BEFORE_AFTER_VERIFICATION.md +148 -0
- package/plandocs/COORDINATE_MODE_DIAGNOSIS.md +144 -0
- package/plandocs/IMPLEMENTATION_STATUS.md +108 -0
- package/plandocs/PHASE_1_COMPLETE.md +165 -0
- package/plandocs/PHASE_1_SUMMARY.md +184 -0
- package/plandocs/PROMPT_OPTIMIZATION_ANALYSIS.md +120 -0
- package/plandocs/PROMPT_SANITY_CHECK.md +120 -0
- package/plandocs/SESSION_SUMMARY_v0.0.33.md +151 -0
- package/plandocs/TROUBLESHOOTING_SESSION.md +72 -0
- package/plandocs/VISUAL_AGENT_EVOLUTION_PLAN.md +396 -0
- package/plandocs/WHATS_NEW_v0.0.33.md +183 -0
- package/src/llm-facade.ts +8 -8
- package/src/llm-provider.ts +11 -1
- package/src/model-constants.ts +17 -5
- package/src/orchestrator/index.ts +3 -2
- package/src/orchestrator/orchestrator-agent.ts +249 -424
- package/src/orchestrator/orchestrator-agent.ts.backup +1386 -0
- package/src/orchestrator/orchestrator-prompts.ts +474 -0
- package/src/orchestrator/tools/index.ts +2 -1
- package/src/orchestrator/tools/verify-action-result.ts +159 -0
- package/src/orchestrator/types.ts +48 -0
- package/src/prompts.ts +87 -37
- package/src/scenario-worker-class.ts +7 -2
- package/src/utils/coordinate-converter.ts +162 -0
- package/testchimp-runner-core-0.0.33.tgz +0 -0
- /package/{CREDIT_CALLBACK_ARCHITECTURE.md → plandocs/CREDIT_CALLBACK_ARCHITECTURE.md} +0 -0
- /package/{INTEGRATION_COMPLETE.md → plandocs/INTEGRATION_COMPLETE.md} +0 -0
- /package/{VISION_DIAGNOSTICS_IMPROVEMENTS.md → plandocs/VISION_DIAGNOSTICS_IMPROVEMENTS.md} +0 -0
|
@@ -0,0 +1,165 @@
|
|
|
1
|
+
# Phase 1 Implementation - COMPLETE ✅
|
|
2
|
+
|
|
3
|
+
## Version: runner-core v0.0.33
|
|
4
|
+
|
|
5
|
+
## What's Been Implemented
|
|
6
|
+
|
|
7
|
+
### 1. Free-Form "Note to Future Self"
|
|
8
|
+
**Purpose:** Tactical memory - agent leaves notes that persist across iterations AND steps.
|
|
9
|
+
|
|
10
|
+
**Type:**
|
|
11
|
+
```typescript
|
|
12
|
+
interface NoteToFutureSelf {
|
|
13
|
+
fromIteration: number;
|
|
14
|
+
content: string; // FREE-FORM - agent writes whatever it wants
|
|
15
|
+
}
|
|
16
|
+
```
|
|
17
|
+
|
|
18
|
+
**How it works:**
|
|
19
|
+
- Agent includes `"noteToFutureSelf": "..."` in response
|
|
20
|
+
- System stores it in `memory.latestNote` (persists across steps!)
|
|
21
|
+
- Passed to next iteration AND next step
|
|
22
|
+
- Displayed prominently at top of prompt
|
|
23
|
+
- Agent reads it FIRST before making decision
|
|
24
|
+
|
|
25
|
+
**Scope:** Entire scenario journey (not just current step)
|
|
26
|
+
|
|
27
|
+
**Example notes:**
|
|
28
|
+
|
|
29
|
+
*Iteration-specific:*
|
|
30
|
+
- "Tried #sidebar-toggle, failed with 'not clickable'. Will try child SVG element next."
|
|
31
|
+
|
|
32
|
+
*Step-spanning:*
|
|
33
|
+
- "This app has slow-loading modals. Always wait 2s after page load before clicking."
|
|
34
|
+
- "Cookie consent appears on every page. Check for and dismiss it first."
|
|
35
|
+
- "Sidebar only visible on desktop viewport (>1024px width)."
|
|
36
|
+
|
|
37
|
+
### 2. Percentage-Based Coordinate Fallback
|
|
38
|
+
**Purpose:** Last-resort mechanism when selector generation repeatedly fails.
|
|
39
|
+
|
|
40
|
+
**Type:**
|
|
41
|
+
```typescript
|
|
42
|
+
interface CoordinateAction {
|
|
43
|
+
type: 'coordinate';
|
|
44
|
+
action: 'click' | 'doubleClick' | 'rightClick' | 'hover' | 'drag' | 'fill' | 'scroll';
|
|
45
|
+
xPercent: number; // 0-100, 3 decimal precision
|
|
46
|
+
yPercent: number;
|
|
47
|
+
toXPercent?: number; // For drag
|
|
48
|
+
toYPercent?: number;
|
|
49
|
+
value?: string; // For fill
|
|
50
|
+
scrollAmount?: number; // For scroll
|
|
51
|
+
}
|
|
52
|
+
```
|
|
53
|
+
|
|
54
|
+
**How it works:**
|
|
55
|
+
- LLM outputs percentages: `{xPercent: 15.755, yPercent: 8.500}`
|
|
56
|
+
- CoordinateConverter converts to pixels: `15.755% → 252px`
|
|
57
|
+
- Generates Playwright command: `await page.mouse.click(252, 68);`
|
|
58
|
+
|
|
59
|
+
**Supported actions:**
|
|
60
|
+
- click, doubleClick, rightClick, hover
|
|
61
|
+
- fill (clicks then types value)
|
|
62
|
+
- drag (from x%,y% to toX%,toY%)
|
|
63
|
+
- scroll (at position, by amount)
|
|
64
|
+
|
|
65
|
+
### 3. Two-Tier Auto-Escalation
|
|
66
|
+
**Trigger:** Code-controlled (not LLM-decided)
|
|
67
|
+
|
|
68
|
+
```
|
|
69
|
+
Tier 1 (iterations 1-3): Playwright Selector Mode
|
|
70
|
+
├─ Normal buildSystemPrompt()
|
|
71
|
+
├─ Agent generates: await page.getByRole(...).click()
|
|
72
|
+
├─ Leaves noteToFutureSelf for continuity
|
|
73
|
+
└─ 3 attempts, then escalate
|
|
74
|
+
|
|
75
|
+
Tier 2 (iterations 4-5): Coordinate Mode
|
|
76
|
+
├─ Auto-activates when consecutiveFailures >= 3
|
|
77
|
+
├─ Uses buildCoordinateSystemPrompt()
|
|
78
|
+
├─ Agent outputs: {xPercent: 15.755, yPercent: 8.500}
|
|
79
|
+
├─ CoordinateConverter → mouse.click(x, y)
|
|
80
|
+
└─ 2 attempts max, then give up
|
|
81
|
+
|
|
82
|
+
Total: Maximum 5 iterations per step
|
|
83
|
+
```
|
|
84
|
+
|
|
85
|
+
### 4. Precision & Accuracy
|
|
86
|
+
- **3 decimal precision** for coordinates (~1px accuracy on most screens)
|
|
87
|
+
- **Resolution-independent** - works on any viewport size
|
|
88
|
+
- **Percentage reference:**
|
|
89
|
+
- Top-left: (0, 0)
|
|
90
|
+
- Top-right: (100, 0)
|
|
91
|
+
- Center: (50, 50)
|
|
92
|
+
- Bottom-right: (100, 100)
|
|
93
|
+
|
|
94
|
+
## Files Modified
|
|
95
|
+
|
|
96
|
+
1. **orchestrator/types.ts**
|
|
97
|
+
- Added `NoteToFutureSelf` interface
|
|
98
|
+
- Added `CoordinateAction` interface
|
|
99
|
+
- Updated `AgentDecision` with new fields
|
|
100
|
+
- Updated `AgentContext` with noteFromPreviousIteration
|
|
101
|
+
|
|
102
|
+
2. **orchestrator/orchestrator-agent.ts**
|
|
103
|
+
- Added note tracking in executeStep()
|
|
104
|
+
- Added coordinate action execution
|
|
105
|
+
- Added buildCoordinateSystemPrompt()
|
|
106
|
+
- Updated buildUserPrompt() to display notes
|
|
107
|
+
- Added mode switching in callAgent()
|
|
108
|
+
- Updated response format documentation
|
|
109
|
+
|
|
110
|
+
3. **utils/coordinate-converter.ts** (NEW)
|
|
111
|
+
- percentToPixels() - Convert % to pixels
|
|
112
|
+
- getViewportSize() - Get current viewport dimensions
|
|
113
|
+
- generateCommands() - Create Playwright commands from percentages
|
|
114
|
+
- executeAction() - Direct execution helper
|
|
115
|
+
|
|
116
|
+
4. **scenario-worker-class.ts** (Earlier fix)
|
|
117
|
+
- Smart timeout handling for waitForLoadState
|
|
118
|
+
|
|
119
|
+
5. **execution-service.ts** (Earlier fix)
|
|
120
|
+
- Smart timeout handling for navigation commands
|
|
121
|
+
|
|
122
|
+
## How to Use
|
|
123
|
+
|
|
124
|
+
**No code changes needed!** The features activate automatically:
|
|
125
|
+
|
|
126
|
+
1. **Note to self:** Agent can optionally include `noteToFutureSelf` in any iteration
|
|
127
|
+
2. **Coordinates:** Auto-activate at iteration 4 if selectors keep failing
|
|
128
|
+
|
|
129
|
+
## Testing Phase 1
|
|
130
|
+
|
|
131
|
+
To validate the implementation:
|
|
132
|
+
|
|
133
|
+
1. **Run PeopleHR scenario** (previously failed on hamburger menu)
|
|
134
|
+
- Should now succeed with note guidance
|
|
135
|
+
- May use coordinates if SVG selector still fails
|
|
136
|
+
|
|
137
|
+
2. **Check logs for:**
|
|
138
|
+
- `📝 Note to self: ...` (agent leaving tactical notes)
|
|
139
|
+
- `🎯 COORDINATE MODE ACTIVATED` (tier 2 triggered)
|
|
140
|
+
- `🎯 Coordinate Action: click at (X%, Y%)` (using fallback)
|
|
141
|
+
|
|
142
|
+
3. **Expected improvements:**
|
|
143
|
+
- 20-30% fewer iterations per step (thanks to notes)
|
|
144
|
+
- < 5% scenarios need coordinate fallback
|
|
145
|
+
- Coordinates work when everything else fails
|
|
146
|
+
|
|
147
|
+
## Phase 2 Preview (Not Yet Implemented)
|
|
148
|
+
|
|
149
|
+
When Phase 2 is added, it will become a **three-tier** system:
|
|
150
|
+
- Tier 1 (iterations 1-2): Playwright selectors
|
|
151
|
+
- Tier 2 (iterations 3-4): Numbered elements (CLICK[3])
|
|
152
|
+
- Tier 3 (iterations 5+): Percentage coordinates
|
|
153
|
+
|
|
154
|
+
Phase 2 adds visual markers [1], [2], [3] on elements with structured commands.
|
|
155
|
+
|
|
156
|
+
---
|
|
157
|
+
|
|
158
|
+
## Status: ✅ READY FOR TESTING
|
|
159
|
+
|
|
160
|
+
Runner-core v0.0.33 is built and ready. Test it with:
|
|
161
|
+
- VS Code extension "Run Test" on peoplehr-corrected.smart.spec.ts
|
|
162
|
+
- Or generate new script from peoplehr.txt scenario
|
|
163
|
+
|
|
164
|
+
**Next:** Validate Phase 1 works before starting Phase 2.
|
|
165
|
+
|
|
@@ -0,0 +1,184 @@
|
|
|
1
|
+
# Phase 1 Complete - Summary & Testing Guide
|
|
2
|
+
|
|
3
|
+
## Version: runner-core v0.0.33 ✅
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## Implementation Complete
|
|
8
|
+
|
|
9
|
+
### What's New:
|
|
10
|
+
|
|
11
|
+
1. **📝 Note to Future Self**
|
|
12
|
+
- Free-form tactical memory between iterations
|
|
13
|
+
- Agent writes: "Tried X, failed. Will try Y next."
|
|
14
|
+
- Prevents repeated mistakes
|
|
15
|
+
|
|
16
|
+
2. **🎯 Percentage-Based Coordinates**
|
|
17
|
+
- Last-resort fallback (3-decimal precision)
|
|
18
|
+
- Resolution-independent (works any viewport size)
|
|
19
|
+
- Supports: click, fill, drag, hover, scroll
|
|
20
|
+
|
|
21
|
+
3. **⚡ Optimized Iteration Limits**
|
|
22
|
+
- Max 5 iterations per step (down from 8)
|
|
23
|
+
- 2 coordinate attempts max (coordinates work or they don't)
|
|
24
|
+
- Faster feedback on stuck scenarios
|
|
25
|
+
|
|
26
|
+
---
|
|
27
|
+
|
|
28
|
+
## Current Behavior (Phase 1)
|
|
29
|
+
|
|
30
|
+
```
|
|
31
|
+
┌─────────────────────────────────────────────────────┐
|
|
32
|
+
│ Iteration 1: Playwright selector │
|
|
33
|
+
│ Try: await page.getByRole('button'...).click() │
|
|
34
|
+
│ Note: "If this fails, try #id selector" │
|
|
35
|
+
│ │
|
|
36
|
+
│ Iteration 2: Playwright selector │
|
|
37
|
+
│ Read note from iteration 1 │
|
|
38
|
+
│ Try: await page.locator('#sidebar-toggle').click()│
|
|
39
|
+
│ Note: "If this fails, try SVG child" │
|
|
40
|
+
│ │
|
|
41
|
+
│ Iteration 3: Playwright selector │
|
|
42
|
+
│ Read note from iteration 2 │
|
|
43
|
+
│ Try: await page.locator('#sidebar-toggle svg') │
|
|
44
|
+
│ → Fails again │
|
|
45
|
+
│ │
|
|
46
|
+
│ 🎯 COORDINATE MODE ACTIVATED 🎯 │
|
|
47
|
+
│ │
|
|
48
|
+
│ Iteration 4: Coordinate action │
|
|
49
|
+
│ Agent outputs: {xPercent: 5.500, yPercent: 8.250}│
|
|
50
|
+
│ Execute: page.mouse.click(88, 66) │
|
|
51
|
+
│ → Success! │
|
|
52
|
+
│ │
|
|
53
|
+
│ OR if fails... │
|
|
54
|
+
│ │
|
|
55
|
+
│ Iteration 5: Coordinate action (2nd attempt) │
|
|
56
|
+
│ Try slightly adjusted coordinates │
|
|
57
|
+
│ → If fails: GIVE UP (stuck) │
|
|
58
|
+
│ │
|
|
59
|
+
│ Total: Max 5 iterations │
|
|
60
|
+
└─────────────────────────────────────────────────────┘
|
|
61
|
+
```
|
|
62
|
+
|
|
63
|
+
---
|
|
64
|
+
|
|
65
|
+
## Testing Phase 1
|
|
66
|
+
|
|
67
|
+
### Test 1: PeopleHR Scenario (Previously Failed)
|
|
68
|
+
|
|
69
|
+
**Expected outcome:**
|
|
70
|
+
- Iteration 1-2: Try text/ID selectors → fail
|
|
71
|
+
- Iteration 3: Note says "try SVG child" → succeeds!
|
|
72
|
+
- OR Iteration 4: Coordinates → succeeds!
|
|
73
|
+
|
|
74
|
+
**Run:**
|
|
75
|
+
```bash
|
|
76
|
+
# Via VS extension: "Generate Script" on peoplehr.txt
|
|
77
|
+
# Or "Run Test" on peoplehr-corrected.smart.spec.ts
|
|
78
|
+
```
|
|
79
|
+
|
|
80
|
+
**Look for in logs:**
|
|
81
|
+
```
|
|
82
|
+
📝 Note to self: ...
|
|
83
|
+
🎯 COORDINATE MODE ACTIVATED
|
|
84
|
+
🎯 Coordinate Action (attempt 1/2): click at (5.500%, 8.250%)
|
|
85
|
+
```
|
|
86
|
+
|
|
87
|
+
### Test 2: Simple Scenario (Should Still Be Fast)
|
|
88
|
+
|
|
89
|
+
Create test: `simple-login.txt`
|
|
90
|
+
```
|
|
91
|
+
- go to https://example.com/login
|
|
92
|
+
- fill username with "alice"
|
|
93
|
+
- fill password with "password123"
|
|
94
|
+
- click login button
|
|
95
|
+
```
|
|
96
|
+
|
|
97
|
+
**Expected:**
|
|
98
|
+
- Each step: 1 iteration (Tier 1 success)
|
|
99
|
+
- No coordinates needed
|
|
100
|
+
- Fast execution
|
|
101
|
+
|
|
102
|
+
### Test 3: Coordinate Fallback
|
|
103
|
+
|
|
104
|
+
**Deliberately difficult scenario:**
|
|
105
|
+
```
|
|
106
|
+
- go to https://some-app-with-shadow-dom.com
|
|
107
|
+
- click on custom web component icon
|
|
108
|
+
```
|
|
109
|
+
|
|
110
|
+
**Expected:**
|
|
111
|
+
- Iterations 1-3: Selectors fail
|
|
112
|
+
- Iteration 4: Coordinates succeed
|
|
113
|
+
- Generated script contains: `await page.mouse.click(x, y);`
|
|
114
|
+
|
|
115
|
+
---
|
|
116
|
+
|
|
117
|
+
## Expected Improvements
|
|
118
|
+
|
|
119
|
+
### Metrics to Track:
|
|
120
|
+
|
|
121
|
+
1. **Iteration Efficiency**
|
|
122
|
+
- Before: ~4 average iterations per step
|
|
123
|
+
- After: ~2.5 average iterations per step (30-40% reduction)
|
|
124
|
+
|
|
125
|
+
2. **Success Rate**
|
|
126
|
+
- Before: Stuck on complex UIs (hamburgers, icons, shadow DOM)
|
|
127
|
+
- After: Coordinates provide escape hatch
|
|
128
|
+
|
|
129
|
+
3. **Coordinate Usage**
|
|
130
|
+
- Target: < 10% of scenarios use coordinates
|
|
131
|
+
- Most scenarios still succeed with selectors
|
|
132
|
+
|
|
133
|
+
---
|
|
134
|
+
|
|
135
|
+
## Files Changed
|
|
136
|
+
|
|
137
|
+
**New:**
|
|
138
|
+
- `src/utils/coordinate-converter.ts` - Percentage conversion utility
|
|
139
|
+
- `VISUAL_AGENT_EVOLUTION_PLAN.md` - Complete plan
|
|
140
|
+
- `PHASE_1_COMPLETE.md` - Feature documentation
|
|
141
|
+
- `IMPLEMENTATION_STATUS.md` - Current status
|
|
142
|
+
- `PHASE_1_SUMMARY.md` - This file
|
|
143
|
+
|
|
144
|
+
**Modified:**
|
|
145
|
+
- `src/orchestrator/types.ts` - Added NoteToFutureSelf, CoordinateAction
|
|
146
|
+
- `src/orchestrator/orchestrator-agent.ts` - Note tracking, coordinate handling, mode switching
|
|
147
|
+
- `src/scenario-worker-class.ts` - Timeout handling (earlier fix)
|
|
148
|
+
- `src/execution-service.ts` - Timeout handling (earlier fix)
|
|
149
|
+
|
|
150
|
+
---
|
|
151
|
+
|
|
152
|
+
## Iteration Budget (Max 5 per Step)
|
|
153
|
+
|
|
154
|
+
**Phase 1 (Current):**
|
|
155
|
+
```
|
|
156
|
+
Iterations 1-3: Playwright selectors (3 attempts)
|
|
157
|
+
Iterations 4-5: Coordinates (2 attempts)
|
|
158
|
+
```
|
|
159
|
+
|
|
160
|
+
**Phase 2 (Future - Optimized):**
|
|
161
|
+
```
|
|
162
|
+
Iteration 1: Playwright selector (1 attempt) - fast path
|
|
163
|
+
Iterations 2-3: Index commands (2 attempts) - reliable fallback
|
|
164
|
+
Iterations 4-5: Coordinates (2 attempts) - last resort
|
|
165
|
+
```
|
|
166
|
+
|
|
167
|
+
**Benefit of Phase 2:**
|
|
168
|
+
- Most scenarios finish in iteration 1 (fast!)
|
|
169
|
+
- Complex scenarios use iterations 2-3 (index system)
|
|
170
|
+
- Only extreme cases reach iterations 4-5 (coordinates)
|
|
171
|
+
|
|
172
|
+
---
|
|
173
|
+
|
|
174
|
+
## Ready to Test!
|
|
175
|
+
|
|
176
|
+
**Current version** (runner-core v0.0.33) is built and ready.
|
|
177
|
+
|
|
178
|
+
**Test with:**
|
|
179
|
+
1. VS Code extension "Generate Script" on `peoplehr.txt`
|
|
180
|
+
2. Or "Run Test" on any existing smart test
|
|
181
|
+
3. Check logs for note-to-self and coordinate usage
|
|
182
|
+
|
|
183
|
+
**After validating Phase 1 works well, proceed to Phase 2 for numbered element system.**
|
|
184
|
+
|
|
@@ -0,0 +1,120 @@
|
|
|
1
|
+
# System Prompt Optimization Analysis
|
|
2
|
+
|
|
3
|
+
## Current Stats:
|
|
4
|
+
- **System Prompt**: 17,573 chars (346 lines)
|
|
5
|
+
- **With Tool Descriptions**: 19,613 chars (~4,903 tokens)
|
|
6
|
+
- **Cost per call**: ~$0.0007 (gpt-5-mini input tokens)
|
|
7
|
+
|
|
8
|
+
## Optimization Opportunities:
|
|
9
|
+
|
|
10
|
+
### 1. **Duplicate Examples** (Save ~30%)
|
|
11
|
+
**Current**: Multiple example sections with ❌/✅ pairs
|
|
12
|
+
- Lines 633-644: Examples section with goto, fill, click examples
|
|
13
|
+
- Lines 621-626: Ambiguous text handling examples
|
|
14
|
+
- Lines 603-607: DOM snapshot examples
|
|
15
|
+
- Lines 615-619: Selector preference list
|
|
16
|
+
|
|
17
|
+
**Optimization**: Consolidate into ONE examples section
|
|
18
|
+
**Savings**: ~2,000 chars
|
|
19
|
+
|
|
20
|
+
### 2. **Verbose Selector Section** (Save ~20%)
|
|
21
|
+
**Current**: Lines 602-644 (42 lines, ~1,800 chars)
|
|
22
|
+
- Lists all selector types with emoji
|
|
23
|
+
- Detailed examples for each
|
|
24
|
+
- Repetitive "Good/Bad" patterns
|
|
25
|
+
|
|
26
|
+
**Optimization**: Create compact reference table
|
|
27
|
+
```
|
|
28
|
+
SELECTORS (preference order):
|
|
29
|
+
1. getByRole/Label/Placeholder (semantic, stable)
|
|
30
|
+
2. getByText (scope to parent if ambiguous!)
|
|
31
|
+
3. CSS IDs (avoid auto-generated)
|
|
32
|
+
|
|
33
|
+
Common mistakes: Missing goto timeout, unscoped getByText, auto-generated IDs
|
|
34
|
+
```
|
|
35
|
+
**Savings**: ~1,200 chars
|
|
36
|
+
|
|
37
|
+
### 3. **Emoji Overuse** (Save ~5%)
|
|
38
|
+
**Current**: Heavy use of ⚠️, ❌, ✅, 🏆, etc.
|
|
39
|
+
|
|
40
|
+
**Optimization**: Use sparingly (only for critical warnings)
|
|
41
|
+
**Savings**: ~500 chars
|
|
42
|
+
|
|
43
|
+
### 4. **Redundant "WHY" Explanations** (Save ~10%)
|
|
44
|
+
**Current**: Multiple "WHY:" sections explaining rationale
|
|
45
|
+
- Line 642-644: WHY semantic selectors
|
|
46
|
+
- Similar explanations scattered throughout
|
|
47
|
+
|
|
48
|
+
**Optimization**: Remove or consolidate
|
|
49
|
+
**Savings**: ~800 chars
|
|
50
|
+
|
|
51
|
+
### 5. **Tool Instructions Redundancy** (Save ~10%)
|
|
52
|
+
**Current**: Tools described twice:
|
|
53
|
+
- In tool registry (dynamic)
|
|
54
|
+
- In prompt rules (static)
|
|
55
|
+
|
|
56
|
+
**Optimization**: Rely more on tool registry descriptions
|
|
57
|
+
**Savings**: ~600 chars
|
|
58
|
+
|
|
59
|
+
### 6. **Status Rules Repetition** (Save ~5%)
|
|
60
|
+
**Current**: Lines 468-486 - Status rules explained multiple times
|
|
61
|
+
|
|
62
|
+
**Optimization**: Single concise statement
|
|
63
|
+
**Savings**: ~400 chars
|
|
64
|
+
|
|
65
|
+
## Proposed Condensed Structure:
|
|
66
|
+
|
|
67
|
+
```markdown
|
|
68
|
+
# System Prompt (Optimized)
|
|
69
|
+
|
|
70
|
+
## Agent Role & Tools
|
|
71
|
+
[Tool descriptions from registry]
|
|
72
|
+
|
|
73
|
+
## Response Format (JSON)
|
|
74
|
+
{required fields} - minimal format, no extensive comments
|
|
75
|
+
|
|
76
|
+
## Core Rules (Prioritized)
|
|
77
|
+
1. Status decisions (complete/continue/stuck)
|
|
78
|
+
2. Selector strategy (semantic > text > CSS)
|
|
79
|
+
3. Common errors (goto timeout, strict mode, auto-IDs)
|
|
80
|
+
4. When to use tools vs commands
|
|
81
|
+
5. Note to future self usage
|
|
82
|
+
|
|
83
|
+
## Examples (Consolidated)
|
|
84
|
+
- Navigation: goto with 30s timeout
|
|
85
|
+
- Selectors: Scoped getByText, semantic selectors
|
|
86
|
+
- Coordinates: When and how
|
|
87
|
+
|
|
88
|
+
## Advanced Features
|
|
89
|
+
- Blocker detection
|
|
90
|
+
- Step re-evaluation
|
|
91
|
+
- Coordinate fallback
|
|
92
|
+
```
|
|
93
|
+
|
|
94
|
+
## Total Potential Savings:
|
|
95
|
+
|
|
96
|
+
- **Before**: 17,573 chars (~4,393 tokens)
|
|
97
|
+
- **After**: ~12,000 chars (~3,000 tokens)
|
|
98
|
+
- **Reduction**: ~32% reduction in system prompt
|
|
99
|
+
- **Cost savings**: ~$0.0002 per call (~30% per call)
|
|
100
|
+
- **Overall impact**: With 7 tasks using gpt-4o-mini, only 4 tasks will benefit
|
|
101
|
+
- **Est. total savings**: ~5-8% additional cost reduction
|
|
102
|
+
|
|
103
|
+
## Recommendation:
|
|
104
|
+
|
|
105
|
+
**Optimize if:**
|
|
106
|
+
- You're seeing consistent 500 errors (less likely now with retry)
|
|
107
|
+
- Want to maximize caching efficiency
|
|
108
|
+
- Running high-volume scenarios (1000+ per day)
|
|
109
|
+
|
|
110
|
+
**Skip if:**
|
|
111
|
+
- Current cost is acceptable
|
|
112
|
+
- Prompt clarity is more important than 5-8% savings
|
|
113
|
+
- Risk of quality degradation concerns you
|
|
114
|
+
|
|
115
|
+
## Action Items (if optimizing):
|
|
116
|
+
|
|
117
|
+
1. ✅ Keep: Critical decision logic, JSON format, coordinate mode
|
|
118
|
+
2. ⚠️ Condense: Selector examples, error responses, WHY sections
|
|
119
|
+
3. ❌ Remove: Duplicate examples, excessive emojis, redundant explanations
|
|
120
|
+
|
|
@@ -0,0 +1,120 @@
|
|
|
1
|
+
# Prompt Sanity Check - Runner-Core v0.0.33
|
|
2
|
+
|
|
3
|
+
## ✅ STRENGTHS
|
|
4
|
+
|
|
5
|
+
### System Prompt (`buildSystemPrompt`)
|
|
6
|
+
- ✅ Required fields clearly marked at top (status, reasoning, statusReasoning)
|
|
7
|
+
- ✅ Comprehensive JSON format with examples
|
|
8
|
+
- ✅ Clear status decision rules
|
|
9
|
+
- ✅ Good blocker detection guidance
|
|
10
|
+
- ✅ Semantic selector preference clearly explained with examples
|
|
11
|
+
- ✅ Tool vs command distinction is clear
|
|
12
|
+
- ✅ Coordinate fallback documented
|
|
13
|
+
|
|
14
|
+
### User Prompt (`buildUserPrompt`)
|
|
15
|
+
- ✅ Static content first (cache-optimized)
|
|
16
|
+
- ✅ Dynamic content last (current state, page info)
|
|
17
|
+
- ✅ Notes from previous iteration shown prominently
|
|
18
|
+
- ✅ Clear warnings for consecutive failures
|
|
19
|
+
- ✅ Coordinate mode trigger clear
|
|
20
|
+
|
|
21
|
+
## ⚠️ ISSUES FOUND
|
|
22
|
+
|
|
23
|
+
### 1. **Duplication/Redundancy**
|
|
24
|
+
- ❌ "Use semantic selectors" mentioned in:
|
|
25
|
+
- System prompt (line ~605: "SELECTOR PREFERENCE")
|
|
26
|
+
- User prompt (line ~860: "SELECTOR STRATEGY")
|
|
27
|
+
- **FIX**: Remove from user prompt, keep in system prompt only
|
|
28
|
+
|
|
29
|
+
### 2. **Length Concerns**
|
|
30
|
+
- ⚠️ System prompt is ~325 lines (very long)
|
|
31
|
+
- ⚠️ May cause LLM to miss critical details in the middle
|
|
32
|
+
- **SUGGESTION**: Consider breaking into sections or condensing
|
|
33
|
+
|
|
34
|
+
### 3. **Conflicting Guidance**
|
|
35
|
+
- ⚠️ Line ~469: "stuck: Tried 3+ iterations"
|
|
36
|
+
- But coordinate mode triggers at 3 failures (line ~904)
|
|
37
|
+
- **FIX**: Clarify: stuck = 5 attempts total (3 regular + 2 coordinate)
|
|
38
|
+
|
|
39
|
+
### 4. **Unclear Iteration Count**
|
|
40
|
+
- ❌ Line ~714: "When iteration count reaches 4+"
|
|
41
|
+
- ❌ Line ~748: "iteration 4+"
|
|
42
|
+
- ✅ But code triggers at 3 failures
|
|
43
|
+
- **FIX**: Update prompt to say "iteration 4+" (0,1,2 = 3 failures, next is #3 which is 4th iteration)
|
|
44
|
+
|
|
45
|
+
### 5. **Missing Information**
|
|
46
|
+
- ❌ Max iterations per step not mentioned (code has 5)
|
|
47
|
+
- **FIX**: Add to system prompt: "MAX 5 iterations per step"
|
|
48
|
+
|
|
49
|
+
### 6. **Verbosity**
|
|
50
|
+
- ⚠️ Examples section (lines ~617-628) is great but long
|
|
51
|
+
- ⚠️ Multiple emoji warnings (⚠️⚠️⚠️) can be reduced to single ⚠️
|
|
52
|
+
- **SUGGESTION**: Keep examples, reduce emoji spam
|
|
53
|
+
|
|
54
|
+
## 🔧 RECOMMENDED FIXES
|
|
55
|
+
|
|
56
|
+
### Priority 1 (Critical):
|
|
57
|
+
1. Remove duplicate selector strategy from user prompt
|
|
58
|
+
2. Clarify max iterations (5 total)
|
|
59
|
+
3. Fix coordinate mode iteration number (4th iteration = after 3 failures)
|
|
60
|
+
|
|
61
|
+
### Priority 2 (Nice to have):
|
|
62
|
+
4. Condense system prompt if possible (target: 250 lines)
|
|
63
|
+
5. Reduce emoji overuse
|
|
64
|
+
6. Add section headers in system prompt for clarity
|
|
65
|
+
|
|
66
|
+
## 📊 PROMPT STRUCTURE ANALYSIS
|
|
67
|
+
|
|
68
|
+
### System Prompt Sections:
|
|
69
|
+
1. Introduction (1 line)
|
|
70
|
+
2. Tool descriptions (dynamic, from registry)
|
|
71
|
+
3. JSON format (40 lines) ✅
|
|
72
|
+
4. Status rules (15 lines) ✅
|
|
73
|
+
5. Step re-evaluation (20 lines) ✅
|
|
74
|
+
6. Blocker detection (25 lines) ✅
|
|
75
|
+
7. Experiences (25 lines) ✅
|
|
76
|
+
8. Critical rules (200 lines) ⚠️ TOO LONG
|
|
77
|
+
9. Coordinate actions (45 lines) ✅
|
|
78
|
+
|
|
79
|
+
**TOTAL**: ~370 lines (with tool descriptions)
|
|
80
|
+
|
|
81
|
+
### User Prompt Sections:
|
|
82
|
+
1. Static instructions (20 lines) - **Cache-friendly** ✅
|
|
83
|
+
2. Dynamic context marker (1 line) ✅
|
|
84
|
+
3. Notes from previous iteration (5 lines) ✅
|
|
85
|
+
4. Warnings for failures (15 lines) ✅
|
|
86
|
+
5. Coordinate mode trigger (8 lines) ✅
|
|
87
|
+
6. Current step goal (10 lines) ✅
|
|
88
|
+
7. Page state (50-100 lines, variable) ✅
|
|
89
|
+
8. Recent steps (20-50 lines, variable) ✅
|
|
90
|
+
9. Experiences (10 lines) ✅
|
|
91
|
+
|
|
92
|
+
**TOTAL**: ~140-200 lines per call
|
|
93
|
+
|
|
94
|
+
## 🎯 RECOMMENDATION SUMMARY
|
|
95
|
+
|
|
96
|
+
**Keep as-is:**
|
|
97
|
+
- JSON structure
|
|
98
|
+
- Semantic selector examples
|
|
99
|
+
- Blocker detection
|
|
100
|
+
- Note to future self
|
|
101
|
+
- Coordinate fallback
|
|
102
|
+
- Cache optimization
|
|
103
|
+
|
|
104
|
+
**Fix:**
|
|
105
|
+
- Remove selector duplication in user prompt
|
|
106
|
+
- Clarify iteration counts
|
|
107
|
+
- Add max iteration limit
|
|
108
|
+
- Reduce emoji spam
|
|
109
|
+
|
|
110
|
+
**Consider:**
|
|
111
|
+
- Condensing "Critical Rules" section (currently 200 lines)
|
|
112
|
+
- Moving some examples to external docs
|
|
113
|
+
- Breaking long sections with clear headers
|
|
114
|
+
|
|
115
|
+
## Overall Assessment: **8/10**
|
|
116
|
+
- Prompts are comprehensive and well-structured
|
|
117
|
+
- Main issues are length and minor redundancies
|
|
118
|
+
- Cache optimization is excellent
|
|
119
|
+
- A few clarity fixes needed for iteration counts
|
|
120
|
+
|