testchimp-runner-core 0.0.32 → 0.0.34
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/llm-facade.d.ts.map +1 -1
- package/dist/llm-facade.js +7 -7
- package/dist/llm-facade.js.map +1 -1
- package/dist/llm-provider.d.ts +9 -0
- package/dist/llm-provider.d.ts.map +1 -1
- package/dist/model-constants.d.ts +16 -5
- package/dist/model-constants.d.ts.map +1 -1
- package/dist/model-constants.js +17 -6
- package/dist/model-constants.js.map +1 -1
- package/dist/orchestrator/index.d.ts +1 -1
- package/dist/orchestrator/index.d.ts.map +1 -1
- package/dist/orchestrator/index.js +3 -2
- package/dist/orchestrator/index.js.map +1 -1
- package/dist/orchestrator/orchestrator-agent.d.ts +0 -8
- package/dist/orchestrator/orchestrator-agent.d.ts.map +1 -1
- package/dist/orchestrator/orchestrator-agent.js +206 -405
- package/dist/orchestrator/orchestrator-agent.js.map +1 -1
- package/dist/orchestrator/orchestrator-prompts.d.ts +20 -0
- package/dist/orchestrator/orchestrator-prompts.d.ts.map +1 -0
- package/dist/orchestrator/orchestrator-prompts.js +455 -0
- package/dist/orchestrator/orchestrator-prompts.js.map +1 -0
- package/dist/orchestrator/tools/index.d.ts +2 -1
- package/dist/orchestrator/tools/index.d.ts.map +1 -1
- package/dist/orchestrator/tools/index.js +4 -2
- package/dist/orchestrator/tools/index.js.map +1 -1
- package/dist/orchestrator/tools/verify-action-result.d.ts +17 -0
- package/dist/orchestrator/tools/verify-action-result.d.ts.map +1 -0
- package/dist/orchestrator/tools/verify-action-result.js +140 -0
- package/dist/orchestrator/tools/verify-action-result.js.map +1 -0
- package/dist/orchestrator/types.d.ts +26 -0
- package/dist/orchestrator/types.d.ts.map +1 -1
- package/dist/orchestrator/types.js.map +1 -1
- package/dist/prompts.d.ts.map +1 -1
- package/dist/prompts.js +87 -37
- package/dist/prompts.js.map +1 -1
- package/dist/scenario-worker-class.d.ts.map +1 -1
- package/dist/scenario-worker-class.js +4 -1
- package/dist/scenario-worker-class.js.map +1 -1
- package/dist/utils/coordinate-converter.d.ts +32 -0
- package/dist/utils/coordinate-converter.d.ts.map +1 -0
- package/dist/utils/coordinate-converter.js +130 -0
- package/dist/utils/coordinate-converter.js.map +1 -0
- package/package.json +1 -1
- package/plandocs/BEFORE_AFTER_VERIFICATION.md +148 -0
- package/plandocs/COORDINATE_MODE_DIAGNOSIS.md +144 -0
- package/plandocs/IMPLEMENTATION_STATUS.md +108 -0
- package/plandocs/PHASE_1_COMPLETE.md +165 -0
- package/plandocs/PHASE_1_SUMMARY.md +184 -0
- package/plandocs/PROMPT_OPTIMIZATION_ANALYSIS.md +120 -0
- package/plandocs/PROMPT_SANITY_CHECK.md +120 -0
- package/plandocs/SESSION_SUMMARY_v0.0.33.md +151 -0
- package/plandocs/TROUBLESHOOTING_SESSION.md +72 -0
- package/plandocs/VISUAL_AGENT_EVOLUTION_PLAN.md +396 -0
- package/plandocs/WHATS_NEW_v0.0.33.md +183 -0
- package/src/llm-facade.ts +8 -8
- package/src/llm-provider.ts +11 -1
- package/src/model-constants.ts +17 -5
- package/src/orchestrator/index.ts +3 -2
- package/src/orchestrator/orchestrator-agent.ts +249 -424
- package/src/orchestrator/orchestrator-agent.ts.backup +1386 -0
- package/src/orchestrator/orchestrator-prompts.ts +474 -0
- package/src/orchestrator/tools/index.ts +2 -1
- package/src/orchestrator/tools/verify-action-result.ts +159 -0
- package/src/orchestrator/types.ts +48 -0
- package/src/prompts.ts +87 -37
- package/src/scenario-worker-class.ts +7 -2
- package/src/utils/coordinate-converter.ts +162 -0
- package/testchimp-runner-core-0.0.33.tgz +0 -0
- /package/{CREDIT_CALLBACK_ARCHITECTURE.md → plandocs/CREDIT_CALLBACK_ARCHITECTURE.md} +0 -0
- /package/{INTEGRATION_COMPLETE.md → plandocs/INTEGRATION_COMPLETE.md} +0 -0
- /package/{VISION_DIAGNOSTICS_IMPROVEMENTS.md → plandocs/VISION_DIAGNOSTICS_IMPROVEMENTS.md} +0 -0
|
@@ -0,0 +1,151 @@
|
|
|
1
|
+
# Runner-Core v0.0.33 - Session Summary
|
|
2
|
+
|
|
3
|
+
## Date: October 15, 2025
|
|
4
|
+
|
|
5
|
+
## Major Accomplishments:
|
|
6
|
+
|
|
7
|
+
### 1. ✅ **Coordinate Fallback System** (Phase 1 Complete)
|
|
8
|
+
- Percentage-based coordinates (0-100%, 3 decimal precision)
|
|
9
|
+
- Activates after 3 selector failures
|
|
10
|
+
- 2 coordinate attempts before giving up
|
|
11
|
+
- Resolution-independent positioning
|
|
12
|
+
|
|
13
|
+
### 2. ✅ **Note to Future Self** (Tactical Memory)
|
|
14
|
+
- Free-form notes persist across iterations AND steps
|
|
15
|
+
- Enables strategic planning across agent decisions
|
|
16
|
+
- Helps maintain context: "Tried X, will try Y next"
|
|
17
|
+
|
|
18
|
+
### 3. ✅ **Visual Verification Tool** (NEW)
|
|
19
|
+
- `verify_action_result` - Before/after screenshot comparison
|
|
20
|
+
- Agent-callable (decides when to use)
|
|
21
|
+
- JPEG 60% quality (85-90% smaller than PNG)
|
|
22
|
+
- Multi-image LLM interface support
|
|
23
|
+
|
|
24
|
+
### 4. ✅ **Critical Bug Fixes**
|
|
25
|
+
- **Coordinate mode never activated**: Changed forced stuck from >= 3 to >= 5 failures
|
|
26
|
+
- **Missing required fields**: Made parser flexible (accepts reasoning OR statusReasoning)
|
|
27
|
+
- **Navigation timeouts**: Added 30s timeout guidance for page.goto()
|
|
28
|
+
- **Strict mode violations**: Added scoping guidance (locator('#parent').getByText())
|
|
29
|
+
|
|
30
|
+
### 5. ✅ **Prompt Optimizations**
|
|
31
|
+
- **59% reduction**: 17,573 chars → 7,287 chars in system prompt
|
|
32
|
+
- **Cache-optimized**: Static content first, dynamic last
|
|
33
|
+
- **Cost savings**: ~40% overall with model tiering
|
|
34
|
+
- **Focused on cognition**: Removed bloat, kept decision-making guidance
|
|
35
|
+
|
|
36
|
+
### 6. ✅ **Model Optimization**
|
|
37
|
+
- **gpt-5-mini**: Complex tasks (4 operations)
|
|
38
|
+
- Command generation
|
|
39
|
+
- Goal completion checks
|
|
40
|
+
- Repair suggestions
|
|
41
|
+
- Agent orchestration
|
|
42
|
+
- **gpt-4o-mini**: Simple tasks (7 operations)
|
|
43
|
+
- Scenario breakdown
|
|
44
|
+
- Screenshot need assessment
|
|
45
|
+
- Repair confidence
|
|
46
|
+
- Test name generation
|
|
47
|
+
- Hashtag generation
|
|
48
|
+
- Script parsing
|
|
49
|
+
- Final script merging
|
|
50
|
+
- **Est. 25-30% cost reduction**
|
|
51
|
+
|
|
52
|
+
### 7. ✅ **Code Cleanup**
|
|
53
|
+
- Removed V1 SmartTestRunnerCore (V2 is stable)
|
|
54
|
+
- Removed backup files (.bak, .tmp)
|
|
55
|
+
- Consolidated types into V2
|
|
56
|
+
- Removed PeopleHR-specific examples from prompts
|
|
57
|
+
|
|
58
|
+
### 8. ✅ **Enhanced Logging**
|
|
59
|
+
- Prompt length metrics (chars + estimated tokens)
|
|
60
|
+
- Full LLM response on parsing errors
|
|
61
|
+
- Field presence diagnostics
|
|
62
|
+
- Retry logging for 500 errors
|
|
63
|
+
|
|
64
|
+
### 9. ✅ **Retry Logic**
|
|
65
|
+
- Automatic retry for OpenAI 500 errors
|
|
66
|
+
- Exponential backoff (1s, 2s, 4s)
|
|
67
|
+
- Up to 3 attempts before failing
|
|
68
|
+
|
|
69
|
+
### 10. ✅ **Headed Mode for Local Testing**
|
|
70
|
+
- All browser instances use headed: false → headed: false for local dev
|
|
71
|
+
- Visual debugging enabled
|
|
72
|
+
|
|
73
|
+
## Files Modified:
|
|
74
|
+
|
|
75
|
+
### Runner-Core:
|
|
76
|
+
1. `src/orchestrator/orchestrator-agent.ts` - Main agent logic
|
|
77
|
+
2. `src/orchestrator/types.ts` - NoteToFutureSelf, CoordinateAction
|
|
78
|
+
3. `src/utils/coordinate-converter.ts` - NEW - Coordinate to Playwright conversion
|
|
79
|
+
4. `src/orchestrator/tools/verify-action-result.ts` - NEW - Visual verification tool
|
|
80
|
+
5. `src/llm-provider.ts` - Added LabeledImage, multi-image support
|
|
81
|
+
6. `src/llm-facade.ts` - Model optimization
|
|
82
|
+
7. `src/model-constants.ts` - Added DEFAULT_SIMPLER_MODEL
|
|
83
|
+
8. `src/scenario-worker-class.ts` - Tool registration
|
|
84
|
+
9. `src/orchestrator/index.ts` - Exports
|
|
85
|
+
10. `src/orchestrator/tools/index.ts` - Tool exports
|
|
86
|
+
|
|
87
|
+
### Scriptservice:
|
|
88
|
+
1. `providers/scriptservice-llm-provider.ts` - Multi-image handling, retry logic
|
|
89
|
+
2. `smart-test-runner-core-v2.ts` - Type definitions, V1 removal
|
|
90
|
+
3. `smart-test-execution-handler.ts` - V1 removal
|
|
91
|
+
4. `workers/test-based-explorer.ts` - V1 removal
|
|
92
|
+
5. `script-generation-handlers.ts` - Headed mode
|
|
93
|
+
6. `script-generation/script-generation-service.ts` - Headed mode
|
|
94
|
+
7. `smart-test-execution-handler.ts` - Headed mode
|
|
95
|
+
|
|
96
|
+
### Documentation:
|
|
97
|
+
1. `WHATS_NEW_v0.0.33.md`
|
|
98
|
+
2. `PHASE_1_COMPLETE.md`
|
|
99
|
+
3. `PHASE_1_SUMMARY.md`
|
|
100
|
+
4. `IMPLEMENTATION_STATUS.md`
|
|
101
|
+
5. `VISUAL_AGENT_EVOLUTION_PLAN.md`
|
|
102
|
+
6. `PROMPT_SANITY_CHECK.md`
|
|
103
|
+
7. `PROMPT_OPTIMIZATION_ANALYSIS.md`
|
|
104
|
+
8. `COORDINATE_MODE_DIAGNOSIS.md`
|
|
105
|
+
9. `BEFORE_AFTER_VERIFICATION.md`
|
|
106
|
+
10. `TROUBLESHOOTING_SESSION.md`
|
|
107
|
+
|
|
108
|
+
## Live Test Status:
|
|
109
|
+
|
|
110
|
+
**Job**: `71b88c60-52f5-4343-aef8-c44ebb07f3e9`
|
|
111
|
+
**Status**: Running (check browser + logs)
|
|
112
|
+
**Watch For**:
|
|
113
|
+
- Step 5 (Employee Information) - Previously problematic
|
|
114
|
+
- Coordinate mode activation
|
|
115
|
+
- verify_action_result tool usage
|
|
116
|
+
- Overall completion
|
|
117
|
+
|
|
118
|
+
## Key Metrics:
|
|
119
|
+
|
|
120
|
+
**Cost Optimization:**
|
|
121
|
+
- Prompt size: 59% reduction
|
|
122
|
+
- Model tiering: 7/11 tasks on cheaper model
|
|
123
|
+
- JPEG compression: 85-90% smaller screenshots
|
|
124
|
+
- **Total savings: ~40% cost reduction**
|
|
125
|
+
|
|
126
|
+
## Next Steps After Test Completes:
|
|
127
|
+
|
|
128
|
+
1. Check if Step 5 completes successfully
|
|
129
|
+
2. Verify coordinate mode activated if needed
|
|
130
|
+
3. Check if verify_action_result tool was used
|
|
131
|
+
4. Analyze any remaining failures
|
|
132
|
+
5. Iterate on prompts/logic based on results
|
|
133
|
+
|
|
134
|
+
## Known Issues to Monitor:
|
|
135
|
+
|
|
136
|
+
1. **Step 5 False Positive**: Clicking menu item vs navigating to page
|
|
137
|
+
2. **Coordinate Loop**: Agent not knowing when coordinate clicks succeed
|
|
138
|
+
3. **Vision verification usage**: Will agent call it proactively?
|
|
139
|
+
|
|
140
|
+
## Success Criteria:
|
|
141
|
+
|
|
142
|
+
✅ All 7 steps complete
|
|
143
|
+
✅ Coordinate fallback used when selectors fail
|
|
144
|
+
✅ Visual verification validates goal achievement
|
|
145
|
+
✅ No infinite loops or stuck states
|
|
146
|
+
✅ Generated script is accurate
|
|
147
|
+
|
|
148
|
+
---
|
|
149
|
+
|
|
150
|
+
**Check your browser window and /tmp/scriptservice-test.log for live execution!**
|
|
151
|
+
|
|
@@ -0,0 +1,72 @@
|
|
|
1
|
+
# Troubleshooting Session: All Modules Icon Click Failure
|
|
2
|
+
|
|
3
|
+
## Objective:
|
|
4
|
+
Understand why the orchestrator agent gets stuck on "Click on the all Modules menu item (top menu icon)" while manual Playwright MCP navigation succeeded.
|
|
5
|
+
|
|
6
|
+
## What I Need to See:
|
|
7
|
+
|
|
8
|
+
### 1. Full Agent Logs for the Failing Step
|
|
9
|
+
Please provide the complete logs showing:
|
|
10
|
+
- What iteration attempts were made (iteration 1, 2, 3...)
|
|
11
|
+
- What selectors the agent tried each time
|
|
12
|
+
- What errors it encountered
|
|
13
|
+
- What the DOM snapshot showed
|
|
14
|
+
- Whether it took screenshots
|
|
15
|
+
- What notes it left to future self
|
|
16
|
+
|
|
17
|
+
### 2. The DOM Context It Saw
|
|
18
|
+
- Interactive elements list
|
|
19
|
+
- ARIA tree snapshot
|
|
20
|
+
- Whether the hamburger icon was visible in the list
|
|
21
|
+
|
|
22
|
+
## What Worked (My Manual MCP Session):
|
|
23
|
+
|
|
24
|
+
From earlier successful navigation:
|
|
25
|
+
```
|
|
26
|
+
✅ Step 1: Clicked hamburger menu
|
|
27
|
+
Selector: #sidebar-toggle > span > svg
|
|
28
|
+
|
|
29
|
+
✅ Step 2: Clicked "Core HR"
|
|
30
|
+
Selector: getByText('Core HR')
|
|
31
|
+
|
|
32
|
+
✅ Step 3: Clicked "Employee Information"
|
|
33
|
+
Selector: getByText('Employee Information')
|
|
34
|
+
```
|
|
35
|
+
|
|
36
|
+
## Hypothesis of Why Agent Fails:
|
|
37
|
+
|
|
38
|
+
### Possible Issue 1: Wrong Selector Strategy
|
|
39
|
+
- Agent might be trying: `getByText('All Modules')` (strict mode violation)
|
|
40
|
+
- Or: `#MenuToggle` (wrong ID)
|
|
41
|
+
- Or: `#sidebar-toggle-menu` (doesn't exist)
|
|
42
|
+
- Instead of: `#sidebar-toggle > span > svg` (actual selector)
|
|
43
|
+
|
|
44
|
+
### Possible Issue 2: Missing Icon Detection
|
|
45
|
+
- Hamburger icons are often SVG elements without accessible text
|
|
46
|
+
- Agent might not recognize this pattern
|
|
47
|
+
- Prompt doesn't explicitly guide on icon/SVG selector strategy
|
|
48
|
+
|
|
49
|
+
### Possible Issue 3: DOM List Incomplete
|
|
50
|
+
- Interactive elements might not include the SVG icon
|
|
51
|
+
- If icon isn't in the list, agent won't know it exists
|
|
52
|
+
- Need to check if `getEnhancedPageInfo` captures SVG icons
|
|
53
|
+
|
|
54
|
+
### Possible Issue 4: Ambiguous Text
|
|
55
|
+
- "All Modules" might appear in multiple places (menu button + modal title)
|
|
56
|
+
- Agent tries `getByText('All Modules')` → strict mode violation
|
|
57
|
+
- Should scope to parent: `locator('#sidebar-toggle').getByText('All Modules')`
|
|
58
|
+
|
|
59
|
+
## Next Steps:
|
|
60
|
+
|
|
61
|
+
1. **Get full logs** from your failing run
|
|
62
|
+
2. **Compare** what agent saw vs what I saw
|
|
63
|
+
3. **Identify** the gap (prompt, DOM extraction, or selector logic)
|
|
64
|
+
4. **Plan fixes**:
|
|
65
|
+
- Prompt improvements (icon/SVG guidance)
|
|
66
|
+
- DOM extraction improvements (ensure icons are captured)
|
|
67
|
+
- Selector strategy improvements (parent scoping for icons)
|
|
68
|
+
- Example-based learning (hamburger menu pattern)
|
|
69
|
+
|
|
70
|
+
## Waiting For:
|
|
71
|
+
Please paste the full logs from the failing step showing all iteration attempts and what the agent tried.
|
|
72
|
+
|
|
@@ -0,0 +1,396 @@
|
|
|
1
|
+
# Runner-Core Visual Agent Evolution - Complete Implementation Plan
|
|
2
|
+
|
|
3
|
+
## Overview
|
|
4
|
+
|
|
5
|
+
Two-phase pragmatic evolution without major architecture overhaul:
|
|
6
|
+
- **Phase 1 (Week 1-2):** Percentage coordinates + Free-form notes
|
|
7
|
+
- **Phase 2 (Week 3-5):** Numbered element system with three-tier fallback
|
|
8
|
+
|
|
9
|
+
---
|
|
10
|
+
|
|
11
|
+
## PHASE 1: Tactical Improvements
|
|
12
|
+
|
|
13
|
+
### 1A: Free-Form "Note to Future Self"
|
|
14
|
+
|
|
15
|
+
**Why:** Agent needs tactical memory between iterations of the SAME step.
|
|
16
|
+
|
|
17
|
+
**Type:**
|
|
18
|
+
```typescript
|
|
19
|
+
interface NoteToFutureSelf {
|
|
20
|
+
fromIteration: number;
|
|
21
|
+
content: string; // FREE-FORM - agent writes whatever it wants
|
|
22
|
+
}
|
|
23
|
+
```
|
|
24
|
+
|
|
25
|
+
**Examples:**
|
|
26
|
+
- "Tried #sidebar-toggle, failed. Will try SVG child next."
|
|
27
|
+
- "Plan: Hover over menu first to reveal dropdown, then click Settings."
|
|
28
|
+
- "Cookie banner blocking. Next: dismiss it, then retry main action."
|
|
29
|
+
|
|
30
|
+
**vs Current Learnings:**
|
|
31
|
+
- Learnings = App-wide patterns ("App uses getByRole")
|
|
32
|
+
- Note to self = Iteration-specific tactics ("Just tried X, will try Y next")
|
|
33
|
+
|
|
34
|
+
**Keep BOTH!**
|
|
35
|
+
|
|
36
|
+
### 1B: Percentage-Based Coordinates
|
|
37
|
+
|
|
38
|
+
**LLM outputs percentages:**
|
|
39
|
+
```json
|
|
40
|
+
{
|
|
41
|
+
"coordinateAction": {
|
|
42
|
+
"action": "click|fill|drag|hover|scroll",
|
|
43
|
+
"xPercent": 15.75, // 2 decimal precision for accuracy
|
|
44
|
+
"yPercent": 8.50,
|
|
45
|
+
|
|
46
|
+
// For drag:
|
|
47
|
+
"toXPercent": 45.25,
|
|
48
|
+
"toYPercent": 8.50,
|
|
49
|
+
|
|
50
|
+
// For fill:
|
|
51
|
+
"value": "alice@example.com"
|
|
52
|
+
}
|
|
53
|
+
}
|
|
54
|
+
```
|
|
55
|
+
|
|
56
|
+
**Code converts to pixels:**
|
|
57
|
+
```typescript
|
|
58
|
+
const viewport = await page.evaluate(() => ({width: window.innerWidth, height: window.innerHeight}));
|
|
59
|
+
const x = Math.round((xPercent / 100) * viewport.width);
|
|
60
|
+
const y = Math.round((yPercent / 100) * viewport.height);
|
|
61
|
+
```
|
|
62
|
+
|
|
63
|
+
### 1C: Code-Controlled Fallback
|
|
64
|
+
|
|
65
|
+
**Tier 1 (iteration 1-3):** Normal Playwright selectors
|
|
66
|
+
**Tier 2 (iteration 4+):** Percentage coordinates (auto-triggered after 3 failures)
|
|
67
|
+
|
|
68
|
+
```typescript
|
|
69
|
+
// In callAgent():
|
|
70
|
+
if (consecutiveFailures >= 3) { // Phase 1: Only 2 tiers
|
|
71
|
+
// Auto-use coordinate-specific system prompt
|
|
72
|
+
// LLM outputs percentages
|
|
73
|
+
// Phase 2 will add tier 2 (indexed elements) between selectors and coordinates
|
|
74
|
+
}
|
|
75
|
+
```
|
|
76
|
+
|
|
77
|
+
**Files:**
|
|
78
|
+
- `utils/coordinate-converter.ts` (NEW)
|
|
79
|
+
- `orchestrator/orchestrator-agent.ts` (UPDATE)
|
|
80
|
+
- `orchestrator/types.ts` (UPDATE - add CoordinateAction, NoteToFutureSelf)
|
|
81
|
+
|
|
82
|
+
---
|
|
83
|
+
|
|
84
|
+
## PHASE 2: Numbered Element System
|
|
85
|
+
|
|
86
|
+
### Architecture
|
|
87
|
+
|
|
88
|
+
```
|
|
89
|
+
Three-Tier Fallback:
|
|
90
|
+
|
|
91
|
+
Tier 1 (iteration 1 ONLY): Playwright Selector Mode - ONE SHOT
|
|
92
|
+
├─ Agent generates: await page.getByRole('button', {name: 'Login'}).click()
|
|
93
|
+
├─ Direct execution
|
|
94
|
+
└─ 70% of tasks finish here (simple/medium complexity)
|
|
95
|
+
|
|
96
|
+
Tier 2 (iterations 2-3): Index Command Mode - TWO ATTEMPTS
|
|
97
|
+
├─ Inject numbered markers [1], [2], [3] → screenshot
|
|
98
|
+
├─ Agent outputs: CLICK[3], FILL[5, "alice@example.com"]
|
|
99
|
+
├─ Execution: Use data-testchimp-el="[3]" (reliable targeting)
|
|
100
|
+
├─ Script output: Translate to NATIVE selector (getByRole, #id, etc.)
|
|
101
|
+
└─ 25% of tasks finish here (complex UIs, icons, shadow DOM)
|
|
102
|
+
|
|
103
|
+
Tier 3 (iterations 4+): Percentage Coordinate Mode - LAST RESORT
|
|
104
|
+
├─ Agent outputs: {xPercent: 15.755, yPercent: 8.500}
|
|
105
|
+
├─ CoordinateConverter: % → pixels
|
|
106
|
+
├─ Execute: mouse.click(x, y)
|
|
107
|
+
└─ <5% of tasks need this (extreme edge cases)
|
|
108
|
+
```
|
|
109
|
+
|
|
110
|
+
### 2.1: Reusable Utility: ElementDetector
|
|
111
|
+
|
|
112
|
+
**File:** `runner-core/src/utils/element-detector.ts`
|
|
113
|
+
|
|
114
|
+
**Purpose:** Detect ALL interactive elements with z-index and occlusion awareness.
|
|
115
|
+
|
|
116
|
+
**Key Features:**
|
|
117
|
+
- Comprehensive element queries (buttons, links, inputs, SVGs, clickable divs)
|
|
118
|
+
- Z-index calculation via `getComputedStyle()`
|
|
119
|
+
- Occlusion detection via `elementFromPoint()` at center
|
|
120
|
+
- Context tags: `[header|nav|sidebar|main|modal]`
|
|
121
|
+
- Spatial tags: `[top|bottom|left|right|center]`
|
|
122
|
+
- Center position as percentage (2 decimal precision)
|
|
123
|
+
|
|
124
|
+
**Output:**
|
|
125
|
+
```typescript
|
|
126
|
+
interface DetectedElement {
|
|
127
|
+
index: number; // [1], [2], [3]
|
|
128
|
+
tag: string;
|
|
129
|
+
text: string;
|
|
130
|
+
role: string;
|
|
131
|
+
ariaLabel: string;
|
|
132
|
+
bbox: {x, y, width, height};
|
|
133
|
+
centerPercent: {x: 15.75, y: 8.50}; // As percentage
|
|
134
|
+
context: string[]; // ['header', 'nav', 'top']
|
|
135
|
+
zIndex: number;
|
|
136
|
+
isVisible: boolean; // Not occluded by higher z-index
|
|
137
|
+
selectors: {
|
|
138
|
+
dataAttribute: string; // [data-testchimp-el="[3]"]
|
|
139
|
+
semantic: string[]; // [getByRole(...), getByLabel(...)]
|
|
140
|
+
cssId: string | null;
|
|
141
|
+
cssClass: string | null;
|
|
142
|
+
};
|
|
143
|
+
}
|
|
144
|
+
```
|
|
145
|
+
|
|
146
|
+
**Implementation Highlights:**
|
|
147
|
+
- Queries: `button`, `a[href]`, `input`, `svg`, `[onclick]`, `[role=...]`, `[data-testid]`, clickable divs/spans
|
|
148
|
+
- Z-index check: `elementFromPoint(centerX, centerY)` must return element or child
|
|
149
|
+
- Filter: Only return `isVisible: true` elements
|
|
150
|
+
|
|
151
|
+
### 2.2: Reusable Utility: SelectorResolver
|
|
152
|
+
|
|
153
|
+
**File:** `runner-core/src/utils/selector-resolver.ts`
|
|
154
|
+
|
|
155
|
+
**Purpose:** Given element index, return most reliable Playwright selector FOR GENERATED SCRIPTS.
|
|
156
|
+
|
|
157
|
+
**CRITICAL DISTINCTION:**
|
|
158
|
+
|
|
159
|
+
A. **During Execution (Internal):**
|
|
160
|
+
- Agent can click using `data-testchimp-el="[N]"` (we inject it temporarily)
|
|
161
|
+
- This ensures agent clicks the exact right element
|
|
162
|
+
|
|
163
|
+
B. **For Generated Script (Output):**
|
|
164
|
+
- Must use NATIVE selectors that work without our attributes
|
|
165
|
+
- Script will run on real application without data-testchimp-el
|
|
166
|
+
|
|
167
|
+
**Selector Resolution for Script Output (in order):**
|
|
168
|
+
1. Semantic selectors - `getByRole()`, `getByLabel()` (BEST - maintainable)
|
|
169
|
+
2. CSS ID - `#element-id` (GOOD - stable)
|
|
170
|
+
3. CSS class - `.button-primary` (scoped to context if ambiguous)
|
|
171
|
+
4. Contextual selector - `header .menu-toggle svg` (LAST RESORT)
|
|
172
|
+
|
|
173
|
+
**For each selector, validate:**
|
|
174
|
+
- Element exists on page
|
|
175
|
+
- Not covered by higher z-index (using elementFromPoint)
|
|
176
|
+
- Position matches expected (±5% tolerance)
|
|
177
|
+
|
|
178
|
+
**Method:**
|
|
179
|
+
```typescript
|
|
180
|
+
static async resolveIndexToSelector(
|
|
181
|
+
index: number,
|
|
182
|
+
elements: DetectedElement[],
|
|
183
|
+
page: Page
|
|
184
|
+
): Promise<{selector: string, strategy: string}>
|
|
185
|
+
```
|
|
186
|
+
|
|
187
|
+
**Validation:**
|
|
188
|
+
```typescript
|
|
189
|
+
private static async validateInteractable(
|
|
190
|
+
page: Page,
|
|
191
|
+
selector: string,
|
|
192
|
+
expectedCenterPercent: {x, y}
|
|
193
|
+
): Promise<boolean> {
|
|
194
|
+
// 1. Element exists
|
|
195
|
+
// 2. Non-zero dimensions
|
|
196
|
+
// 3. elementFromPoint(center) === element (z-index check)
|
|
197
|
+
// 4. Position within ±5% tolerance
|
|
198
|
+
}
|
|
199
|
+
```
|
|
200
|
+
|
|
201
|
+
### 2.3: Reusable Utility: VisualMarkerInjector
|
|
202
|
+
|
|
203
|
+
**File:** `runner-core/src/utils/visual-marker-injector.ts`
|
|
204
|
+
|
|
205
|
+
**Purpose:** Inject visual numbered labels on page.
|
|
206
|
+
|
|
207
|
+
**Methods:**
|
|
208
|
+
```typescript
|
|
209
|
+
static async injectNumberedMarkers(page): Promise<DetectedElement[]>
|
|
210
|
+
static async removeMarkers(page): Promise<void>
|
|
211
|
+
static async captureMarkedScreenshot(page, elements): Promise<string>
|
|
212
|
+
```
|
|
213
|
+
|
|
214
|
+
**Visual Markers:**
|
|
215
|
+
- Red gradient background `[1]`, `[2]`, `[3]`
|
|
216
|
+
- Positioned at top-left of element
|
|
217
|
+
- z-index: 999999 (always visible)
|
|
218
|
+
- Inject `data-testchimp-el` attribute on actual element (TEMPORARY - for agent execution only)
|
|
219
|
+
* Used internally to ensure agent clicks correct element
|
|
220
|
+
* Removed after step completion
|
|
221
|
+
* NEVER appears in generated script output
|
|
222
|
+
|
|
223
|
+
### 2.4: Reusable Utility: IndexCommandTranslator
|
|
224
|
+
|
|
225
|
+
**File:** `runner-core/src/utils/index-command-translator.ts`
|
|
226
|
+
|
|
227
|
+
**Purpose:** Translate index commands to Playwright commands with NATIVE selectors (for script generation).
|
|
228
|
+
|
|
229
|
+
**Input:**
|
|
230
|
+
```typescript
|
|
231
|
+
{ action: "CLICK", index: 3 }
|
|
232
|
+
{ action: "FILL", index: 5, value: "alice@example.com" }
|
|
233
|
+
```
|
|
234
|
+
|
|
235
|
+
**Output (MUST use native selectors):**
|
|
236
|
+
```typescript
|
|
237
|
+
"await page.getByRole('button', {name: 'Menu'}).click();"
|
|
238
|
+
"await page.locator('#username').fill('alice@example.com');"
|
|
239
|
+
// OR
|
|
240
|
+
"await page.locator('#sidebar-toggle svg').click();"
|
|
241
|
+
```
|
|
242
|
+
|
|
243
|
+
**NOT this (won't work in generated script):**
|
|
244
|
+
```typescript
|
|
245
|
+
❌ "await page.locator('[data-testchimp-el=\"[3]\"]').click();"
|
|
246
|
+
❌ "await page.locator('[data-testchimp-el=\"[5]\"]').fill('alice@example.com');"
|
|
247
|
+
```
|
|
248
|
+
|
|
249
|
+
**Process:**
|
|
250
|
+
1. **During execution:** Click element using `data-testchimp-el="[index]"` (reliable)
|
|
251
|
+
2. **For script output:** Use SelectorResolver to get NATIVE selector (semantic/id/class)
|
|
252
|
+
3. Generate Playwright command with native selector
|
|
253
|
+
4. Return command string that works on real application
|
|
254
|
+
|
|
255
|
+
**Critical Distinction:**
|
|
256
|
+
- `data-testchimp-el` = Internal execution helper (temporary)
|
|
257
|
+
- Script output = Native selectors (permanent, works standalone)
|
|
258
|
+
|
|
259
|
+
### 2.5: Integration - Three-Tier System (Optimized Escalation)
|
|
260
|
+
|
|
261
|
+
**File:** `orchestrator/orchestrator-agent.ts`
|
|
262
|
+
|
|
263
|
+
**Optimized Strategy:** Escalate quickly to avoid wasting time on difficult tasks
|
|
264
|
+
|
|
265
|
+
**Mode Determination:**
|
|
266
|
+
```typescript
|
|
267
|
+
let tier: 1 | 2 | 3;
|
|
268
|
+
|
|
269
|
+
// Tier 1: Try normal selectors ONCE (iteration 1)
|
|
270
|
+
// Tier 2: Use indexed elements TWICE (iterations 2-3)
|
|
271
|
+
// Tier 3: Use percentage coords (iterations 4+)
|
|
272
|
+
|
|
273
|
+
if (iteration >= 4) {
|
|
274
|
+
tier = 3; // Coordinate mode
|
|
275
|
+
} else if (iteration >= 2) {
|
|
276
|
+
tier = 2; // Index command mode
|
|
277
|
+
} else {
|
|
278
|
+
tier = 1; // Normal Playwright selector mode
|
|
279
|
+
}
|
|
280
|
+
```
|
|
281
|
+
|
|
282
|
+
**Rationale:**
|
|
283
|
+
- Simple tasks: Succeed in Tier 1 (iteration 1) - fast!
|
|
284
|
+
- Medium tasks: Tier 2 gives 2 attempts with reliable index system
|
|
285
|
+
- Hard tasks: Tier 3 coordinates as absolute fallback
|
|
286
|
+
- No wasted iterations on difficult element detection
|
|
287
|
+
|
|
288
|
+
**Tier Preparation:**
|
|
289
|
+
```typescript
|
|
290
|
+
if (tier === 2) await this.prepareIndexMode(context, page);
|
|
291
|
+
if (tier === 3) await this.prepareCoordinateMode(context, page);
|
|
292
|
+
```
|
|
293
|
+
|
|
294
|
+
**System Prompt Selection:**
|
|
295
|
+
```typescript
|
|
296
|
+
const systemPrompt =
|
|
297
|
+
tier === 3 ? this.buildCoordinateSystemPrompt() :
|
|
298
|
+
tier === 2 ? this.buildIndexSystemPrompt() :
|
|
299
|
+
this.buildSystemPrompt();
|
|
300
|
+
```
|
|
301
|
+
|
|
302
|
+
**Execution Flow:**
|
|
303
|
+
```typescript
|
|
304
|
+
// Tier 1 (iteration 1):
|
|
305
|
+
if (decision.commands) {
|
|
306
|
+
// Execute normal Playwright command
|
|
307
|
+
// If fails → iteration++, move to Tier 2
|
|
308
|
+
}
|
|
309
|
+
|
|
310
|
+
// Tier 2 (iterations 2-3):
|
|
311
|
+
if (decision.indexCommand) {
|
|
312
|
+
// Step A: Click using data-testchimp-el="[N]" (execution)
|
|
313
|
+
await page.locator('[data-testchimp-el="[3]"]').click();
|
|
314
|
+
|
|
315
|
+
// Step B: Resolve to native selector (script generation)
|
|
316
|
+
const nativeSelector = await SelectorResolver.resolve(3, elements);
|
|
317
|
+
// → Returns: "getByRole('button', {name: 'Menu'})"
|
|
318
|
+
|
|
319
|
+
// Step C: Add to generated script
|
|
320
|
+
commandsExecuted.push(`await page.${nativeSelector}.click();`);
|
|
321
|
+
|
|
322
|
+
// If fails after 2 attempts → iteration++, move to Tier 3
|
|
323
|
+
}
|
|
324
|
+
|
|
325
|
+
// Tier 3 (iterations 4+):
|
|
326
|
+
if (decision.coordinateAction) {
|
|
327
|
+
// Convert % to pixels and execute
|
|
328
|
+
// Add coordinate commands to script (acceptable for edge cases)
|
|
329
|
+
}
|
|
330
|
+
```
|
|
331
|
+
|
|
332
|
+
---
|
|
333
|
+
|
|
334
|
+
## Utilities Summary
|
|
335
|
+
|
|
336
|
+
All utilities are **stateless and reusable**:
|
|
337
|
+
|
|
338
|
+
| Utility | Purpose | Reusable For |
|
|
339
|
+
|---------|---------|--------------|
|
|
340
|
+
| ElementDetector | Find interactive elements | Accessibility audits, page analysis |
|
|
341
|
+
| SelectorResolver | Index → selector with validation | Any numbered system |
|
|
342
|
+
| VisualMarkerInjector | Add visual labels | Manual testing, debugging |
|
|
343
|
+
| IndexCommandTranslator | Index command → Playwright | Any index-based automation |
|
|
344
|
+
| CoordinateConverter | Percentage → pixels | Any coordinate system |
|
|
345
|
+
|
|
346
|
+
---
|
|
347
|
+
|
|
348
|
+
## Implementation Timeline
|
|
349
|
+
|
|
350
|
+
### Week 1: Phase 1 Core
|
|
351
|
+
- [ ] NoteToFutureSelf type and tracking
|
|
352
|
+
- [ ] CoordinateAction with percentages
|
|
353
|
+
- [ ] CoordinateConverter utility
|
|
354
|
+
- [ ] Coordinate mode switching (tier 3)
|
|
355
|
+
|
|
356
|
+
### Week 2: Phase 1 Testing
|
|
357
|
+
- [ ] Test note-to-self on 10 scenarios
|
|
358
|
+
- [ ] Test percentage coordinates at multiple viewport sizes
|
|
359
|
+
- [ ] Verify all coordinate actions (click, fill, drag, scroll, hover)
|
|
360
|
+
|
|
361
|
+
### Week 3: Phase 2 Utilities
|
|
362
|
+
- [ ] ElementDetector with z-index awareness
|
|
363
|
+
- [ ] SelectorResolver with occlusion validation
|
|
364
|
+
- [ ] Test utilities standalone on complex pages
|
|
365
|
+
|
|
366
|
+
### Week 4: Phase 2 Integration
|
|
367
|
+
- [ ] VisualMarkerInjector
|
|
368
|
+
- [ ] IndexCommandTranslator (TWO-STAGE: execution via data-attr, script via native selector)
|
|
369
|
+
- [ ] Index mode (tier 2) integration with iteration-based switching
|
|
370
|
+
- [ ] Optimized escalation: iteration 1 → tier 1, iteration 2-3 → tier 2, iteration 4+ → tier 3
|
|
371
|
+
- [ ] Test PeopleHR with tier 2 (should succeed in iteration 2-3)
|
|
372
|
+
|
|
373
|
+
### Week 5: Phase 2 Testing
|
|
374
|
+
- [ ] Three-tier end-to-end testing
|
|
375
|
+
- [ ] Measure tier distribution (target: 70/25/5)
|
|
376
|
+
- [ ] A/B test vs current implementation
|
|
377
|
+
- [ ] Performance optimization
|
|
378
|
+
|
|
379
|
+
## Success Metrics
|
|
380
|
+
|
|
381
|
+
**Phase 1:**
|
|
382
|
+
- 20-30% reduction in average iterations per step
|
|
383
|
+
- Note-to-self prevents 40%+ of repeated selector failures
|
|
384
|
+
- Coordinates used in < 5% of scenarios
|
|
385
|
+
|
|
386
|
+
**Phase 2:**
|
|
387
|
+
- 70% scenarios complete in Tier 1 (iteration 1) - simple cases
|
|
388
|
+
- 25% scenarios use Tier 2 (iterations 2-3) - complex UIs with icons/shadows
|
|
389
|
+
- < 5% scenarios escalate to Tier 3 (iterations 4+) - impossible selector cases
|
|
390
|
+
- **PeopleHR hamburger menu:** Succeeds in Tier 2 iteration 2 with CLICK[N]
|
|
391
|
+
- **Average iterations per step:** Should decrease from ~4 to ~1.5
|
|
392
|
+
|
|
393
|
+
## Total Effort: 4-5 weeks
|
|
394
|
+
|
|
395
|
+
**Ready to implement?**
|
|
396
|
+
|