testchimp-runner-core 0.0.33 → 0.0.35
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/execution-service.d.ts +1 -4
- package/dist/execution-service.d.ts.map +1 -1
- package/dist/execution-service.js +155 -468
- package/dist/execution-service.js.map +1 -1
- package/dist/index.d.ts +3 -1
- package/dist/index.d.ts.map +1 -1
- package/dist/index.js +11 -1
- package/dist/index.js.map +1 -1
- package/dist/llm-facade.d.ts.map +1 -1
- package/dist/llm-facade.js +7 -7
- package/dist/llm-facade.js.map +1 -1
- package/dist/llm-provider.d.ts +9 -0
- package/dist/llm-provider.d.ts.map +1 -1
- package/dist/model-constants.d.ts +16 -5
- package/dist/model-constants.d.ts.map +1 -1
- package/dist/model-constants.js +17 -6
- package/dist/model-constants.js.map +1 -1
- package/dist/orchestrator/decision-parser.d.ts +18 -0
- package/dist/orchestrator/decision-parser.d.ts.map +1 -0
- package/dist/orchestrator/decision-parser.js +127 -0
- package/dist/orchestrator/decision-parser.js.map +1 -0
- package/dist/orchestrator/index.d.ts +4 -2
- package/dist/orchestrator/index.d.ts.map +1 -1
- package/dist/orchestrator/index.js +15 -2
- package/dist/orchestrator/index.js.map +1 -1
- package/dist/orchestrator/orchestrator-agent.d.ts +17 -22
- package/dist/orchestrator/orchestrator-agent.d.ts.map +1 -1
- package/dist/orchestrator/orchestrator-agent.js +708 -577
- package/dist/orchestrator/orchestrator-agent.js.map +1 -1
- package/dist/orchestrator/orchestrator-prompts.d.ts +32 -0
- package/dist/orchestrator/orchestrator-prompts.d.ts.map +1 -0
- package/dist/orchestrator/orchestrator-prompts.js +737 -0
- package/dist/orchestrator/orchestrator-prompts.js.map +1 -0
- package/dist/orchestrator/page-som-handler.d.ts +106 -0
- package/dist/orchestrator/page-som-handler.d.ts.map +1 -0
- package/dist/orchestrator/page-som-handler.js +1353 -0
- package/dist/orchestrator/page-som-handler.js.map +1 -0
- package/dist/orchestrator/som-types.d.ts +149 -0
- package/dist/orchestrator/som-types.d.ts.map +1 -0
- package/dist/orchestrator/som-types.js +87 -0
- package/dist/orchestrator/som-types.js.map +1 -0
- package/dist/orchestrator/tool-registry.d.ts +2 -0
- package/dist/orchestrator/tool-registry.d.ts.map +1 -1
- package/dist/orchestrator/tool-registry.js.map +1 -1
- package/dist/orchestrator/tools/index.d.ts +5 -1
- package/dist/orchestrator/tools/index.d.ts.map +1 -1
- package/dist/orchestrator/tools/index.js +9 -2
- package/dist/orchestrator/tools/index.js.map +1 -1
- package/dist/orchestrator/tools/refresh-som-markers.d.ts +12 -0
- package/dist/orchestrator/tools/refresh-som-markers.d.ts.map +1 -0
- package/dist/orchestrator/tools/refresh-som-markers.js +64 -0
- package/dist/orchestrator/tools/refresh-som-markers.js.map +1 -0
- package/dist/orchestrator/tools/verify-action-result.d.ts +17 -0
- package/dist/orchestrator/tools/verify-action-result.d.ts.map +1 -0
- package/dist/orchestrator/tools/verify-action-result.js +140 -0
- package/dist/orchestrator/tools/verify-action-result.js.map +1 -0
- package/dist/orchestrator/tools/view-previous-screenshot.d.ts +15 -0
- package/dist/orchestrator/tools/view-previous-screenshot.d.ts.map +1 -0
- package/dist/orchestrator/tools/view-previous-screenshot.js +92 -0
- package/dist/orchestrator/tools/view-previous-screenshot.js.map +1 -0
- package/dist/orchestrator/types.d.ts +49 -1
- package/dist/orchestrator/types.d.ts.map +1 -1
- package/dist/orchestrator/types.js +11 -1
- package/dist/orchestrator/types.js.map +1 -1
- package/dist/prompts.d.ts.map +1 -1
- package/dist/prompts.js +40 -34
- package/dist/prompts.js.map +1 -1
- package/dist/scenario-service.d.ts +5 -0
- package/dist/scenario-service.d.ts.map +1 -1
- package/dist/scenario-service.js +17 -0
- package/dist/scenario-service.js.map +1 -1
- package/dist/scenario-worker-class.d.ts +4 -0
- package/dist/scenario-worker-class.d.ts.map +1 -1
- package/dist/scenario-worker-class.js +21 -3
- package/dist/scenario-worker-class.js.map +1 -1
- package/dist/testing/agent-tester.d.ts +35 -0
- package/dist/testing/agent-tester.d.ts.map +1 -0
- package/dist/testing/agent-tester.js +84 -0
- package/dist/testing/agent-tester.js.map +1 -0
- package/dist/testing/ref-translator-tester.d.ts +44 -0
- package/dist/testing/ref-translator-tester.d.ts.map +1 -0
- package/dist/testing/ref-translator-tester.js +104 -0
- package/dist/testing/ref-translator-tester.js.map +1 -0
- package/dist/utils/coordinate-converter.d.ts +32 -0
- package/dist/utils/coordinate-converter.d.ts.map +1 -0
- package/dist/utils/coordinate-converter.js +130 -0
- package/dist/utils/coordinate-converter.js.map +1 -0
- package/dist/utils/hierarchical-selector.d.ts +47 -0
- package/dist/utils/hierarchical-selector.d.ts.map +1 -0
- package/dist/utils/hierarchical-selector.js +212 -0
- package/dist/utils/hierarchical-selector.js.map +1 -0
- package/dist/utils/page-info-retry.d.ts +14 -0
- package/dist/utils/page-info-retry.d.ts.map +1 -0
- package/dist/utils/page-info-retry.js +60 -0
- package/dist/utils/page-info-retry.js.map +1 -0
- package/dist/utils/page-info-utils.d.ts +1 -0
- package/dist/utils/page-info-utils.d.ts.map +1 -1
- package/dist/utils/page-info-utils.js +46 -18
- package/dist/utils/page-info-utils.js.map +1 -1
- package/dist/utils/ref-attacher.d.ts +21 -0
- package/dist/utils/ref-attacher.d.ts.map +1 -0
- package/dist/utils/ref-attacher.js +149 -0
- package/dist/utils/ref-attacher.js.map +1 -0
- package/dist/utils/ref-translator.d.ts +49 -0
- package/dist/utils/ref-translator.d.ts.map +1 -0
- package/dist/utils/ref-translator.js +276 -0
- package/dist/utils/ref-translator.js.map +1 -0
- package/package.json +1 -1
- package/plandocs/BEFORE_AFTER_VERIFICATION.md +148 -0
- package/plandocs/COORDINATE_MODE_DIAGNOSIS.md +144 -0
- package/plandocs/IMPLEMENTATION_STATUS.md +108 -0
- package/plandocs/PHASE_1_COMPLETE.md +165 -0
- package/plandocs/PHASE_1_SUMMARY.md +184 -0
- package/plandocs/PROMPT_OPTIMIZATION_ANALYSIS.md +120 -0
- package/plandocs/PROMPT_SANITY_CHECK.md +120 -0
- package/plandocs/SESSION_SUMMARY_v0.0.33.md +151 -0
- package/plandocs/TROUBLESHOOTING_SESSION.md +72 -0
- package/plandocs/VISUAL_AGENT_EVOLUTION_PLAN.md +396 -0
- package/plandocs/WHATS_NEW_v0.0.33.md +183 -0
- package/plandocs/exploratory-mode-support-v2.plan.md +953 -0
- package/plandocs/exploratory-mode-support.plan.md +928 -0
- package/plandocs/journey-id-tracking-addendum.md +227 -0
- package/src/execution-service.ts +179 -596
- package/src/index.ts +10 -0
- package/src/llm-facade.ts +8 -8
- package/src/llm-provider.ts +11 -1
- package/src/model-constants.ts +17 -5
- package/src/orchestrator/decision-parser.ts +139 -0
- package/src/orchestrator/index.ts +27 -2
- package/src/orchestrator/orchestrator-agent.ts +868 -623
- package/src/orchestrator/orchestrator-prompts.ts +786 -0
- package/src/orchestrator/page-som-handler.ts +1565 -0
- package/src/orchestrator/som-types.ts +188 -0
- package/src/orchestrator/tool-registry.ts +2 -0
- package/src/orchestrator/tools/index.ts +5 -1
- package/src/orchestrator/tools/refresh-som-markers.ts +69 -0
- package/src/orchestrator/tools/verify-action-result.ts +159 -0
- package/src/orchestrator/tools/view-previous-screenshot.ts +103 -0
- package/src/orchestrator/types.ts +95 -4
- package/src/prompts.ts +40 -34
- package/src/scenario-service.ts +20 -0
- package/src/scenario-worker-class.ts +30 -4
- package/src/utils/coordinate-converter.ts +162 -0
- package/src/utils/page-info-retry.ts +65 -0
- package/src/utils/page-info-utils.ts +53 -18
- package/testchimp-runner-core-0.0.35.tgz +0 -0
- /package/{CREDIT_CALLBACK_ARCHITECTURE.md → plandocs/CREDIT_CALLBACK_ARCHITECTURE.md} +0 -0
- /package/{INTEGRATION_COMPLETE.md → plandocs/INTEGRATION_COMPLETE.md} +0 -0
- /package/{VISION_DIAGNOSTICS_IMPROVEMENTS.md → plandocs/VISION_DIAGNOSTICS_IMPROVEMENTS.md} +0 -0
- /package/{RELEASE_0.0.26.md → releasenotes/RELEASE_0.0.26.md} +0 -0
- /package/{RELEASE_0.0.27.md → releasenotes/RELEASE_0.0.27.md} +0 -0
- /package/{RELEASE_0.0.28.md → releasenotes/RELEASE_0.0.28.md} +0 -0
|
@@ -0,0 +1,148 @@
|
|
|
1
|
+
# Before/After Screenshot Verification
|
|
2
|
+
|
|
3
|
+
## Feature: Visual Goal Verification for Coordinate Actions
|
|
4
|
+
|
|
5
|
+
### Problem Solved:
|
|
6
|
+
When using coordinate-based actions (clicking at x,y%), the agent has no way to know if the click achieved the goal:
|
|
7
|
+
- No element reference to check state
|
|
8
|
+
- No selector feedback
|
|
9
|
+
- Can't verify if expected page loaded or modal opened
|
|
10
|
+
|
|
11
|
+
This led to:
|
|
12
|
+
- False positives (click succeeded but goal not achieved)
|
|
13
|
+
- Infinite loops (agent keeps clicking, unsure if it worked)
|
|
14
|
+
|
|
15
|
+
### Solution:
|
|
16
|
+
Automatic before/after screenshot comparison after coordinate clicks.
|
|
17
|
+
|
|
18
|
+
## How It Works:
|
|
19
|
+
|
|
20
|
+
### 1. **Automatic Trigger** (No Agent Action Required)
|
|
21
|
+
When agent uses coordinate action:
|
|
22
|
+
```typescript
|
|
23
|
+
Iteration 4: 🎯 Coordinate mode activated
|
|
24
|
+
Step 1: Capture BEFORE screenshot
|
|
25
|
+
Step 2: Execute coordinate click (x%, y%)
|
|
26
|
+
Step 3: Wait 1000ms for UI to settle
|
|
27
|
+
Step 4: Capture AFTER screenshot
|
|
28
|
+
Step 5: Call LLM with both images (labeled "BEFORE", "AFTER")
|
|
29
|
+
Step 6: LLM responds: { goalAchieved: true/false, reasoning: "..." }
|
|
30
|
+
Step 7a: If TRUE → Mark complete, exit step ✅
|
|
31
|
+
Step 7b: If FALSE → Continue to next iteration, try different coordinates
|
|
32
|
+
```
|
|
33
|
+
|
|
34
|
+
### 2. **LLM Prompt for Verification**
|
|
35
|
+
```
|
|
36
|
+
Goal: [Current step goal]
|
|
37
|
+
|
|
38
|
+
Compare the BEFORE and AFTER screenshots.
|
|
39
|
+
|
|
40
|
+
Did the action achieve the goal? Respond with JSON:
|
|
41
|
+
{
|
|
42
|
+
"goalAchieved": boolean,
|
|
43
|
+
"reasoning": "What changed (or didn't change)",
|
|
44
|
+
"visibleChanges": ["List of UI changes observed"]
|
|
45
|
+
}
|
|
46
|
+
|
|
47
|
+
Focus on:
|
|
48
|
+
- Did expected elements appear/disappear?
|
|
49
|
+
- Did page navigate or content change?
|
|
50
|
+
- Visual indicators of success (new panels, forms, highlights)?
|
|
51
|
+
|
|
52
|
+
Be strict: Only return true if you clearly see the expected change.
|
|
53
|
+
```
|
|
54
|
+
|
|
55
|
+
### 3. **Multi-Image LLM Interface**
|
|
56
|
+
```typescript
|
|
57
|
+
// NEW: LabeledImage interface
|
|
58
|
+
export interface LabeledImage {
|
|
59
|
+
label: string; // "Before", "After", etc.
|
|
60
|
+
dataUrl: string; // Base64 data URL
|
|
61
|
+
}
|
|
62
|
+
|
|
63
|
+
// UPDATED: LLMRequest
|
|
64
|
+
export interface LLMRequest {
|
|
65
|
+
imageUrl?: string; // Backward compatible (single image)
|
|
66
|
+
images?: LabeledImage[]; // NEW - multi-image support
|
|
67
|
+
}
|
|
68
|
+
```
|
|
69
|
+
|
|
70
|
+
### 4. **Provider Implementation** (scriptservice-llm-provider.ts)
|
|
71
|
+
```typescript
|
|
72
|
+
if (request.images && request.images.length > 0) {
|
|
73
|
+
for (const img of request.images) {
|
|
74
|
+
contentParts.push({ type: 'text', text: `\n[${img.label}]:` });
|
|
75
|
+
contentParts.push({ type: 'image_url', image_url: { url: img.dataUrl } });
|
|
76
|
+
}
|
|
77
|
+
// Sends: [BEFORE]: <image1>, [AFTER]: <image2>
|
|
78
|
+
}
|
|
79
|
+
```
|
|
80
|
+
|
|
81
|
+
## When Verification Happens:
|
|
82
|
+
|
|
83
|
+
✅ **Always**: After first coordinate action attempt
|
|
84
|
+
❌ **Never**: After selector-based actions (have element state to check)
|
|
85
|
+
⚠️ **Conditional**: Can add for other scenarios where goal verification is unclear
|
|
86
|
+
|
|
87
|
+
## Cost Considerations:
|
|
88
|
+
|
|
89
|
+
**Per verification call:**
|
|
90
|
+
- 2 viewport screenshots (~50-100KB each)
|
|
91
|
+
- Vision model (gpt-5-mini): ~$0.001 per call
|
|
92
|
+
- Used only when coordinate mode activates (after 3 selector failures)
|
|
93
|
+
|
|
94
|
+
**Typical scenario:**
|
|
95
|
+
- Steps 1-10: Regular selectors → No verification cost
|
|
96
|
+
- Step 5 gets stuck → Coordinate mode → 1 verification call → $0.001
|
|
97
|
+
- Overall impact: Minimal, used sparingly
|
|
98
|
+
|
|
99
|
+
## Example Flow:
|
|
100
|
+
|
|
101
|
+
**Step 5: "Select Employee Information"**
|
|
102
|
+
```
|
|
103
|
+
Iteration 1: getByText('Employee Information') → Strict mode ❌
|
|
104
|
+
Iteration 2: locator('#collapse-1').getByText('Employee Information') → Click succeeds ✅
|
|
105
|
+
BUT: Didn't navigate to Employee Information page (false positive)
|
|
106
|
+
|
|
107
|
+
Iteration 3: Selector fails again
|
|
108
|
+
Iteration 4: 🎯 Coordinate mode
|
|
109
|
+
→ BEFORE: Homepage with sidebar
|
|
110
|
+
→ Click at (19.3%, 22.9%)
|
|
111
|
+
→ Wait 1s
|
|
112
|
+
→ AFTER: Check screenshot
|
|
113
|
+
→ LLM: "goalAchieved": true, "reasoning": "Employee Information page loaded with form"
|
|
114
|
+
→ ✅ Mark complete, exit
|
|
115
|
+
```
|
|
116
|
+
|
|
117
|
+
## Backward Compatibility:
|
|
118
|
+
|
|
119
|
+
✅ **Single image still works:**
|
|
120
|
+
```typescript
|
|
121
|
+
const request = {
|
|
122
|
+
imageUrl: 'data:image/png;base64,...' // Old way
|
|
123
|
+
};
|
|
124
|
+
```
|
|
125
|
+
|
|
126
|
+
✅ **Multi-image NEW:**
|
|
127
|
+
```typescript
|
|
128
|
+
const request = {
|
|
129
|
+
images: [
|
|
130
|
+
{ label: 'BEFORE', dataUrl: '...' },
|
|
131
|
+
{ label: 'AFTER', dataUrl: '...' }
|
|
132
|
+
]
|
|
133
|
+
};
|
|
134
|
+
```
|
|
135
|
+
|
|
136
|
+
## Files Modified:
|
|
137
|
+
|
|
138
|
+
1. `runner-core/src/llm-provider.ts` - Added LabeledImage interface and images field
|
|
139
|
+
2. `scriptservice/providers/scriptservice-llm-provider.ts` - Handle multiple images in OpenAI API
|
|
140
|
+
3. `runner-core/src/orchestrator/orchestrator-agent.ts` - Added verifyGoalWithScreenshotComparison method
|
|
141
|
+
4. Automatic trigger after coordinate actions
|
|
142
|
+
|
|
143
|
+
## Next Steps:
|
|
144
|
+
|
|
145
|
+
- ✅ Infrastructure ready
|
|
146
|
+
- ⏳ Need to test with real scenario
|
|
147
|
+
- 🔮 Future: Could expose as agent-callable tool if needed
|
|
148
|
+
|
|
@@ -0,0 +1,144 @@
|
|
|
1
|
+
# Coordinate Mode Diagnosis - Live Test Results
|
|
2
|
+
|
|
3
|
+
## Test Scenario: PeopleHR Employee Information Flow
|
|
4
|
+
|
|
5
|
+
### ✅ What Worked:
|
|
6
|
+
|
|
7
|
+
1. **Coordinate fallback DID activate** (after fix from >= 3 to >= 5)
|
|
8
|
+
2. **Agent successfully used coordinates** at (87.5%, 23.438%)
|
|
9
|
+
3. **Physical clicks succeeded** - page.mouse.click(1120, 169)
|
|
10
|
+
4. **Agent learned** to stick with coordinates after selectors failed
|
|
11
|
+
|
|
12
|
+
### ❌ What Didn't Work:
|
|
13
|
+
|
|
14
|
+
**Agent hit max iterations (8) without marking "complete"**
|
|
15
|
+
|
|
16
|
+
## Detailed Step 6 Flow:
|
|
17
|
+
|
|
18
|
+
```
|
|
19
|
+
Iteration 1: Selector attempt → Timeout ❌
|
|
20
|
+
Iteration 2: Selector attempt → Timeout ❌
|
|
21
|
+
Iteration 3: Selector attempt → Timeout ❌
|
|
22
|
+
Iteration 4: 🎯 COORDINATE MODE → Click (87.5%, 23.438%) → ✅ Success
|
|
23
|
+
Iteration 5: Repeat coordinate → ✅ Success
|
|
24
|
+
Iteration 6: Repeat coordinate → ✅ Success (?)
|
|
25
|
+
Iteration 7: Repeat coordinate → ✅ Success
|
|
26
|
+
Iteration 8: Repeat coordinate → ✅ Success
|
|
27
|
+
Result: ⚠️ Max iterations → system_limit
|
|
28
|
+
```
|
|
29
|
+
|
|
30
|
+
## Root Cause Analysis:
|
|
31
|
+
|
|
32
|
+
### Problem: **No Goal Verification After Coordinate Success**
|
|
33
|
+
|
|
34
|
+
**With selectors:**
|
|
35
|
+
```typescript
|
|
36
|
+
await page.getByRole('button').click();
|
|
37
|
+
// Can verify: await expect(button).toHaveState('pressed')
|
|
38
|
+
// Can check: New elements appeared, URL changed, etc.
|
|
39
|
+
```
|
|
40
|
+
|
|
41
|
+
**With coordinates:**
|
|
42
|
+
```typescript
|
|
43
|
+
await page.mouse.click(1120, 169);
|
|
44
|
+
// ❓ Did it work? No element reference!
|
|
45
|
+
// ❓ How to verify? Can't check button state
|
|
46
|
+
// ❓ What changed? Need to inspect DOM/screenshot
|
|
47
|
+
```
|
|
48
|
+
|
|
49
|
+
### Why Agent Kept Retrying:
|
|
50
|
+
|
|
51
|
+
**Agent's reasoning (iterations 5-8):**
|
|
52
|
+
- "Coordinate click succeeded (executed without error)"
|
|
53
|
+
- "But I don't know if goal was achieved"
|
|
54
|
+
- "Step says 'Click on New' - did the New form open?"
|
|
55
|
+
- "I should try again to be sure..."
|
|
56
|
+
- → **Loops until max iterations**
|
|
57
|
+
|
|
58
|
+
## Solutions to Consider:
|
|
59
|
+
|
|
60
|
+
### Option 1: **Trust Coordinate Success** (Simple)
|
|
61
|
+
After coordinate click succeeds:
|
|
62
|
+
- Wait 500ms for UI response
|
|
63
|
+
- Mark status="complete" automatically
|
|
64
|
+
- Assume click worked (trust the coordinates)
|
|
65
|
+
|
|
66
|
+
```typescript
|
|
67
|
+
if (coordinateAction && coordResult.allSucceeded) {
|
|
68
|
+
await page.waitForTimeout(500); // Let UI respond
|
|
69
|
+
return { status: 'complete', reasoning: 'Coordinate click succeeded' };
|
|
70
|
+
}
|
|
71
|
+
```
|
|
72
|
+
|
|
73
|
+
**Pros**: Simple, fast
|
|
74
|
+
**Cons**: No verification of actual goal achievement
|
|
75
|
+
|
|
76
|
+
### Option 2: **Visual Verification** (Better)
|
|
77
|
+
After coordinate click:
|
|
78
|
+
- Wait 500ms
|
|
79
|
+
- Take screenshot
|
|
80
|
+
- Compare before/after
|
|
81
|
+
- If changed → complete, else → retry with different coords
|
|
82
|
+
|
|
83
|
+
```typescript
|
|
84
|
+
const beforeScreenshot = await page.screenshot();
|
|
85
|
+
await page.mouse.click(x, y);
|
|
86
|
+
await page.waitForTimeout(500);
|
|
87
|
+
const afterScreenshot = await page.screenshot();
|
|
88
|
+
if (screenshotsAreDifferent(before, after)) {
|
|
89
|
+
return { status: 'complete' };
|
|
90
|
+
}
|
|
91
|
+
```
|
|
92
|
+
|
|
93
|
+
**Pros**: Validates something changed
|
|
94
|
+
**Cons**: Slower, more LLM calls
|
|
95
|
+
|
|
96
|
+
### Option 3: **DOM Change Detection** (Balanced)
|
|
97
|
+
After coordinate click:
|
|
98
|
+
- Capture DOM snapshot before
|
|
99
|
+
- Click coordinates
|
|
100
|
+
- Capture DOM snapshot after
|
|
101
|
+
- If new elements/navigation → complete
|
|
102
|
+
|
|
103
|
+
```typescript
|
|
104
|
+
const beforeUrl = page.url();
|
|
105
|
+
const beforeElements = await getEnhancedPageInfo(page);
|
|
106
|
+
await page.mouse.click(x, y);
|
|
107
|
+
await page.waitForTimeout(500);
|
|
108
|
+
const afterUrl = page.url();
|
|
109
|
+
const afterElements = await getEnhancedPageInfo(page);
|
|
110
|
+
|
|
111
|
+
if (afterUrl !== beforeUrl || afterElements.count !== beforeElements.count) {
|
|
112
|
+
return { status: 'complete', reasoning: 'Page state changed after coordinate click' };
|
|
113
|
+
}
|
|
114
|
+
```
|
|
115
|
+
|
|
116
|
+
**Pros**: Fast, objective verification
|
|
117
|
+
**Cons**: Might miss subtle changes (modal opens without URL/element count change)
|
|
118
|
+
|
|
119
|
+
### Option 4: **Prompt Guidance** (Immediate)
|
|
120
|
+
Update prompt to tell agent:
|
|
121
|
+
"After coordinate click succeeds, mark status='complete' unless you can clearly verify it failed"
|
|
122
|
+
|
|
123
|
+
**Pros**: No code changes
|
|
124
|
+
**Cons**: Relies on LLM judgment
|
|
125
|
+
|
|
126
|
+
## Recommendation:
|
|
127
|
+
|
|
128
|
+
**Hybrid approach:**
|
|
129
|
+
1. **Immediate** (Prompt): Tell agent to trust coordinate success
|
|
130
|
+
2. **Phase 2** (Code): Add DOM change detection for validation
|
|
131
|
+
|
|
132
|
+
## Current Status:
|
|
133
|
+
|
|
134
|
+
- ✅ Coordinate fallback works technically
|
|
135
|
+
- ✅ Physical clicks succeed
|
|
136
|
+
- ❌ Agent doesn't know when to stop
|
|
137
|
+
- 🔧 Need completion detection logic
|
|
138
|
+
|
|
139
|
+
## Test Results Summary:
|
|
140
|
+
|
|
141
|
+
**Steps 1-5**: ✅ All completed successfully
|
|
142
|
+
**Step 6**: ⚠️ Coordinates worked but hit max iterations (no completion detection)
|
|
143
|
+
**Overall**: Coordinate mode is functional but needs completion logic
|
|
144
|
+
|
|
@@ -0,0 +1,108 @@
|
|
|
1
|
+
# Runner-Core Visual Agent Implementation Status
|
|
2
|
+
|
|
3
|
+
## Phase 1: ✅ COMPLETE (v0.0.33)
|
|
4
|
+
|
|
5
|
+
### Implemented Features:
|
|
6
|
+
|
|
7
|
+
1. **Note to Future Self** - Tactical iteration memory
|
|
8
|
+
2. **Percentage-Based Coordinates** - Last-resort fallback with 3-decimal precision
|
|
9
|
+
3. **Two-Tier Auto-Escalation** - Code-controlled mode switching
|
|
10
|
+
|
|
11
|
+
### Current Behavior (Phase 1):
|
|
12
|
+
|
|
13
|
+
```
|
|
14
|
+
Iteration 1-3: Normal Playwright selectors + note-to-self (3 attempts)
|
|
15
|
+
↓ (after 3 failures)
|
|
16
|
+
Iteration 4-5: Percentage coordinates (2 attempts max)
|
|
17
|
+
↓ (if both coordinate attempts fail)
|
|
18
|
+
Give up - mark as stuck
|
|
19
|
+
|
|
20
|
+
Total: Maximum 5 iterations per step
|
|
21
|
+
```
|
|
22
|
+
|
|
23
|
+
---
|
|
24
|
+
|
|
25
|
+
## Phase 2: 📋 PLANNED (Not Started)
|
|
26
|
+
|
|
27
|
+
### Will Add:
|
|
28
|
+
|
|
29
|
+
1. **ElementDetector** - Detect interactive elements with z-index awareness
|
|
30
|
+
2. **VisualMarkerInjector** - Number elements [1], [2], [3] on screenshot
|
|
31
|
+
3. **SelectorResolver** - Translate index → native Playwright selector
|
|
32
|
+
4. **IndexCommandTranslator** - Convert CLICK[3] → native Playwright command
|
|
33
|
+
|
|
34
|
+
### Future Behavior (Phase 2):
|
|
35
|
+
|
|
36
|
+
```
|
|
37
|
+
Iteration 1: Playwright selector (1 attempt) → 70% success
|
|
38
|
+
↓ (on first failure)
|
|
39
|
+
Iteration 2-3: Index commands CLICK[3] (2 attempts) → 25% success
|
|
40
|
+
↓ (after 3 total failures)
|
|
41
|
+
Iteration 4-5: Percentage coordinates (2 attempts max) → 5% success
|
|
42
|
+
↓ (if all fail)
|
|
43
|
+
Give up - mark as stuck
|
|
44
|
+
|
|
45
|
+
Total: Maximum 5 iterations per step (down from 8)
|
|
46
|
+
Average: ~1.5 iterations per step (fast!)
|
|
47
|
+
```
|
|
48
|
+
|
|
49
|
+
### Key Design Principle for Phase 2:
|
|
50
|
+
|
|
51
|
+
**During Execution:**
|
|
52
|
+
- Agent clicks using `data-testchimp-el="[3]"` (reliable, we inject it)
|
|
53
|
+
|
|
54
|
+
**In Generated Script:**
|
|
55
|
+
- Translator outputs NATIVE selector: `getByRole('button', {name: 'Menu'})`
|
|
56
|
+
- Script works standalone without data-testchimp-el
|
|
57
|
+
|
|
58
|
+
**Why Two-Stage:**
|
|
59
|
+
1. Agent needs reliability during exploration → use data attribute
|
|
60
|
+
2. Generated script must be portable → use native selectors
|
|
61
|
+
3. Best of both worlds: reliable execution + maintainable output
|
|
62
|
+
|
|
63
|
+
---
|
|
64
|
+
|
|
65
|
+
## Optimizations vs Original Plan
|
|
66
|
+
|
|
67
|
+
### Original Plan:
|
|
68
|
+
- Tier 1: iterations 1-2
|
|
69
|
+
- Tier 2: iterations 3-4
|
|
70
|
+
- Tier 3: iterations 5+
|
|
71
|
+
- Average: ~4 iterations per step
|
|
72
|
+
|
|
73
|
+
### Optimized Plan (Current):
|
|
74
|
+
- Tier 1: iteration 1 ONLY (fast path)
|
|
75
|
+
- Tier 2: iterations 2-3 (reliable fallback)
|
|
76
|
+
- Tier 3: iterations 4+ (absolute last resort)
|
|
77
|
+
- **Target: ~1.5 average iterations per step**
|
|
78
|
+
|
|
79
|
+
**Rationale:** Don't waste time! Simple tasks finish in 1 iteration, complex tasks escalate quickly to more reliable methods.
|
|
80
|
+
|
|
81
|
+
---
|
|
82
|
+
|
|
83
|
+
## Testing Checklist
|
|
84
|
+
|
|
85
|
+
### Phase 1 (Ready Now):
|
|
86
|
+
- [ ] Run PeopleHR scenario - verify note-to-self helps
|
|
87
|
+
- [ ] Test coordinate fallback on deliberately difficult case
|
|
88
|
+
- [ ] Measure iteration reduction (expect 20-30%)
|
|
89
|
+
- [ ] Verify timeout fixes for waitForLoadState
|
|
90
|
+
|
|
91
|
+
### Phase 2 (When Implemented):
|
|
92
|
+
- [ ] Test ElementDetector on modals/overlays
|
|
93
|
+
- [ ] Verify z-index occlusion detection
|
|
94
|
+
- [ ] Validate native selector generation (no data-testchimp-el in output)
|
|
95
|
+
- [ ] Run generated scripts standalone - must work!
|
|
96
|
+
- [ ] Measure tier distribution: 70/25/5
|
|
97
|
+
|
|
98
|
+
---
|
|
99
|
+
|
|
100
|
+
## Current Version
|
|
101
|
+
|
|
102
|
+
**Runner-Core:** v0.0.33
|
|
103
|
+
**Status:** Built and ready to test
|
|
104
|
+
**Phase 1:** ✅ Complete
|
|
105
|
+
**Phase 2:** 📋 Planned but not started
|
|
106
|
+
|
|
107
|
+
**Next Step:** Test Phase 1 with PeopleHR scenario to validate improvements before implementing Phase 2.
|
|
108
|
+
|
|
@@ -0,0 +1,165 @@
|
|
|
1
|
+
# Phase 1 Implementation - COMPLETE ✅
|
|
2
|
+
|
|
3
|
+
## Version: runner-core v0.0.33
|
|
4
|
+
|
|
5
|
+
## What's Been Implemented
|
|
6
|
+
|
|
7
|
+
### 1. Free-Form "Note to Future Self"
|
|
8
|
+
**Purpose:** Tactical memory - agent leaves notes that persist across iterations AND steps.
|
|
9
|
+
|
|
10
|
+
**Type:**
|
|
11
|
+
```typescript
|
|
12
|
+
interface NoteToFutureSelf {
|
|
13
|
+
fromIteration: number;
|
|
14
|
+
content: string; // FREE-FORM - agent writes whatever it wants
|
|
15
|
+
}
|
|
16
|
+
```
|
|
17
|
+
|
|
18
|
+
**How it works:**
|
|
19
|
+
- Agent includes `"noteToFutureSelf": "..."` in response
|
|
20
|
+
- System stores it in `memory.latestNote` (persists across steps!)
|
|
21
|
+
- Passed to next iteration AND next step
|
|
22
|
+
- Displayed prominently at top of prompt
|
|
23
|
+
- Agent reads it FIRST before making decision
|
|
24
|
+
|
|
25
|
+
**Scope:** Entire scenario journey (not just current step)
|
|
26
|
+
|
|
27
|
+
**Example notes:**
|
|
28
|
+
|
|
29
|
+
*Iteration-specific:*
|
|
30
|
+
- "Tried #sidebar-toggle, failed with 'not clickable'. Will try child SVG element next."
|
|
31
|
+
|
|
32
|
+
*Step-spanning:*
|
|
33
|
+
- "This app has slow-loading modals. Always wait 2s after page load before clicking."
|
|
34
|
+
- "Cookie consent appears on every page. Check for and dismiss it first."
|
|
35
|
+
- "Sidebar only visible on desktop viewport (>1024px width)."
|
|
36
|
+
|
|
37
|
+
### 2. Percentage-Based Coordinate Fallback
|
|
38
|
+
**Purpose:** Last-resort mechanism when selector generation repeatedly fails.
|
|
39
|
+
|
|
40
|
+
**Type:**
|
|
41
|
+
```typescript
|
|
42
|
+
interface CoordinateAction {
|
|
43
|
+
type: 'coordinate';
|
|
44
|
+
action: 'click' | 'doubleClick' | 'rightClick' | 'hover' | 'drag' | 'fill' | 'scroll';
|
|
45
|
+
xPercent: number; // 0-100, 3 decimal precision
|
|
46
|
+
yPercent: number;
|
|
47
|
+
toXPercent?: number; // For drag
|
|
48
|
+
toYPercent?: number;
|
|
49
|
+
value?: string; // For fill
|
|
50
|
+
scrollAmount?: number; // For scroll
|
|
51
|
+
}
|
|
52
|
+
```
|
|
53
|
+
|
|
54
|
+
**How it works:**
|
|
55
|
+
- LLM outputs percentages: `{xPercent: 15.755, yPercent: 8.500}`
|
|
56
|
+
- CoordinateConverter converts to pixels: `15.755% → 252px`
|
|
57
|
+
- Generates Playwright command: `await page.mouse.click(252, 68);`
|
|
58
|
+
|
|
59
|
+
**Supported actions:**
|
|
60
|
+
- click, doubleClick, rightClick, hover
|
|
61
|
+
- fill (clicks then types value)
|
|
62
|
+
- drag (from x%,y% to toX%,toY%)
|
|
63
|
+
- scroll (at position, by amount)
|
|
64
|
+
|
|
65
|
+
### 3. Two-Tier Auto-Escalation
|
|
66
|
+
**Trigger:** Code-controlled (not LLM-decided)
|
|
67
|
+
|
|
68
|
+
```
|
|
69
|
+
Tier 1 (iterations 1-3): Playwright Selector Mode
|
|
70
|
+
├─ Normal buildSystemPrompt()
|
|
71
|
+
├─ Agent generates: await page.getByRole(...).click()
|
|
72
|
+
├─ Leaves noteToFutureSelf for continuity
|
|
73
|
+
└─ 3 attempts, then escalate
|
|
74
|
+
|
|
75
|
+
Tier 2 (iterations 4-5): Coordinate Mode
|
|
76
|
+
├─ Auto-activates when consecutiveFailures >= 3
|
|
77
|
+
├─ Uses buildCoordinateSystemPrompt()
|
|
78
|
+
├─ Agent outputs: {xPercent: 15.755, yPercent: 8.500}
|
|
79
|
+
├─ CoordinateConverter → mouse.click(x, y)
|
|
80
|
+
└─ 2 attempts max, then give up
|
|
81
|
+
|
|
82
|
+
Total: Maximum 5 iterations per step
|
|
83
|
+
```
|
|
84
|
+
|
|
85
|
+
### 4. Precision & Accuracy
|
|
86
|
+
- **3 decimal precision** for coordinates (~1px accuracy on most screens)
|
|
87
|
+
- **Resolution-independent** - works on any viewport size
|
|
88
|
+
- **Percentage reference:**
|
|
89
|
+
- Top-left: (0, 0)
|
|
90
|
+
- Top-right: (100, 0)
|
|
91
|
+
- Center: (50, 50)
|
|
92
|
+
- Bottom-right: (100, 100)
|
|
93
|
+
|
|
94
|
+
## Files Modified
|
|
95
|
+
|
|
96
|
+
1. **orchestrator/types.ts**
|
|
97
|
+
- Added `NoteToFutureSelf` interface
|
|
98
|
+
- Added `CoordinateAction` interface
|
|
99
|
+
- Updated `AgentDecision` with new fields
|
|
100
|
+
- Updated `AgentContext` with noteFromPreviousIteration
|
|
101
|
+
|
|
102
|
+
2. **orchestrator/orchestrator-agent.ts**
|
|
103
|
+
- Added note tracking in executeStep()
|
|
104
|
+
- Added coordinate action execution
|
|
105
|
+
- Added buildCoordinateSystemPrompt()
|
|
106
|
+
- Updated buildUserPrompt() to display notes
|
|
107
|
+
- Added mode switching in callAgent()
|
|
108
|
+
- Updated response format documentation
|
|
109
|
+
|
|
110
|
+
3. **utils/coordinate-converter.ts** (NEW)
|
|
111
|
+
- percentToPixels() - Convert % to pixels
|
|
112
|
+
- getViewportSize() - Get current viewport dimensions
|
|
113
|
+
- generateCommands() - Create Playwright commands from percentages
|
|
114
|
+
- executeAction() - Direct execution helper
|
|
115
|
+
|
|
116
|
+
4. **scenario-worker-class.ts** (Earlier fix)
|
|
117
|
+
- Smart timeout handling for waitForLoadState
|
|
118
|
+
|
|
119
|
+
5. **execution-service.ts** (Earlier fix)
|
|
120
|
+
- Smart timeout handling for navigation commands
|
|
121
|
+
|
|
122
|
+
## How to Use
|
|
123
|
+
|
|
124
|
+
**No code changes needed!** The features activate automatically:
|
|
125
|
+
|
|
126
|
+
1. **Note to self:** Agent can optionally include `noteToFutureSelf` in any iteration
|
|
127
|
+
2. **Coordinates:** Auto-activate at iteration 4 if selectors keep failing
|
|
128
|
+
|
|
129
|
+
## Testing Phase 1
|
|
130
|
+
|
|
131
|
+
To validate the implementation:
|
|
132
|
+
|
|
133
|
+
1. **Run PeopleHR scenario** (previously failed on hamburger menu)
|
|
134
|
+
- Should now succeed with note guidance
|
|
135
|
+
- May use coordinates if SVG selector still fails
|
|
136
|
+
|
|
137
|
+
2. **Check logs for:**
|
|
138
|
+
- `📝 Note to self: ...` (agent leaving tactical notes)
|
|
139
|
+
- `🎯 COORDINATE MODE ACTIVATED` (tier 2 triggered)
|
|
140
|
+
- `🎯 Coordinate Action: click at (X%, Y%)` (using fallback)
|
|
141
|
+
|
|
142
|
+
3. **Expected improvements:**
|
|
143
|
+
- 20-30% fewer iterations per step (thanks to notes)
|
|
144
|
+
- < 5% scenarios need coordinate fallback
|
|
145
|
+
- Coordinates work when everything else fails
|
|
146
|
+
|
|
147
|
+
## Phase 2 Preview (Not Yet Implemented)
|
|
148
|
+
|
|
149
|
+
When Phase 2 is added, it will become a **three-tier** system:
|
|
150
|
+
- Tier 1 (iterations 1-2): Playwright selectors
|
|
151
|
+
- Tier 2 (iterations 3-4): Numbered elements (CLICK[3])
|
|
152
|
+
- Tier 3 (iterations 5+): Percentage coordinates
|
|
153
|
+
|
|
154
|
+
Phase 2 adds visual markers [1], [2], [3] on elements with structured commands.
|
|
155
|
+
|
|
156
|
+
---
|
|
157
|
+
|
|
158
|
+
## Status: ✅ READY FOR TESTING
|
|
159
|
+
|
|
160
|
+
Runner-core v0.0.33 is built and ready. Test it with:
|
|
161
|
+
- VS Code extension "Run Test" on peoplehr-corrected.smart.spec.ts
|
|
162
|
+
- Or generate new script from peoplehr.txt scenario
|
|
163
|
+
|
|
164
|
+
**Next:** Validate Phase 1 works before starting Phase 2.
|
|
165
|
+
|