testchimp-runner-core 0.0.35 → 0.0.36
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/package.json +6 -1
- package/plandocs/BEFORE_AFTER_VERIFICATION.md +0 -148
- package/plandocs/COORDINATE_MODE_DIAGNOSIS.md +0 -144
- package/plandocs/CREDIT_CALLBACK_ARCHITECTURE.md +0 -253
- package/plandocs/HUMAN_LIKE_IMPROVEMENTS.md +0 -642
- package/plandocs/IMPLEMENTATION_STATUS.md +0 -108
- package/plandocs/INTEGRATION_COMPLETE.md +0 -322
- package/plandocs/MULTI_AGENT_ARCHITECTURE_REVIEW.md +0 -844
- package/plandocs/ORCHESTRATOR_MVP_SUMMARY.md +0 -539
- package/plandocs/PHASE1_ABSTRACTION_COMPLETE.md +0 -241
- package/plandocs/PHASE1_FINAL_STATUS.md +0 -210
- package/plandocs/PHASE_1_COMPLETE.md +0 -165
- package/plandocs/PHASE_1_SUMMARY.md +0 -184
- package/plandocs/PLANNING_SESSION_SUMMARY.md +0 -372
- package/plandocs/PROMPT_OPTIMIZATION_ANALYSIS.md +0 -120
- package/plandocs/PROMPT_SANITY_CHECK.md +0 -120
- package/plandocs/SCRIPT_CLEANUP_FEATURE.md +0 -201
- package/plandocs/SCRIPT_GENERATION_ARCHITECTURE.md +0 -364
- package/plandocs/SELECTOR_IMPROVEMENTS.md +0 -139
- package/plandocs/SESSION_SUMMARY_v0.0.33.md +0 -151
- package/plandocs/TROUBLESHOOTING_SESSION.md +0 -72
- package/plandocs/VISION_DIAGNOSTICS_IMPROVEMENTS.md +0 -336
- package/plandocs/VISUAL_AGENT_EVOLUTION_PLAN.md +0 -396
- package/plandocs/WHATS_NEW_v0.0.33.md +0 -183
- package/plandocs/exploratory-mode-support-v2.plan.md +0 -953
- package/plandocs/exploratory-mode-support.plan.md +0 -928
- package/plandocs/journey-id-tracking-addendum.md +0 -227
- package/releasenotes/RELEASE_0.0.26.md +0 -165
- package/releasenotes/RELEASE_0.0.27.md +0 -236
- package/releasenotes/RELEASE_0.0.28.md +0 -286
- package/src/auth-config.ts +0 -84
- package/src/credit-usage-service.ts +0 -188
- package/src/env-loader.ts +0 -103
- package/src/execution-service.ts +0 -996
- package/src/file-handler.ts +0 -104
- package/src/index.ts +0 -432
- package/src/llm-facade.ts +0 -821
- package/src/llm-provider.ts +0 -53
- package/src/model-constants.ts +0 -35
- package/src/orchestrator/decision-parser.ts +0 -139
- package/src/orchestrator/index.ts +0 -58
- package/src/orchestrator/orchestrator-agent.ts +0 -1282
- package/src/orchestrator/orchestrator-prompts.ts +0 -786
- package/src/orchestrator/page-som-handler.ts +0 -1565
- package/src/orchestrator/som-types.ts +0 -188
- package/src/orchestrator/tool-registry.ts +0 -184
- package/src/orchestrator/tools/check-page-ready.ts +0 -75
- package/src/orchestrator/tools/extract-data.ts +0 -92
- package/src/orchestrator/tools/index.ts +0 -15
- package/src/orchestrator/tools/inspect-page.ts +0 -42
- package/src/orchestrator/tools/recall-history.ts +0 -72
- package/src/orchestrator/tools/refresh-som-markers.ts +0 -69
- package/src/orchestrator/tools/take-screenshot.ts +0 -128
- package/src/orchestrator/tools/verify-action-result.ts +0 -159
- package/src/orchestrator/tools/view-previous-screenshot.ts +0 -103
- package/src/orchestrator/types.ts +0 -291
- package/src/playwright-mcp-service.ts +0 -224
- package/src/progress-reporter.ts +0 -144
- package/src/prompts.ts +0 -842
- package/src/providers/backend-proxy-llm-provider.ts +0 -91
- package/src/providers/local-llm-provider.ts +0 -38
- package/src/scenario-service.ts +0 -252
- package/src/scenario-worker-class.ts +0 -1110
- package/src/script-utils.ts +0 -203
- package/src/types.ts +0 -239
- package/src/utils/browser-utils.ts +0 -348
- package/src/utils/coordinate-converter.ts +0 -162
- package/src/utils/page-info-retry.ts +0 -65
- package/src/utils/page-info-utils.ts +0 -285
- package/testchimp-runner-core-0.0.35.tgz +0 -0
- package/tsconfig.json +0 -19
|
@@ -1,120 +0,0 @@
|
|
|
1
|
-
# Prompt Sanity Check - Runner-Core v0.0.33
|
|
2
|
-
|
|
3
|
-
## ✅ STRENGTHS
|
|
4
|
-
|
|
5
|
-
### System Prompt (`buildSystemPrompt`)
|
|
6
|
-
- ✅ Required fields clearly marked at top (status, reasoning, statusReasoning)
|
|
7
|
-
- ✅ Comprehensive JSON format with examples
|
|
8
|
-
- ✅ Clear status decision rules
|
|
9
|
-
- ✅ Good blocker detection guidance
|
|
10
|
-
- ✅ Semantic selector preference clearly explained with examples
|
|
11
|
-
- ✅ Tool vs command distinction is clear
|
|
12
|
-
- ✅ Coordinate fallback documented
|
|
13
|
-
|
|
14
|
-
### User Prompt (`buildUserPrompt`)
|
|
15
|
-
- ✅ Static content first (cache-optimized)
|
|
16
|
-
- ✅ Dynamic content last (current state, page info)
|
|
17
|
-
- ✅ Notes from previous iteration shown prominently
|
|
18
|
-
- ✅ Clear warnings for consecutive failures
|
|
19
|
-
- ✅ Coordinate mode trigger clear
|
|
20
|
-
|
|
21
|
-
## ⚠️ ISSUES FOUND
|
|
22
|
-
|
|
23
|
-
### 1. **Duplication/Redundancy**
|
|
24
|
-
- ❌ "Use semantic selectors" mentioned in:
|
|
25
|
-
- System prompt (line ~605: "SELECTOR PREFERENCE")
|
|
26
|
-
- User prompt (line ~860: "SELECTOR STRATEGY")
|
|
27
|
-
- **FIX**: Remove from user prompt, keep in system prompt only
|
|
28
|
-
|
|
29
|
-
### 2. **Length Concerns**
|
|
30
|
-
- ⚠️ System prompt is ~325 lines (very long)
|
|
31
|
-
- ⚠️ May cause LLM to miss critical details in the middle
|
|
32
|
-
- **SUGGESTION**: Consider breaking into sections or condensing
|
|
33
|
-
|
|
34
|
-
### 3. **Conflicting Guidance**
|
|
35
|
-
- ⚠️ Line ~469: "stuck: Tried 3+ iterations"
|
|
36
|
-
- But coordinate mode triggers at 3 failures (line ~904)
|
|
37
|
-
- **FIX**: Clarify: stuck = 5 attempts total (3 regular + 2 coordinate)
|
|
38
|
-
|
|
39
|
-
### 4. **Unclear Iteration Count**
|
|
40
|
-
- ❌ Line ~714: "When iteration count reaches 4+"
|
|
41
|
-
- ❌ Line ~748: "iteration 4+"
|
|
42
|
-
- ✅ But code triggers at 3 failures
|
|
43
|
-
- **FIX**: Update prompt to say "iteration 4+" (0,1,2 = 3 failures, next is #3 which is 4th iteration)
|
|
44
|
-
|
|
45
|
-
### 5. **Missing Information**
|
|
46
|
-
- ❌ Max iterations per step not mentioned (code has 5)
|
|
47
|
-
- **FIX**: Add to system prompt: "MAX 5 iterations per step"
|
|
48
|
-
|
|
49
|
-
### 6. **Verbosity**
|
|
50
|
-
- ⚠️ Examples section (lines ~617-628) is great but long
|
|
51
|
-
- ⚠️ Multiple emoji warnings (⚠️⚠️⚠️) can be reduced to single ⚠️
|
|
52
|
-
- **SUGGESTION**: Keep examples, reduce emoji spam
|
|
53
|
-
|
|
54
|
-
## 🔧 RECOMMENDED FIXES
|
|
55
|
-
|
|
56
|
-
### Priority 1 (Critical):
|
|
57
|
-
1. Remove duplicate selector strategy from user prompt
|
|
58
|
-
2. Clarify max iterations (5 total)
|
|
59
|
-
3. Fix coordinate mode iteration number (4th iteration = after 3 failures)
|
|
60
|
-
|
|
61
|
-
### Priority 2 (Nice to have):
|
|
62
|
-
4. Condense system prompt if possible (target: 250 lines)
|
|
63
|
-
5. Reduce emoji overuse
|
|
64
|
-
6. Add section headers in system prompt for clarity
|
|
65
|
-
|
|
66
|
-
## 📊 PROMPT STRUCTURE ANALYSIS
|
|
67
|
-
|
|
68
|
-
### System Prompt Sections:
|
|
69
|
-
1. Introduction (1 line)
|
|
70
|
-
2. Tool descriptions (dynamic, from registry)
|
|
71
|
-
3. JSON format (40 lines) ✅
|
|
72
|
-
4. Status rules (15 lines) ✅
|
|
73
|
-
5. Step re-evaluation (20 lines) ✅
|
|
74
|
-
6. Blocker detection (25 lines) ✅
|
|
75
|
-
7. Experiences (25 lines) ✅
|
|
76
|
-
8. Critical rules (200 lines) ⚠️ TOO LONG
|
|
77
|
-
9. Coordinate actions (45 lines) ✅
|
|
78
|
-
|
|
79
|
-
**TOTAL**: ~370 lines (with tool descriptions)
|
|
80
|
-
|
|
81
|
-
### User Prompt Sections:
|
|
82
|
-
1. Static instructions (20 lines) - **Cache-friendly** ✅
|
|
83
|
-
2. Dynamic context marker (1 line) ✅
|
|
84
|
-
3. Notes from previous iteration (5 lines) ✅
|
|
85
|
-
4. Warnings for failures (15 lines) ✅
|
|
86
|
-
5. Coordinate mode trigger (8 lines) ✅
|
|
87
|
-
6. Current step goal (10 lines) ✅
|
|
88
|
-
7. Page state (50-100 lines, variable) ✅
|
|
89
|
-
8. Recent steps (20-50 lines, variable) ✅
|
|
90
|
-
9. Experiences (10 lines) ✅
|
|
91
|
-
|
|
92
|
-
**TOTAL**: ~140-200 lines per call
|
|
93
|
-
|
|
94
|
-
## 🎯 RECOMMENDATION SUMMARY
|
|
95
|
-
|
|
96
|
-
**Keep as-is:**
|
|
97
|
-
- JSON structure
|
|
98
|
-
- Semantic selector examples
|
|
99
|
-
- Blocker detection
|
|
100
|
-
- Note to future self
|
|
101
|
-
- Coordinate fallback
|
|
102
|
-
- Cache optimization
|
|
103
|
-
|
|
104
|
-
**Fix:**
|
|
105
|
-
- Remove selector duplication in user prompt
|
|
106
|
-
- Clarify iteration counts
|
|
107
|
-
- Add max iteration limit
|
|
108
|
-
- Reduce emoji spam
|
|
109
|
-
|
|
110
|
-
**Consider:**
|
|
111
|
-
- Condensing "Critical Rules" section (currently 200 lines)
|
|
112
|
-
- Moving some examples to external docs
|
|
113
|
-
- Breaking long sections with clear headers
|
|
114
|
-
|
|
115
|
-
## Overall Assessment: **8/10**
|
|
116
|
-
- Prompts are comprehensive and well-structured
|
|
117
|
-
- Main issues are length and minor redundancies
|
|
118
|
-
- Cache optimization is excellent
|
|
119
|
-
- A few clarity fixes needed for iteration counts
|
|
120
|
-
|
|
@@ -1,201 +0,0 @@
|
|
|
1
|
-
# Script Cleanup Feature
|
|
2
|
-
|
|
3
|
-
## Summary
|
|
4
|
-
Added a final cleanup step in the script generation pipeline that uses an LLM to make minor adjustments to the generated test script, removing redundancies and improving code quality without changing the core logic.
|
|
5
|
-
|
|
6
|
-
## Purpose
|
|
7
|
-
After the orchestrator generates test steps, there may be minor redundancies or formatting issues:
|
|
8
|
-
- Duplicate expect() assertions
|
|
9
|
-
- Redundant waits or checks
|
|
10
|
-
- Inconsistent formatting
|
|
11
|
-
- Orphaned step comments without code
|
|
12
|
-
|
|
13
|
-
The cleanup step acts as a final sanity check to polish the generated script while preserving its core functionality.
|
|
14
|
-
|
|
15
|
-
## Implementation
|
|
16
|
-
|
|
17
|
-
### 1. New Prompt (`prompts.ts`)
|
|
18
|
-
|
|
19
|
-
**SCRIPT_CLEANUP** prompt with clear guidelines:
|
|
20
|
-
- **DO:** Remove duplicates, fix formatting, consolidate identical assertions
|
|
21
|
-
- **DO NOT:** Change test logic, remove legitimate assertions, restructure code, change selectors, add new functionality
|
|
22
|
-
|
|
23
|
-
**Examples in prompt:**
|
|
24
|
-
```typescript
|
|
25
|
-
// ❌ REMOVE redundancy:
|
|
26
|
-
await expect(page.getByText('Hello')).toBeVisible();
|
|
27
|
-
await expect(page.getByText('Hello')).toBeVisible(); // duplicate
|
|
28
|
-
|
|
29
|
-
// ✅ KEEP legitimate checks:
|
|
30
|
-
await expect(page.getByPlaceholder('Message...')).toBeEmpty();
|
|
31
|
-
await page.getByPlaceholder('Message...').fill('Hello');
|
|
32
|
-
await expect(page.getByPlaceholder('Message...')).toHaveValue('Hello'); // different checks
|
|
33
|
-
```
|
|
34
|
-
|
|
35
|
-
### 2. New Method in LLMFacade (`llm-facade.ts`)
|
|
36
|
-
|
|
37
|
-
```typescript
|
|
38
|
-
async cleanupScript(script: string, model?: string): Promise<{
|
|
39
|
-
script: string;
|
|
40
|
-
changes: string[];
|
|
41
|
-
skipped?: string;
|
|
42
|
-
}>
|
|
43
|
-
```
|
|
44
|
-
|
|
45
|
-
**Behavior:**
|
|
46
|
-
- Calls LLM with SCRIPT_CLEANUP prompt
|
|
47
|
-
- Parses JSON response with cleaned script and list of changes
|
|
48
|
-
- Returns original script on error (safe fallback)
|
|
49
|
-
- Logs all changes made for transparency
|
|
50
|
-
|
|
51
|
-
**Error Handling:**
|
|
52
|
-
- Invalid JSON → return original script
|
|
53
|
-
- Missing fields → return original script
|
|
54
|
-
- LLM error → return original script
|
|
55
|
-
- Never fails the generation process
|
|
56
|
-
|
|
57
|
-
### 3. Integration into Scenario Worker (`scenario-worker-class.ts`)
|
|
58
|
-
|
|
59
|
-
Added cleanup step immediately after `generateTestScript()`:
|
|
60
|
-
|
|
61
|
-
```typescript
|
|
62
|
-
// Generate clean script with TestChimp comment and code
|
|
63
|
-
generatedScript = generateTestScript(testName, steps, undefined, hashtags);
|
|
64
|
-
|
|
65
|
-
// Perform final cleanup pass to remove redundancies and make minor adjustments
|
|
66
|
-
this.log(`[ScenarioWorker] Performing final script cleanup...`);
|
|
67
|
-
try {
|
|
68
|
-
const cleanupResult = await this.llmFacade.cleanupScript(generatedScript, job.model);
|
|
69
|
-
|
|
70
|
-
if (cleanupResult.changes && cleanupResult.changes.length > 0) {
|
|
71
|
-
this.log(`[ScenarioWorker] Cleanup made ${cleanupResult.changes.length} improvement(s):`);
|
|
72
|
-
cleanupResult.changes.forEach((change, i) => {
|
|
73
|
-
this.log(`[ScenarioWorker] ${i + 1}. ${change}`);
|
|
74
|
-
});
|
|
75
|
-
generatedScript = cleanupResult.script;
|
|
76
|
-
} else if (cleanupResult.skipped) {
|
|
77
|
-
this.log(`[ScenarioWorker] Cleanup skipped: ${cleanupResult.skipped}`);
|
|
78
|
-
} else {
|
|
79
|
-
this.log(`[ScenarioWorker] Cleanup completed - no changes needed`);
|
|
80
|
-
}
|
|
81
|
-
} catch (error: any) {
|
|
82
|
-
this.log(`[ScenarioWorker] Cleanup failed, using original script: ${error.message}`);
|
|
83
|
-
// Continue with original script on error
|
|
84
|
-
}
|
|
85
|
-
```
|
|
86
|
-
|
|
87
|
-
## What Gets Cleaned Up
|
|
88
|
-
|
|
89
|
-
### ✅ Redundancies Removed
|
|
90
|
-
1. **Duplicate assertions:**
|
|
91
|
-
```typescript
|
|
92
|
-
// Before cleanup
|
|
93
|
-
await expect(page.getByText('Hello')).toBeVisible();
|
|
94
|
-
await expect(page.getByText('Hello')).toBeVisible();
|
|
95
|
-
|
|
96
|
-
// After cleanup
|
|
97
|
-
await expect(page.getByText('Hello')).toBeVisible();
|
|
98
|
-
```
|
|
99
|
-
|
|
100
|
-
2. **Redundant URL checks:**
|
|
101
|
-
```typescript
|
|
102
|
-
// Before cleanup
|
|
103
|
-
await expect(page).toHaveURL(/\/messages/);
|
|
104
|
-
await expect(page).toHaveURL(/\/messages/);
|
|
105
|
-
|
|
106
|
-
// After cleanup
|
|
107
|
-
await expect(page).toHaveURL(/\/messages/);
|
|
108
|
-
```
|
|
109
|
-
|
|
110
|
-
3. **Duplicate comments without code** (already handled by script generation, but this is a safety net)
|
|
111
|
-
|
|
112
|
-
### ✅ Minor Formatting Fixes
|
|
113
|
-
- Inconsistent spacing
|
|
114
|
-
- Alignment issues
|
|
115
|
-
- Obvious formatting problems
|
|
116
|
-
|
|
117
|
-
### ❌ Preserved (Not Changed)
|
|
118
|
-
- Test logic and flow
|
|
119
|
-
- Legitimate assertions (same locator, different expectations)
|
|
120
|
-
- Important waits
|
|
121
|
-
- Selectors
|
|
122
|
-
- Test structure
|
|
123
|
-
- Any functionality
|
|
124
|
-
|
|
125
|
-
## Safety Features
|
|
126
|
-
|
|
127
|
-
### 1. Conservative Approach
|
|
128
|
-
- Only makes changes when confident they're safe
|
|
129
|
-
- Prompt explicitly warns against major changes
|
|
130
|
-
- Focuses on "obvious" redundancies only
|
|
131
|
-
|
|
132
|
-
### 2. Transparency
|
|
133
|
-
- Logs all changes made with descriptions
|
|
134
|
-
- Makes it easy to see what was modified
|
|
135
|
-
- Helps debug if cleanup causes issues
|
|
136
|
-
|
|
137
|
-
### 3. Graceful Degradation
|
|
138
|
-
- Any error → return original script
|
|
139
|
-
- Invalid response → return original script
|
|
140
|
-
- Never breaks the generation pipeline
|
|
141
|
-
- Cleanup is an enhancement, not a requirement
|
|
142
|
-
|
|
143
|
-
### 4. Idempotent
|
|
144
|
-
- Running cleanup twice should produce the same result
|
|
145
|
-
- No cumulative changes or drift
|
|
146
|
-
|
|
147
|
-
## Example Output
|
|
148
|
-
|
|
149
|
-
**Console logs:**
|
|
150
|
-
```
|
|
151
|
-
[ScenarioWorker] Performing final script cleanup...
|
|
152
|
-
[LLMFacade] Script cleanup completed. Changes: 2
|
|
153
|
-
[LLMFacade] 1. Removed duplicate expect assertion for message visibility
|
|
154
|
-
[LLMFacade] 2. Consolidated redundant URL checks into single assertion
|
|
155
|
-
[ScenarioWorker] Cleanup made 2 improvement(s):
|
|
156
|
-
[ScenarioWorker] 1. Removed duplicate expect assertion for message visibility
|
|
157
|
-
[ScenarioWorker] 2. Consolidated redundant URL checks into single assertion
|
|
158
|
-
```
|
|
159
|
-
|
|
160
|
-
## Benefits
|
|
161
|
-
|
|
162
|
-
1. **Cleaner Scripts:** Removes redundancies that can make tests harder to read
|
|
163
|
-
2. **Reduced Token Usage:** Shorter scripts mean less tokens consumed by users
|
|
164
|
-
3. **Better Maintainability:** Clean code is easier to understand and modify
|
|
165
|
-
4. **Safety Net:** Catches issues that might slip through orchestrator logic
|
|
166
|
-
5. **Zero Risk:** Fallback to original script on any error
|
|
167
|
-
|
|
168
|
-
## Performance Impact
|
|
169
|
-
|
|
170
|
-
- **Time:** Adds one LLM call at the end (~1-3 seconds)
|
|
171
|
-
- **Cost:** One additional LLM call per script generation
|
|
172
|
-
- **Benefit:** Catches redundancies that would otherwise be in production tests
|
|
173
|
-
|
|
174
|
-
The small overhead is worthwhile for the quality improvement.
|
|
175
|
-
|
|
176
|
-
## Future Enhancements
|
|
177
|
-
|
|
178
|
-
Possible improvements:
|
|
179
|
-
1. **Configurable:** Allow users to disable cleanup if they prefer
|
|
180
|
-
2. **More Rules:** Add more specific cleanup patterns
|
|
181
|
-
3. **Static Analysis:** Use AST parsing instead of LLM for some checks (faster, cheaper)
|
|
182
|
-
4. **Metrics:** Track how often cleanup makes changes vs. no-ops
|
|
183
|
-
|
|
184
|
-
## Files Modified
|
|
185
|
-
|
|
186
|
-
1. `/src/prompts.ts` - Added SCRIPT_CLEANUP prompt
|
|
187
|
-
2. `/src/llm-facade.ts` - Added cleanupScript() method
|
|
188
|
-
3. `/src/scenario-worker-class.ts` - Integrated cleanup into generation pipeline
|
|
189
|
-
|
|
190
|
-
## Testing
|
|
191
|
-
|
|
192
|
-
The feature is safe to deploy because:
|
|
193
|
-
- Falls back to original script on any error
|
|
194
|
-
- Doesn't break existing functionality
|
|
195
|
-
- Only makes conservative changes
|
|
196
|
-
- Logs all modifications for review
|
|
197
|
-
|
|
198
|
-
## Conclusion
|
|
199
|
-
|
|
200
|
-
The script cleanup feature adds a lightweight final polish step to the generation pipeline, removing redundancies and improving code quality without risk to the core test logic. It's a safety net that catches issues the orchestrator might miss while maintaining backward compatibility and graceful error handling.
|
|
201
|
-
|
|
@@ -1,364 +0,0 @@
|
|
|
1
|
-
# Script Generation Architecture & Work Plan
|
|
2
|
-
|
|
3
|
-
## Overview
|
|
4
|
-
AI-powered test script generation from natural language scenarios using LLM-guided Playwright automation with vision-based fallback diagnostics.
|
|
5
|
-
|
|
6
|
-
## Architecture Flow
|
|
7
|
-
|
|
8
|
-
```
|
|
9
|
-
User Scenario (text file)
|
|
10
|
-
↓
|
|
11
|
-
1. Scenario Breakdown (LLM)
|
|
12
|
-
↓
|
|
13
|
-
2. Step-by-Step Execution
|
|
14
|
-
↓
|
|
15
|
-
3. Command Generation (LLM + DOM)
|
|
16
|
-
↓
|
|
17
|
-
4. Playwright Execution
|
|
18
|
-
↓
|
|
19
|
-
5. Goal Completion Check (LLM)
|
|
20
|
-
↓
|
|
21
|
-
6. Vision Fallback (if needed)
|
|
22
|
-
↓
|
|
23
|
-
7. Script Generation
|
|
24
|
-
```
|
|
25
|
-
|
|
26
|
-
## Components
|
|
27
|
-
|
|
28
|
-
### 1. Scenario Breakdown
|
|
29
|
-
**File:** `llm-facade.ts` → `breakdownScenario()`
|
|
30
|
-
**Prompt:** `PROMPTS.SCENARIO_BREAKDOWN`
|
|
31
|
-
|
|
32
|
-
**Input:** Natural language scenario
|
|
33
|
-
```
|
|
34
|
-
- Go to https://app.com
|
|
35
|
-
- Login with credentials: admin, pass123
|
|
36
|
-
- Click on settings
|
|
37
|
-
```
|
|
38
|
-
|
|
39
|
-
**Output:** Structured steps
|
|
40
|
-
```json
|
|
41
|
-
{
|
|
42
|
-
"steps": [
|
|
43
|
-
"Go to https://app.com",
|
|
44
|
-
"Login with credentials: admin, pass123",
|
|
45
|
-
"Click on settings"
|
|
46
|
-
]
|
|
47
|
-
}
|
|
48
|
-
```
|
|
49
|
-
|
|
50
|
-
**Key Principles:**
|
|
51
|
-
- ✅ Preserve ALL specific values (credentials, names, amounts, etc.)
|
|
52
|
-
- ✅ Keep steps semantic (no technical selectors)
|
|
53
|
-
- ✅ One clear action per step
|
|
54
|
-
- ❌ Never replace values with variables/placeholders
|
|
55
|
-
|
|
56
|
-
### 2. Step Execution Loop
|
|
57
|
-
**File:** `scenario-worker-class.ts` → `processScenarioJob()`
|
|
58
|
-
|
|
59
|
-
**For each step:**
|
|
60
|
-
1. Initialize step tracking
|
|
61
|
-
2. Execute sub-actions until goal complete
|
|
62
|
-
3. Track failures and successes
|
|
63
|
-
4. Generate final script
|
|
64
|
-
|
|
65
|
-
**Counters:**
|
|
66
|
-
- `subActionCount`: Number of different commands tried for this step
|
|
67
|
-
- `totalFailedAttemptsForStep`: Total failures across all sub-actions
|
|
68
|
-
- `attempt`: Retry count within current sub-action (0-3)
|
|
69
|
-
|
|
70
|
-
### 3. Command Generation
|
|
71
|
-
**File:** `llm-facade.ts` → `generatePlaywrightCommand()`
|
|
72
|
-
**Prompt:** `PROMPTS.PLAYWRIGHT_COMMAND`
|
|
73
|
-
|
|
74
|
-
**Context Provided:**
|
|
75
|
-
- Goal description
|
|
76
|
-
- Current page state (DOM snapshot)
|
|
77
|
-
- Previous commands in this step
|
|
78
|
-
- Previous step history
|
|
79
|
-
- Last error (if retry)
|
|
80
|
-
|
|
81
|
-
**Key Principles:**
|
|
82
|
-
1. **Extract specific values from goal** - Use exact credentials, names, amounts from goal description
|
|
83
|
-
2. **Navigation handling** - Use `{ waitUntil: 'domcontentloaded', timeout: 10000 }` for redirects
|
|
84
|
-
3. **Check current URL** - Don't retry navigation if already navigated (even if redirected)
|
|
85
|
-
4. **Never hallucinate verification** - Only verify what goal explicitly asks for
|
|
86
|
-
5. **Semantic action completion** - "Login" means fill + click, not just fill
|
|
87
|
-
|
|
88
|
-
### 4. Goal Completion Assessment
|
|
89
|
-
**File:** `llm-facade.ts` → `checkGoalCompletion()`
|
|
90
|
-
**Prompt:** `PROMPTS.GOAL_COMPLETION_CHECK`
|
|
91
|
-
|
|
92
|
-
**Decision Matrix:**
|
|
93
|
-
|
|
94
|
-
| Goal Type | Completion Criteria | Example |
|
|
95
|
-
|-----------|-------------------|---------|
|
|
96
|
-
| Simple action | Action succeeded | "Click button" → complete after click |
|
|
97
|
-
| Semantic action | All implicit steps done | "Login" → complete after fill + click |
|
|
98
|
-
| Multi-part action | All parts done | "Fill all fields" → complete after all fields |
|
|
99
|
-
| Verification | Assertion passed | "Verify message" → complete after assertion |
|
|
100
|
-
|
|
101
|
-
**Semantic Action Recognition:**
|
|
102
|
-
- **"Login with credentials"** → Fill username, fill password, click login button
|
|
103
|
-
- **"Send message"** → Type message, click send button
|
|
104
|
-
- **"Submit form"** → Fill fields, click submit button
|
|
105
|
-
- **"Register/Signup"** → Fill registration, click register button
|
|
106
|
-
|
|
107
|
-
Mark INCOMPLETE until the final implicit action completes.
|
|
108
|
-
|
|
109
|
-
### 5. Vision-Based Fallback Diagnostics
|
|
110
|
-
**File:** `scenario-worker-class.ts` (lines 215-272)
|
|
111
|
-
**Prompts:** `SCREENSHOT_NEED_ASSESSMENT`, `VISION_DIAGNOSTIC_ANALYSIS`
|
|
112
|
-
|
|
113
|
-
**Trigger Condition:**
|
|
114
|
-
```typescript
|
|
115
|
-
totalFailedAttemptsForStep >= 2 && !usedVisionMode && lastError
|
|
116
|
-
```
|
|
117
|
-
|
|
118
|
-
**When:** After 2+ total failures across all sub-actions
|
|
119
|
-
|
|
120
|
-
**Two-Step Process:**
|
|
121
|
-
|
|
122
|
-
**Step 1: Assess Screenshot Need** (gpt-4.1-mini)
|
|
123
|
-
- Quick check: Would visual analysis help?
|
|
124
|
-
- Conservative: Only recommend if DOM info insufficient
|
|
125
|
-
- Returns: needsScreenshot (boolean) + reason
|
|
126
|
-
|
|
127
|
-
**Step 2: Vision Diagnostics** (gpt-4o - only if assessment says yes)
|
|
128
|
-
- Supervisor analyzes screenshot
|
|
129
|
-
- Identifies: What's visible vs what was assumed
|
|
130
|
-
- Diagnoses: Why previous attempts failed
|
|
131
|
-
- Recommends: Better approach based on visual reality
|
|
132
|
-
|
|
133
|
-
**Output:**
|
|
134
|
-
- Visual analysis
|
|
135
|
-
- Root cause of failures
|
|
136
|
-
- Specific instructions for next attempt
|
|
137
|
-
- Elements found/not found
|
|
138
|
-
|
|
139
|
-
### 6. Script Generation
|
|
140
|
-
**File:** `script-utils.ts` → `generateTestScript()`
|
|
141
|
-
|
|
142
|
-
**Output Format:**
|
|
143
|
-
```javascript
|
|
144
|
-
/*
|
|
145
|
-
This is a TestChimp Smart Test.
|
|
146
|
-
Version: 1.0
|
|
147
|
-
|
|
148
|
-
#login #coreHR #peopleHR
|
|
149
|
-
*/
|
|
150
|
-
|
|
151
|
-
import { test, expect } from '@playwright/test';
|
|
152
|
-
test('testName', async ({ page, browser, context }) => {
|
|
153
|
-
// Step 1: Go to URL
|
|
154
|
-
await page.goto('https://...', { waitUntil: 'domcontentloaded', timeout: 10000 });
|
|
155
|
-
|
|
156
|
-
// Step 2: Login with credentials: Willy, Willy@1234
|
|
157
|
-
await page.fill('username', 'Willy');
|
|
158
|
-
await page.fill('password', 'Willy@1234');
|
|
159
|
-
await page.click('button[name="Login"]');
|
|
160
|
-
|
|
161
|
-
// Step 3: Click on All Modules [FAILED]
|
|
162
|
-
// Attempted: await page.getByText('All Modules').click();
|
|
163
|
-
});
|
|
164
|
-
```
|
|
165
|
-
|
|
166
|
-
## Configuration & Timeouts
|
|
167
|
-
|
|
168
|
-
**Default Timeout:** 5 seconds (fast feedback on wrong selectors)
|
|
169
|
-
|
|
170
|
-
**Navigation Timeout:** 10 seconds explicit (handles redirects)
|
|
171
|
-
```typescript
|
|
172
|
-
await page.goto(url, { waitUntil: 'domcontentloaded', timeout: 10000 })
|
|
173
|
-
```
|
|
174
|
-
|
|
175
|
-
**Why:**
|
|
176
|
-
- 5s for element operations → fast failure on wrong selectors (not 10s wait per wrong selector)
|
|
177
|
-
- 10s explicit for navigation → handles redirects properly
|
|
178
|
-
- Best of both: fast iteration + reliable navigation
|
|
179
|
-
|
|
180
|
-
## Key Improvements
|
|
181
|
-
|
|
182
|
-
### 1. Value Preservation Throughout Flow
|
|
183
|
-
|
|
184
|
-
**Problem:** Losing specific values (credentials, amounts, etc.)
|
|
185
|
-
|
|
186
|
-
**Solution:** Preserve at every stage
|
|
187
|
-
|
|
188
|
-
| Stage | Before | After |
|
|
189
|
-
|-------|--------|-------|
|
|
190
|
-
| Breakdown | "Login with user/pass" | "Login with credentials: Willy, Willy@1234" |
|
|
191
|
-
| Goal | "Complete login" | "Login with credentials: Willy, Willy@1234" |
|
|
192
|
-
| Command | `process.env.USERNAME` | `'Willy'` |
|
|
193
|
-
|
|
194
|
-
### 2. Semantic Action Understanding
|
|
195
|
-
|
|
196
|
-
**Problem:** Marking "Login" complete after just filling fields
|
|
197
|
-
|
|
198
|
-
**Solution:** Recognize implicit final actions
|
|
199
|
-
|
|
200
|
-
- "Login" → fill + **click login button**
|
|
201
|
-
- "Send" → type + **click send button**
|
|
202
|
-
- "Submit" → fill + **click submit button**
|
|
203
|
-
|
|
204
|
-
### 3. Navigation & Redirect Handling
|
|
205
|
-
|
|
206
|
-
**Problem:** Retrying original URL after successful redirect
|
|
207
|
-
|
|
208
|
-
**Solution:**
|
|
209
|
-
- Check current URL after navigation errors
|
|
210
|
-
- If URL changed from `about:blank` → navigation succeeded
|
|
211
|
-
- Use `domcontentloaded` for redirects (more reliable than `load`)
|
|
212
|
-
- Don't retry if already on a page
|
|
213
|
-
|
|
214
|
-
### 4. Vision Diagnostics
|
|
215
|
-
|
|
216
|
-
**Problem:** Vision never triggering (was checking per-sub-action attempt)
|
|
217
|
-
|
|
218
|
-
**Solution:** Trigger on total failures across all sub-actions
|
|
219
|
-
- Changed from `attempt === 2` → `totalFailedAttemptsForStep >= 2`
|
|
220
|
-
- Now triggers after 2+ failures regardless of sub-action boundaries
|
|
221
|
-
- Detailed logging shows when/why vision triggers or doesn't
|
|
222
|
-
|
|
223
|
-
### 5. Enhanced Logging
|
|
224
|
-
|
|
225
|
-
**Visibility:**
|
|
226
|
-
- ✅ Console logs for Debug Console
|
|
227
|
-
- ✅ outputChannel for Output panel
|
|
228
|
-
- ✅ Timestamps on all logs
|
|
229
|
-
- ✅ Version markers
|
|
230
|
-
- ✅ Vision trigger decision logs
|
|
231
|
-
|
|
232
|
-
**Format:**
|
|
233
|
-
```
|
|
234
|
-
[02:58:15.613] [ScenarioWorker] 🚀 RUNNER-CORE VERSION: v1.5.0-vision-preserve-values
|
|
235
|
-
[02:58:15.614] [ScenarioWorker] Step 1 - Sub-action 1, Attempt 1: Go to URL
|
|
236
|
-
[02:58:15.650] [ScenarioWorker] 🔍 Vision trigger check: subAction=1, attempt=0, totalFailed=0, usedVision=false
|
|
237
|
-
[02:58:15.651] [ScenarioWorker] 📝 Using DOM-based approach (0 failures so far, need 2+)
|
|
238
|
-
```
|
|
239
|
-
|
|
240
|
-
## Retry & Failure Budget
|
|
241
|
-
|
|
242
|
-
**Per Step Limits:**
|
|
243
|
-
- `MAX_RETRIES_PER_STEP = 3` → 4 attempts per sub-action (0, 1, 2, 3)
|
|
244
|
-
- `MAX_SUBACTIONS_PER_STEP = 5` → Max 5 different commands for one step
|
|
245
|
-
- `MAX_FAILED_ATTEMPTS_PER_STEP = 12` → Hard limit on total failures
|
|
246
|
-
|
|
247
|
-
**Early Termination:**
|
|
248
|
-
- After 2 consecutive step failures → stop execution
|
|
249
|
-
- Saves resources, prevents runaway costs
|
|
250
|
-
|
|
251
|
-
## Error Context Enhancement
|
|
252
|
-
|
|
253
|
-
**Navigation errors now include current URL:**
|
|
254
|
-
```
|
|
255
|
-
Error: Timeout 10000ms exceeded | Current URL: https://redirected-url.com
|
|
256
|
-
```
|
|
257
|
-
|
|
258
|
-
This helps LLM understand:
|
|
259
|
-
- Navigation succeeded but redirected
|
|
260
|
-
- Don't retry original URL
|
|
261
|
-
- Proceed with current page
|
|
262
|
-
|
|
263
|
-
## Workflow Example
|
|
264
|
-
|
|
265
|
-
**Scenario:**
|
|
266
|
-
```
|
|
267
|
-
- Go to https://app.com/login
|
|
268
|
-
- Login with credentials: admin, pass123
|
|
269
|
-
- Click dashboard
|
|
270
|
-
```
|
|
271
|
-
|
|
272
|
-
**Execution:**
|
|
273
|
-
|
|
274
|
-
**Step 1: Navigate**
|
|
275
|
-
```
|
|
276
|
-
Attempt 1: goto(url, {domcontentloaded, timeout: 10000}) → ✅ Success
|
|
277
|
-
Goal check: COMPLETE (navigation is single-step action)
|
|
278
|
-
```
|
|
279
|
-
|
|
280
|
-
**Step 2: Login**
|
|
281
|
-
```
|
|
282
|
-
Sub-action 1, Attempt 1: fill(username, 'admin') → ✅ Success
|
|
283
|
-
Goal check: INCOMPLETE (login needs username + password + click)
|
|
284
|
-
nextSubGoal: "Enter password and click login"
|
|
285
|
-
|
|
286
|
-
Sub-action 2, Attempt 1: fill(password, 'pass123') → ✅ Success
|
|
287
|
-
Goal check: INCOMPLETE (still need to click login button)
|
|
288
|
-
nextSubGoal: "Click login button to submit credentials"
|
|
289
|
-
|
|
290
|
-
Sub-action 3, Attempt 1: click(login button) → ✅ Success
|
|
291
|
-
Goal check: COMPLETE (all parts of login done)
|
|
292
|
-
```
|
|
293
|
-
|
|
294
|
-
**Step 3: Click dashboard**
|
|
295
|
-
```
|
|
296
|
-
Sub-action 1, Attempt 1: click(dashboard) → ❌ Fail (not visible)
|
|
297
|
-
🔍 Vision check: totalFailed=1, need 2+
|
|
298
|
-
📝 Using DOM (1 failure, need 2+)
|
|
299
|
-
|
|
300
|
-
Sub-action 1, Attempt 2: waitFor + click → ❌ Fail (still not visible)
|
|
301
|
-
🔍 Vision check: totalFailed=2, usedVision=false
|
|
302
|
-
🎯 VISION TRIGGER: 2 total failures - assessing...
|
|
303
|
-
💭 LLM: SCREENSHOT NEEDED ✅
|
|
304
|
-
📸 Taking screenshot...
|
|
305
|
-
👔 Supervisor analyzing...
|
|
306
|
-
🔨 Generating vision-aided command...
|
|
307
|
-
|
|
308
|
-
Sub-action 1, Attempt 3: [vision-aided command] → ✅ Success
|
|
309
|
-
Goal check: COMPLETE
|
|
310
|
-
```
|
|
311
|
-
|
|
312
|
-
## Testing Checklist
|
|
313
|
-
|
|
314
|
-
- [ ] Specific values preserved (credentials, names, amounts)
|
|
315
|
-
- [ ] Semantic actions complete fully (login includes button click)
|
|
316
|
-
- [ ] Navigation redirects handled (no URL retry loops)
|
|
317
|
-
- [ ] Vision triggers after 2+ failures
|
|
318
|
-
- [ ] Vision logs show decision reasoning
|
|
319
|
-
- [ ] Timeouts appropriate (5s default, 10s navigation)
|
|
320
|
-
- [ ] Error context includes current URL
|
|
321
|
-
- [ ] Failed steps don't show previous step commands
|
|
322
|
-
- [ ] Version marker visible in logs
|
|
323
|
-
|
|
324
|
-
## Version Tracking
|
|
325
|
-
|
|
326
|
-
**Current Version:** `v1.5.0-vision-preserve-values`
|
|
327
|
-
|
|
328
|
-
**Version log location:**
|
|
329
|
-
- During initialization: `[ScenarioWorker] 🚀 RUNNER-CORE VERSION: v1.5.0-vision-preserve-values`
|
|
330
|
-
- Increment for each significant change
|
|
331
|
-
|
|
332
|
-
## Build & Deploy
|
|
333
|
-
|
|
334
|
-
**Local Development:**
|
|
335
|
-
```bash
|
|
336
|
-
cd /Users/nuwansam/IdeaProjects/AwareRepo/local/vs-ext
|
|
337
|
-
./build_local.sh
|
|
338
|
-
```
|
|
339
|
-
|
|
340
|
-
**What it does:**
|
|
341
|
-
1. Builds runner-core
|
|
342
|
-
2. Packs runner-core (0.0.22)
|
|
343
|
-
3. Installs in vs-ext
|
|
344
|
-
4. Builds vs-ext for staging
|
|
345
|
-
|
|
346
|
-
**Verification:**
|
|
347
|
-
```bash
|
|
348
|
-
grep "v1.5.0-vision-preserve-values" node_modules/testchimp-runner-core/dist/scenario-worker-class.js
|
|
349
|
-
```
|
|
350
|
-
|
|
351
|
-
## Related Documentation
|
|
352
|
-
|
|
353
|
-
- `VISION_DIAGNOSTICS_IMPROVEMENTS.md` - Vision system details
|
|
354
|
-
- `prompts.ts` - All LLM prompts and guidance
|
|
355
|
-
- `types.ts` - Type definitions
|
|
356
|
-
|
|
357
|
-
## Future Enhancements
|
|
358
|
-
|
|
359
|
-
1. **Learn from vision insights** - Build library of common patterns
|
|
360
|
-
2. **Optimize vision timing** - Better cost/benefit analysis
|
|
361
|
-
3. **Cross-flow learning** - Share insights between generation and repair
|
|
362
|
-
4. **Smarter goal parsing** - Better semantic action recognition
|
|
363
|
-
5. **Dynamic timeout adjustment** - Based on operation type
|
|
364
|
-
|