testchimp-runner-core 0.0.35 → 0.0.36

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (71) hide show
  1. package/package.json +6 -1
  2. package/plandocs/BEFORE_AFTER_VERIFICATION.md +0 -148
  3. package/plandocs/COORDINATE_MODE_DIAGNOSIS.md +0 -144
  4. package/plandocs/CREDIT_CALLBACK_ARCHITECTURE.md +0 -253
  5. package/plandocs/HUMAN_LIKE_IMPROVEMENTS.md +0 -642
  6. package/plandocs/IMPLEMENTATION_STATUS.md +0 -108
  7. package/plandocs/INTEGRATION_COMPLETE.md +0 -322
  8. package/plandocs/MULTI_AGENT_ARCHITECTURE_REVIEW.md +0 -844
  9. package/plandocs/ORCHESTRATOR_MVP_SUMMARY.md +0 -539
  10. package/plandocs/PHASE1_ABSTRACTION_COMPLETE.md +0 -241
  11. package/plandocs/PHASE1_FINAL_STATUS.md +0 -210
  12. package/plandocs/PHASE_1_COMPLETE.md +0 -165
  13. package/plandocs/PHASE_1_SUMMARY.md +0 -184
  14. package/plandocs/PLANNING_SESSION_SUMMARY.md +0 -372
  15. package/plandocs/PROMPT_OPTIMIZATION_ANALYSIS.md +0 -120
  16. package/plandocs/PROMPT_SANITY_CHECK.md +0 -120
  17. package/plandocs/SCRIPT_CLEANUP_FEATURE.md +0 -201
  18. package/plandocs/SCRIPT_GENERATION_ARCHITECTURE.md +0 -364
  19. package/plandocs/SELECTOR_IMPROVEMENTS.md +0 -139
  20. package/plandocs/SESSION_SUMMARY_v0.0.33.md +0 -151
  21. package/plandocs/TROUBLESHOOTING_SESSION.md +0 -72
  22. package/plandocs/VISION_DIAGNOSTICS_IMPROVEMENTS.md +0 -336
  23. package/plandocs/VISUAL_AGENT_EVOLUTION_PLAN.md +0 -396
  24. package/plandocs/WHATS_NEW_v0.0.33.md +0 -183
  25. package/plandocs/exploratory-mode-support-v2.plan.md +0 -953
  26. package/plandocs/exploratory-mode-support.plan.md +0 -928
  27. package/plandocs/journey-id-tracking-addendum.md +0 -227
  28. package/releasenotes/RELEASE_0.0.26.md +0 -165
  29. package/releasenotes/RELEASE_0.0.27.md +0 -236
  30. package/releasenotes/RELEASE_0.0.28.md +0 -286
  31. package/src/auth-config.ts +0 -84
  32. package/src/credit-usage-service.ts +0 -188
  33. package/src/env-loader.ts +0 -103
  34. package/src/execution-service.ts +0 -996
  35. package/src/file-handler.ts +0 -104
  36. package/src/index.ts +0 -432
  37. package/src/llm-facade.ts +0 -821
  38. package/src/llm-provider.ts +0 -53
  39. package/src/model-constants.ts +0 -35
  40. package/src/orchestrator/decision-parser.ts +0 -139
  41. package/src/orchestrator/index.ts +0 -58
  42. package/src/orchestrator/orchestrator-agent.ts +0 -1282
  43. package/src/orchestrator/orchestrator-prompts.ts +0 -786
  44. package/src/orchestrator/page-som-handler.ts +0 -1565
  45. package/src/orchestrator/som-types.ts +0 -188
  46. package/src/orchestrator/tool-registry.ts +0 -184
  47. package/src/orchestrator/tools/check-page-ready.ts +0 -75
  48. package/src/orchestrator/tools/extract-data.ts +0 -92
  49. package/src/orchestrator/tools/index.ts +0 -15
  50. package/src/orchestrator/tools/inspect-page.ts +0 -42
  51. package/src/orchestrator/tools/recall-history.ts +0 -72
  52. package/src/orchestrator/tools/refresh-som-markers.ts +0 -69
  53. package/src/orchestrator/tools/take-screenshot.ts +0 -128
  54. package/src/orchestrator/tools/verify-action-result.ts +0 -159
  55. package/src/orchestrator/tools/view-previous-screenshot.ts +0 -103
  56. package/src/orchestrator/types.ts +0 -291
  57. package/src/playwright-mcp-service.ts +0 -224
  58. package/src/progress-reporter.ts +0 -144
  59. package/src/prompts.ts +0 -842
  60. package/src/providers/backend-proxy-llm-provider.ts +0 -91
  61. package/src/providers/local-llm-provider.ts +0 -38
  62. package/src/scenario-service.ts +0 -252
  63. package/src/scenario-worker-class.ts +0 -1110
  64. package/src/script-utils.ts +0 -203
  65. package/src/types.ts +0 -239
  66. package/src/utils/browser-utils.ts +0 -348
  67. package/src/utils/coordinate-converter.ts +0 -162
  68. package/src/utils/page-info-retry.ts +0 -65
  69. package/src/utils/page-info-utils.ts +0 -285
  70. package/testchimp-runner-core-0.0.35.tgz +0 -0
  71. package/tsconfig.json +0 -19
@@ -1,120 +0,0 @@
1
- # Prompt Sanity Check - Runner-Core v0.0.33
2
-
3
- ## ✅ STRENGTHS
4
-
5
- ### System Prompt (`buildSystemPrompt`)
6
- - ✅ Required fields clearly marked at top (status, reasoning, statusReasoning)
7
- - ✅ Comprehensive JSON format with examples
8
- - ✅ Clear status decision rules
9
- - ✅ Good blocker detection guidance
10
- - ✅ Semantic selector preference clearly explained with examples
11
- - ✅ Tool vs command distinction is clear
12
- - ✅ Coordinate fallback documented
13
-
14
- ### User Prompt (`buildUserPrompt`)
15
- - ✅ Static content first (cache-optimized)
16
- - ✅ Dynamic content last (current state, page info)
17
- - ✅ Notes from previous iteration shown prominently
18
- - ✅ Clear warnings for consecutive failures
19
- - ✅ Coordinate mode trigger clear
20
-
21
- ## ⚠️ ISSUES FOUND
22
-
23
- ### 1. **Duplication/Redundancy**
24
- - ❌ "Use semantic selectors" mentioned in:
25
- - System prompt (line ~605: "SELECTOR PREFERENCE")
26
- - User prompt (line ~860: "SELECTOR STRATEGY")
27
- - **FIX**: Remove from user prompt, keep in system prompt only
28
-
29
- ### 2. **Length Concerns**
30
- - ⚠️ System prompt is ~325 lines (very long)
31
- - ⚠️ May cause LLM to miss critical details in the middle
32
- - **SUGGESTION**: Consider breaking into sections or condensing
33
-
34
- ### 3. **Conflicting Guidance**
35
- - ⚠️ Line ~469: "stuck: Tried 3+ iterations"
36
- - But coordinate mode triggers at 3 failures (line ~904)
37
- - **FIX**: Clarify: stuck = 5 attempts total (3 regular + 2 coordinate)
38
-
39
- ### 4. **Unclear Iteration Count**
40
- - ❌ Line ~714: "When iteration count reaches 4+"
41
- - ❌ Line ~748: "iteration 4+"
42
- - ✅ But code triggers at 3 failures
43
- - **FIX**: Update prompt to say "iteration 4+" (0,1,2 = 3 failures, next is #3 which is 4th iteration)
44
-
45
- ### 5. **Missing Information**
46
- - ❌ Max iterations per step not mentioned (code has 5)
47
- - **FIX**: Add to system prompt: "MAX 5 iterations per step"
48
-
49
- ### 6. **Verbosity**
50
- - ⚠️ Examples section (lines ~617-628) is great but long
51
- - ⚠️ Multiple emoji warnings (⚠️⚠️⚠️) can be reduced to single ⚠️
52
- - **SUGGESTION**: Keep examples, reduce emoji spam
53
-
54
- ## 🔧 RECOMMENDED FIXES
55
-
56
- ### Priority 1 (Critical):
57
- 1. Remove duplicate selector strategy from user prompt
58
- 2. Clarify max iterations (5 total)
59
- 3. Fix coordinate mode iteration number (4th iteration = after 3 failures)
60
-
61
- ### Priority 2 (Nice to have):
62
- 4. Condense system prompt if possible (target: 250 lines)
63
- 5. Reduce emoji overuse
64
- 6. Add section headers in system prompt for clarity
65
-
66
- ## 📊 PROMPT STRUCTURE ANALYSIS
67
-
68
- ### System Prompt Sections:
69
- 1. Introduction (1 line)
70
- 2. Tool descriptions (dynamic, from registry)
71
- 3. JSON format (40 lines) ✅
72
- 4. Status rules (15 lines) ✅
73
- 5. Step re-evaluation (20 lines) ✅
74
- 6. Blocker detection (25 lines) ✅
75
- 7. Experiences (25 lines) ✅
76
- 8. Critical rules (200 lines) ⚠️ TOO LONG
77
- 9. Coordinate actions (45 lines) ✅
78
-
79
- **TOTAL**: ~370 lines (with tool descriptions)
80
-
81
- ### User Prompt Sections:
82
- 1. Static instructions (20 lines) - **Cache-friendly** ✅
83
- 2. Dynamic context marker (1 line) ✅
84
- 3. Notes from previous iteration (5 lines) ✅
85
- 4. Warnings for failures (15 lines) ✅
86
- 5. Coordinate mode trigger (8 lines) ✅
87
- 6. Current step goal (10 lines) ✅
88
- 7. Page state (50-100 lines, variable) ✅
89
- 8. Recent steps (20-50 lines, variable) ✅
90
- 9. Experiences (10 lines) ✅
91
-
92
- **TOTAL**: ~140-200 lines per call
93
-
94
- ## 🎯 RECOMMENDATION SUMMARY
95
-
96
- **Keep as-is:**
97
- - JSON structure
98
- - Semantic selector examples
99
- - Blocker detection
100
- - Note to future self
101
- - Coordinate fallback
102
- - Cache optimization
103
-
104
- **Fix:**
105
- - Remove selector duplication in user prompt
106
- - Clarify iteration counts
107
- - Add max iteration limit
108
- - Reduce emoji spam
109
-
110
- **Consider:**
111
- - Condensing "Critical Rules" section (currently 200 lines)
112
- - Moving some examples to external docs
113
- - Breaking long sections with clear headers
114
-
115
- ## Overall Assessment: **8/10**
116
- - Prompts are comprehensive and well-structured
117
- - Main issues are length and minor redundancies
118
- - Cache optimization is excellent
119
- - A few clarity fixes needed for iteration counts
120
-
@@ -1,201 +0,0 @@
1
- # Script Cleanup Feature
2
-
3
- ## Summary
4
- Added a final cleanup step in the script generation pipeline that uses an LLM to make minor adjustments to the generated test script, removing redundancies and improving code quality without changing the core logic.
5
-
6
- ## Purpose
7
- After the orchestrator generates test steps, there may be minor redundancies or formatting issues:
8
- - Duplicate expect() assertions
9
- - Redundant waits or checks
10
- - Inconsistent formatting
11
- - Orphaned step comments without code
12
-
13
- The cleanup step acts as a final sanity check to polish the generated script while preserving its core functionality.
14
-
15
- ## Implementation
16
-
17
- ### 1. New Prompt (`prompts.ts`)
18
-
19
- **SCRIPT_CLEANUP** prompt with clear guidelines:
20
- - **DO:** Remove duplicates, fix formatting, consolidate identical assertions
21
- - **DO NOT:** Change test logic, remove legitimate assertions, restructure code, change selectors, add new functionality
22
-
23
- **Examples in prompt:**
24
- ```typescript
25
- // ❌ REMOVE redundancy:
26
- await expect(page.getByText('Hello')).toBeVisible();
27
- await expect(page.getByText('Hello')).toBeVisible(); // duplicate
28
-
29
- // ✅ KEEP legitimate checks:
30
- await expect(page.getByPlaceholder('Message...')).toBeEmpty();
31
- await page.getByPlaceholder('Message...').fill('Hello');
32
- await expect(page.getByPlaceholder('Message...')).toHaveValue('Hello'); // different checks
33
- ```
34
-
35
- ### 2. New Method in LLMFacade (`llm-facade.ts`)
36
-
37
- ```typescript
38
- async cleanupScript(script: string, model?: string): Promise<{
39
- script: string;
40
- changes: string[];
41
- skipped?: string;
42
- }>
43
- ```
44
-
45
- **Behavior:**
46
- - Calls LLM with SCRIPT_CLEANUP prompt
47
- - Parses JSON response with cleaned script and list of changes
48
- - Returns original script on error (safe fallback)
49
- - Logs all changes made for transparency
50
-
51
- **Error Handling:**
52
- - Invalid JSON → return original script
53
- - Missing fields → return original script
54
- - LLM error → return original script
55
- - Never fails the generation process
56
-
57
- ### 3. Integration into Scenario Worker (`scenario-worker-class.ts`)
58
-
59
- Added cleanup step immediately after `generateTestScript()`:
60
-
61
- ```typescript
62
- // Generate clean script with TestChimp comment and code
63
- generatedScript = generateTestScript(testName, steps, undefined, hashtags);
64
-
65
- // Perform final cleanup pass to remove redundancies and make minor adjustments
66
- this.log(`[ScenarioWorker] Performing final script cleanup...`);
67
- try {
68
- const cleanupResult = await this.llmFacade.cleanupScript(generatedScript, job.model);
69
-
70
- if (cleanupResult.changes && cleanupResult.changes.length > 0) {
71
- this.log(`[ScenarioWorker] Cleanup made ${cleanupResult.changes.length} improvement(s):`);
72
- cleanupResult.changes.forEach((change, i) => {
73
- this.log(`[ScenarioWorker] ${i + 1}. ${change}`);
74
- });
75
- generatedScript = cleanupResult.script;
76
- } else if (cleanupResult.skipped) {
77
- this.log(`[ScenarioWorker] Cleanup skipped: ${cleanupResult.skipped}`);
78
- } else {
79
- this.log(`[ScenarioWorker] Cleanup completed - no changes needed`);
80
- }
81
- } catch (error: any) {
82
- this.log(`[ScenarioWorker] Cleanup failed, using original script: ${error.message}`);
83
- // Continue with original script on error
84
- }
85
- ```
86
-
87
- ## What Gets Cleaned Up
88
-
89
- ### ✅ Redundancies Removed
90
- 1. **Duplicate assertions:**
91
- ```typescript
92
- // Before cleanup
93
- await expect(page.getByText('Hello')).toBeVisible();
94
- await expect(page.getByText('Hello')).toBeVisible();
95
-
96
- // After cleanup
97
- await expect(page.getByText('Hello')).toBeVisible();
98
- ```
99
-
100
- 2. **Redundant URL checks:**
101
- ```typescript
102
- // Before cleanup
103
- await expect(page).toHaveURL(/\/messages/);
104
- await expect(page).toHaveURL(/\/messages/);
105
-
106
- // After cleanup
107
- await expect(page).toHaveURL(/\/messages/);
108
- ```
109
-
110
- 3. **Duplicate comments without code** (already handled by script generation, but this is a safety net)
111
-
112
- ### ✅ Minor Formatting Fixes
113
- - Inconsistent spacing
114
- - Alignment issues
115
- - Obvious formatting problems
116
-
117
- ### ❌ Preserved (Not Changed)
118
- - Test logic and flow
119
- - Legitimate assertions (same locator, different expectations)
120
- - Important waits
121
- - Selectors
122
- - Test structure
123
- - Any functionality
124
-
125
- ## Safety Features
126
-
127
- ### 1. Conservative Approach
128
- - Only makes changes when confident they're safe
129
- - Prompt explicitly warns against major changes
130
- - Focuses on "obvious" redundancies only
131
-
132
- ### 2. Transparency
133
- - Logs all changes made with descriptions
134
- - Makes it easy to see what was modified
135
- - Helps debug if cleanup causes issues
136
-
137
- ### 3. Graceful Degradation
138
- - Any error → return original script
139
- - Invalid response → return original script
140
- - Never breaks the generation pipeline
141
- - Cleanup is an enhancement, not a requirement
142
-
143
- ### 4. Idempotent
144
- - Running cleanup twice should produce the same result
145
- - No cumulative changes or drift
146
-
147
- ## Example Output
148
-
149
- **Console logs:**
150
- ```
151
- [ScenarioWorker] Performing final script cleanup...
152
- [LLMFacade] Script cleanup completed. Changes: 2
153
- [LLMFacade] 1. Removed duplicate expect assertion for message visibility
154
- [LLMFacade] 2. Consolidated redundant URL checks into single assertion
155
- [ScenarioWorker] Cleanup made 2 improvement(s):
156
- [ScenarioWorker] 1. Removed duplicate expect assertion for message visibility
157
- [ScenarioWorker] 2. Consolidated redundant URL checks into single assertion
158
- ```
159
-
160
- ## Benefits
161
-
162
- 1. **Cleaner Scripts:** Removes redundancies that can make tests harder to read
163
- 2. **Reduced Token Usage:** Shorter scripts mean less tokens consumed by users
164
- 3. **Better Maintainability:** Clean code is easier to understand and modify
165
- 4. **Safety Net:** Catches issues that might slip through orchestrator logic
166
- 5. **Zero Risk:** Fallback to original script on any error
167
-
168
- ## Performance Impact
169
-
170
- - **Time:** Adds one LLM call at the end (~1-3 seconds)
171
- - **Cost:** One additional LLM call per script generation
172
- - **Benefit:** Catches redundancies that would otherwise be in production tests
173
-
174
- The small overhead is worthwhile for the quality improvement.
175
-
176
- ## Future Enhancements
177
-
178
- Possible improvements:
179
- 1. **Configurable:** Allow users to disable cleanup if they prefer
180
- 2. **More Rules:** Add more specific cleanup patterns
181
- 3. **Static Analysis:** Use AST parsing instead of LLM for some checks (faster, cheaper)
182
- 4. **Metrics:** Track how often cleanup makes changes vs. no-ops
183
-
184
- ## Files Modified
185
-
186
- 1. `/src/prompts.ts` - Added SCRIPT_CLEANUP prompt
187
- 2. `/src/llm-facade.ts` - Added cleanupScript() method
188
- 3. `/src/scenario-worker-class.ts` - Integrated cleanup into generation pipeline
189
-
190
- ## Testing
191
-
192
- The feature is safe to deploy because:
193
- - Falls back to original script on any error
194
- - Doesn't break existing functionality
195
- - Only makes conservative changes
196
- - Logs all modifications for review
197
-
198
- ## Conclusion
199
-
200
- The script cleanup feature adds a lightweight final polish step to the generation pipeline, removing redundancies and improving code quality without risk to the core test logic. It's a safety net that catches issues the orchestrator might miss while maintaining backward compatibility and graceful error handling.
201
-
@@ -1,364 +0,0 @@
1
- # Script Generation Architecture & Work Plan
2
-
3
- ## Overview
4
- AI-powered test script generation from natural language scenarios using LLM-guided Playwright automation with vision-based fallback diagnostics.
5
-
6
- ## Architecture Flow
7
-
8
- ```
9
- User Scenario (text file)
10
-
11
- 1. Scenario Breakdown (LLM)
12
-
13
- 2. Step-by-Step Execution
14
-
15
- 3. Command Generation (LLM + DOM)
16
-
17
- 4. Playwright Execution
18
-
19
- 5. Goal Completion Check (LLM)
20
-
21
- 6. Vision Fallback (if needed)
22
-
23
- 7. Script Generation
24
- ```
25
-
26
- ## Components
27
-
28
- ### 1. Scenario Breakdown
29
- **File:** `llm-facade.ts` → `breakdownScenario()`
30
- **Prompt:** `PROMPTS.SCENARIO_BREAKDOWN`
31
-
32
- **Input:** Natural language scenario
33
- ```
34
- - Go to https://app.com
35
- - Login with credentials: admin, pass123
36
- - Click on settings
37
- ```
38
-
39
- **Output:** Structured steps
40
- ```json
41
- {
42
- "steps": [
43
- "Go to https://app.com",
44
- "Login with credentials: admin, pass123",
45
- "Click on settings"
46
- ]
47
- }
48
- ```
49
-
50
- **Key Principles:**
51
- - ✅ Preserve ALL specific values (credentials, names, amounts, etc.)
52
- - ✅ Keep steps semantic (no technical selectors)
53
- - ✅ One clear action per step
54
- - ❌ Never replace values with variables/placeholders
55
-
56
- ### 2. Step Execution Loop
57
- **File:** `scenario-worker-class.ts` → `processScenarioJob()`
58
-
59
- **For each step:**
60
- 1. Initialize step tracking
61
- 2. Execute sub-actions until goal complete
62
- 3. Track failures and successes
63
- 4. Generate final script
64
-
65
- **Counters:**
66
- - `subActionCount`: Number of different commands tried for this step
67
- - `totalFailedAttemptsForStep`: Total failures across all sub-actions
68
- - `attempt`: Retry count within current sub-action (0-3)
69
-
70
- ### 3. Command Generation
71
- **File:** `llm-facade.ts` → `generatePlaywrightCommand()`
72
- **Prompt:** `PROMPTS.PLAYWRIGHT_COMMAND`
73
-
74
- **Context Provided:**
75
- - Goal description
76
- - Current page state (DOM snapshot)
77
- - Previous commands in this step
78
- - Previous step history
79
- - Last error (if retry)
80
-
81
- **Key Principles:**
82
- 1. **Extract specific values from goal** - Use exact credentials, names, amounts from goal description
83
- 2. **Navigation handling** - Use `{ waitUntil: 'domcontentloaded', timeout: 10000 }` for redirects
84
- 3. **Check current URL** - Don't retry navigation if already navigated (even if redirected)
85
- 4. **Never hallucinate verification** - Only verify what goal explicitly asks for
86
- 5. **Semantic action completion** - "Login" means fill + click, not just fill
87
-
88
- ### 4. Goal Completion Assessment
89
- **File:** `llm-facade.ts` → `checkGoalCompletion()`
90
- **Prompt:** `PROMPTS.GOAL_COMPLETION_CHECK`
91
-
92
- **Decision Matrix:**
93
-
94
- | Goal Type | Completion Criteria | Example |
95
- |-----------|-------------------|---------|
96
- | Simple action | Action succeeded | "Click button" → complete after click |
97
- | Semantic action | All implicit steps done | "Login" → complete after fill + click |
98
- | Multi-part action | All parts done | "Fill all fields" → complete after all fields |
99
- | Verification | Assertion passed | "Verify message" → complete after assertion |
100
-
101
- **Semantic Action Recognition:**
102
- - **"Login with credentials"** → Fill username, fill password, click login button
103
- - **"Send message"** → Type message, click send button
104
- - **"Submit form"** → Fill fields, click submit button
105
- - **"Register/Signup"** → Fill registration, click register button
106
-
107
- Mark INCOMPLETE until the final implicit action completes.
108
-
109
- ### 5. Vision-Based Fallback Diagnostics
110
- **File:** `scenario-worker-class.ts` (lines 215-272)
111
- **Prompts:** `SCREENSHOT_NEED_ASSESSMENT`, `VISION_DIAGNOSTIC_ANALYSIS`
112
-
113
- **Trigger Condition:**
114
- ```typescript
115
- totalFailedAttemptsForStep >= 2 && !usedVisionMode && lastError
116
- ```
117
-
118
- **When:** After 2+ total failures across all sub-actions
119
-
120
- **Two-Step Process:**
121
-
122
- **Step 1: Assess Screenshot Need** (gpt-4.1-mini)
123
- - Quick check: Would visual analysis help?
124
- - Conservative: Only recommend if DOM info insufficient
125
- - Returns: needsScreenshot (boolean) + reason
126
-
127
- **Step 2: Vision Diagnostics** (gpt-4o - only if assessment says yes)
128
- - Supervisor analyzes screenshot
129
- - Identifies: What's visible vs what was assumed
130
- - Diagnoses: Why previous attempts failed
131
- - Recommends: Better approach based on visual reality
132
-
133
- **Output:**
134
- - Visual analysis
135
- - Root cause of failures
136
- - Specific instructions for next attempt
137
- - Elements found/not found
138
-
139
- ### 6. Script Generation
140
- **File:** `script-utils.ts` → `generateTestScript()`
141
-
142
- **Output Format:**
143
- ```javascript
144
- /*
145
- This is a TestChimp Smart Test.
146
- Version: 1.0
147
-
148
- #login #coreHR #peopleHR
149
- */
150
-
151
- import { test, expect } from '@playwright/test';
152
- test('testName', async ({ page, browser, context }) => {
153
- // Step 1: Go to URL
154
- await page.goto('https://...', { waitUntil: 'domcontentloaded', timeout: 10000 });
155
-
156
- // Step 2: Login with credentials: Willy, Willy@1234
157
- await page.fill('username', 'Willy');
158
- await page.fill('password', 'Willy@1234');
159
- await page.click('button[name="Login"]');
160
-
161
- // Step 3: Click on All Modules [FAILED]
162
- // Attempted: await page.getByText('All Modules').click();
163
- });
164
- ```
165
-
166
- ## Configuration & Timeouts
167
-
168
- **Default Timeout:** 5 seconds (fast feedback on wrong selectors)
169
-
170
- **Navigation Timeout:** 10 seconds explicit (handles redirects)
171
- ```typescript
172
- await page.goto(url, { waitUntil: 'domcontentloaded', timeout: 10000 })
173
- ```
174
-
175
- **Why:**
176
- - 5s for element operations → fast failure on wrong selectors (not 10s wait per wrong selector)
177
- - 10s explicit for navigation → handles redirects properly
178
- - Best of both: fast iteration + reliable navigation
179
-
180
- ## Key Improvements
181
-
182
- ### 1. Value Preservation Throughout Flow
183
-
184
- **Problem:** Losing specific values (credentials, amounts, etc.)
185
-
186
- **Solution:** Preserve at every stage
187
-
188
- | Stage | Before | After |
189
- |-------|--------|-------|
190
- | Breakdown | "Login with user/pass" | "Login with credentials: Willy, Willy@1234" |
191
- | Goal | "Complete login" | "Login with credentials: Willy, Willy@1234" |
192
- | Command | `process.env.USERNAME` | `'Willy'` |
193
-
194
- ### 2. Semantic Action Understanding
195
-
196
- **Problem:** Marking "Login" complete after just filling fields
197
-
198
- **Solution:** Recognize implicit final actions
199
-
200
- - "Login" → fill + **click login button**
201
- - "Send" → type + **click send button**
202
- - "Submit" → fill + **click submit button**
203
-
204
- ### 3. Navigation & Redirect Handling
205
-
206
- **Problem:** Retrying original URL after successful redirect
207
-
208
- **Solution:**
209
- - Check current URL after navigation errors
210
- - If URL changed from `about:blank` → navigation succeeded
211
- - Use `domcontentloaded` for redirects (more reliable than `load`)
212
- - Don't retry if already on a page
213
-
214
- ### 4. Vision Diagnostics
215
-
216
- **Problem:** Vision never triggering (was checking per-sub-action attempt)
217
-
218
- **Solution:** Trigger on total failures across all sub-actions
219
- - Changed from `attempt === 2` → `totalFailedAttemptsForStep >= 2`
220
- - Now triggers after 2+ failures regardless of sub-action boundaries
221
- - Detailed logging shows when/why vision triggers or doesn't
222
-
223
- ### 5. Enhanced Logging
224
-
225
- **Visibility:**
226
- - ✅ Console logs for Debug Console
227
- - ✅ outputChannel for Output panel
228
- - ✅ Timestamps on all logs
229
- - ✅ Version markers
230
- - ✅ Vision trigger decision logs
231
-
232
- **Format:**
233
- ```
234
- [02:58:15.613] [ScenarioWorker] 🚀 RUNNER-CORE VERSION: v1.5.0-vision-preserve-values
235
- [02:58:15.614] [ScenarioWorker] Step 1 - Sub-action 1, Attempt 1: Go to URL
236
- [02:58:15.650] [ScenarioWorker] 🔍 Vision trigger check: subAction=1, attempt=0, totalFailed=0, usedVision=false
237
- [02:58:15.651] [ScenarioWorker] 📝 Using DOM-based approach (0 failures so far, need 2+)
238
- ```
239
-
240
- ## Retry & Failure Budget
241
-
242
- **Per Step Limits:**
243
- - `MAX_RETRIES_PER_STEP = 3` → 4 attempts per sub-action (0, 1, 2, 3)
244
- - `MAX_SUBACTIONS_PER_STEP = 5` → Max 5 different commands for one step
245
- - `MAX_FAILED_ATTEMPTS_PER_STEP = 12` → Hard limit on total failures
246
-
247
- **Early Termination:**
248
- - After 2 consecutive step failures → stop execution
249
- - Saves resources, prevents runaway costs
250
-
251
- ## Error Context Enhancement
252
-
253
- **Navigation errors now include current URL:**
254
- ```
255
- Error: Timeout 10000ms exceeded | Current URL: https://redirected-url.com
256
- ```
257
-
258
- This helps LLM understand:
259
- - Navigation succeeded but redirected
260
- - Don't retry original URL
261
- - Proceed with current page
262
-
263
- ## Workflow Example
264
-
265
- **Scenario:**
266
- ```
267
- - Go to https://app.com/login
268
- - Login with credentials: admin, pass123
269
- - Click dashboard
270
- ```
271
-
272
- **Execution:**
273
-
274
- **Step 1: Navigate**
275
- ```
276
- Attempt 1: goto(url, {domcontentloaded, timeout: 10000}) → ✅ Success
277
- Goal check: COMPLETE (navigation is single-step action)
278
- ```
279
-
280
- **Step 2: Login**
281
- ```
282
- Sub-action 1, Attempt 1: fill(username, 'admin') → ✅ Success
283
- Goal check: INCOMPLETE (login needs username + password + click)
284
- nextSubGoal: "Enter password and click login"
285
-
286
- Sub-action 2, Attempt 1: fill(password, 'pass123') → ✅ Success
287
- Goal check: INCOMPLETE (still need to click login button)
288
- nextSubGoal: "Click login button to submit credentials"
289
-
290
- Sub-action 3, Attempt 1: click(login button) → ✅ Success
291
- Goal check: COMPLETE (all parts of login done)
292
- ```
293
-
294
- **Step 3: Click dashboard**
295
- ```
296
- Sub-action 1, Attempt 1: click(dashboard) → ❌ Fail (not visible)
297
- 🔍 Vision check: totalFailed=1, need 2+
298
- 📝 Using DOM (1 failure, need 2+)
299
-
300
- Sub-action 1, Attempt 2: waitFor + click → ❌ Fail (still not visible)
301
- 🔍 Vision check: totalFailed=2, usedVision=false
302
- 🎯 VISION TRIGGER: 2 total failures - assessing...
303
- 💭 LLM: SCREENSHOT NEEDED ✅
304
- 📸 Taking screenshot...
305
- 👔 Supervisor analyzing...
306
- 🔨 Generating vision-aided command...
307
-
308
- Sub-action 1, Attempt 3: [vision-aided command] → ✅ Success
309
- Goal check: COMPLETE
310
- ```
311
-
312
- ## Testing Checklist
313
-
314
- - [ ] Specific values preserved (credentials, names, amounts)
315
- - [ ] Semantic actions complete fully (login includes button click)
316
- - [ ] Navigation redirects handled (no URL retry loops)
317
- - [ ] Vision triggers after 2+ failures
318
- - [ ] Vision logs show decision reasoning
319
- - [ ] Timeouts appropriate (5s default, 10s navigation)
320
- - [ ] Error context includes current URL
321
- - [ ] Failed steps don't show previous step commands
322
- - [ ] Version marker visible in logs
323
-
324
- ## Version Tracking
325
-
326
- **Current Version:** `v1.5.0-vision-preserve-values`
327
-
328
- **Version log location:**
329
- - During initialization: `[ScenarioWorker] 🚀 RUNNER-CORE VERSION: v1.5.0-vision-preserve-values`
330
- - Increment for each significant change
331
-
332
- ## Build & Deploy
333
-
334
- **Local Development:**
335
- ```bash
336
- cd /Users/nuwansam/IdeaProjects/AwareRepo/local/vs-ext
337
- ./build_local.sh
338
- ```
339
-
340
- **What it does:**
341
- 1. Builds runner-core
342
- 2. Packs runner-core (0.0.22)
343
- 3. Installs in vs-ext
344
- 4. Builds vs-ext for staging
345
-
346
- **Verification:**
347
- ```bash
348
- grep "v1.5.0-vision-preserve-values" node_modules/testchimp-runner-core/dist/scenario-worker-class.js
349
- ```
350
-
351
- ## Related Documentation
352
-
353
- - `VISION_DIAGNOSTICS_IMPROVEMENTS.md` - Vision system details
354
- - `prompts.ts` - All LLM prompts and guidance
355
- - `types.ts` - Type definitions
356
-
357
- ## Future Enhancements
358
-
359
- 1. **Learn from vision insights** - Build library of common patterns
360
- 2. **Optimize vision timing** - Better cost/benefit analysis
361
- 3. **Cross-flow learning** - Share insights between generation and repair
362
- 4. **Smarter goal parsing** - Better semantic action recognition
363
- 5. **Dynamic timeout adjustment** - Based on operation type
364
-