testchimp-runner-core 0.0.34 → 0.0.36

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (150) hide show
  1. package/dist/execution-service.d.ts +1 -4
  2. package/dist/execution-service.d.ts.map +1 -1
  3. package/dist/execution-service.js +155 -468
  4. package/dist/execution-service.js.map +1 -1
  5. package/dist/index.d.ts +3 -1
  6. package/dist/index.d.ts.map +1 -1
  7. package/dist/index.js +11 -1
  8. package/dist/index.js.map +1 -1
  9. package/dist/orchestrator/decision-parser.d.ts +18 -0
  10. package/dist/orchestrator/decision-parser.d.ts.map +1 -0
  11. package/dist/orchestrator/decision-parser.js +127 -0
  12. package/dist/orchestrator/decision-parser.js.map +1 -0
  13. package/dist/orchestrator/index.d.ts +4 -2
  14. package/dist/orchestrator/index.d.ts.map +1 -1
  15. package/dist/orchestrator/index.js +14 -2
  16. package/dist/orchestrator/index.js.map +1 -1
  17. package/dist/orchestrator/orchestrator-agent.d.ts +17 -14
  18. package/dist/orchestrator/orchestrator-agent.d.ts.map +1 -1
  19. package/dist/orchestrator/orchestrator-agent.js +534 -204
  20. package/dist/orchestrator/orchestrator-agent.js.map +1 -1
  21. package/dist/orchestrator/orchestrator-prompts.d.ts +14 -2
  22. package/dist/orchestrator/orchestrator-prompts.d.ts.map +1 -1
  23. package/dist/orchestrator/orchestrator-prompts.js +529 -247
  24. package/dist/orchestrator/orchestrator-prompts.js.map +1 -1
  25. package/dist/orchestrator/page-som-handler.d.ts +106 -0
  26. package/dist/orchestrator/page-som-handler.d.ts.map +1 -0
  27. package/dist/orchestrator/page-som-handler.js +1353 -0
  28. package/dist/orchestrator/page-som-handler.js.map +1 -0
  29. package/dist/orchestrator/som-types.d.ts +149 -0
  30. package/dist/orchestrator/som-types.d.ts.map +1 -0
  31. package/dist/orchestrator/som-types.js +87 -0
  32. package/dist/orchestrator/som-types.js.map +1 -0
  33. package/dist/orchestrator/tool-registry.d.ts +2 -0
  34. package/dist/orchestrator/tool-registry.d.ts.map +1 -1
  35. package/dist/orchestrator/tool-registry.js.map +1 -1
  36. package/dist/orchestrator/tools/index.d.ts +4 -1
  37. package/dist/orchestrator/tools/index.d.ts.map +1 -1
  38. package/dist/orchestrator/tools/index.js +7 -2
  39. package/dist/orchestrator/tools/index.js.map +1 -1
  40. package/dist/orchestrator/tools/refresh-som-markers.d.ts +12 -0
  41. package/dist/orchestrator/tools/refresh-som-markers.d.ts.map +1 -0
  42. package/dist/orchestrator/tools/refresh-som-markers.js +64 -0
  43. package/dist/orchestrator/tools/refresh-som-markers.js.map +1 -0
  44. package/dist/orchestrator/tools/view-previous-screenshot.d.ts +15 -0
  45. package/dist/orchestrator/tools/view-previous-screenshot.d.ts.map +1 -0
  46. package/dist/orchestrator/tools/view-previous-screenshot.js +92 -0
  47. package/dist/orchestrator/tools/view-previous-screenshot.js.map +1 -0
  48. package/dist/orchestrator/types.d.ts +23 -1
  49. package/dist/orchestrator/types.d.ts.map +1 -1
  50. package/dist/orchestrator/types.js +11 -1
  51. package/dist/orchestrator/types.js.map +1 -1
  52. package/dist/scenario-service.d.ts +5 -0
  53. package/dist/scenario-service.d.ts.map +1 -1
  54. package/dist/scenario-service.js +17 -0
  55. package/dist/scenario-service.js.map +1 -1
  56. package/dist/scenario-worker-class.d.ts +4 -0
  57. package/dist/scenario-worker-class.d.ts.map +1 -1
  58. package/dist/scenario-worker-class.js +18 -3
  59. package/dist/scenario-worker-class.js.map +1 -1
  60. package/dist/testing/agent-tester.d.ts +35 -0
  61. package/dist/testing/agent-tester.d.ts.map +1 -0
  62. package/dist/testing/agent-tester.js +84 -0
  63. package/dist/testing/agent-tester.js.map +1 -0
  64. package/dist/testing/ref-translator-tester.d.ts +44 -0
  65. package/dist/testing/ref-translator-tester.d.ts.map +1 -0
  66. package/dist/testing/ref-translator-tester.js +104 -0
  67. package/dist/testing/ref-translator-tester.js.map +1 -0
  68. package/dist/utils/hierarchical-selector.d.ts +47 -0
  69. package/dist/utils/hierarchical-selector.d.ts.map +1 -0
  70. package/dist/utils/hierarchical-selector.js +212 -0
  71. package/dist/utils/hierarchical-selector.js.map +1 -0
  72. package/dist/utils/page-info-retry.d.ts +14 -0
  73. package/dist/utils/page-info-retry.d.ts.map +1 -0
  74. package/dist/utils/page-info-retry.js +60 -0
  75. package/dist/utils/page-info-retry.js.map +1 -0
  76. package/dist/utils/page-info-utils.d.ts +1 -0
  77. package/dist/utils/page-info-utils.d.ts.map +1 -1
  78. package/dist/utils/page-info-utils.js +46 -18
  79. package/dist/utils/page-info-utils.js.map +1 -1
  80. package/dist/utils/ref-attacher.d.ts +21 -0
  81. package/dist/utils/ref-attacher.d.ts.map +1 -0
  82. package/dist/utils/ref-attacher.js +149 -0
  83. package/dist/utils/ref-attacher.js.map +1 -0
  84. package/dist/utils/ref-translator.d.ts +49 -0
  85. package/dist/utils/ref-translator.d.ts.map +1 -0
  86. package/dist/utils/ref-translator.js +276 -0
  87. package/dist/utils/ref-translator.js.map +1 -0
  88. package/package.json +6 -1
  89. package/RELEASE_0.0.26.md +0 -165
  90. package/RELEASE_0.0.27.md +0 -236
  91. package/RELEASE_0.0.28.md +0 -286
  92. package/plandocs/BEFORE_AFTER_VERIFICATION.md +0 -148
  93. package/plandocs/COORDINATE_MODE_DIAGNOSIS.md +0 -144
  94. package/plandocs/CREDIT_CALLBACK_ARCHITECTURE.md +0 -253
  95. package/plandocs/HUMAN_LIKE_IMPROVEMENTS.md +0 -642
  96. package/plandocs/IMPLEMENTATION_STATUS.md +0 -108
  97. package/plandocs/INTEGRATION_COMPLETE.md +0 -322
  98. package/plandocs/MULTI_AGENT_ARCHITECTURE_REVIEW.md +0 -844
  99. package/plandocs/ORCHESTRATOR_MVP_SUMMARY.md +0 -539
  100. package/plandocs/PHASE1_ABSTRACTION_COMPLETE.md +0 -241
  101. package/plandocs/PHASE1_FINAL_STATUS.md +0 -210
  102. package/plandocs/PHASE_1_COMPLETE.md +0 -165
  103. package/plandocs/PHASE_1_SUMMARY.md +0 -184
  104. package/plandocs/PLANNING_SESSION_SUMMARY.md +0 -372
  105. package/plandocs/PROMPT_OPTIMIZATION_ANALYSIS.md +0 -120
  106. package/plandocs/PROMPT_SANITY_CHECK.md +0 -120
  107. package/plandocs/SCRIPT_CLEANUP_FEATURE.md +0 -201
  108. package/plandocs/SCRIPT_GENERATION_ARCHITECTURE.md +0 -364
  109. package/plandocs/SELECTOR_IMPROVEMENTS.md +0 -139
  110. package/plandocs/SESSION_SUMMARY_v0.0.33.md +0 -151
  111. package/plandocs/TROUBLESHOOTING_SESSION.md +0 -72
  112. package/plandocs/VISION_DIAGNOSTICS_IMPROVEMENTS.md +0 -336
  113. package/plandocs/VISUAL_AGENT_EVOLUTION_PLAN.md +0 -396
  114. package/plandocs/WHATS_NEW_v0.0.33.md +0 -183
  115. package/src/auth-config.ts +0 -84
  116. package/src/credit-usage-service.ts +0 -188
  117. package/src/env-loader.ts +0 -103
  118. package/src/execution-service.ts +0 -1413
  119. package/src/file-handler.ts +0 -104
  120. package/src/index.ts +0 -422
  121. package/src/llm-facade.ts +0 -821
  122. package/src/llm-provider.ts +0 -53
  123. package/src/model-constants.ts +0 -35
  124. package/src/orchestrator/index.ts +0 -34
  125. package/src/orchestrator/orchestrator-agent.ts +0 -862
  126. package/src/orchestrator/orchestrator-agent.ts.backup +0 -1386
  127. package/src/orchestrator/orchestrator-prompts.ts +0 -474
  128. package/src/orchestrator/tool-registry.ts +0 -182
  129. package/src/orchestrator/tools/check-page-ready.ts +0 -75
  130. package/src/orchestrator/tools/extract-data.ts +0 -92
  131. package/src/orchestrator/tools/index.ts +0 -12
  132. package/src/orchestrator/tools/inspect-page.ts +0 -42
  133. package/src/orchestrator/tools/recall-history.ts +0 -72
  134. package/src/orchestrator/tools/take-screenshot.ts +0 -128
  135. package/src/orchestrator/tools/verify-action-result.ts +0 -159
  136. package/src/orchestrator/types.ts +0 -248
  137. package/src/playwright-mcp-service.ts +0 -224
  138. package/src/progress-reporter.ts +0 -144
  139. package/src/prompts.ts +0 -842
  140. package/src/providers/backend-proxy-llm-provider.ts +0 -91
  141. package/src/providers/local-llm-provider.ts +0 -38
  142. package/src/scenario-service.ts +0 -232
  143. package/src/scenario-worker-class.ts +0 -1089
  144. package/src/script-utils.ts +0 -203
  145. package/src/types.ts +0 -239
  146. package/src/utils/browser-utils.ts +0 -348
  147. package/src/utils/coordinate-converter.ts +0 -162
  148. package/src/utils/page-info-utils.ts +0 -250
  149. package/testchimp-runner-core-0.0.33.tgz +0 -0
  150. package/tsconfig.json +0 -19
package/RELEASE_0.0.27.md DELETED
@@ -1,236 +0,0 @@
1
- # Release 0.0.27 - Clean Logs and Credit Callback Architecture
2
-
3
- ## Summary
4
- Major cleanup of logging architecture with timestamps moved to consumers, minimal initialization logs, credit callback support for server-side integration, and version read from package.json.
5
-
6
- ## Changes in 0.0.27
7
-
8
- ### 1. Timestamps at Consumer Level
9
- **Architecture Fix:**
10
- - ❌ Before: runner-core added timestamps
11
- - ✅ After: Consumer adds timestamps in their timezone
12
-
13
- **runner-core:**
14
- - Removed all timestamp formatting
15
- - Reports raw messages via callbacks
16
-
17
- **vs-extension:**
18
- - Added `formatLocalTimestamp()` utility function
19
- - Wraps outputChannel to add timestamps automatically
20
- - Format: `HH:MM:SS.mmm` in local timezone
21
-
22
- ### 2. Minimal Initialization Logs
23
- **Before:**
24
- ```
25
- 🤖 Initializing Orchestrator Mode
26
- ✓ Orchestrator initialized with 5 tools (DEBUG MODE)
27
- ═══════════════════════════════════════════════════════
28
- 🚀 RUNNER-CORE VERSION: v1.5.0-vision-preserve-values
29
- ═══════════════════════════════════════════════════════
30
- Initializing Scenario worker...
31
- Scenario worker initialized with session: scenario_worker_XXX
32
- Scenario worker initialized (Orchestrator Mode) with session...
33
- Scenario service initialized
34
- ```
35
-
36
- **After:**
37
- ```
38
- testchimp-runner-core v0.0.27
39
- 📋 Processing scenario: [scenario description]
40
- ```
41
-
42
- **Changes:**
43
- - Single version log (reads from package.json)
44
- - No internal initialization details
45
- - Clean, minimal output
46
-
47
- ### 3. Credit Callback Architecture
48
- **Purpose:** Allow server-side integration to update DB directly without axios calls
49
-
50
- **Implementation:**
51
- ```typescript
52
- export interface CreditUsage {
53
- credits: number;
54
- usageReason: CreditUsageReason;
55
- jobId?: string;
56
- timestamp: number;
57
- }
58
-
59
- export type CreditUsageCallback = (usage: CreditUsage) => void | Promise<void>;
60
- ```
61
-
62
- **Behavior:**
63
- 1. **If callback provided** (server-side):
64
- - ✅ Call callback → Direct DB update
65
- - ❌ NO axios calls made
66
-
67
- 2. **If NO callback but auth configured** (client-side):
68
- - ❌ No callback
69
- - ✅ Makes axios call to backend API
70
-
71
- 3. **If neither** (development):
72
- - Logs warning
73
- - Continues without tracking
74
-
75
- **Usage:**
76
- ```typescript
77
- // Server-side
78
- const service = new TestChimpService(
79
- fileHandler, undefined, backendUrl, maxWorkers,
80
- llmProvider, progressReporter, orchestratorOptions,
81
- async (creditUsage) => {
82
- await db.insertCreditUsage(creditUsage); // Direct DB
83
- }
84
- );
85
-
86
- // Client-side (vs-ext, github-action)
87
- const service = new TestChimpService(
88
- fileHandler, authConfig, backendUrl
89
- // No callback - uses axios
90
- );
91
- ```
92
-
93
- ### 4. Wrapped OutputChannel
94
- **Problem:** runner-core wrote directly to outputChannel without timestamps
95
-
96
- **Solution:** vs-extension wraps the outputChannel:
97
- ```typescript
98
- const wrappedChannel = {
99
- appendLine: (message) => {
100
- const timestamp = formatLocalTimestamp();
101
- const shouldLog = isDev || !isVerboseDebugLog;
102
- if (shouldLog) {
103
- this.outputChannel?.appendLine(`[${timestamp}] ${message}`);
104
- }
105
- }
106
- };
107
- service.setOutputChannel(wrappedChannel);
108
- ```
109
-
110
- **Result:**
111
- - ✅ All logs get timestamps
112
- - ✅ Filtering applied
113
- - ✅ Consumer controls presentation
114
-
115
- ### 5. Dynamic Version in build_local.sh
116
- **Before:**
117
- ```bash
118
- cp testchimp-runner-core-0.0.22.tgz . # Hardcoded!
119
- ```
120
-
121
- **After:**
122
- ```bash
123
- RUNNER_VERSION=$(node -p "require('./package.json').version")
124
- TARBALL_NAME="testchimp-runner-core-${RUNNER_VERSION}.tgz"
125
- cp "$PARENT_DIR/runner-core/${TARBALL_NAME}" .
126
- ```
127
-
128
- **Benefit:** Always uses correct version, no manual updates needed
129
-
130
- ## New Exports
131
-
132
- ```typescript
133
- export {
134
- // Credit usage types
135
- CreditUsageCallback,
136
- CreditUsage,
137
- CreditUsageReason,
138
-
139
- // Existing exports...
140
- TestChimpService,
141
- // ...
142
- }
143
- ```
144
-
145
- ## Files Modified
146
-
147
- ### runner-core
148
- 1. `/src/scenario-worker-class.ts` - Removed timestamps, minimal logs, version from package.json
149
- 2. `/src/scenario-service.ts` - Removed initialization logs
150
- 3. `/src/credit-usage-service.ts` - Added callback architecture, callback-first logic
151
- 4. `/src/index.ts` - Credit callback support, preserve across recreations
152
-
153
- ### vs-extension
154
- 1. `/src/embedded-service.ts` - `formatLocalTimestamp()` utility, wrapped outputChannel, consistent filtering
155
- 2. `/build_local.sh` - Dynamic version detection
156
- 3. `/package.json` - Updated to `^0.0.27`
157
-
158
- ### github-action
159
- 1. `/package.json` - Updated to `^0.0.27`
160
-
161
- ## Published to npm
162
-
163
- ```
164
- ✅ Published: testchimp-runner-core@0.0.27
165
- 📦 Package Size: ~245 kB
166
- 📋 Registry: https://registry.npmjs.org/
167
- ```
168
-
169
- ## Benefits
170
-
171
- ### Logging
172
- 1. **Local Timezone** - Timestamps match user's clock
173
- 2. **Clean Output** - Only essential information
174
- 3. **Consumer Control** - Consumer decides format and filtering
175
- 4. **No Verbose Init** - Single version log instead of 10+ lines
176
-
177
- ### Credit Tracking
178
- 1. **Server-Side** - Direct DB updates, no HTTP overhead
179
- 2. **Client-Side** - Existing axios behavior preserved
180
- 3. **Flexible** - Each consumer decides how to track
181
- 4. **Observable** - Callback provides visibility
182
-
183
- ### Architecture
184
- 1. **Separation of Concerns** - Library reports, consumer presents
185
- 2. **Environment Agnostic** - Library doesn't assume timezone/environment
186
- 3. **Testable** - Easy to mock callbacks
187
- 4. **Maintainable** - Clear boundaries
188
-
189
- ## Migration
190
-
191
- ### For All Consumers
192
- Update package.json:
193
- ```json
194
- {
195
- "dependencies": {
196
- "testchimp-runner-core": "^0.0.27"
197
- }
198
- }
199
- ```
200
-
201
- ### For Server-Side (Optional)
202
- Add credit callback:
203
- ```typescript
204
- const service = new TestChimpService(
205
- fileHandler, undefined, backendUrl, maxWorkers,
206
- llmProvider, progressReporter, orchestratorOptions,
207
- async (creditUsage) => {
208
- await creditRepository.insert(creditUsage);
209
- }
210
- );
211
- ```
212
-
213
- ## Backward Compatibility
214
-
215
- ✅ Fully backward compatible
216
- - All parameters optional
217
- - Existing behavior preserved
218
- - No breaking changes
219
-
220
- ## Complete Feature Set
221
-
222
- All improvements from today are now in 0.0.27:
223
- 1. ✅ Semantic selector preference
224
- 2. ✅ Playwright expect() assertions
225
- 3. ✅ Script cleanup feature
226
- 4. ✅ Fixed comment placement
227
- 5. ✅ Focused step execution
228
- 6. ✅ Orchestrator reasoning logs in output channel
229
- 7. ✅ Environment-aware log filtering (consumer-side)
230
- 8. ✅ Local timezone timestamps
231
- 9. ✅ Clean initialization logs
232
- 10. ✅ Credit callback architecture
233
- 11. ✅ Version from package.json
234
-
235
- Ready for production! 🚀
236
-
package/RELEASE_0.0.28.md DELETED
@@ -1,286 +0,0 @@
1
- # Runner-Core 0.0.28 Release - Scriptservice Integration
2
-
3
- ## Overview
4
-
5
- Extended runner-core with lifecycle callbacks and created SmartTestRunnerCoreV2 wrapper to enable scriptservice integration. This release makes runner-core a drop-in replacement for scriptservice's duplicated SmartTestRunnerCore.
6
-
7
- ## Changes
8
-
9
- ### 1. Extended ProgressReporter with Lifecycle Callbacks
10
-
11
- **File:** `src/progress-reporter.ts`
12
-
13
- Added optional lifecycle callbacks to ProgressReporter interface:
14
-
15
- ```typescript
16
- export interface StepInfo {
17
- stepId?: string;
18
- stepNumber: number;
19
- description: string;
20
- code?: string;
21
- }
22
-
23
- export interface ProgressReporter {
24
- // ... existing callbacks ...
25
-
26
- // NEW: Lifecycle callbacks (used by scriptservice, ignored by local clients)
27
- beforeStartTest?(page: any, browser: any, context: any): Promise<void>;
28
- beforeStepStart?(step: StepInfo, page: any): Promise<void>;
29
- afterEndTest?(status: 'passed' | 'failed', error?: string, page?: any): Promise<void>;
30
- }
31
- ```
32
-
33
- **Purpose:**
34
- - `beforeStartTest`: Initialize browser context, set up DB records (scriptservice only)
35
- - `beforeStepStart`: Update step status to IN_PROGRESS in DB (scriptservice only)
36
- - `afterEndTest`: Write final status to DB, cleanup resources (scriptservice only)
37
- - Local clients (vs-ext, github-action) can ignore these - they use return values
38
-
39
- ### 2. Integrated Lifecycle Callbacks
40
-
41
- **Files:**
42
- - `src/execution-service.ts` - RUN_EXACTLY and AI_REPAIR modes
43
- - `src/scenario-worker-class.ts` - Orchestrator mode
44
-
45
- **Integration Points:**
46
-
47
- **ExecutionService (RUN_EXACTLY):**
48
- ```typescript
49
- // Before script execution
50
- if (this.progressReporter?.beforeStartTest) {
51
- await this.progressReporter.beforeStartTest(page, browser, context);
52
- }
53
-
54
- // After execution (success or failure)
55
- if (this.progressReporter?.afterEndTest) {
56
- await this.progressReporter.afterEndTest(
57
- success ? 'passed' : 'failed',
58
- error,
59
- page
60
- );
61
- }
62
- ```
63
-
64
- **ExecutionService (AI_REPAIR):**
65
- ```typescript
66
- // Before each step
67
- if (this.progressReporter?.beforeStepStart) {
68
- await this.progressReporter.beforeStepStart(
69
- { stepNumber, description, code },
70
- page
71
- );
72
- }
73
- ```
74
-
75
- **ScenarioWorker (Orchestrator):**
76
- ```typescript
77
- // After browser initialization
78
- if (this.progressReporter?.beforeStartTest) {
79
- await this.progressReporter.beforeStartTest(page, browser, context);
80
- }
81
-
82
- // Before each orchestrator step
83
- if (this.progressReporter?.beforeStepStart) {
84
- await this.progressReporter.beforeStepStart({ stepNumber, description }, page);
85
- }
86
-
87
- // In finally block (before browser close)
88
- if (this.progressReporter?.afterEndTest) {
89
- await this.progressReporter.afterEndTest(
90
- overallSuccess ? 'passed' : 'failed',
91
- error,
92
- page
93
- );
94
- }
95
- ```
96
-
97
- ### 3. Version Bump
98
-
99
- **File:** `package.json`
100
-
101
- Updated version from `0.0.27` to `0.0.28`
102
-
103
- ## Scriptservice Integration
104
-
105
- ### 1. Created ScriptserviceLLMProvider
106
-
107
- **File:** `services/scriptservice/providers/scriptservice-llm-provider.ts` (NEW)
108
-
109
- Implements `LLMProvider` interface for scriptservice:
110
- - Wraps scriptservice's existing OpenAI client
111
- - Supports both text and vision prompts
112
- - Uses scriptservice's OpenAI API key from ConfigService
113
- - No backend proxy - all calls are local
114
- - No token tracking needed (scriptservice doesn't use this interface for tracking)
115
-
116
- ### 2. Created SmartTestRunnerCoreV2
117
-
118
- **File:** `services/scriptservice/smart-test-runner-core-v2.ts` (NEW)
119
-
120
- Drop-in replacement for SmartTestRunnerCore:
121
- - **Same Interface:** Maintains exact same constructor and methods (runExactly, runWithRepair)
122
- - **Uses runner-core:** Delegates to TestChimpService internally
123
- - **Callback Mapping:** Maps scriptservice callbacks to runner-core ProgressReporter
124
- - **Lifecycle Support:** Properly calls beforeStartTest, beforeStepStart, afterEndTest
125
- - **No Breaking Changes:** Can replace SmartTestRunnerCore with zero code changes
126
-
127
- **Key Features:**
128
- ```typescript
129
- export class SmartTestRunnerCoreV2 {
130
- constructor(config: RunnerConfig) {
131
- // Create local LLM provider (no backend)
132
- const llmProvider = new ScriptserviceLLMProvider();
133
-
134
- // Map callbacks to progress reporter
135
- const progressReporter: ProgressReporter = {
136
- onStepProgress: async (stepProgress) => {
137
- // Call scriptservice's onStepComplete
138
- },
139
- beforeStartTest: config.callbacks?.beforeStartTest,
140
- beforeStepStart: config.callbacks?.beforeStepStart,
141
- afterEndTest: config.callbacks?.afterEndTest
142
- };
143
-
144
- // Initialize runner-core (no auth, no backend)
145
- this.runnerCore = new TestChimpService(
146
- undefined, // No file handler
147
- undefined, // No auth
148
- undefined, // No backend URL
149
- 1, // Single worker
150
- llmProvider,
151
- progressReporter
152
- );
153
- }
154
-
155
- async runExactly(script: string): Promise<RunExactlyResult> { /* ... */ }
156
- async runWithRepair(steps: TestStepWithId[]): Promise<RunWithRepairResult> { /* ... */ }
157
- }
158
- ```
159
-
160
- ### 3. Updated Scriptservice Consumers
161
-
162
- **Files Updated:**
163
- - `services/scriptservice/smart-test-execution-handler.ts`
164
- - `services/scriptservice/workers/test-based-explorer.ts`
165
- - `services/scriptservice/package.json` - Added `testchimp-runner-core: ^0.0.28`
166
-
167
- **Migration Strategy:**
168
- ```typescript
169
- // Environment flag for gradual rollout
170
- const useV2 = process.env.USE_RUNNER_CORE_V2 !== 'false'; // Default: true
171
- const runner = useV2
172
- ? new SmartTestRunnerCoreV2(runnerConfig)
173
- : new SmartTestRunnerCore(runnerConfig);
174
- ```
175
-
176
- **Rollout Plan:**
177
- 1. Deploy with `USE_RUNNER_CORE_V2=true` (default)
178
- 2. Monitor scriptservice execution logs
179
- 3. If issues occur, set `USE_RUNNER_CORE_V2=false` to rollback
180
- 4. Once validated, remove flag and delete old SmartTestRunnerCore
181
-
182
- ## Benefits
183
-
184
- ### For Scriptservice:
185
- 1. **Zero Axios Calls:** All LLM calls local, all DB writes via callbacks
186
- 2. **Auto-Updates:** Automatically get runner-core improvements (better prompts, vision, orchestrator)
187
- 3. **Code Deduplication:** Remove ~600 lines of duplicated execution/repair logic
188
- 4. **Consistency:** Same prompts and logic as vs-extension and github-action
189
-
190
- ### For Runner-Core:
191
- 1. **Universal Library:** Works for both client-side (vs-ext) and server-side (scriptservice)
192
- 2. **Flexible Architecture:** Callbacks allow different execution patterns
193
- 3. **Backward Compatible:** Existing consumers unaffected (callbacks are optional)
194
-
195
- ## Testing
196
-
197
- ### Scriptservice Testing:
198
- ```bash
199
- # Test with V2 (default)
200
- USE_RUNNER_CORE_V2=true npm start
201
-
202
- # Test with V1 (legacy fallback)
203
- USE_RUNNER_CORE_V2=false npm start
204
- ```
205
-
206
- ### Expected Behavior:
207
- - V2: Logs show "using SmartTestRunnerCoreV2 (runner-core)"
208
- - V1: Logs show "using SmartTestRunnerCore (legacy)"
209
- - All callbacks (beforeStartTest, beforeStepStart, afterEndTest, onStepComplete) should fire
210
- - DB writes should happen via callbacks
211
- - Screenshots should upload to GCS
212
- - Journey reporting should work
213
-
214
- ## Publishing
215
-
216
- ### Prerequisites:
217
- 1. ✅ Build runner-core: `npm run build`
218
- 2. ✅ No lint errors
219
- 3. ✅ Version bumped to 0.0.28
220
- 4. ⏸️ **Awaiting user permission to publish to npm**
221
-
222
- ### Publish Command (when approved):
223
- ```bash
224
- cd /Users/nuwansam/IdeaProjects/AwareRepo/local/runner-core
225
- npm publish
226
- ```
227
-
228
- ### Post-Publish:
229
- 1. Install in scriptservice: `npm install testchimp-runner-core@0.0.28`
230
- 2. Install in vs-ext: Update to 0.0.28
231
- 3. Install in github-action: Update to 0.0.28
232
-
233
- ## Migration Checklist
234
-
235
- ### Phase 1: Deploy with V2 (Default)
236
- - [x] Add lifecycle callbacks to ProgressReporter
237
- - [x] Integrate callbacks in ExecutionService and ScenarioService
238
- - [x] Create ScriptserviceLLMProvider
239
- - [x] Create SmartTestRunnerCoreV2
240
- - [x] Update scriptservice consumers with environment flag
241
- - [ ] Publish runner-core 0.0.28 (needs user permission)
242
- - [ ] Install 0.0.28 in scriptservice
243
- - [ ] Deploy scriptservice with USE_RUNNER_CORE_V2=true
244
- - [ ] Monitor logs and execution results
245
-
246
- ### Phase 2: Validate (1-2 weeks)
247
- - [ ] Verify all smart test executions work
248
- - [ ] Verify explorer mode works
249
- - [ ] Verify script generation works
250
- - [ ] Check DB writes happen correctly
251
- - [ ] Check screenshot uploads work
252
- - [ ] Monitor for any errors or regressions
253
-
254
- ### Phase 3: Full Migration
255
- - [ ] Remove environment flag check
256
- - [ ] Replace all `new SmartTestRunnerCore` with `new SmartTestRunnerCoreV2`
257
- - [ ] Delete old `smart-test-runner-core.ts` file
258
- - [ ] Optional: Rename V2 to just `SmartTestRunnerCore`
259
-
260
- ## Breaking Changes
261
-
262
- None. All changes are backward compatible.
263
-
264
- ## Files Changed
265
-
266
- ### Runner-Core:
267
- - `src/progress-reporter.ts` - Added lifecycle callbacks
268
- - `src/execution-service.ts` - Integrated callbacks
269
- - `src/scenario-worker-class.ts` - Integrated callbacks
270
- - `package.json` - Version bump to 0.0.28
271
-
272
- ### Scriptservice:
273
- - `providers/scriptservice-llm-provider.ts` - NEW
274
- - `smart-test-runner-core-v2.ts` - NEW
275
- - `smart-test-execution-handler.ts` - Added V2 usage with flag
276
- - `workers/test-based-explorer.ts` - Added V2 usage with flag
277
- - `package.json` - Added testchimp-runner-core dependency
278
-
279
- ## Notes
280
-
281
- - Lifecycle callbacks are **optional** - local clients (vs-ext, github-action) don't need them
282
- - Scriptservice uses callbacks for DB writes and resource management
283
- - V2 wrapper uses existing page/browser/context (doesn't create new ones)
284
- - LLM calls in V2 are all local (no backend proxy)
285
- - No authentication needed for scriptservice (already authenticated at service level)
286
-
@@ -1,148 +0,0 @@
1
- # Before/After Screenshot Verification
2
-
3
- ## Feature: Visual Goal Verification for Coordinate Actions
4
-
5
- ### Problem Solved:
6
- When using coordinate-based actions (clicking at x,y%), the agent has no way to know if the click achieved the goal:
7
- - No element reference to check state
8
- - No selector feedback
9
- - Can't verify if expected page loaded or modal opened
10
-
11
- This led to:
12
- - False positives (click succeeded but goal not achieved)
13
- - Infinite loops (agent keeps clicking, unsure if it worked)
14
-
15
- ### Solution:
16
- Automatic before/after screenshot comparison after coordinate clicks.
17
-
18
- ## How It Works:
19
-
20
- ### 1. **Automatic Trigger** (No Agent Action Required)
21
- When agent uses coordinate action:
22
- ```typescript
23
- Iteration 4: 🎯 Coordinate mode activated
24
- Step 1: Capture BEFORE screenshot
25
- Step 2: Execute coordinate click (x%, y%)
26
- Step 3: Wait 1000ms for UI to settle
27
- Step 4: Capture AFTER screenshot
28
- Step 5: Call LLM with both images (labeled "BEFORE", "AFTER")
29
- Step 6: LLM responds: { goalAchieved: true/false, reasoning: "..." }
30
- Step 7a: If TRUE → Mark complete, exit step ✅
31
- Step 7b: If FALSE → Continue to next iteration, try different coordinates
32
- ```
33
-
34
- ### 2. **LLM Prompt for Verification**
35
- ```
36
- Goal: [Current step goal]
37
-
38
- Compare the BEFORE and AFTER screenshots.
39
-
40
- Did the action achieve the goal? Respond with JSON:
41
- {
42
- "goalAchieved": boolean,
43
- "reasoning": "What changed (or didn't change)",
44
- "visibleChanges": ["List of UI changes observed"]
45
- }
46
-
47
- Focus on:
48
- - Did expected elements appear/disappear?
49
- - Did page navigate or content change?
50
- - Visual indicators of success (new panels, forms, highlights)?
51
-
52
- Be strict: Only return true if you clearly see the expected change.
53
- ```
54
-
55
- ### 3. **Multi-Image LLM Interface**
56
- ```typescript
57
- // NEW: LabeledImage interface
58
- export interface LabeledImage {
59
- label: string; // "Before", "After", etc.
60
- dataUrl: string; // Base64 data URL
61
- }
62
-
63
- // UPDATED: LLMRequest
64
- export interface LLMRequest {
65
- imageUrl?: string; // Backward compatible (single image)
66
- images?: LabeledImage[]; // NEW - multi-image support
67
- }
68
- ```
69
-
70
- ### 4. **Provider Implementation** (scriptservice-llm-provider.ts)
71
- ```typescript
72
- if (request.images && request.images.length > 0) {
73
- for (const img of request.images) {
74
- contentParts.push({ type: 'text', text: `\n[${img.label}]:` });
75
- contentParts.push({ type: 'image_url', image_url: { url: img.dataUrl } });
76
- }
77
- // Sends: [BEFORE]: <image1>, [AFTER]: <image2>
78
- }
79
- ```
80
-
81
- ## When Verification Happens:
82
-
83
- ✅ **Always**: After first coordinate action attempt
84
- ❌ **Never**: After selector-based actions (have element state to check)
85
- ⚠️ **Conditional**: Can add for other scenarios where goal verification is unclear
86
-
87
- ## Cost Considerations:
88
-
89
- **Per verification call:**
90
- - 2 viewport screenshots (~50-100KB each)
91
- - Vision model (gpt-5-mini): ~$0.001 per call
92
- - Used only when coordinate mode activates (after 3 selector failures)
93
-
94
- **Typical scenario:**
95
- - Steps 1-10: Regular selectors → No verification cost
96
- - Step 5 gets stuck → Coordinate mode → 1 verification call → $0.001
97
- - Overall impact: Minimal, used sparingly
98
-
99
- ## Example Flow:
100
-
101
- **Step 5: "Select Employee Information"**
102
- ```
103
- Iteration 1: getByText('Employee Information') → Strict mode ❌
104
- Iteration 2: locator('#collapse-1').getByText('Employee Information') → Click succeeds ✅
105
- BUT: Didn't navigate to Employee Information page (false positive)
106
-
107
- Iteration 3: Selector fails again
108
- Iteration 4: 🎯 Coordinate mode
109
- → BEFORE: Homepage with sidebar
110
- → Click at (19.3%, 22.9%)
111
- → Wait 1s
112
- → AFTER: Check screenshot
113
- → LLM: "goalAchieved": true, "reasoning": "Employee Information page loaded with form"
114
- → ✅ Mark complete, exit
115
- ```
116
-
117
- ## Backward Compatibility:
118
-
119
- ✅ **Single image still works:**
120
- ```typescript
121
- const request = {
122
- imageUrl: 'data:image/png;base64,...' // Old way
123
- };
124
- ```
125
-
126
- ✅ **Multi-image NEW:**
127
- ```typescript
128
- const request = {
129
- images: [
130
- { label: 'BEFORE', dataUrl: '...' },
131
- { label: 'AFTER', dataUrl: '...' }
132
- ]
133
- };
134
- ```
135
-
136
- ## Files Modified:
137
-
138
- 1. `runner-core/src/llm-provider.ts` - Added LabeledImage interface and images field
139
- 2. `scriptservice/providers/scriptservice-llm-provider.ts` - Handle multiple images in OpenAI API
140
- 3. `runner-core/src/orchestrator/orchestrator-agent.ts` - Added verifyGoalWithScreenshotComparison method
141
- 4. Automatic trigger after coordinate actions
142
-
143
- ## Next Steps:
144
-
145
- - ✅ Infrastructure ready
146
- - ⏳ Need to test with real scenario
147
- - 🔮 Future: Could expose as agent-callable tool if needed
148
-