testchimp-runner-core 0.0.35 → 0.0.37

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (81) hide show
  1. package/dist/orchestrator/orchestrator-agent.d.ts.map +1 -1
  2. package/dist/orchestrator/orchestrator-agent.js +7 -4
  3. package/dist/orchestrator/orchestrator-agent.js.map +1 -1
  4. package/dist/orchestrator/orchestrator-prompts.d.ts.map +1 -1
  5. package/dist/orchestrator/orchestrator-prompts.js +73 -15
  6. package/dist/orchestrator/orchestrator-prompts.js.map +1 -1
  7. package/dist/orchestrator/page-som-handler.d.ts +1 -2
  8. package/dist/orchestrator/page-som-handler.d.ts.map +1 -1
  9. package/dist/orchestrator/page-som-handler.js +51 -25
  10. package/dist/orchestrator/page-som-handler.js.map +1 -1
  11. package/package.json +6 -1
  12. package/plandocs/BEFORE_AFTER_VERIFICATION.md +0 -148
  13. package/plandocs/COORDINATE_MODE_DIAGNOSIS.md +0 -144
  14. package/plandocs/CREDIT_CALLBACK_ARCHITECTURE.md +0 -253
  15. package/plandocs/HUMAN_LIKE_IMPROVEMENTS.md +0 -642
  16. package/plandocs/IMPLEMENTATION_STATUS.md +0 -108
  17. package/plandocs/INTEGRATION_COMPLETE.md +0 -322
  18. package/plandocs/MULTI_AGENT_ARCHITECTURE_REVIEW.md +0 -844
  19. package/plandocs/ORCHESTRATOR_MVP_SUMMARY.md +0 -539
  20. package/plandocs/PHASE1_ABSTRACTION_COMPLETE.md +0 -241
  21. package/plandocs/PHASE1_FINAL_STATUS.md +0 -210
  22. package/plandocs/PHASE_1_COMPLETE.md +0 -165
  23. package/plandocs/PHASE_1_SUMMARY.md +0 -184
  24. package/plandocs/PLANNING_SESSION_SUMMARY.md +0 -372
  25. package/plandocs/PROMPT_OPTIMIZATION_ANALYSIS.md +0 -120
  26. package/plandocs/PROMPT_SANITY_CHECK.md +0 -120
  27. package/plandocs/SCRIPT_CLEANUP_FEATURE.md +0 -201
  28. package/plandocs/SCRIPT_GENERATION_ARCHITECTURE.md +0 -364
  29. package/plandocs/SELECTOR_IMPROVEMENTS.md +0 -139
  30. package/plandocs/SESSION_SUMMARY_v0.0.33.md +0 -151
  31. package/plandocs/TROUBLESHOOTING_SESSION.md +0 -72
  32. package/plandocs/VISION_DIAGNOSTICS_IMPROVEMENTS.md +0 -336
  33. package/plandocs/VISUAL_AGENT_EVOLUTION_PLAN.md +0 -396
  34. package/plandocs/WHATS_NEW_v0.0.33.md +0 -183
  35. package/plandocs/exploratory-mode-support-v2.plan.md +0 -953
  36. package/plandocs/exploratory-mode-support.plan.md +0 -928
  37. package/plandocs/journey-id-tracking-addendum.md +0 -227
  38. package/releasenotes/RELEASE_0.0.26.md +0 -165
  39. package/releasenotes/RELEASE_0.0.27.md +0 -236
  40. package/releasenotes/RELEASE_0.0.28.md +0 -286
  41. package/src/auth-config.ts +0 -84
  42. package/src/credit-usage-service.ts +0 -188
  43. package/src/env-loader.ts +0 -103
  44. package/src/execution-service.ts +0 -996
  45. package/src/file-handler.ts +0 -104
  46. package/src/index.ts +0 -432
  47. package/src/llm-facade.ts +0 -821
  48. package/src/llm-provider.ts +0 -53
  49. package/src/model-constants.ts +0 -35
  50. package/src/orchestrator/decision-parser.ts +0 -139
  51. package/src/orchestrator/index.ts +0 -58
  52. package/src/orchestrator/orchestrator-agent.ts +0 -1282
  53. package/src/orchestrator/orchestrator-prompts.ts +0 -786
  54. package/src/orchestrator/page-som-handler.ts +0 -1565
  55. package/src/orchestrator/som-types.ts +0 -188
  56. package/src/orchestrator/tool-registry.ts +0 -184
  57. package/src/orchestrator/tools/check-page-ready.ts +0 -75
  58. package/src/orchestrator/tools/extract-data.ts +0 -92
  59. package/src/orchestrator/tools/index.ts +0 -15
  60. package/src/orchestrator/tools/inspect-page.ts +0 -42
  61. package/src/orchestrator/tools/recall-history.ts +0 -72
  62. package/src/orchestrator/tools/refresh-som-markers.ts +0 -69
  63. package/src/orchestrator/tools/take-screenshot.ts +0 -128
  64. package/src/orchestrator/tools/verify-action-result.ts +0 -159
  65. package/src/orchestrator/tools/view-previous-screenshot.ts +0 -103
  66. package/src/orchestrator/types.ts +0 -291
  67. package/src/playwright-mcp-service.ts +0 -224
  68. package/src/progress-reporter.ts +0 -144
  69. package/src/prompts.ts +0 -842
  70. package/src/providers/backend-proxy-llm-provider.ts +0 -91
  71. package/src/providers/local-llm-provider.ts +0 -38
  72. package/src/scenario-service.ts +0 -252
  73. package/src/scenario-worker-class.ts +0 -1110
  74. package/src/script-utils.ts +0 -203
  75. package/src/types.ts +0 -239
  76. package/src/utils/browser-utils.ts +0 -348
  77. package/src/utils/coordinate-converter.ts +0 -162
  78. package/src/utils/page-info-retry.ts +0 -65
  79. package/src/utils/page-info-utils.ts +0 -285
  80. package/testchimp-runner-core-0.0.35.tgz +0 -0
  81. package/tsconfig.json +0 -19
@@ -1,108 +0,0 @@
1
- # Runner-Core Visual Agent Implementation Status
2
-
3
- ## Phase 1: ✅ COMPLETE (v0.0.33)
4
-
5
- ### Implemented Features:
6
-
7
- 1. **Note to Future Self** - Tactical iteration memory
8
- 2. **Percentage-Based Coordinates** - Last-resort fallback with 3-decimal precision
9
- 3. **Two-Tier Auto-Escalation** - Code-controlled mode switching
10
-
11
- ### Current Behavior (Phase 1):
12
-
13
- ```
14
- Iteration 1-3: Normal Playwright selectors + note-to-self (3 attempts)
15
- ↓ (after 3 failures)
16
- Iteration 4-5: Percentage coordinates (2 attempts max)
17
- ↓ (if both coordinate attempts fail)
18
- Give up - mark as stuck
19
-
20
- Total: Maximum 5 iterations per step
21
- ```
22
-
23
- ---
24
-
25
- ## Phase 2: 📋 PLANNED (Not Started)
26
-
27
- ### Will Add:
28
-
29
- 1. **ElementDetector** - Detect interactive elements with z-index awareness
30
- 2. **VisualMarkerInjector** - Number elements [1], [2], [3] on screenshot
31
- 3. **SelectorResolver** - Translate index → native Playwright selector
32
- 4. **IndexCommandTranslator** - Convert CLICK[3] → native Playwright command
33
-
34
- ### Future Behavior (Phase 2):
35
-
36
- ```
37
- Iteration 1: Playwright selector (1 attempt) → 70% success
38
- ↓ (on first failure)
39
- Iteration 2-3: Index commands CLICK[3] (2 attempts) → 25% success
40
- ↓ (after 3 total failures)
41
- Iteration 4-5: Percentage coordinates (2 attempts max) → 5% success
42
- ↓ (if all fail)
43
- Give up - mark as stuck
44
-
45
- Total: Maximum 5 iterations per step (down from 8)
46
- Average: ~1.5 iterations per step (fast!)
47
- ```
48
-
49
- ### Key Design Principle for Phase 2:
50
-
51
- **During Execution:**
52
- - Agent clicks using `data-testchimp-el="[3]"` (reliable, we inject it)
53
-
54
- **In Generated Script:**
55
- - Translator outputs NATIVE selector: `getByRole('button', {name: 'Menu'})`
56
- - Script works standalone without data-testchimp-el
57
-
58
- **Why Two-Stage:**
59
- 1. Agent needs reliability during exploration → use data attribute
60
- 2. Generated script must be portable → use native selectors
61
- 3. Best of both worlds: reliable execution + maintainable output
62
-
63
- ---
64
-
65
- ## Optimizations vs Original Plan
66
-
67
- ### Original Plan:
68
- - Tier 1: iterations 1-2
69
- - Tier 2: iterations 3-4
70
- - Tier 3: iterations 5+
71
- - Average: ~4 iterations per step
72
-
73
- ### Optimized Plan (Current):
74
- - Tier 1: iteration 1 ONLY (fast path)
75
- - Tier 2: iterations 2-3 (reliable fallback)
76
- - Tier 3: iterations 4+ (absolute last resort)
77
- - **Target: ~1.5 average iterations per step**
78
-
79
- **Rationale:** Don't waste time! Simple tasks finish in 1 iteration, complex tasks escalate quickly to more reliable methods.
80
-
81
- ---
82
-
83
- ## Testing Checklist
84
-
85
- ### Phase 1 (Ready Now):
86
- - [ ] Run PeopleHR scenario - verify note-to-self helps
87
- - [ ] Test coordinate fallback on deliberately difficult case
88
- - [ ] Measure iteration reduction (expect 20-30%)
89
- - [ ] Verify timeout fixes for waitForLoadState
90
-
91
- ### Phase 2 (When Implemented):
92
- - [ ] Test ElementDetector on modals/overlays
93
- - [ ] Verify z-index occlusion detection
94
- - [ ] Validate native selector generation (no data-testchimp-el in output)
95
- - [ ] Run generated scripts standalone - must work!
96
- - [ ] Measure tier distribution: 70/25/5
97
-
98
- ---
99
-
100
- ## Current Version
101
-
102
- **Runner-Core:** v0.0.33
103
- **Status:** Built and ready to test
104
- **Phase 1:** ✅ Complete
105
- **Phase 2:** 📋 Planned but not started
106
-
107
- **Next Step:** Test Phase 1 with PeopleHR scenario to validate improvements before implementing Phase 2.
108
-
@@ -1,322 +0,0 @@
1
- # Runner-Core 0.0.28 - Scriptservice Integration Complete
2
-
3
- ## ✅ Implementation Summary
4
-
5
- Successfully extended runner-core to support both local clients (vs-ext, github-action) and server-side usage (scriptservice) through:
6
- 1. **Lifecycle callbacks** for server-side DB writes and resource management
7
- 2. **Existing browser support** to reuse caller's browser instead of creating new ones
8
- 3. **Drop-in SmartTestRunnerCoreV2** wrapper for seamless migration
9
-
10
- ## Changes Made
11
-
12
- ### 1. Extended ProgressReporter with Lifecycle Callbacks
13
-
14
- **File:** `src/progress-reporter.ts`
15
-
16
- Added optional lifecycle callbacks (used by scriptservice, ignored by local clients):
17
-
18
- ```typescript
19
- export interface StepInfo {
20
- stepId?: string;
21
- stepNumber: number;
22
- description: string;
23
- code?: string;
24
- }
25
-
26
- export interface ProgressReporter {
27
- // Existing callbacks...
28
- onStepProgress?(): Promise<void>;
29
- onJobProgress?(): Promise<void>;
30
-
31
- // NEW: Lifecycle callbacks
32
- beforeStartTest?(page: any, browser: any, context: any): Promise<void>;
33
- beforeStepStart?(step: StepInfo, page: any): Promise<void>;
34
- afterEndTest?(status: 'passed' | 'failed', error?: string, page?: any): Promise<void>;
35
- }
36
- ```
37
-
38
- **Exported StepInfo:** Added to index.ts exports for consumer usage.
39
-
40
- ### 2. Integrated Lifecycle Callbacks
41
-
42
- **Files:**
43
- - `src/execution-service.ts` - RUN_EXACTLY and RUN_WITH_AI_REPAIR modes
44
- - `src/scenario-worker-class.ts` - Orchestrator mode
45
-
46
- **Integration Points:**
47
- - `beforeStartTest`: Called after browser initialization, before execution
48
- - `beforeStepStart`: Called before each step execution
49
- - `afterEndTest`: Called after all execution completes (success or failure)
50
-
51
- **Note:** All callbacks are optional - local clients don't provide them.
52
-
53
- ### 3. Added Existing Browser Support
54
-
55
- **File:** `src/types.ts`
56
-
57
- Extended `ScriptExecutionRequest` to accept existing browser:
58
-
59
- ```typescript
60
- export interface ScriptExecutionRequest {
61
- // ... existing fields ...
62
-
63
- // Optional: Provide existing browser/page/context (for server-side usage)
64
- // If not provided, runner-core will create its own
65
- existingBrowser?: any;
66
- existingContext?: any;
67
- existingPage?: any;
68
- }
69
- ```
70
-
71
- **File:** `src/execution-service.ts`
72
-
73
- Updated `runExactly` and `runWithAIRepair` to:
74
- - Check if existing browser is provided
75
- - Use existing browser if available (don't create new one)
76
- - Don't close browser if we didn't create it (caller owns it)
77
-
78
- **Benefits:**
79
- - **Local clients:** Continue creating their own browser (no change)
80
- - **Scriptservice:** Reuses its existing browser (no double initialization)
81
-
82
- ### 4. Created ScriptserviceLLMProvider
83
-
84
- **File:** `services/scriptservice/providers/scriptservice-llm-provider.ts` (NEW)
85
-
86
- Implements `LLMProvider` interface:
87
- - Wraps scriptservice's existing OpenAI client
88
- - Supports text and vision prompts
89
- - Uses scriptservice's OpenAI API key from ConfigService
90
- - No backend proxy - all calls are local
91
- - No auth needed (scriptservice is already authenticated)
92
-
93
- ```typescript
94
- export class ScriptserviceLLMProvider implements LLMProvider {
95
- async callLLM(request: LLMRequest): Promise<LLMResponse> {
96
- // Uses scriptservice's OpenAI client
97
- // Supports both text and vision (imageUrl)
98
- // Returns answer and optional usage
99
- }
100
- }
101
- ```
102
-
103
- ### 5. Created SmartTestRunnerCoreV2
104
-
105
- **File:** `services/scriptservice/smart-test-runner-core-v2.ts` (NEW)
106
-
107
- Drop-in replacement for SmartTestRunnerCore:
108
-
109
- **Same Interface:**
110
- ```typescript
111
- constructor(config: RunnerConfig)
112
- async runExactly(script: string): Promise<RunExactlyResult>
113
- async runWithRepair(steps: TestStepWithId[]): Promise<RunWithRepairResult>
114
- ```
115
-
116
- **Key Features:**
117
- - Uses `TestChimpService` (runner-core) internally
118
- - Maps scriptservice callbacks to ProgressReporter
119
- - Provides existing browser to runner-core (via `existingBrowser`, `existingContext`, `existingPage`)
120
- - Automatically gets runner-core improvements (better prompts, vision, semantic selectors)
121
-
122
- **Callback Mapping:**
123
- ```typescript
124
- const progressReporter: ProgressReporter = {
125
- onStepProgress: async (stepProgress) => {
126
- // Calls scriptservice's onStepComplete
127
- await config.callbacks.onStepComplete(...);
128
- },
129
- beforeStartTest: config.callbacks?.beforeStartTest,
130
- beforeStepStart: config.callbacks?.beforeStepStart,
131
- afterEndTest: config.callbacks?.afterEndTest
132
- };
133
- ```
134
-
135
- **runWithRepair Implementation:**
136
- - Converts steps to Playwright script
137
- - Calls `runnerCore.executeScript` with mode `RUN_WITH_AI_REPAIR`
138
- - Provides existing browser/page/context
139
- - Maps results back to scriptservice format
140
-
141
- ### 6. Updated Scriptservice Consumers
142
-
143
- **Files:**
144
- - `services/scriptservice/smart-test-execution-handler.ts`
145
- - `services/scriptservice/workers/test-based-explorer.ts`
146
- - `services/scriptservice/package.json` - Added `testchimp-runner-core: file:../../local/runner-core`
147
-
148
- **Migration Strategy:**
149
- ```typescript
150
- // Default to V2, can toggle with environment variable
151
- const useV2 = process.env.USE_RUNNER_CORE_V2 !== 'false'; // Default: true
152
- const runner = useV2
153
- ? new SmartTestRunnerCoreV2(runnerConfig)
154
- : new SmartTestRunnerCore(runnerConfig);
155
- ```
156
-
157
- **Rollback:**
158
- ```bash
159
- export USE_RUNNER_CORE_V2=false
160
- ```
161
-
162
- ## Testing Instructions
163
-
164
- ### 1. Install Local Runner-Core
165
-
166
- ```bash
167
- cd /Users/nuwansam/IdeaProjects/AwareRepo/services/scriptservice
168
-
169
- # Manual install (if npm install doesn't work)
170
- ./install-local-runner-core.sh
171
-
172
- # Or manually
173
- rm -rf node_modules/testchimp-runner-core
174
- mkdir -p node_modules/testchimp-runner-core
175
- cp -r ../../local/runner-core/dist node_modules/testchimp-runner-core/
176
- cp ../../local/runner-core/package.json node_modules/testchimp-runner-core/
177
- ```
178
-
179
- ### 2. Build Scriptservice
180
-
181
- ```bash
182
- cd /Users/nuwansam/IdeaProjects/AwareRepo/services/scriptservice
183
- rm -rf dist
184
- npx tsc
185
- ```
186
-
187
- ### 3. Run Scriptservice
188
-
189
- ```bash
190
- npm run start:staging
191
- # or
192
- npm run start:prod
193
- ```
194
-
195
- ### 4. Test Script Execution
196
-
197
- **Look for log message:**
198
- ```
199
- using SmartTestRunnerCoreV2 (runner-core)
200
- ```
201
-
202
- **Test JSON payload:**
203
- ```json
204
- {
205
- "playwrightConfig": "{}",
206
- "targetUrl": "https://studio--cafetime-afg2v.us-central1.hosted.app/",
207
- "scenario": "- Go to https://studio--cafetime-afg2v.us-central1.hosted.app/\n- Login with alice@example.com, TestPass123\n- Go to Messages tab\n- Send a message \"Hello\"\n- Verify conversation thread shows the sent message\n- Verify message input field is now empty",
208
- "projectId": "test-project"
209
- }
210
- ```
211
-
212
- ### 5. Verify Behaviors
213
-
214
- - ✅ DB writes happen via callbacks
215
- - ✅ Screenshots upload to GCS correctly
216
- - ✅ LLM calls work (all local, no backend proxy)
217
- - ✅ Smart test execution completes
218
- - ✅ Explorer mode works
219
- - ✅ Error handling and reporting work
220
- - ✅ Lifecycle callbacks fire (beforeStartTest, beforeStepStart, afterEndTest)
221
-
222
- ## Benefits
223
-
224
- ### For Scriptservice:
225
- 1. **Zero Code Duplication:** Removed ~600 lines of duplicated execution/repair logic
226
- 2. **Auto-Updates:** Automatically gets runner-core improvements
227
- 3. **Better Prompts:** Semantic selectors, improved vision, cleaner scripts
228
- 4. **Consistent Behavior:** Same logic as vs-extension and github-action
229
- 5. **No Breaking Changes:** Drop-in replacement with environment flag for safety
230
-
231
- ### For Runner-Core:
232
- 1. **Universal Library:** Works for both client-side and server-side
233
- 2. **Flexible Architecture:** Callbacks and browser injection allow different execution patterns
234
- 3. **Backward Compatible:** Existing consumers unaffected (all new features are optional)
235
-
236
- ## Architecture
237
-
238
- ```
239
- ┌─────────────────────────────────────────────────────────────┐
240
- │ SCRIPTSERVICE │
241
- │ │
242
- │ ┌────────────────────────────────────────────────────┐ │
243
- │ │ SmartTestRunnerCoreV2 (Thin Wrapper) │ │
244
- │ │ - Maps scriptservice callbacks to ProgressReporter│ │
245
- │ │ - Provides existing browser to runner-core │ │
246
- │ │ - Converts steps ↔ script format │ │
247
- │ └───────────────┬────────────────────────────────────┘ │
248
- │ │ │
249
- │ ▼ │
250
- │ ┌────────────────────────────────────────────────────┐ │
251
- │ │ ScriptserviceLLMProvider (Local OpenAI) │ │
252
- │ │ - Wraps scriptservice's OpenAI client │ │
253
- │ │ - No backend proxy, all calls local │ │
254
- │ └────────────────────────────────────────────────────┘ │
255
- │ │
256
- └─────────────────────────────────────────────────────────────┘
257
-
258
-
259
- ┌─────────────────────────────────────────────────────────────┐
260
- │ RUNNER-CORE (v0.0.28) │
261
- │ │
262
- │ ┌────────────────────────────────────────────────────┐ │
263
- │ │ ExecutionService │ │
264
- │ │ - Accepts existingBrowser/Page/Context (NEW) │ │
265
- │ │ - Calls lifecycle callbacks (NEW) │ │
266
- │ │ - RUN_EXACTLY and RUN_WITH_AI_REPAIR modes │ │
267
- │ └────────────────────────────────────────────────────┘ │
268
- │ │
269
- │ ┌────────────────────────────────────────────────────┐ │
270
- │ │ ProgressReporter Interface │ │
271
- │ │ - onStepProgress (existing) │ │
272
- │ │ - beforeStartTest (NEW) │ │
273
- │ │ - beforeStepStart (NEW) │ │
274
- │ │ - afterEndTest (NEW) │ │
275
- │ └────────────────────────────────────────────────────┘ │
276
- │ │
277
- └─────────────────────────────────────────────────────────────┘
278
- ```
279
-
280
- ## Files Changed
281
-
282
- ### Runner-Core:
283
- 1. `src/progress-reporter.ts` - Added StepInfo and lifecycle callbacks
284
- 2. `src/index.ts` - Export StepInfo
285
- 3. `src/types.ts` - Added existingBrowser/Context/Page fields
286
- 4. `src/execution-service.ts` - Support existing browser and lifecycle callbacks
287
- 5. `src/scenario-worker-class.ts` - Call lifecycle callbacks in orchestrator mode
288
- 6. `package.json` - Version 0.0.28
289
-
290
- ### Scriptservice:
291
- 7. `providers/scriptservice-llm-provider.ts` - NEW
292
- 8. `smart-test-runner-core-v2.ts` - NEW
293
- 9. `smart-test-execution-handler.ts` - Use V2 by default
294
- 10. `workers/test-based-explorer.ts` - Use V2 by default
295
- 11. `package.json` - Added testchimp-runner-core dependency
296
- 12. `install-local-runner-core.sh` - NEW (helper script for local testing)
297
-
298
- ## Next Steps
299
-
300
- ### Before Publishing to npm:
301
- 1. ✅ Test runExactly mode in scriptservice
302
- 2. ✅ Test runWithRepair mode in scriptservice
303
- 3. ✅ Verify DB writes and screenshots work
304
- 4. ✅ Monitor logs for errors
305
-
306
- ### After Validation:
307
- 1. Publish runner-core 0.0.28 to npm
308
- 2. Update scriptservice package.json to use `^0.0.28` instead of `file:...`
309
- 3. Deploy to staging
310
- 4. After 1-2 weeks, remove environment flag and delete old SmartTestRunnerCore
311
-
312
- ## Breaking Changes
313
-
314
- None. All changes are backward compatible and opt-in.
315
-
316
- ## Notes
317
-
318
- - Lifecycle callbacks are optional - local clients don't need them
319
- - Existing browser is optional - if not provided, runner-core creates its own
320
- - V2 defaults to enabled but can be disabled with `USE_RUNNER_CORE_V2=false`
321
- - Old SmartTestRunnerCore remains available as fallback
322
-