npm - testchimp-runner-core - Versions diffs - 0.0.35 → 0.0.36 - Mend

testchimp-runner-core 0.0.35 → 0.0.36

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (71) hide show

package/package.json +6 -1
package/plandocs/BEFORE_AFTER_VERIFICATION.md +0 -148
package/plandocs/COORDINATE_MODE_DIAGNOSIS.md +0 -144
package/plandocs/CREDIT_CALLBACK_ARCHITECTURE.md +0 -253
package/plandocs/HUMAN_LIKE_IMPROVEMENTS.md +0 -642
package/plandocs/IMPLEMENTATION_STATUS.md +0 -108
package/plandocs/INTEGRATION_COMPLETE.md +0 -322
package/plandocs/MULTI_AGENT_ARCHITECTURE_REVIEW.md +0 -844
package/plandocs/ORCHESTRATOR_MVP_SUMMARY.md +0 -539
package/plandocs/PHASE1_ABSTRACTION_COMPLETE.md +0 -241
package/plandocs/PHASE1_FINAL_STATUS.md +0 -210
package/plandocs/PHASE_1_COMPLETE.md +0 -165
package/plandocs/PHASE_1_SUMMARY.md +0 -184
package/plandocs/PLANNING_SESSION_SUMMARY.md +0 -372
package/plandocs/PROMPT_OPTIMIZATION_ANALYSIS.md +0 -120
package/plandocs/PROMPT_SANITY_CHECK.md +0 -120
package/plandocs/SCRIPT_CLEANUP_FEATURE.md +0 -201
package/plandocs/SCRIPT_GENERATION_ARCHITECTURE.md +0 -364
package/plandocs/SELECTOR_IMPROVEMENTS.md +0 -139
package/plandocs/SESSION_SUMMARY_v0.0.33.md +0 -151
package/plandocs/TROUBLESHOOTING_SESSION.md +0 -72
package/plandocs/VISION_DIAGNOSTICS_IMPROVEMENTS.md +0 -336
package/plandocs/VISUAL_AGENT_EVOLUTION_PLAN.md +0 -396
package/plandocs/WHATS_NEW_v0.0.33.md +0 -183
package/plandocs/exploratory-mode-support-v2.plan.md +0 -953
package/plandocs/exploratory-mode-support.plan.md +0 -928
package/plandocs/journey-id-tracking-addendum.md +0 -227
package/releasenotes/RELEASE_0.0.26.md +0 -165
package/releasenotes/RELEASE_0.0.27.md +0 -236
package/releasenotes/RELEASE_0.0.28.md +0 -286
package/src/auth-config.ts +0 -84
package/src/credit-usage-service.ts +0 -188
package/src/env-loader.ts +0 -103
package/src/execution-service.ts +0 -996
package/src/file-handler.ts +0 -104
package/src/index.ts +0 -432
package/src/llm-facade.ts +0 -821
package/src/llm-provider.ts +0 -53
package/src/model-constants.ts +0 -35
package/src/orchestrator/decision-parser.ts +0 -139
package/src/orchestrator/index.ts +0 -58
package/src/orchestrator/orchestrator-agent.ts +0 -1282
package/src/orchestrator/orchestrator-prompts.ts +0 -786
package/src/orchestrator/page-som-handler.ts +0 -1565
package/src/orchestrator/som-types.ts +0 -188
package/src/orchestrator/tool-registry.ts +0 -184
package/src/orchestrator/tools/check-page-ready.ts +0 -75
package/src/orchestrator/tools/extract-data.ts +0 -92
package/src/orchestrator/tools/index.ts +0 -15
package/src/orchestrator/tools/inspect-page.ts +0 -42
package/src/orchestrator/tools/recall-history.ts +0 -72
package/src/orchestrator/tools/refresh-som-markers.ts +0 -69
package/src/orchestrator/tools/take-screenshot.ts +0 -128
package/src/orchestrator/tools/verify-action-result.ts +0 -159
package/src/orchestrator/tools/view-previous-screenshot.ts +0 -103
package/src/orchestrator/types.ts +0 -291
package/src/playwright-mcp-service.ts +0 -224
package/src/progress-reporter.ts +0 -144
package/src/prompts.ts +0 -842
package/src/providers/backend-proxy-llm-provider.ts +0 -91
package/src/providers/local-llm-provider.ts +0 -38
package/src/scenario-service.ts +0 -252
package/src/scenario-worker-class.ts +0 -1110
package/src/script-utils.ts +0 -203
package/src/types.ts +0 -239
package/src/utils/browser-utils.ts +0 -348
package/src/utils/coordinate-converter.ts +0 -162
package/src/utils/page-info-retry.ts +0 -65
package/src/utils/page-info-utils.ts +0 -285
package/testchimp-runner-core-0.0.35.tgz +0 -0
package/tsconfig.json +0 -19

package/package.json CHANGED Viewed

@@ -1,9 +1,14 @@
 {
   "name": "testchimp-runner-core",
-  "version": "0.0.35",
+  "version": "0.0.36",
   "description": "Core TestChimp functionality for test generation and AI repair",
   "main": "dist/index.js",
   "types": "dist/index.d.ts",
+  "files": [
+    "dist/",
+    "env.prod",
+    "env.staging"
+  ],
   "scripts": {
     "build": "tsc",
     "watch": "tsc --watch",

package/plandocs/BEFORE_AFTER_VERIFICATION.md DELETED Viewed

@@ -1,148 +0,0 @@
-# Before/After Screenshot Verification
-## Feature: Visual Goal Verification for Coordinate Actions
-### Problem Solved:
-When using coordinate-based actions (clicking at x,y%), the agent has no way to know if the click achieved the goal:
-- No element reference to check state
-- No selector feedback
-- Can't verify if expected page loaded or modal opened
-This led to:
-- False positives (click succeeded but goal not achieved)
-- Infinite loops (agent keeps clicking, unsure if it worked)
-### Solution:
-Automatic before/after screenshot comparison after coordinate clicks.
-## How It Works:
-### 1. **Automatic Trigger** (No Agent Action Required)
-When agent uses coordinate action:
-```typescript
-Iteration 4: 🎯 Coordinate mode activated
-  Step 1: Capture BEFORE screenshot
-  Step 2: Execute coordinate click (x%, y%)
-  Step 3: Wait 1000ms for UI to settle
-  Step 4: Capture AFTER screenshot
-  Step 5: Call LLM with both images (labeled "BEFORE", "AFTER")
-  Step 6: LLM responds: { goalAchieved: true/false, reasoning: "..." }
-  Step 7a: If TRUE → Mark complete, exit step ✅
-  Step 7b: If FALSE → Continue to next iteration, try different coordinates
-```
-### 2. **LLM Prompt for Verification**
-```
-Goal: [Current step goal]
-Compare the BEFORE and AFTER screenshots.
-Did the action achieve the goal? Respond with JSON:
-{
-  "goalAchieved": boolean,
-  "reasoning": "What changed (or didn't change)",
-  "visibleChanges": ["List of UI changes observed"]
-}
-Focus on:
-- Did expected elements appear/disappear?
-- Did page navigate or content change?
-- Visual indicators of success (new panels, forms, highlights)?
-Be strict: Only return true if you clearly see the expected change.
-```
-### 3. **Multi-Image LLM Interface**
-```typescript
-// NEW: LabeledImage interface
-export interface LabeledImage {
-  label: string;      // "Before", "After", etc.
-  dataUrl: string;    // Base64 data URL
-}
-// UPDATED: LLMRequest
-export interface LLMRequest {
-  imageUrl?: string;         // Backward compatible (single image)
-  images?: LabeledImage[];   // NEW - multi-image support
-}
-```
-### 4. **Provider Implementation** (scriptservice-llm-provider.ts)
-```typescript
-if (request.images && request.images.length > 0) {
-  for (const img of request.images) {
-    contentParts.push({ type: 'text', text: `\n[${img.label}]:` });
-    contentParts.push({ type: 'image_url', image_url: { url: img.dataUrl } });
-  }
-  // Sends: [BEFORE]: <image1>, [AFTER]: <image2>
-}
-```
-## When Verification Happens:
-✅ **Always**: After first coordinate action attempt
-❌ **Never**: After selector-based actions (have element state to check)
-⚠️ **Conditional**: Can add for other scenarios where goal verification is unclear
-## Cost Considerations:
-**Per verification call:**
-- 2 viewport screenshots (~50-100KB each)
-- Vision model (gpt-5-mini): ~$0.001 per call
-- Used only when coordinate mode activates (after 3 selector failures)
-**Typical scenario:**
-- Steps 1-10: Regular selectors → No verification cost
-- Step 5 gets stuck → Coordinate mode → 1 verification call → $0.001
-- Overall impact: Minimal, used sparingly
-## Example Flow:
-**Step 5: "Select Employee Information"**
-```
-Iteration 1: getByText('Employee Information') → Strict mode ❌
-Iteration 2: locator('#collapse-1').getByText('Employee Information') → Click succeeds ✅
-           BUT: Didn't navigate to Employee Information page (false positive)
-Iteration 3: Selector fails again
-Iteration 4: 🎯 Coordinate mode
-  → BEFORE: Homepage with sidebar
-  → Click at (19.3%, 22.9%)
-  → Wait 1s
-  → AFTER: Check screenshot
-  → LLM: "goalAchieved": true, "reasoning": "Employee Information page loaded with form"
-  → ✅ Mark complete, exit
-```
-## Backward Compatibility:
-✅ **Single image still works:**
-```typescript
-const request = {
-  imageUrl: 'data:image/png;base64,...'  // Old way
-};
-```
-✅ **Multi-image NEW:**
-```typescript
-const request = {
-  images: [
-    { label: 'BEFORE', dataUrl: '...' },
-    { label: 'AFTER', dataUrl: '...' }
-  ]
-};
-```
-## Files Modified:
-1. `runner-core/src/llm-provider.ts` - Added LabeledImage interface and images field
-2. `scriptservice/providers/scriptservice-llm-provider.ts` - Handle multiple images in OpenAI API
-3. `runner-core/src/orchestrator/orchestrator-agent.ts` - Added verifyGoalWithScreenshotComparison method
-4. Automatic trigger after coordinate actions
-## Next Steps:
-- ✅ Infrastructure ready
-- ⏳ Need to test with real scenario
-- 🔮 Future: Could expose as agent-callable tool if needed

package/plandocs/COORDINATE_MODE_DIAGNOSIS.md DELETED Viewed

@@ -1,144 +0,0 @@
-# Coordinate Mode Diagnosis - Live Test Results
-## Test Scenario: PeopleHR Employee Information Flow
-### ✅ What Worked:
-1. **Coordinate fallback DID activate** (after fix from >= 3 to >= 5)
-2. **Agent successfully used coordinates** at (87.5%, 23.438%)
-3. **Physical clicks succeeded** - page.mouse.click(1120, 169)
-4. **Agent learned** to stick with coordinates after selectors failed
-### ❌ What Didn't Work:
-**Agent hit max iterations (8) without marking "complete"**
-## Detailed Step 6 Flow:
-```
-Iteration 1: Selector attempt → Timeout ❌
-Iteration 2: Selector attempt → Timeout ❌
-Iteration 3: Selector attempt → Timeout ❌
-Iteration 4: 🎯 COORDINATE MODE → Click (87.5%, 23.438%) → ✅ Success
-Iteration 5: Repeat coordinate → ✅ Success
-Iteration 6: Repeat coordinate → ✅ Success (?)
-Iteration 7: Repeat coordinate → ✅ Success
-Iteration 8: Repeat coordinate → ✅ Success
-Result: ⚠️ Max iterations → system_limit
-```
-## Root Cause Analysis:
-### Problem: **No Goal Verification After Coordinate Success**
-**With selectors:**
-```typescript
-await page.getByRole('button').click();
-// Can verify: await expect(button).toHaveState('pressed')
-// Can check: New elements appeared, URL changed, etc.
-```
-**With coordinates:**
-```typescript
-await page.mouse.click(1120, 169);
-// ❓ Did it work? No element reference!
-// ❓ How to verify? Can't check button state
-// ❓ What changed? Need to inspect DOM/screenshot
-```
-### Why Agent Kept Retrying:
-**Agent's reasoning (iterations 5-8):**
-- "Coordinate click succeeded (executed without error)"
-- "But I don't know if goal was achieved"
-- "Step says 'Click on New' - did the New form open?"
-- "I should try again to be sure..."
-- → **Loops until max iterations**
-## Solutions to Consider:
-### Option 1: **Trust Coordinate Success** (Simple)
-After coordinate click succeeds:
-- Wait 500ms for UI response
-- Mark status="complete" automatically
-- Assume click worked (trust the coordinates)
-```typescript
-if (coordinateAction && coordResult.allSucceeded) {
-  await page.waitForTimeout(500); // Let UI respond
-  return { status: 'complete', reasoning: 'Coordinate click succeeded' };
-}
-```
-**Pros**: Simple, fast
-**Cons**: No verification of actual goal achievement
-### Option 2: **Visual Verification** (Better)
-After coordinate click:
-- Wait 500ms
-- Take screenshot
-- Compare before/after
-- If changed → complete, else → retry with different coords
-```typescript
-const beforeScreenshot = await page.screenshot();
-await page.mouse.click(x, y);
-await page.waitForTimeout(500);
-const afterScreenshot = await page.screenshot();
-if (screenshotsAreDifferent(before, after)) {
-  return { status: 'complete' };
-}
-```
-**Pros**: Validates something changed
-**Cons**: Slower, more LLM calls
-### Option 3: **DOM Change Detection** (Balanced)
-After coordinate click:
-- Capture DOM snapshot before
-- Click coordinates
-- Capture DOM snapshot after
-- If new elements/navigation → complete
-```typescript
-const beforeUrl = page.url();
-const beforeElements = await getEnhancedPageInfo(page);
-await page.mouse.click(x, y);
-await page.waitForTimeout(500);
-const afterUrl = page.url();
-const afterElements = await getEnhancedPageInfo(page);
-if (afterUrl !== beforeUrl || afterElements.count !== beforeElements.count) {
-  return { status: 'complete', reasoning: 'Page state changed after coordinate click' };
-}
-```
-**Pros**: Fast, objective verification
-**Cons**: Might miss subtle changes (modal opens without URL/element count change)
-### Option 4: **Prompt Guidance** (Immediate)
-Update prompt to tell agent:
-"After coordinate click succeeds, mark status='complete' unless you can clearly verify it failed"
-**Pros**: No code changes
-**Cons**: Relies on LLM judgment
-## Recommendation:
-**Hybrid approach:**
-1. **Immediate** (Prompt): Tell agent to trust coordinate success
-2. **Phase 2** (Code): Add DOM change detection for validation
-## Current Status:
-- ✅ Coordinate fallback works technically
-- ✅ Physical clicks succeed
-- ❌ Agent doesn't know when to stop
-- 🔧 Need completion detection logic
-## Test Results Summary:
-**Steps 1-5**: ✅ All completed successfully
-**Step 6**: ⚠️ Coordinates worked but hit max iterations (no completion detection)
-**Overall**: Coordinate mode is functional but needs completion logic

package/plandocs/CREDIT_CALLBACK_ARCHITECTURE.md DELETED Viewed

@@ -1,253 +0,0 @@
-# Credit Usage Callback Architecture
-## Summary
-Added callback-based credit usage reporting to allow server-side integration to update DB directly without axios calls, while client-side continues using axios calls to backend API.
-## Architecture
-### Callback-First Approach
-```typescript
-export interface CreditUsage {
-  credits: number;
-  usageReason: CreditUsageReason;
-  jobId?: string;
-  timestamp: number;
-}
-export type CreditUsageCallback = (usage: CreditUsage) => void | Promise<void>;
-```
-### Behavior
-```typescript
-async reportCreditUsage(credits, usageReason, jobId) {
-  const creditUsage = { credits, usageReason, jobId, timestamp: Date.now() };
-  // 1. If callback provided: Use callback (SERVER-SIDE)
-  if (this.creditUsageCallback) {
-    await this.creditUsageCallback(creditUsage);
-    return {}; // No axios call needed
-  }
-  // 2. No callback but auth configured: Use axios (CLIENT-SIDE)
-  if (this.authConfig) {
-    await axios.post(`${backend}/localagent/insert_credit_usage`, ...);
-    return response.data;
-  }
-  // 3. No callback and no auth: Development mode
-  return {};
-}
-```
-## Usage Scenarios
-### Server-Side (scriptservice)
-```typescript
-import { TestChimpService, CreditUsage, CreditUsageReason } from 'testchimp-runner-core';
-const service = new TestChimpService(
-  fileHandler,
-  undefined, // NO auth config
-  backendUrl,
-  maxWorkers,
-  llmProvider,
-  progressReporter,
-  orchestratorOptions,
-  async (creditUsage: CreditUsage) => {
-    // Update DB directly - NO axios calls
-    await db.insertCreditUsage({
-      credits: creditUsage.credits,
-      usageReason: creditUsage.usageReason,
-      jobId: creditUsage.jobId,
-      timestamp: creditUsage.timestamp
-    });
-  }
-);
-```
-**Result:**
-- ✅ Callback called → DB updated directly
-- ❌ No axios calls (no auth configured)
-- ✅ Full control over DB updates
-### Client-Side (vs-extension, github-action)
-```typescript
-import { TestChimpService } from 'testchimp-runner-core';
-const service = new TestChimpService(
-  fileHandler,
-  authConfig, // Auth configured
-  backendUrl,
-  maxWorkers,
-  llmProvider,
-  progressReporter,
-  orchestratorOptions
-  // NO callback - uses axios
-);
-```
-**Result:**
-- ❌ No callback provided
-- ✅ Axios call made to backend API (because auth configured)
-- ✅ Backend handles DB update
-### Development Mode (local testing)
-```typescript
-const service = new TestChimpService(
-  fileHandler
-  // No auth, no callback
-);
-```
-**Result:**
-- ❌ No callback
-- ❌ No auth
-- ⚠️ Warning logged: "Credit usage not tracked"
-- ✅ Continues without error
-## API
-### Constructor
-```typescript
-new TestChimpService(
-  fileHandler?,
-  authConfig?,
-  backendUrl?,
-  maxWorkers?,
-  llmProvider?,
-  progressReporter?,
-  orchestratorOptions?,
-  creditUsageCallback? // NEW
-)
-```
-### Method
-```typescript
-service.setCreditUsageCallback((creditUsage) => {
-  // Handle credit usage in your system
-  console.log(`Used ${creditUsage.credits} credits for ${creditUsage.usageReason}`);
-});
-```
-## Exported Types
-```typescript
-export {
-  CreditUsageCallback,     // Type for callback function
-  CreditUsage,             // Interface for usage data
-  CreditUsageReason        // Enum: SCRIPT_GENERATE, TEST_REPAIR, etc.
-};
-```
-## Benefits
-### For Server-Side
-1. **No Network Calls** - Direct DB updates via callback
-2. **Full Control** - Custom logic for credit tracking
-3. **Performance** - No HTTP round-trip overhead
-4. **Reliability** - No network failures
-### For Client-Side
-1. **Backward Compatible** - Existing code works unchanged
-2. **Centralized** - Backend API handles all credit logic
-3. **Simple** - Just configure auth, no callback needed
-### For Both
-1. **Flexible** - Each consumer decides how to handle credits
-2. **Testable** - Can mock callbacks easily
-3. **Observable** - Callback provides visibility into credit usage
-## Implementation Notes
-### Preservation Across Service Recreations
-Credit callback is stored and reapplied when services are recreated:
-```typescript
-// Store in TestChimpService
-private creditUsageCallback?: CreditUsageCallback;
-// Pass to CreditUsageService on every recreation
-this.creditUsageService = new CreditUsageService(
-  this.authConfig,
-  this.backendUrl,
-  this.creditUsageCallback  // Always preserved
-);
-```
-### Error Handling
-**Callback-based (server-side):**
-- Callback error → Throws (critical for DB updates)
-**Axios-based (client-side):**
-- Axios error → Throws (critical for credit tracking)
-**Development mode:**
-- No tracking → Logs warning, continues
-## Example: Server-Side Integration
-```typescript
-// In scriptservice
-import { TestChimpService, CreditUsage, CreditUsageReason } from 'testchimp-runner-core';
-class ScriptService {
-  private testChimpService: TestChimpService;
-  constructor() {
-    this.testChimpService = new TestChimpService(
-      new CustomFileHandler(),
-      undefined, // No auth - server-side doesn't need it
-      'http://localhost:3000', // Internal backend URL
-      5,
-      new CustomLLMProvider(), // Server has its own LLM provider
-      customProgressReporter,
-      { useOrchestrator: true },
-      async (creditUsage: CreditUsage) => {
-        // Direct DB update - no HTTP calls
-        await this.creditRepository.insert({
-          userId: this.getCurrentUserId(),
-          credits: creditUsage.credits,
-          reason: creditUsage.usageReason,
-          jobId: creditUsage.jobId,
-          timestamp: new Date(creditUsage.timestamp)
-        });
-      }
-    );
-  }
-}
-```
-## Files Modified
-1. `/src/credit-usage-service.ts` - Added callback support, callback-first logic
-2. `/src/index.ts` - Accept credit callback in constructor, expose `setCreditUsageCallback()`, preserve across recreations
-3. Exported types: `CreditUsage`, `CreditUsageCallback`, `CreditUsageReason`
-## Testing
-### Server-Side
-1. Set credit callback
-2. Generate script
-3. Verify callback called with correct credit data
-4. Verify NO axios calls made
-### Client-Side
-1. Configure auth (no callback)
-2. Generate script
-3. Verify axios call made to backend
-4. Verify backend receives credit data
-## Backward Compatibility
-✅ Fully backward compatible
-- Existing consumers work unchanged
-- Optional callback parameter
-- Existing axios behavior preserved for client-side