testchimp-runner-core 0.0.35 → 0.0.36

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (71) hide show
  1. package/package.json +6 -1
  2. package/plandocs/BEFORE_AFTER_VERIFICATION.md +0 -148
  3. package/plandocs/COORDINATE_MODE_DIAGNOSIS.md +0 -144
  4. package/plandocs/CREDIT_CALLBACK_ARCHITECTURE.md +0 -253
  5. package/plandocs/HUMAN_LIKE_IMPROVEMENTS.md +0 -642
  6. package/plandocs/IMPLEMENTATION_STATUS.md +0 -108
  7. package/plandocs/INTEGRATION_COMPLETE.md +0 -322
  8. package/plandocs/MULTI_AGENT_ARCHITECTURE_REVIEW.md +0 -844
  9. package/plandocs/ORCHESTRATOR_MVP_SUMMARY.md +0 -539
  10. package/plandocs/PHASE1_ABSTRACTION_COMPLETE.md +0 -241
  11. package/plandocs/PHASE1_FINAL_STATUS.md +0 -210
  12. package/plandocs/PHASE_1_COMPLETE.md +0 -165
  13. package/plandocs/PHASE_1_SUMMARY.md +0 -184
  14. package/plandocs/PLANNING_SESSION_SUMMARY.md +0 -372
  15. package/plandocs/PROMPT_OPTIMIZATION_ANALYSIS.md +0 -120
  16. package/plandocs/PROMPT_SANITY_CHECK.md +0 -120
  17. package/plandocs/SCRIPT_CLEANUP_FEATURE.md +0 -201
  18. package/plandocs/SCRIPT_GENERATION_ARCHITECTURE.md +0 -364
  19. package/plandocs/SELECTOR_IMPROVEMENTS.md +0 -139
  20. package/plandocs/SESSION_SUMMARY_v0.0.33.md +0 -151
  21. package/plandocs/TROUBLESHOOTING_SESSION.md +0 -72
  22. package/plandocs/VISION_DIAGNOSTICS_IMPROVEMENTS.md +0 -336
  23. package/plandocs/VISUAL_AGENT_EVOLUTION_PLAN.md +0 -396
  24. package/plandocs/WHATS_NEW_v0.0.33.md +0 -183
  25. package/plandocs/exploratory-mode-support-v2.plan.md +0 -953
  26. package/plandocs/exploratory-mode-support.plan.md +0 -928
  27. package/plandocs/journey-id-tracking-addendum.md +0 -227
  28. package/releasenotes/RELEASE_0.0.26.md +0 -165
  29. package/releasenotes/RELEASE_0.0.27.md +0 -236
  30. package/releasenotes/RELEASE_0.0.28.md +0 -286
  31. package/src/auth-config.ts +0 -84
  32. package/src/credit-usage-service.ts +0 -188
  33. package/src/env-loader.ts +0 -103
  34. package/src/execution-service.ts +0 -996
  35. package/src/file-handler.ts +0 -104
  36. package/src/index.ts +0 -432
  37. package/src/llm-facade.ts +0 -821
  38. package/src/llm-provider.ts +0 -53
  39. package/src/model-constants.ts +0 -35
  40. package/src/orchestrator/decision-parser.ts +0 -139
  41. package/src/orchestrator/index.ts +0 -58
  42. package/src/orchestrator/orchestrator-agent.ts +0 -1282
  43. package/src/orchestrator/orchestrator-prompts.ts +0 -786
  44. package/src/orchestrator/page-som-handler.ts +0 -1565
  45. package/src/orchestrator/som-types.ts +0 -188
  46. package/src/orchestrator/tool-registry.ts +0 -184
  47. package/src/orchestrator/tools/check-page-ready.ts +0 -75
  48. package/src/orchestrator/tools/extract-data.ts +0 -92
  49. package/src/orchestrator/tools/index.ts +0 -15
  50. package/src/orchestrator/tools/inspect-page.ts +0 -42
  51. package/src/orchestrator/tools/recall-history.ts +0 -72
  52. package/src/orchestrator/tools/refresh-som-markers.ts +0 -69
  53. package/src/orchestrator/tools/take-screenshot.ts +0 -128
  54. package/src/orchestrator/tools/verify-action-result.ts +0 -159
  55. package/src/orchestrator/tools/view-previous-screenshot.ts +0 -103
  56. package/src/orchestrator/types.ts +0 -291
  57. package/src/playwright-mcp-service.ts +0 -224
  58. package/src/progress-reporter.ts +0 -144
  59. package/src/prompts.ts +0 -842
  60. package/src/providers/backend-proxy-llm-provider.ts +0 -91
  61. package/src/providers/local-llm-provider.ts +0 -38
  62. package/src/scenario-service.ts +0 -252
  63. package/src/scenario-worker-class.ts +0 -1110
  64. package/src/script-utils.ts +0 -203
  65. package/src/types.ts +0 -239
  66. package/src/utils/browser-utils.ts +0 -348
  67. package/src/utils/coordinate-converter.ts +0 -162
  68. package/src/utils/page-info-retry.ts +0 -65
  69. package/src/utils/page-info-utils.ts +0 -285
  70. package/testchimp-runner-core-0.0.35.tgz +0 -0
  71. package/tsconfig.json +0 -19
@@ -1,539 +0,0 @@
1
- # Orchestrator Agent MVP: Final Specification
2
-
3
- ## What We're Building
4
-
5
- **Single orchestrator agent** that replaces the current reactive sub-action loop with a proactive, tool-using, memory-maintaining system.
6
-
7
- ---
8
-
9
- ## Core Features (All in MVP)
10
-
11
- ### 1. Always-Provided Context (Auto-Fetched Each Iteration)
12
-
13
- ```typescript
14
- {
15
- // WHERE AM I? (Journey tracking)
16
- overallGoal: "Full scenario text",
17
- currentStepGoal: "Login with alice@example.com, TestPass123",
18
- stepNumber: 3,
19
- totalSteps: 6,
20
- completedSteps: ["Go to URL", "Click login"],
21
- remainingSteps: ["Messages", "Send message", "Verify"],
22
-
23
- // WHAT DO I SEE? (Current state - fresh)
24
- currentPageInfo: getEnhancedPageInfo(page), // ARIA + IDs + data attrs (compact!)
25
- currentURL: page.url(),
26
-
27
- // WHAT DID I JUST DO? (Recent memory ~6-7 steps - TEXT ONLY, NO SCREENSHOTS)
28
- recentSteps: [
29
- {
30
- stepNumber: 2,
31
- action: "Navigated to login page",
32
- code: "await page.goto('https://...')",
33
- result: "success",
34
- observation: "Login form visible with email and password fields"
35
- },
36
- {
37
- stepNumber: 2,
38
- action: "Explored menu icon (hover)",
39
- code: "explore_element({action: 'hover', selector: 'nav button:nth-child(2)'})",
40
- result: "success",
41
- observation: "Tooltip appeared showing 'Dashboard', confirmed this is the Dashboard button"
42
- }
43
- ],
44
-
45
- // WHAT HAVE I LEARNED? (Experiences)
46
- experiences: [
47
- "Site uses #id selectors for forms",
48
- "Login redirects to /dashboard"
49
- ],
50
-
51
- // WHAT DO I KNOW? (Extracted data)
52
- extractedData: { userEmail: "alice@example.com" },
53
-
54
- // WHAT DID I TELL MYSELF? (Self-reflection from previous iteration)
55
- previousIterationGuidance: {
56
- guidanceForNext: "Check for redirect after submit",
57
- detectingLoop: false
58
- }
59
- }
60
- ```
61
-
62
- **Optimized for complex pages:**
63
- - DOM: Pre-truncated with increased limits
64
- - Interactive elements: top 50
65
- - IDs: top 50
66
- - Data attributes: top 50
67
- - Form fields: top 20
68
- - Page structure: top 10
69
- - General elements: top 50
70
- - Text: 30 chars max
71
- - Result: ~800-1,500 tokens
72
- - Steps: Recent 6-7 only
73
- - Experiences: Cap at 20
74
-
75
- ### 2. Optional Tools (Agent Requests)
76
-
77
- **8 core tools:**
78
-
79
- **Information Gathering:**
80
- 1. **take_screenshot({isFullPage})** - Visual context (use freely)
81
- 2. **recall_history({maxSteps, query})** - Deeper history
82
- 3. **inspect_page()** - Get DOM (might be redundant since always-provided, but keeps extensibility)
83
- 4. **check_page_ready()** - Verify page loaded
84
-
85
- **Data Management:**
86
- 5. **extract_data({selector, dataName})** - Save data for later
87
-
88
- **Recovery Tools (Self-Unstuck):**
89
- 6. **navigate_back()** - Go back in browser history (if exploratory action had side effects)
90
- 7. **refresh_page()** - Reload current page (if page in bad state)
91
- 8. **navigate_to_url({url})** - Navigate to specific URL (validate it's within allowed domain)
92
-
93
- **Inquisitive Exploration (Phase 2):**
94
- 9. **explore_element({action, selector, purpose})** - Investigate ambiguous elements
95
- - Actions: "hover", "click_info", "click_menu", "focus"
96
- - Non-consequential only (no form submit, delete, etc.)
97
- - Returns: {success, screenshotTaken, observation}
98
-
99
- **Extensible**: Add new tools → auto-appear in prompt
100
-
101
- **Recovery scenarios:**
102
- - Agent realizes it navigated away from domain → `navigate_back()` or `navigate_to_url({baseUrl})`
103
- - Page got stuck/unresponsive → `refresh_page()`
104
- - Exploratory action opened modal/overlay → `navigate_back()` or refresh
105
- - Lost context after redirect → `navigate_to_url()` to known good state
106
-
107
- ### 3. Agent Decision (What Agent Outputs)
108
-
109
- ```typescript
110
- {
111
- // 1. Tool requests (multiple allowed)
112
- toolCalls: [
113
- {name: "take_screenshot", params: {isFullPage: false}},
114
- {name: "recall_history", params: {maxSteps: 5}}
115
- ],
116
- toolReasoning: "Need visual + history to understand pattern",
117
- needsToolResults: true, // Wait for tools before commands
118
-
119
- // 2. Command batch (executed sequentially)
120
- commands: [
121
- "await page.fill('#email', 'alice@example.com')",
122
- "await page.fill('#password', 'TestPass123')",
123
- "await page.click('button[type=\"submit\"]')"
124
- ],
125
- commandReasoning: "Batch entire login flow",
126
-
127
- // 3. Self-reflection (FREE-FORM guidance to next iteration)
128
- selfReflection: {
129
- guidanceForNext: "After submit, check URL changed to /dashboard",
130
- detectingLoop: false, // Agent signals if repeating same approach
131
- loopReasoning: null
132
- },
133
-
134
- // 4. Learnings (stored in experiences)
135
- experiences: [
136
- "Forms use #id selectors consistently",
137
- "Login redirects immediately after submit"
138
- ],
139
-
140
- // 5. Memory update (what to store)
141
- memoryUpdate: {
142
- action: "Filled login form and submitted",
143
- observation: "Page redirected to dashboard",
144
- extractedData: {userEmail: "alice@example.com"}
145
- },
146
-
147
- // 6. Termination decision
148
- status: "complete", // or "stuck" | "infeasible" | "continue"
149
- statusReasoning: "Login completed, dashboard visible",
150
- reasoning: "Overall iteration reasoning"
151
- }
152
- ```
153
-
154
- ### 4. Execution: Sequential with Early Stop
155
-
156
- **Agent plans batch, system executes sequentially:**
157
-
158
- ```
159
- Agent: commands = [cmd1, cmd2, cmd3]
160
-
161
- Execute:
162
- cmd1 → SUCCESS ✓ (record in history)
163
- cmd2 → SUCCESS ✓ (record in history)
164
- cmd3 → FAIL ✗ (record in history, STOP)
165
-
166
- Result: 2/3 succeeded, accurately tracked
167
- ```
168
-
169
- ### 5. Comprehensive Logging
170
-
171
- **Every iteration logs:**
172
- ```
173
- [Orchestrator] === Iteration 2/8 ===
174
- [Orchestrator] 🎯 Current Goal: Login with alice@example.com, TestPass123
175
- [Orchestrator] 📍 Progress: Step 2/6
176
- [Orchestrator] 💭 Reasoning: Form fields located, batching fill operations
177
- [Orchestrator] 🧠 Previous Guidance: Check for redirect after submit
178
- [Orchestrator] 🔧 Tools: [take_screenshot (viewport)]
179
- [Orchestrator] 📋 Tool Reasoning: Visual check for login button state
180
- [Orchestrator] ✓ Tools executed
181
- [Orchestrator] 📝 Commands (3): fill email, fill password, click submit
182
- [Orchestrator] 💡 Batch Reasoning: Can execute entire login in one go
183
- [Orchestrator] ▶ Executing sequentially:
184
- [Orchestrator] ✓ [1/3] await page.fill('#email', 'alice@example.com')
185
- [Orchestrator] ✓ [2/3] await page.fill('#password', 'TestPass123')
186
- [Orchestrator] ✓ [3/3] await page.click('button[type="submit"]')
187
- [Orchestrator] 📚 Experiences: Site uses #id for forms, Login redirects to /dashboard
188
- [Orchestrator] 🧠 Next Guidance: Verify dashboard loaded, check for user menu
189
- [Orchestrator] 🔄 Loop Detection: false
190
- [Orchestrator] 🎯 Status: continue
191
- [Orchestrator] 💭 Status Reasoning: Commands executed, need to verify navigation
192
- ```
193
-
194
- ---
195
-
196
- ## Inquisitive Exploration (Phase 2)
197
-
198
- ### Problem: Ambiguous UI Elements
199
-
200
- **Scenario**: Agent needs to click "Dashboard" but menu items are icon-only (no text, no clear ARIA labels)
201
-
202
- **Solution**: Agent investigates non-consequentially before committing to actions
203
-
204
- ### How It Works
205
-
206
- ```typescript
207
- // ITERATION N - Agent realizes it needs more info:
208
- {
209
- "toolCalls": [
210
- {
211
- "name": "explore_element",
212
- "params": {
213
- "action": "hover",
214
- "selector": "nav button:nth-child(2)",
215
- "purpose": "Check tooltip to see if this is Dashboard"
216
- }
217
- }
218
- ],
219
- "toolReasoning": "Menu items are icons without labels, need to hover to see tooltips",
220
- "needsToolResults": true // Agent waits for tool results before continuing
221
- }
222
-
223
- // System executes tool:
224
- 1. Hover over element
225
- 2. Wait 500ms for tooltip
226
- 3. Take screenshot
227
- 4. Call agent AGAIN with screenshot to analyze it
228
- 5. Agent responds with analysis: "Tooltip shows 'Dashboard' text"
229
- 6. System stores learning in history (TEXT, not screenshot):
230
- {
231
- action: "Explored menu icon (hover)",
232
- code: "explore_element(...)",
233
- result: "success",
234
- observation: "Tooltip appeared showing 'Dashboard', confirmed this is the Dashboard button"
235
- }
236
- 7. Tool returns to original agent call: {success: true, learning: "Tooltip says Dashboard"}
237
-
238
- // SAME ITERATION N - Agent receives tool result (TEXT learning, no screenshot):
239
- {
240
- "toolResults": {
241
- "explore_element": {
242
- "success": true,
243
- "learning": "Tooltip appeared showing 'Dashboard', confirmed this is the Dashboard button"
244
- }
245
- }
246
- }
247
-
248
- // Agent now has confidence to proceed with commands:
249
- {
250
- "commands": ["await page.click('nav button:nth-child(2)')"],
251
- "commandReasoning": "Exploration confirmed this is Dashboard button via tooltip"
252
- }
253
- ```
254
-
255
- **Key: Screenshot analyzed immediately, only TEXT learnings stored/passed forward**
256
-
257
- ### Allowed Exploration Actions
258
-
259
- **Non-consequential only:**
260
-
261
- 1. **hover** - Show tooltips, menus, dropdowns
262
- - Safe: Doesn't change state
263
- - Use: Reveal hidden info
264
-
265
- 2. **click_info** - Click info icons, help buttons
266
- - Safe: Usually opens modal/tooltip
267
- - Use: Get more context
268
- - Risk: Modal might block page (can navigate_back)
269
-
270
- 3. **click_menu** - Click menu headers to reveal items
271
- - Safe: Just expands menu
272
- - Use: See menu options
273
- - Risk: Menu might navigate (rare)
274
-
275
- 4. **focus** - Focus on input to see placeholder/validation
276
- - Safe: Just focuses element
277
- - Use: See input hints
278
-
279
- **NOT allowed:**
280
- - ❌ Submit forms
281
- - ❌ Delete/remove actions
282
- - ❌ Purchase/confirm buttons
283
- - ❌ Logout
284
- - ❌ File uploads
285
- - ❌ Navigation links (unless explicitly exploratory)
286
-
287
- ### Exploration Workflow
288
-
289
- ```
290
- Iteration N:
291
- Agent Decision 1: "Need to explore menu icons"
292
- Tool: explore_element(hover, selector)
293
-
294
- System executes:
295
- → Hover element
296
- → Take screenshot
297
- → Call agent with screenshot: "Analyze this, what do you see?"
298
-
299
- Agent Decision 2 (sub-call): "I see tooltip says 'Dashboard'"
300
-
301
- System stores:
302
- → History entry (TEXT): "Explored button, tooltip shows Dashboard"
303
- → No screenshot stored
304
-
305
- System returns to Agent Decision 1:
306
- → Tool result: {success: true, learning: "Tooltip shows Dashboard"}
307
-
308
- Agent Decision 1 continues: "Great! Now I can click it"
309
- Commands: [click that element]
310
-
311
- System executes commands sequentially
312
- ```
313
-
314
- **Key: Screenshot analyzed IMMEDIATELY within same iteration, only TEXT learning stored**
315
-
316
- **Benefits:**
317
- - No screenshots stored in memory (saves tokens)
318
- - Immediate feedback (no waiting for next iteration)
319
- - Structured learning extraction
320
- - Future iterations only see concise text observations
321
-
322
- ### Guardrails
323
-
324
- ```typescript
325
- interface AgentConfig {
326
- // Per iteration
327
- maxExploratoryActionsPerIteration: 3, // Can explore up to 3 elements per iteration
328
-
329
- // Per step
330
- maxExploratoryActionsPerStep: 10, // Total exploration budget per step
331
-
332
- // Safety
333
- explorationTimeout: 2000, // Max wait for tooltip/menu
334
- allowedExplorationActions: ['hover', 'click_info', 'click_menu', 'focus']
335
- }
336
-
337
- // System enforces:
338
- if (explorationAction.action === 'click' && selector.includes('submit')) {
339
- logger.error('SYSTEM: Exploratory click on submit button blocked');
340
- return {success: false, reason: 'unsafe_action'};
341
- }
342
-
343
- if (explorationCount > config.maxExploratoryActionsPerStep) {
344
- logger.warn('SYSTEM: Exploration budget exhausted');
345
- // Remove explore_element from available tools
346
- }
347
- ```
348
-
349
- ### State Validation
350
-
351
- **Before exploration:**
352
- ```typescript
353
- const beforeState = {
354
- url: page.url(),
355
- modalCount: await page.locator('[role="dialog"]').count()
356
- };
357
- ```
358
-
359
- **After exploration:**
360
- ```typescript
361
- const afterState = {
362
- url: page.url(),
363
- modalCount: await page.locator('[role="dialog"]').count()
364
- };
365
-
366
- // Check for unexpected navigation
367
- if (beforeState.url !== afterState.url) {
368
- logger.warn('Exploration caused navigation, reverting');
369
- await page.goBack();
370
- }
371
-
372
- // Modal opened (might be intended)
373
- if (afterState.modalCount > beforeState.modalCount) {
374
- observation = "Modal opened (may need to close)";
375
- }
376
- ```
377
-
378
- ### When Agent Uses Exploration
379
-
380
- **Agent decides based on:**
381
- - ❓ DOM shows icons without text
382
- - ❓ Multiple similar elements, unclear which is correct
383
- - ❓ Need to see menu contents before deciding
384
- - ❓ Input field needs to show validation rules
385
-
386
- **Example agent reasoning:**
387
- ```
388
- "DOM shows 5 icon buttons without labels. Need to hover over each
389
- to see tooltips and identify which is Dashboard. Will explore
390
- buttons 1-3 this iteration."
391
- ```
392
-
393
- ### Why Phase 2 (Not MVP)
394
-
395
- **Reasons to defer:**
396
- 1. **Safety risk** - Need robust validation to prevent state changes
397
- 2. **Complexity** - Requires screenshot handling, state comparison
398
- 3. **Edge cases** - Modals, overlays, navigation need careful handling
399
- 4. **Testing needed** - Validate on multiple sites before including
400
-
401
- **MVP workaround:**
402
- - Agent uses `take_screenshot()` + DOM analysis
403
- - Makes best guess from available info
404
- - If wrong, retry with different approach
405
- - Less elegant but safer
406
-
407
- **Add in Phase 2 when:**
408
- - MVP validated and working
409
- - Identified common patterns where exploration helps
410
- - State validation logic battle-tested
411
-
412
- ---
413
-
414
- ## Guardrails
415
-
416
- ### System-Enforced (Hard Limits)
417
-
418
- ```typescript
419
- interface AgentConfig {
420
- // Per-step
421
- maxIterationsPerStep: 8,
422
- maxToolCallsPerIteration: 5,
423
- maxCommandsPerIteration: 5,
424
-
425
- // Scenario-wide
426
- maxConsecutiveStepFailures: 2,
427
- maxTotalIterations: 50, // Across all steps
428
-
429
- // Memory
430
- maxExperiences: 20,
431
- maxHistorySize: 100,
432
- recentStepsCount: 7 // How many in always-provided
433
- }
434
-
435
- // System checks BEFORE and AFTER agent call
436
- if (iteration > config.maxIterationsPerStep) {
437
- logger.warn('SYSTEM: Iteration limit reached');
438
- return {success: false, reason: 'system_limit'};
439
- }
440
- ```
441
-
442
- ### Agent Self-Awareness (Soft Signals)
443
-
444
- ```typescript
445
- // Agent can signal issues:
446
- {
447
- "status": "stuck",
448
- "statusReasoning": "Tried 4 different selectors, none work - element likely doesn't exist"
449
- }
450
-
451
- {
452
- "selfReflection": {
453
- "detectingLoop": true,
454
- "loopReasoning": "I've tried text-based selectors 3 times, need completely different approach"
455
- }
456
- }
457
-
458
- // System respects agent signals but also enforces hard limits
459
- ```
460
-
461
- **No screenshot budget** - Agent can use screenshots freely
462
-
463
- ---
464
-
465
- ## MVP Scope (Complete Feature Set)
466
-
467
- ### Included in MVP:
468
- - ✅ OrchestratorAgent with tool-use
469
- - ✅ Dynamic ToolRegistry
470
- - ✅ **8 core tools**:
471
- - Information: take_screenshot, recall_history, inspect_page, check_page_ready
472
- - Data: extract_data
473
- - Recovery: navigate_back, refresh_page, navigate_to_url
474
- - ✅ Journey memory (history + experiences + extracted data)
475
- - ✅ Always-provided context (goal + DOM + recent 7 steps + self-reflection)
476
- - ✅ **Free-form self-reflection** (train of thought continuity)
477
- - ✅ **Agent loop detection** (detectingLoop flag)
478
- - ✅ **Self-recovery** (navigate_back, refresh_page when stuck)
479
- - ✅ Batch command planning (max 3-5)
480
- - ✅ Sequential execution (stop on failure)
481
- - ✅ Experience accumulation
482
- - ✅ Configurable guardrails (per job)
483
- - ✅ **Token usage tracking** (input/output/image)
484
- - ✅ **Comprehensive logging** (all thoughts visible)
485
- - ✅ Works for generation AND repair modes
486
-
487
- ### Excluded (Phase 2):
488
- - ❌ Exploratory actions (explore_element tool - safety concerns)
489
- - ❌ Advanced optimizations (caching, adaptive limits)
490
- - ❌ Memory summarization
491
-
492
- ---
493
-
494
- ## Key Decisions
495
-
496
- 1. ✅ **Self-reflection in MVP** - Valuable for continuity, agent detects own loops
497
- 2. ✅ **No screenshot budget** - Use freely when helpful
498
- 3. ✅ **DOM always-provided** - Already compact via getEnhancedPageInfo
499
- 4. ✅ **Recent 6-7 steps** - Enough context without bloat
500
- 5. ✅ **Agent + system guardrails** - Agent signals, system enforces
501
- 6. ✅ **Sequential batch execution** - Plan together, execute one-by-one
502
- 7. ✅ **Repair mode support** - Script → Agent on failure → Script
503
-
504
- ---
505
-
506
- ## Why This Works
507
-
508
- **Agent maintains train of thought:**
509
- - Iteration 1: "Try #id selectors" → succeeds
510
- - Iteration 2 self-reflection: "IDs worked, continue using them"
511
- - Iteration 3: Uses IDs again → succeeds faster
512
-
513
- **Agent detects spirals:**
514
- - Iteration 1-2-3: Tries text selectors, all fail
515
- - Iteration 4 self-reflection: detectingLoop=true, "Text doesn't work, switching to IDs"
516
- - Breaks own loop before system limit
517
-
518
- **Human-like:**
519
- - Remember what just happened (recent steps)
520
- - Learn patterns (experiences)
521
- - Maintain train of thought (self-reflection)
522
- - Know when stuck (loop detection)
523
- - Use tools when needed (screenshot, history)
524
-
525
- ---
526
-
527
- ## Ready to Implement
528
-
529
- **Full MVP specification complete** with:
530
- - All architecture decisions finalized
531
- - Self-reflection included with loop detection
532
- - No screenshot budgeting
533
- - DOM optimization validated
534
- - Comprehensive logging defined
535
- - Repair mode integration planned
536
- - Guardrails configured
537
-
538
- **Estimated implementation**: 2-3 weeks for complete MVP
539
-