@minded-ai/mindedjs 2.0.6 → 2.0.7-beta-2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (41) hide show
  1. package/dist/browserTask/README.md +419 -0
  2. package/dist/browserTask/browserAgent.py +632 -0
  3. package/dist/browserTask/captcha_isolated.png +0 -0
  4. package/dist/browserTask/executeBrowserTask.ts +79 -0
  5. package/dist/browserTask/requirements.txt +8 -0
  6. package/dist/browserTask/setup.sh +144 -0
  7. package/dist/cli/index.js +0 -0
  8. package/dist/index.d.ts +1 -1
  9. package/dist/index.d.ts.map +1 -1
  10. package/dist/index.js.map +1 -1
  11. package/dist/internalTools/retell.d.ts +12 -0
  12. package/dist/internalTools/retell.d.ts.map +1 -0
  13. package/dist/internalTools/retell.js +54 -0
  14. package/dist/internalTools/retell.js.map +1 -0
  15. package/dist/internalTools/sendPlaceholderMessage.d.ts +14 -0
  16. package/dist/internalTools/sendPlaceholderMessage.d.ts.map +1 -0
  17. package/dist/internalTools/sendPlaceholderMessage.js +61 -0
  18. package/dist/internalTools/sendPlaceholderMessage.js.map +1 -0
  19. package/dist/nodes/addRpaNode.d.ts +16 -0
  20. package/dist/nodes/addRpaNode.d.ts.map +1 -0
  21. package/dist/nodes/addRpaNode.js +177 -0
  22. package/dist/nodes/addRpaNode.js.map +1 -0
  23. package/dist/nodes/nodeFactory.d.ts.map +1 -1
  24. package/dist/nodes/nodeFactory.js +4 -0
  25. package/dist/nodes/nodeFactory.js.map +1 -1
  26. package/dist/platform/config.js +1 -1
  27. package/dist/platform/config.js.map +1 -1
  28. package/dist/types/Flows.types.d.ts +35 -2
  29. package/dist/types/Flows.types.d.ts.map +1 -1
  30. package/dist/types/Flows.types.js +13 -1
  31. package/dist/types/Flows.types.js.map +1 -1
  32. package/dist/utils/extractStateMemoryResponse.d.ts +5 -0
  33. package/dist/utils/extractStateMemoryResponse.d.ts.map +1 -0
  34. package/dist/utils/extractStateMemoryResponse.js +91 -0
  35. package/dist/utils/extractStateMemoryResponse.js.map +1 -0
  36. package/package.json +2 -1
  37. package/src/index.ts +3 -0
  38. package/src/nodes/addRpaNode.ts +205 -0
  39. package/src/nodes/nodeFactory.ts +4 -0
  40. package/src/platform/config.ts +1 -1
  41. package/src/types/Flows.types.ts +37 -1
@@ -0,0 +1,419 @@
1
+ # Browser Agent with CAPTCHA Bypass
2
+
3
+ This implementation replaces the CLI-based browser-use approach with the Python SDK, providing enhanced capabilities including automatic CAPTCHA detection and bypass functionality.
4
+
5
+ ## 🚀 Features
6
+
7
+ - **Python SDK Integration**: Uses browser-use Python SDK instead of CLI for better control
8
+ - **CAPTCHA Bypass**: Automatic detection and solving of text-based CAPTCHAs
9
+ - **Lifecycle Hooks**: Leverages browser-use lifecycle hooks for seamless operation
10
+ - **Advanced OCR**: Uses OpenCV and Tesseract for robust CAPTCHA text extraction
11
+ - **AI-Powered Solving**: Framework ready for LLM-based CAPTCHA solving (GPT-4V)
12
+ - **Error Handling**: Comprehensive error handling and logging
13
+ - **JSON Output**: Structured output format for better integration
14
+
15
+ ## 📁 File Structure
16
+
17
+ ```
18
+ browserTask/
19
+ ├── browserAgent.py # Main Python script with CAPTCHA bypass
20
+ ├── executeBrowserTask.ts # Updated TypeScript wrapper
21
+ ├── requirements.txt # Python dependencies
22
+ ├── setup.sh # Setup script for dependencies
23
+ └── README.md # This documentation
24
+ ```
25
+
26
+ ## 🛠️ Setup
27
+
28
+ ### 1. Run the Setup Script
29
+
30
+ The easiest way to get started is to run the setup script:
31
+
32
+ ```bash
33
+ cd mindedjs/src/browserTask
34
+ chmod +x setup.sh
35
+ ./setup.sh
36
+ ```
37
+
38
+ This will:
39
+
40
+ - Check Python 3.8+ installation
41
+ - Install Python dependencies
42
+ - Install Playwright browsers
43
+ - Install Tesseract OCR
44
+ - Make scripts executable
45
+
46
+ ### 2. Manual Setup (Alternative)
47
+
48
+ If you prefer manual setup:
49
+
50
+ ```bash
51
+ # Install uv (if not already installed)
52
+ curl -LsSf https://astral.sh/uv/install.sh | sh
53
+
54
+ # Create virtual environment
55
+ uv venv --python 3.11
56
+
57
+ # Activate virtual environment
58
+ source .venv/bin/activate
59
+
60
+ # Install Python dependencies
61
+ uv pip install -r requirements.txt
62
+
63
+ # Install Playwright browsers
64
+ uv run playwright install
65
+
66
+ # Install Tesseract OCR
67
+ # macOS:
68
+ brew install tesseract
69
+
70
+ # Ubuntu/Debian:
71
+ sudo apt-get install tesseract-ocr tesseract-ocr-eng
72
+
73
+ # CentOS/RHEL:
74
+ sudo yum install tesseract tesseract-langpack-eng
75
+ ```
76
+
77
+ ### 3. Configure API Keys
78
+
79
+ Set your OpenAI API key (required for LLM functionality):
80
+
81
+ ```bash
82
+ export OPENAI_API_KEY='your-api-key-here'
83
+ ```
84
+
85
+ ## 🎯 Usage
86
+
87
+ ### From TypeScript (MindedJS)
88
+
89
+ The `executeBrowserTask` function now uses the Python implementation automatically:
90
+
91
+ ```typescript
92
+ import { executeBrowserTask } from './browserTask/executeBrowserTask';
93
+
94
+ const result = await executeBrowserTask('Navigate to example.com and fill out the contact form');
95
+ console.log(result);
96
+ ```
97
+
98
+ ### Direct Python Usage
99
+
100
+ You can also run the Python script directly:
101
+
102
+ ```bash
103
+ # Activate virtual environment first
104
+ source .venv/bin/activate
105
+
106
+ # Basic usage
107
+ python browserAgent.py -p "Navigate to google.com and search for AI"
108
+ # Or with uv: uv run python browserAgent.py -p "Navigate to google.com and search for AI"
109
+
110
+ # Form with CAPTCHA - the agent will automatically handle the CAPTCHA
111
+ python browserAgent.py -p "Go to contact-form.com, fill out name as 'John Doe', email as 'john@example.com', message as 'Hello World', and submit the form"
112
+
113
+ # Registration with CAPTCHA
114
+ python browserAgent.py -p "Register on website.com with username 'testuser', email 'test@example.com', password 'securepass123', and complete the registration"
115
+
116
+ # With custom max steps
117
+ python browserAgent.py -p "Complete a multi-step process" --max-steps 50
118
+ ```
119
+
120
+ **Note**: In the above examples, if the forms have CAPTCHAs, they will be automatically detected, solved, and filled. The agent will then continue with filling the other fields and submitting as instructed.
121
+
122
+ ## 🔒 CAPTCHA Bypass Capabilities
123
+
124
+ ### Integration with Browser-Use
125
+
126
+ The CAPTCHA bypass is designed to work seamlessly with browser-use agents. When a CAPTCHA is detected:
127
+
128
+ 1. **CAPTCHA Detection**: Automatic detection on each page load
129
+ 2. **Silent Solving**: CAPTCHA is solved and filled in the background
130
+ 3. **Task Continuation**: Browser-use continues with your original prompt
131
+ 4. **No Interruption**: The user's task flow remains uninterrupted
132
+
133
+ **Perfect for forms like:**
134
+
135
+ - Registration forms with name + email + CAPTCHA
136
+ - Contact forms with message + CAPTCHA
137
+ - Login forms with credentials + CAPTCHA
138
+ - Multi-step forms with CAPTCHA verification
139
+
140
+ ### Supported CAPTCHA Types
141
+
142
+ 1. **Text CAPTCHAs**: Alphanumeric text challenges
143
+ 2. **Simple Math**: Basic arithmetic problems
144
+ 3. **Image-based**: Framework ready for image selection CAPTCHAs
145
+
146
+ ### How It Works
147
+
148
+ 1. **Automatic Detection**: Lifecycle hooks monitor each page for CAPTCHA indicators
149
+ 2. **Image Processing**: OpenCV extracts and preprocesses CAPTCHA images
150
+ 3. **OCR Recognition**: Tesseract performs text recognition
151
+ 4. **Smart Input**: Automatically finds and fills CAPTCHA input fields
152
+ 5. **Continue Task**: Lets browser-use continue with original instructions (other fields, submission, etc.)
153
+
154
+ ### CAPTCHA Detection Indicators
155
+
156
+ The system looks for these common CAPTCHA indicators:
157
+
158
+ - `captcha`
159
+ - `recaptcha`
160
+ - `hcaptcha`
161
+ - `verification`
162
+ - `security check`
163
+ - `prove you are human`
164
+
165
+ ## 🔧 Configuration
166
+
167
+ ### Environment Variables
168
+
169
+ ```bash
170
+ # Required
171
+ OPENAI_API_KEY=your-openai-api-key
172
+
173
+ # Optional
174
+ PYTHONPATH=/path/to/browserTask # Set automatically by TypeScript wrapper
175
+ ```
176
+
177
+ ### Python Script Arguments
178
+
179
+ ```bash
180
+ python3 browserAgent.py --help
181
+ ```
182
+
183
+ - `-p, --prompt`: Task prompt (required)
184
+ - `--max-steps`: Maximum number of steps (default: 30)
185
+ - `--output-format`: Output format (text|json, default: text)
186
+
187
+ ## 🏗️ Architecture
188
+
189
+ ### Lifecycle Hooks Integration
190
+
191
+ The implementation uses browser-use lifecycle hooks for seamless CAPTCHA handling:
192
+
193
+ ```python
194
+ async def captcha_detection_hook(agent):
195
+ """Runs before each step to check for CAPTCHAs"""
196
+ page = await agent.browser_session.get_current_page()
197
+
198
+ # 1. Check for CAPTCHA indicators on the page
199
+ # 2. If found, solve and fill the CAPTCHA field
200
+ # 3. Let browser-use continue with the original task
201
+ # 4. No interruption to the user's workflow
202
+ ```
203
+
204
+ **Flow Example:**
205
+
206
+ ```
207
+ User Task: "Fill out the contact form with my details and submit"
208
+
209
+ Step 1: Navigate to contact page
210
+ Step 2: [Hook detects CAPTCHA] → Solve CAPTCHA → Fill CAPTCHA field
211
+ Step 3: Fill name field (as per user instruction)
212
+ Step 4: Fill email field (as per user instruction)
213
+ Step 5: Fill message field (as per user instruction)
214
+ Step 6: Click submit button (as per user instruction)
215
+ ```
216
+
217
+ ### CAPTCHA Bypass Flow
218
+
219
+ ```mermaid
220
+ graph TD
221
+ A[Page Load] --> B[Lifecycle Hook Triggered]
222
+ B --> C{CAPTCHA Detected?}
223
+ C -->|No| D[Continue Normal Flow]
224
+ C -->|Yes| E[Take Screenshot]
225
+ E --> F[Extract CAPTCHA Region]
226
+ F --> G[OCR Text Recognition]
227
+ G --> H{Text Extracted?}
228
+ H -->|Yes| I[Fill CAPTCHA Field]
229
+ H -->|No| J[Try AI Solving]
230
+ J --> I
231
+ I --> K[Browser-use Continues Task]
232
+ ```
233
+
234
+ ### Tool Architecture
235
+
236
+ ```python
237
+ class CaptchaBypass(Tool):
238
+ """Advanced CAPTCHA detection and solving tool"""
239
+
240
+ async def use(self, page, input_text="") -> ActionResult:
241
+ # Main entry point for CAPTCHA solving
242
+
243
+ async def _detect_and_solve_captcha(self, cv_img, pil_img):
244
+ # OpenCV + OCR processing
245
+
246
+ async def _ai_solve_captcha(self, screenshot):
247
+ # AI-powered solving (extensible)
248
+
249
+ async def _fill_captcha_solution(self, page, solution):
250
+ # Find inputs and fill solution (no submission)
251
+ ```
252
+
253
+ ## 🧪 Testing
254
+
255
+ ### Test Basic Functionality
256
+
257
+ ```bash
258
+ # Activate virtual environment first
259
+ source .venv/bin/activate
260
+
261
+ # Test simple navigation
262
+ python browserAgent.py -p "Go to google.com"
263
+
264
+ # Test with potential CAPTCHA site
265
+ python browserAgent.py -p "Navigate to a form and fill it out"
266
+ ```
267
+
268
+ ### Test CAPTCHA Bypass
269
+
270
+ You can test the CAPTCHA bypass by:
271
+
272
+ 1. Finding a site with text CAPTCHAs
273
+ 2. Running the agent on that site
274
+ 3. Monitoring logs for CAPTCHA detection and solving
275
+
276
+ ### Debug Mode
277
+
278
+ Enable detailed logging by modifying the Python script:
279
+
280
+ ```python
281
+ logging.basicConfig(level=logging.DEBUG) # Change from INFO to DEBUG
282
+ ```
283
+
284
+ ## 🔍 Troubleshooting
285
+
286
+ ### Common Issues
287
+
288
+ 1. **Python Dependencies**
289
+
290
+ ```bash
291
+ # With virtual environment activated
292
+ uv pip install --upgrade browser-use opencv-python pytesseract pillow
293
+ # Or: pip install --upgrade browser-use opencv-python pytesseract pillow
294
+ ```
295
+
296
+ 2. **Tesseract Not Found**
297
+
298
+ ```bash
299
+ # Verify installation
300
+ tesseract --version
301
+
302
+ # Add to PATH if needed
303
+ export PATH="/usr/local/bin:$PATH"
304
+ ```
305
+
306
+ 3. **Playwright Browser Issues**
307
+
308
+ ```bash
309
+ uv run playwright install --with-deps
310
+ # Or with activated venv: playwright install --with-deps
311
+ ```
312
+
313
+ 4. **Permission Issues**
314
+ ```bash
315
+ chmod +x browserAgent.py
316
+ chmod +x setup.sh
317
+ ```
318
+
319
+ ### Debugging CAPTCHA Issues
320
+
321
+ 1. Check `captcha_isolated.png` for extracted CAPTCHA images
322
+ 2. Review logs for OCR results
323
+ 3. Verify Tesseract language packs: `tesseract --list-langs`
324
+
325
+ ## 🚀 Extending the Implementation
326
+
327
+ ### Adding New CAPTCHA Types
328
+
329
+ ```python
330
+ # Extend the CaptchaBypass class
331
+ class AdvancedCaptchaBypass(CaptchaBypass):
332
+ async def _solve_image_captcha(self, page):
333
+ # Implement image selection logic
334
+ pass
335
+
336
+ async def _solve_audio_captcha(self, page):
337
+ # Implement audio CAPTCHA solving
338
+ pass
339
+ ```
340
+
341
+ ### Custom Lifecycle Hooks
342
+
343
+ ```python
344
+ async def custom_hook(agent):
345
+ """Add custom logic to lifecycle hooks"""
346
+ page = await agent.browser_session.get_current_page()
347
+ # Your custom logic here
348
+
349
+ # Use in agent
350
+ agent.run(on_step_start=custom_hook)
351
+ ```
352
+
353
+ ### AI-Powered CAPTCHA Solving
354
+
355
+ The framework is ready for AI-powered CAPTCHA solving. Implement the `_ai_solve_captcha` method:
356
+
357
+ ```python
358
+ async def _ai_solve_captcha(self, screenshot: bytes) -> Dict[str, Any]:
359
+ # Convert screenshot to base64
360
+ screenshot_b64 = base64.b64encode(screenshot).decode('utf-8')
361
+
362
+ # Send to GPT-4V or similar vision model
363
+ response = await self.llm.ainvoke([
364
+ {"type": "text", "text": "Solve this CAPTCHA:"},
365
+ {"type": "image_url", "image_url": {"url": f"data:image/png;base64,{screenshot_b64}"}}
366
+ ])
367
+
368
+ return {"success": True, "solution": response.content}
369
+ ```
370
+
371
+ ## 📝 Migration from CLI
372
+
373
+ The new implementation maintains the same interface as the original CLI version:
374
+
375
+ ### Before (CLI)
376
+
377
+ ```typescript
378
+ spawn('browser-use', ['-p', prompt]);
379
+ ```
380
+
381
+ ### After (Python SDK)
382
+
383
+ ```typescript
384
+ spawn('python3', [pythonScriptPath, '-p', prompt, '--output-format', 'json']);
385
+ ```
386
+
387
+ ### Benefits of Migration
388
+
389
+ 1. **Better Control**: Direct access to browser-use APIs
390
+ 2. **CAPTCHA Bypass**: Automatic CAPTCHA handling
391
+ 3. **Lifecycle Hooks**: Fine-grained control over browser behavior
392
+ 4. **Error Handling**: Structured error reporting
393
+ 5. **Extensibility**: Easy to add new features
394
+ 6. **Performance**: Reduced overhead compared to CLI spawning
395
+
396
+ ## 📄 License
397
+
398
+ This implementation follows the same license as the parent MindedJS project.
399
+
400
+ ## 🤝 Contributing
401
+
402
+ When contributing to the browser agent:
403
+
404
+ 1. Test with multiple CAPTCHA types
405
+ 2. Ensure proper error handling
406
+ 3. Add logging for debugging
407
+ 4. Update documentation
408
+ 5. Follow the existing code style
409
+
410
+ ## 🆘 Support
411
+
412
+ For issues and questions:
413
+
414
+ 1. Check the troubleshooting section
415
+ 2. Review logs for error details
416
+ 3. Test with the simplest possible case
417
+ 4. Verify all dependencies are installed
418
+
419
+ The browser agent with CAPTCHA bypass provides a robust foundation for automated web interactions while handling common anti-bot measures.