npm - @minded-ai/mindedjs - Versions diffs - 2.0.6 → 2.0.7-beta-2 - Mend

@minded-ai/mindedjs 2.0.6 → 2.0.7-beta-2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (41) hide show

package/dist/browserTask/README.md +419 -0
package/dist/browserTask/browserAgent.py +632 -0
package/dist/browserTask/captcha_isolated.png +0 -0
package/dist/browserTask/executeBrowserTask.ts +79 -0
package/dist/browserTask/requirements.txt +8 -0
package/dist/browserTask/setup.sh +144 -0
package/dist/cli/index.js +0 -0
package/dist/index.d.ts +1 -1
package/dist/index.d.ts.map +1 -1
package/dist/index.js.map +1 -1
package/dist/internalTools/retell.d.ts +12 -0
package/dist/internalTools/retell.d.ts.map +1 -0
package/dist/internalTools/retell.js +54 -0
package/dist/internalTools/retell.js.map +1 -0
package/dist/internalTools/sendPlaceholderMessage.d.ts +14 -0
package/dist/internalTools/sendPlaceholderMessage.d.ts.map +1 -0
package/dist/internalTools/sendPlaceholderMessage.js +61 -0
package/dist/internalTools/sendPlaceholderMessage.js.map +1 -0
package/dist/nodes/addRpaNode.d.ts +16 -0
package/dist/nodes/addRpaNode.d.ts.map +1 -0
package/dist/nodes/addRpaNode.js +177 -0
package/dist/nodes/addRpaNode.js.map +1 -0
package/dist/nodes/nodeFactory.d.ts.map +1 -1
package/dist/nodes/nodeFactory.js +4 -0
package/dist/nodes/nodeFactory.js.map +1 -1
package/dist/platform/config.js +1 -1
package/dist/platform/config.js.map +1 -1
package/dist/types/Flows.types.d.ts +35 -2
package/dist/types/Flows.types.d.ts.map +1 -1
package/dist/types/Flows.types.js +13 -1
package/dist/types/Flows.types.js.map +1 -1
package/dist/utils/extractStateMemoryResponse.d.ts +5 -0
package/dist/utils/extractStateMemoryResponse.d.ts.map +1 -0
package/dist/utils/extractStateMemoryResponse.js +91 -0
package/dist/utils/extractStateMemoryResponse.js.map +1 -0
package/package.json +2 -1
package/src/index.ts +3 -0
package/src/nodes/addRpaNode.ts +205 -0
package/src/nodes/nodeFactory.ts +4 -0
package/src/platform/config.ts +1 -1
package/src/types/Flows.types.ts +37 -1

package/dist/browserTask/README.md ADDED Viewed

@@ -0,0 +1,419 @@
+# Browser Agent with CAPTCHA Bypass
+This implementation replaces the CLI-based browser-use approach with the Python SDK, providing enhanced capabilities including automatic CAPTCHA detection and bypass functionality.
+## 🚀 Features
+- **Python SDK Integration**: Uses browser-use Python SDK instead of CLI for better control
+- **CAPTCHA Bypass**: Automatic detection and solving of text-based CAPTCHAs
+- **Lifecycle Hooks**: Leverages browser-use lifecycle hooks for seamless operation
+- **Advanced OCR**: Uses OpenCV and Tesseract for robust CAPTCHA text extraction
+- **AI-Powered Solving**: Framework ready for LLM-based CAPTCHA solving (GPT-4V)
+- **Error Handling**: Comprehensive error handling and logging
+- **JSON Output**: Structured output format for better integration
+## 📁 File Structure
+```
+browserTask/
+├── browserAgent.py          # Main Python script with CAPTCHA bypass
+├── executeBrowserTask.ts    # Updated TypeScript wrapper
+├── requirements.txt         # Python dependencies
+├── setup.sh                # Setup script for dependencies
+└── README.md               # This documentation
+```
+## 🛠️ Setup
+### 1. Run the Setup Script
+The easiest way to get started is to run the setup script:
+```bash
+cd mindedjs/src/browserTask
+chmod +x setup.sh
+./setup.sh
+```
+This will:
+- Check Python 3.8+ installation
+- Install Python dependencies
+- Install Playwright browsers
+- Install Tesseract OCR
+- Make scripts executable
+### 2. Manual Setup (Alternative)
+If you prefer manual setup:
+```bash
+# Install uv (if not already installed)
+curl -LsSf https://astral.sh/uv/install.sh | sh
+# Create virtual environment
+uv venv --python 3.11
+# Activate virtual environment
+source .venv/bin/activate
+# Install Python dependencies
+uv pip install -r requirements.txt
+# Install Playwright browsers
+uv run playwright install
+# Install Tesseract OCR
+# macOS:
+brew install tesseract
+# Ubuntu/Debian:
+sudo apt-get install tesseract-ocr tesseract-ocr-eng
+# CentOS/RHEL:
+sudo yum install tesseract tesseract-langpack-eng
+```
+### 3. Configure API Keys
+Set your OpenAI API key (required for LLM functionality):
+```bash
+export OPENAI_API_KEY='your-api-key-here'
+```
+## 🎯 Usage
+### From TypeScript (MindedJS)
+The `executeBrowserTask` function now uses the Python implementation automatically:
+```typescript
+import { executeBrowserTask } from './browserTask/executeBrowserTask';
+const result = await executeBrowserTask('Navigate to example.com and fill out the contact form');
+console.log(result);
+```
+### Direct Python Usage
+You can also run the Python script directly:
+```bash
+# Activate virtual environment first
+source .venv/bin/activate
+# Basic usage
+python browserAgent.py -p "Navigate to google.com and search for AI"
+# Or with uv: uv run python browserAgent.py -p "Navigate to google.com and search for AI"
+# Form with CAPTCHA - the agent will automatically handle the CAPTCHA
+python browserAgent.py -p "Go to contact-form.com, fill out name as 'John Doe', email as 'john@example.com', message as 'Hello World', and submit the form"
+# Registration with CAPTCHA
+python browserAgent.py -p "Register on website.com with username 'testuser', email 'test@example.com', password 'securepass123', and complete the registration"
+# With custom max steps
+python browserAgent.py -p "Complete a multi-step process" --max-steps 50
+```
+**Note**: In the above examples, if the forms have CAPTCHAs, they will be automatically detected, solved, and filled. The agent will then continue with filling the other fields and submitting as instructed.
+## 🔒 CAPTCHA Bypass Capabilities
+### Integration with Browser-Use
+The CAPTCHA bypass is designed to work seamlessly with browser-use agents. When a CAPTCHA is detected:
+1. **CAPTCHA Detection**: Automatic detection on each page load
+2. **Silent Solving**: CAPTCHA is solved and filled in the background
+3. **Task Continuation**: Browser-use continues with your original prompt
+4. **No Interruption**: The user's task flow remains uninterrupted
+**Perfect for forms like:**
+- Registration forms with name + email + CAPTCHA
+- Contact forms with message + CAPTCHA
+- Login forms with credentials + CAPTCHA
+- Multi-step forms with CAPTCHA verification
+### Supported CAPTCHA Types
+1. **Text CAPTCHAs**: Alphanumeric text challenges
+2. **Simple Math**: Basic arithmetic problems
+3. **Image-based**: Framework ready for image selection CAPTCHAs
+### How It Works
+1. **Automatic Detection**: Lifecycle hooks monitor each page for CAPTCHA indicators
+2. **Image Processing**: OpenCV extracts and preprocesses CAPTCHA images
+3. **OCR Recognition**: Tesseract performs text recognition
+4. **Smart Input**: Automatically finds and fills CAPTCHA input fields
+5. **Continue Task**: Lets browser-use continue with original instructions (other fields, submission, etc.)
+### CAPTCHA Detection Indicators
+The system looks for these common CAPTCHA indicators:
+- `captcha`
+- `recaptcha`
+- `hcaptcha`
+- `verification`
+- `security check`
+- `prove you are human`
+## 🔧 Configuration
+### Environment Variables
+```bash
+# Required
+OPENAI_API_KEY=your-openai-api-key
+# Optional
+PYTHONPATH=/path/to/browserTask  # Set automatically by TypeScript wrapper
+```
+### Python Script Arguments
+```bash
+python3 browserAgent.py --help
+```
+- `-p, --prompt`: Task prompt (required)
+- `--max-steps`: Maximum number of steps (default: 30)
+- `--output-format`: Output format (text|json, default: text)
+## 🏗️ Architecture
+### Lifecycle Hooks Integration
+The implementation uses browser-use lifecycle hooks for seamless CAPTCHA handling:
+```python
+async def captcha_detection_hook(agent):
+    """Runs before each step to check for CAPTCHAs"""
+    page = await agent.browser_session.get_current_page()
+    # 1. Check for CAPTCHA indicators on the page
+    # 2. If found, solve and fill the CAPTCHA field
+    # 3. Let browser-use continue with the original task
+    # 4. No interruption to the user's workflow
+```
+**Flow Example:**
+```
+User Task: "Fill out the contact form with my details and submit"
+Step 1: Navigate to contact page
+Step 2: [Hook detects CAPTCHA] → Solve CAPTCHA → Fill CAPTCHA field
+Step 3: Fill name field (as per user instruction)
+Step 4: Fill email field (as per user instruction)
+Step 5: Fill message field (as per user instruction)
+Step 6: Click submit button (as per user instruction)
+```
+### CAPTCHA Bypass Flow
+```mermaid
+graph TD
+    A[Page Load] --> B[Lifecycle Hook Triggered]
+    B --> C{CAPTCHA Detected?}
+    C -->|No| D[Continue Normal Flow]
+    C -->|Yes| E[Take Screenshot]
+    E --> F[Extract CAPTCHA Region]
+    F --> G[OCR Text Recognition]
+    G --> H{Text Extracted?}
+    H -->|Yes| I[Fill CAPTCHA Field]
+    H -->|No| J[Try AI Solving]
+    J --> I
+    I --> K[Browser-use Continues Task]
+```
+### Tool Architecture
+```python
+class CaptchaBypass(Tool):
+    """Advanced CAPTCHA detection and solving tool"""
+    async def use(self, page, input_text="") -> ActionResult:
+        # Main entry point for CAPTCHA solving
+    async def _detect_and_solve_captcha(self, cv_img, pil_img):
+        # OpenCV + OCR processing
+    async def _ai_solve_captcha(self, screenshot):
+        # AI-powered solving (extensible)
+    async def _fill_captcha_solution(self, page, solution):
+        # Find inputs and fill solution (no submission)
+```
+## 🧪 Testing
+### Test Basic Functionality
+```bash
+# Activate virtual environment first
+source .venv/bin/activate
+# Test simple navigation
+python browserAgent.py -p "Go to google.com"
+# Test with potential CAPTCHA site
+python browserAgent.py -p "Navigate to a form and fill it out"
+```
+### Test CAPTCHA Bypass
+You can test the CAPTCHA bypass by:
+1. Finding a site with text CAPTCHAs
+2. Running the agent on that site
+3. Monitoring logs for CAPTCHA detection and solving
+### Debug Mode
+Enable detailed logging by modifying the Python script:
+```python
+logging.basicConfig(level=logging.DEBUG)  # Change from INFO to DEBUG
+```
+## 🔍 Troubleshooting
+### Common Issues
+1. **Python Dependencies**
+   ```bash
+   # With virtual environment activated
+   uv pip install --upgrade browser-use opencv-python pytesseract pillow
+   # Or: pip install --upgrade browser-use opencv-python pytesseract pillow
+   ```
+2. **Tesseract Not Found**
+   ```bash
+   # Verify installation
+   tesseract --version
+   # Add to PATH if needed
+   export PATH="/usr/local/bin:$PATH"
+   ```
+3. **Playwright Browser Issues**
+   ```bash
+   uv run playwright install --with-deps
+   # Or with activated venv: playwright install --with-deps
+   ```
+4. **Permission Issues**
+   ```bash
+   chmod +x browserAgent.py
+   chmod +x setup.sh
+   ```
+### Debugging CAPTCHA Issues
+1. Check `captcha_isolated.png` for extracted CAPTCHA images
+2. Review logs for OCR results
+3. Verify Tesseract language packs: `tesseract --list-langs`
+## 🚀 Extending the Implementation
+### Adding New CAPTCHA Types
+```python
+# Extend the CaptchaBypass class
+class AdvancedCaptchaBypass(CaptchaBypass):
+    async def _solve_image_captcha(self, page):
+        # Implement image selection logic
+        pass
+    async def _solve_audio_captcha(self, page):
+        # Implement audio CAPTCHA solving
+        pass
+```
+### Custom Lifecycle Hooks
+```python
+async def custom_hook(agent):
+    """Add custom logic to lifecycle hooks"""
+    page = await agent.browser_session.get_current_page()
+    # Your custom logic here
+# Use in agent
+agent.run(on_step_start=custom_hook)
+```
+### AI-Powered CAPTCHA Solving
+The framework is ready for AI-powered CAPTCHA solving. Implement the `_ai_solve_captcha` method:
+```python
+async def _ai_solve_captcha(self, screenshot: bytes) -> Dict[str, Any]:
+    # Convert screenshot to base64
+    screenshot_b64 = base64.b64encode(screenshot).decode('utf-8')
+    # Send to GPT-4V or similar vision model
+    response = await self.llm.ainvoke([
+        {"type": "text", "text": "Solve this CAPTCHA:"},
+        {"type": "image_url", "image_url": {"url": f"data:image/png;base64,{screenshot_b64}"}}
+    ])
+    return {"success": True, "solution": response.content}
+```
+## 📝 Migration from CLI
+The new implementation maintains the same interface as the original CLI version:
+### Before (CLI)
+```typescript
+spawn('browser-use', ['-p', prompt]);
+```
+### After (Python SDK)
+```typescript
+spawn('python3', [pythonScriptPath, '-p', prompt, '--output-format', 'json']);
+```
+### Benefits of Migration
+1. **Better Control**: Direct access to browser-use APIs
+2. **CAPTCHA Bypass**: Automatic CAPTCHA handling
+3. **Lifecycle Hooks**: Fine-grained control over browser behavior
+4. **Error Handling**: Structured error reporting
+5. **Extensibility**: Easy to add new features
+6. **Performance**: Reduced overhead compared to CLI spawning
+## 📄 License
+This implementation follows the same license as the parent MindedJS project.
+## 🤝 Contributing
+When contributing to the browser agent:
+1. Test with multiple CAPTCHA types
+2. Ensure proper error handling
+3. Add logging for debugging
+4. Update documentation
+5. Follow the existing code style
+## 🆘 Support
+For issues and questions:
+1. Check the troubleshooting section
+2. Review logs for error details
+3. Test with the simplest possible case
+4. Verify all dependencies are installed
+The browser agent with CAPTCHA bypass provides a robust foundation for automated web interactions while handling common anti-bot measures.