npm - browsemind - Versions diffs - 0.5.0 - Mend

browsemind 0.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (4) hide show

package/LICENSE ADDED Viewed

@@ -0,0 +1,30 @@
+GNU AFFERO GENERAL PUBLIC LICENSE
+Version 3, 19 November 2007
+Copyright (C) 2026 browsemind Contributors
+This program is free software: you can redistribute it and/or modify
+it under the terms of the GNU Affero General Public License as published
+by the Free Software Foundation, either version 3 of the License, or
+(at your option) any later version.
+This program is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU Affero General Public License for more details.
+You should have received a copy of the GNU Affero General Public License
+along with this program.  If not, see <https://www.gnu.org/licenses/>.
+Full license text: https://www.gnu.org/licenses/agpl-3.0.txt
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

package/README.md ADDED Viewed

@@ -0,0 +1,896 @@
+# BrowseMind 🤖 — LLM Browser Automation Agent
+> **Navigate, search, extract, and automate any website using natural language.**
+> Powered by Crawl4AI and LiteLLM — the agent decides everything, you just describe what you need.
+<div align="center">
+[![Python 3.11+](https://img.shields.io/badge/Python-3.11%2B-blue.svg)](https://www.python.org/downloads/)
+[![Version](https://img.shields.io/badge/version-0.5.0-green.svg)](https://github.com/prokopis3/browsemind/releases)
+[![License: AGPL v3](https://img.shields.io/badge/License-AGPLv3-blue.svg)](LICENSE)
+[![Crawl4AI](https://img.shields.io/badge/Crawl4AI-0.9.0%2B-orange)](https://github.com/unclecode/crawl4ai)
+[![SemVer](https://img.shields.io/badge/SemVer-0.5.0-ff69b4.svg)](https://semver.org/spec/v2.0.0.html)
+[![Conventional Commits](https://img.shields.io/badge/Conventional%20Commits-1.0.0-yellow.svg)](https://www.conventionalcommits.org/)
+[![Changelog](https://img.shields.io/badge/Changelog-Keep%20a%20Changelog-brightgreen.svg)](https://keepachangelog.com/en/1.1.0/)
+[![CI](https://github.com/prokopis3/browsemind/workflows/CI/badge.svg)](https://github.com/prokopis3/browsemind/actions)
+**[Installation](#installation)** • **[Quickstart](#rocket-quickstart)** • **[CLI Reference](#computer-cli-reference)** • **[Features](#sparkles-features)** • **[Architecture](#architecture)** • **[Configuration](#wrench-configuration)** • **[API](run_api.py)** • **[Roadmap](ROADMAP.md)**
+</div>
+---
+## 📋 Table of Contents
+- [Installation](#installation)
+- [Quickstart](#rocket-quickstart)
+- [CLI Reference](#computer-cli-reference)
+- [Features](#sparkles-features)
+- [Architecture](#architecture)
+- [Python API](#snake-python-api)
+- [Configuration](#wrench-configuration)
+- [Testing](#microscope-testing)
+- [Project Structure](#open_file_folder-project-structure)
+- [Contributing](#handshake-contributing)
+- [License](#page_with_curl-license)
+- [Disclaimer](#disclaimer)
+- [Citations](#-citations-1)
+---
+## Installation
+### Requirements
+- **Python 3.11+**
+- **Chrome/Chromium** — installed automatically by Crawl4AI on first run, or manually via [Chrome for Testing](https://developer.chrome.com/blog/chrome-for-testing/)
+- **Playwright** — installed via `playwright install chromium` (auto-setup with Crawl4AI)
+<details>
+  <summary><strong>Package Managers</strong></summary>
+### pip (recommended)
+```bash
+pip install browsemind
+browsemind doctor                     # Verify installation
+browsemind --task "Open example.com and screenshot it"
+```
+### uv
+```bash
+uv tool install browsemind
+browsemind doctor
+browsemind --task "Open example.com and screenshot it"
+```
+### npm
+```bash
+npm install -g browsemind
+browsemind doctor
+browsemind --task "Open example.com and screenshot it"
+```
+### Homebrew (macOS)
+```bash
+brew install browsemind/browsemind/browsemind
+browsemind doctor
+```
+### Chocolatey (Windows)
+```powershell
+choco install browsemind
+browsemind doctor
+```
+### Scoop (Windows)
+```powershell
+scoop bucket add browsemind https://github.com/prokopis3/scoop-browsemind
+scoop install browsemind
+browsemind doctor
+```
+</details>
+<details>
+  <summary><strong>Development & Quick Start</strong></summary>
+### From Source (development)
+```bash
+git clone https://github.com/prokopis3/browsemind.git
+cd browsemind
+# Create virtual environment
+python -m venv .venv
+source .venv/bin/activate      # Linux/macOS
+.venv\Scripts\activate         # Windows
+# Install
+uv pip install -e .
+# Install Playwright browsers
+playwright install chromium
+# Verify
+browsemind doctor
+```
+### Quick Start (no install)
+```bash
+npx browsemind doctor
+npx browsemind --task "Open example.com and screenshot it"
+```
+</details>
+<details>
+  <summary><strong>Updating & Diagnostics</strong></summary>
+### Updating
+Upgrade to the latest version:
+```bash
+browsemind upgrade
+```
+Auto-detects your installation method (pip, uv, npm, Homebrew, Chocolatey, Scoop) and runs the appropriate update command. Displays the version change on success.
+To check for updates without upgrading:
+```bash
+browsemind upgrade --check
+```
+### Doctor — Diagnose Your Installation
+Run `doctor` whenever something stops working unexpectedly or after upgrades:
+```bash
+browsemind doctor                     # Full diagnosis
+browsemind doctor --offline --quick   # Local-only, fastest
+browsemind doctor --fix               # Also run destructive repairs
+browsemind doctor --json              # Structured output for agents
+```
+Checks: environment, Chrome install, config files, security, network reachability, and a live headless launch test. Exit code `0` if all pass, `1` if any fail.
+</details>
+---
+## 🚀 Quickstart
+### 1. Set Your API Key
+```bash
+# Copy the example env file
+cp llm.env.example llm.env
+# Edit llm.env with at least one API key:
+#   DEEPSEEK_API_KEY=sk-your-deepseek-key
+#   OPENAI_API_KEY=sk-your-openai-key
+#   GROQ_API_KEY=gsk-your-groq-key
+```
+### 2. Run Your First Task
+```bash
+# Browse and screenshot
+browsemind --task "Navigate to example.com and take a screenshot"
+# Extract data
+browsemind --task "Go to quotes.toscrape.com and extract all quotes" --headless
+# Search + extract
+browsemind --task "Search for Python async programming and return the top 3 results"
+# Deep crawl a catalog
+browsemind --task "Extract all product titles and prices" --url https://books.toscrape.com --discover
+# Fast content extraction (Queen Reader — no LLM, no browser wait)
+browsemind --task "Read the text from example.com"
+```
+### 3. Try Command Mode
+```bash
+browsemind open https://example.com
+browsemind snapshot
+browsemind screenshot
+browsemind close
+```
+---
+## 💻 CLI Reference
+<details>
+  <summary><strong>Task Mode (AI Agent)</strong></summary>
+```bash
+browsemind --task "Extract all product names and prices" --url https://books.toscrape.com
+browsemind --task "Find the pricing page and return the prices" --url https://example.com
+browsemind --task "Extract all articles from the blog" --save json --output-dir ./output
+browsemind --task "Access the dashboard and download the report" --url https://example.com/login --login
+browsemind --task "Capture the full page" --url https://example.com --vision-tile
+```
+**Options:**
+| Flag | Description |
+|------|-------------|
+| `--task`, `-t` | Natural language task description |
+| `--url` | Starting URL |
+| `--headless` | Run browser in headless mode |
+| `--max-steps` | Maximum agent steps (default: 30) |
+| `--save json/csv/both` | Save extracted data to file |
+| `--screenshot` | Save screenshot after extraction |
+| `--pdf` | Generate PDF of the page |
+| `--markdown` | Generate markdown summary |
+| `--discover` | Deep crawl domain discovery |
+| `--discover-max-pages` | Max pages for discovery (default: 50) |
+| `--login` | Force interactive login mode |
+| `--json` | JSON output format |
+| `--verbose`, `-v` | Verbose output |
+</details>
+<details>
+  <summary><strong>Command Mode — Core & Interactions</strong></summary>
+#### Core
+```bash
+browsemind open <url>              # Navigate to URL
+browsemind fetch <url>             # Fetch webpage content
+browsemind snapshot                # Get page state (element refs, text, URL)
+browsemind screenshot [path]       # Take a screenshot
+browsemind close                   # Close browser
+```
+#### Interactions — use element refs from `snapshot`
+```bash
+browsemind click @e3               # Click element by ref
+browsemind fill @e2 "hello world"  # Clear and fill input
+browsemind type @e2 "text"         # Type into element
+browsemind press Enter             # Press keyboard key
+browsemind scroll down             # Scroll page
+browsemind highlight .product-card # Highlight element
+```
+</details>
+<details>
+  <summary><strong>Command Mode — Page Info, Tabs & Debug</strong></summary>
+#### Page Information
+```bash
+browsemind get title               # Page title
+browsemind get url                 # Current URL
+browsemind get text h1             # Element text
+browsemind get html .content       # Element HTML
+browsemind get count .product      # Element count
+browsemind is visible .loading     # Check visibility
+browsemind cookies list            # List cookies
+browsemind storage get key         # localStorage value
+```
+#### Tabs & Navigation
+```bash
+browsemind tab list                # List open tabs
+browsemind tab new <url>           # Open new tab
+browsemind tab switch 2            # Switch to tab
+browsemind back                    # Go back
+browsemind forward                 # Go forward
+browsemind reload                  # Reload page
+```
+#### CDP & Debug
+```bash
+browsemind cdp                     # Get CDP WebSocket URL
+browsemind console list            # View JS console messages
+browsemind errors list             # View page errors
+browsemind inspect                 # Open Chrome DevTools
+browsemind pdf page.pdf            # Save page as PDF
+browsemind trace start             # Start performance tracing
+browsemind wait 5                  # Wait N seconds
+```
+</details>
+<details>
+  <summary><strong>Command Mode — Maintenance, Plugins & Fetch</strong></summary>
+#### Setup & Maintenance
+```bash
+browsemind upgrade                 # Upgrade to latest version
+browsemind upgrade --check         # Check version without upgrading
+browsemind doctor                  # Diagnose installation
+browsemind doctor --fix            # Diagnose + repair
+browsemind doctor --json           # Structured JSON output
+```
+#### Skills, Plugins & MCP
+```bash
+browsemind skills list             # List installed skills
+browsemind skills add <source>     # Install skill from GitHub/URL
+browsemind skills get <name>       # Show skill details
+browsemind skills remove <name>    # Uninstall skill
+browsemind plugin list             # List installed plugins
+browsemind plugin add <name>       # Install plugin
+browsemind plugin run <name>       # Execute plugin
+browsemind mcp --tools all         # Start MCP server
+```
+#### Fetch Options
+```bash
+browsemind fetch <url> --format markdown   # Default
+browsemind fetch <url> --format json       # Structured JSON
+browsemind fetch <url> --format screenshot # Screenshot
+browsemind fetch --search "query"          # Search then fetch
+```
+</details>
+---
+## ✨ Features
+<details>
+  <summary><strong>🧠 LLM Decision Engine</strong></summary>
+`AgentBrain` uses LiteLLM to observe, think, decide, and verify — accuracy over cost. The system prompt is a compact **97-line / ~959-token decision tree** (was 580 lines / ~10K tokens — 6x smaller). The LLM chooses from **141 ActionType enum values** covering every pattern in the [Dynamic Data Extraction Guide](docs/DYNAMIC_DATA_EXTRACTION_GUIDE.md).
+</details>
+<details>
+  <summary><strong>🌐 Intelligent Browser Automation</strong></summary>
+Full Playwright integration via Crawl4AI with anti-bot evasion, stealth mode, managed browser profiles, and persistent sessions. The **PageNavigator** provides a 6-strategy escalation chain:
+```
+DomainGraph cache → smart goto → navigate_js → proxy escalation → undetected browser → fallback fetch
+```
+Cloudflare, Akamai, CAPTCHA, and WAF detection built in.
+</details>
+<details>
+  <summary><strong>📊 Multi-Cascade Extraction (18 Pipelines)</strong></summary>
+Cost-ascending extraction cascade — LLM is the **LAST** resort:
+```
+Identity → QueenReader → Markdown → CSS → XPath → Regex → Cosine → LLM → Vision
+```
+| Pipeline | Cost | Method |
+|----------|------|--------|
+| **Identity** (pass-through) | $0 | `NoExtractionStrategy` — raw content |
+| **Queen Reader** | $0 | DOMContentLoaded fast path — no schema, no LLM, no vision |
+| **CSS Extraction** | $0 | `JsonCssExtractionStrategy` — structured lists |
+| **XPath Extraction** | $0 | `JsonXPathExtractionStrategy` — complex DOM |
+| **Regex Extraction** | $0 | `RegexExtractionStrategy` — 23 built-in patterns |
+| **Cosine Extraction** | $0 | `CosineStrategy` — semantic clustering |
+| **LLM Extraction** | $$ | `LLMExtractionStrategy` — unstructured content |
+| **Vision Extraction** | $$$ | Screenshot analysis via LLM |
+| **Deep Crawl** | Varies | `AdaptiveCrawler` — full site crawl |
+| **Batch** | Varies | `arun_many()` — parallel multi-URL |
+| **Scroll / Load-More** | $0 | Infinite scroll / AJAX load-more |
+| **Table / PDF / Video** | $0 | Specialized pipelines |
+| **Chunked LLM** | $$ | 3 chunking strategies (regex, sliding, overlapping) |
+</details>
+<details>
+  <summary><strong>📖 Queen Reader — Content-First Fast Extraction</strong></summary>
+Inspired by PixelRAG's approach. Navigates with `wait_until='domcontentloaded'` (not `networkidle`) — returns text content **before** the full page loads. No schema generation, no LLM cascade, no waiting for JS/images. Falls back to full load if content < 500 chars.
+</details>
+<details>
+  <summary><strong>🔢 DOM Serializer & Element Highlighter</strong></summary>
+browser-use style indexed interactive element mapping. Runs JS to find ALL interactive DOM elements, assigns stable **1-based indices**, builds `selector_map {index → backendNodeId}` for CDP dispatch. The Element Highlighter draws numbered blue outline boxes on each element so the LLM can reference `click(index=N)` or `input(index=N, text="...")`.
+</details>
+<details>
+  <summary><strong>🔐 Login & Credential Management</strong></summary>
+Persisted browser profiles, encrypted credential vault, a11y snapshot form filling (works on React/Vue/Angular), multi-strategy login pipeline with wrong-credential retry. Dedicated `LOGIN` + `RECOVER_FROM_BLOCK` action handlers.
+</details>
+<details>
+  <summary><strong>⚡ Token Budget-Aware LLM Tiering</strong></summary>
+BrowseMind automatically selects the optimal LLM tier based on your remaining **prompt token budget**:
+| Tier | Budget | Behavior | Model |
+|------|--------|----------|-------|
+| **🟢 RULE** | &lt; 500 tokens | Zero-LLM — rule-based fallback only | No LLM call |
+| **🟡 CHEAP** | 500–2,000 tokens | Low-cost LLM for simple decisions | `deepseek/deepseek-chat` (or configured cheap provider) |
+| **🔴 FULL** | &gt; 2,000 tokens | Full LLM for complex reasoning | Primary model (DeepSeek/GPT-4/Claude) |
+Thresholds are configurable in `browsemind.yml`:
+```yaml
+llm:
+  cheap_threshold: 500      # Below this → RULE tier
+  full_threshold: 2000       # Above this → FULL tier
+  cheap_provider: "deepseek/deepseek-chat"      # $0.14/$0.28 per 1M tokens
+```
+**The stack:**
+- **TokenTracker** — Tracks prompt/completion tokens and total cost per session
+- **LLMTierSelector** — Picks RULE/CHEAP/FULL based on remaining budget
+- **TokenOptimizer** — 14-pattern RuleActionSuggester, FTS5LinkMatcher, BM25Ranker (~80% prompt reduction)
+- **ThresholdSupervisor** — Prevents budget overshoot
+- **Chain failure recovery** — Auto-cascade to next strategy on failure
+</details>
+<details>
+  <summary><strong>🔌 Plugin System</strong></summary>
+Three plugin types extend the agent:
+- **ToolPlugin** — Add new tools to ToolRegistry (Slack, email, etc.)
+- **SkillPlugin** — Prompt-level skills (constraints, context injection)
+- **MCPPlugin** — Wrap MCP servers as agent tools (filesystem, GitHub)
+</details>
+<details>
+  <summary><strong>🧰 38 Built-in Tool Wrappers</strong></summary>
+Discoverable tools via ToolRegistry with structured metadata, parameter validation, and failure/success pattern documentation for LLM use. Includes QueenReaderTool, DOMSerializerTool, ElementHighlighterTool.
+</details>
+<details>
+  <summary><strong>📦 28 Handler Files, 48 Handlers, 141 ActionTypes</strong></summary>
+Single `ActionDispatcher` routes all 141 action types through 48 specialized handler classes — no if/elif chains.
+</details>
+<details>
+  <summary><strong>🔎 4 Search Engine Support</strong></summary>
+Google (CSE API + browser fallback, sign-in redirect fixed), Bing, Brave, DuckDuckGo — auto-fallback orchestration.
+</details>
+<details>
+  <summary><strong>🧪 JS Browser-Level Test Suite</strong></summary>
+19 Playwright-based tests in `tests_js/` covering DOM serializer, element highlighter, form interaction, consent manager, human-like behavior, recovery engine, strategy pipeline, tile capture.
+</details>
+---
+## Architecture
+### Core Loop
+```
+User Prompt → Entry (CLI/API)
+    → AgentLoop.run()
+        → PageState.capture()        # OBSERVE
+        → AgentBrain.assess_progress() # THINK
+        → AgentBrain.decide()         # DECIDE (LLM)
+        → ActionDispatcher.dispatch() # ACT
+            → Specific Handler → BrowserAgent
+        → AgentBrain.verify()         # VERIFY
+    → Loop until DONE
+```
+### Architecture Diagram (Mermaid)
+See [`docs/architecture.mmd`](docs/architecture.mmd) for the full interactive diagram. Key layers:
+| Layer | Description | Key Files |
+|-------|-------------|-----------|
+| **1. Entry** | CLI + FastAPI | `main.py`, `run_api.py`, `lib/cli/` |
+| **2. Agent Loop & Brain** | Observe→Think→Decide→Act→Verify | `agent_loop.py`, `agent_brain.py`, `page_state.py` |
+| **3. Tool Registry** | 38 AgentTool wrappers + Plugin System | `tools/tool_registry.py`, `tools/builtin_tools.py` |
+| **4. Action Dispatch** | Single dispatch authority | `handlers/__init__.py` |
+| **5. Handler Layer** | 48 handlers across 28 files | `handlers/interaction.py`, `handlers/extraction.py`, ... |
+| **6. Browser Layer** | Unified API (50+ methods) | `browser_agent.py`, `navigator.py` |
+| **7. Extraction Layer** | 18 pipelines, cost-ascending cascade | `strategy_pipeline.py`, `queen_reader.py` |
+| **8. Content Pipeline** | Queen Reader, DOM Serializer, Highlighter | `queen_reader.py`, `dom_serializer.py`, `element_highlighter.py` |
+| **9. Support & Infrastructure** | Cache, Memory, Proxy, Identity, Telemetry | `cache.py`, `memory/`, `proxy_manager.py`, `identity_manager.py` |
+| **10. Multi-Agent System** | Planner/Verifier/Supervisor (advisory) | `agents/planner.py`, `agents/verifier.py` |
+### Key Metrics
+| Metric | Value |
+|--------|-------|
+| Action types | **141** |
+| Handler files | **28** |
+| Concrete handlers | **48** |
+| AgentTool wrappers | **38** |
+| Extraction pipelines | **18** |
+| Search engines | **4** |
+| Cache tiers | **4** (L0–L3) |
+| Memory tiers | **3** (Working/Episodic/Semantic) |
+| Plugin types | **3** (Tool/Skill/MCP) |
+| Largest file | `strategy_pipeline.py` (~4,400 lines) |
+| System prompt | **97 lines / ~959 tokens** |
+| JS test suite | **19 files** in `tests_js/` |
+### Key Design Principles
+| Principle | Description |
+|-----------|-------------|
+| **LLM is Primary** | Accuracy over cost — LLM consulted first for every decision |
+| **Rules Suggest** | RuleActionSuggester provides hints, not commands |
+| **Zero-LLM by Default** | Extraction, navigation, link matching use 0 LLM tokens |
+| **Cost-Ascending Cascade** | Identity→CSS→...→LLM→Vision (LLM is absolute LAST) |
+| **No Hardcoded Names** | Element recognition uses HTML semantics + WAI-ARIA |
+| **Structural Overlays** | Consent dialogs detected by `role=dialog`, `position:fixed` |
+---
+## 🐍 Python API
+```python
+import asyncio
+from lib.agent_loop import AgentLoop
+from lib.config import AppConfig
+async def main():
+    config = AppConfig.load("browsemind.yml")
+    loop = AgentLoop(config)
+    result = await loop.run("Navigate to example.com and extract all headings")
+    print(f"Success: {result.success}")
+    print(f"Steps: {result.steps_taken}")
+    print(f"Data: {result.data}")
+    print(f"Cost: ${result.total_cost:.4f}")
+asyncio.run(main())
+```
+### Programmatic Extraction
+```python
+from lib.strategy_pipeline import StrategyPipeline
+from lib.crawl4ai_config import Crawl4AIConfigBuilder
+config = (
+    Crawl4AIConfigBuilder()
+    .with_css_selector(".product")
+    .with_word_count_threshold(50)
+    .with_cache_mode("BYPASS")
+    .build()
+)
+pipeline = StrategyPipeline(crawler)
+result = await pipeline.run(
+    url="https://books.toscrape.com",
+    strategy="css",
+    schema={
+        "baseSelector": ".product",
+        "fields": [
+            {"name": "title", "selector": "h3 a", "type": "text"},
+            {"name": "price", "selector": ".price", "type": "text"},
+        ]
+    }
+)
+print(result.extracted[:5])
+```
+---
+## 🔧 Configuration
+### LLM Providers (YAML-Driven)
+Configuration lives in a single file: [`browsemind.yml`](browsemind.yml). API keys go in `llm.env` (git-ignored).
+```yaml
+llm:
+  default_provider: "deepseek/deepseek-chat"
+  extraction_provider: "deepseek/deepseek-chat"
+  config:
+    temperature: 0.3
+    max_tokens: 4096
+```
+**llm.env:**
+```env
+DEEPSEEK_API_KEY=sk-your-deepseek-key
+# OPENAI_API_KEY=sk-your-openai-key
+# GROQ_API_KEY=gsk-your-groq-key
+# ANTHROPIC_API_KEY=sk-ant-your-anthropic-key
+```
+Supported providers (any LiteLLM-compatible):
+- `deepseek/deepseek-chat` — **default** ($0.14/$0.28 per 1M tokens, cheapest June 2026)
+- `deepseek/deepseek-v4-pro` — Premium tier ($0.435/$0.87 per 1M tokens)
+- `openai/gpt-4o-mini`, `openai/gpt-4o`
+- `groq/meta-llama/llama-4-scout-17b-16e-instruct`
+- `anthropic/claude-3-haiku`, `anthropic/claude-3-opus`
+- Any OpenAI-compatible endpoint
+### Extraction Configuration
+18 LLM-configurable parameters via `ExtractionConfig`:
+| Parameter | Type | Description |
+|-----------|------|-------------|
+| `wait_for` | `str` | CSS/JS condition before extraction |
+| `scan_full_page` | `bool` | Auto-scroll for infinite content |
+| `word_count_threshold` | `int` | Min words per block (default: 200) |
+| `css_selector` | `str` | Focus on specific page section |
+| `excluded_tags` | `list` | Remove specific HTML tags |
+| `remove_overlay_elements` | `bool` | Remove popups/modals |
+| `simulate_user` | `bool` | Simulate mouse movements |
+| `magic` | `bool` | Auto-handle popups/consent |
+| `cache_mode` | `str` | `ENABLED/BYPASS/DISABLED/READ_ONLY/WRITE_ONLY` |
+| `process_iframes` | `bool` | Inline iframe content |
+| `flatten_shadow_dom` | `bool` | Flatten Shadow DOM |
+---
+## 🔬 Testing
+### Python Tests
+```bash
+# Run all unit tests
+pytest
+# Run specific test file
+pytest tests/test_improvements.py -v
+# Run with coverage
+pytest --cov=lib tests/
+```
+### JS Browser-Level Tests (19 Playwright tests)
+```bash
+cd tests_js
+node run_all.js
+```
+Individual test files in `tests_js/`:
+- `test_dom_serializer.js` — Element indexing and selector maps
+- `test_element_highlighter.js` — Numbered overlay rendering
+- `test_consent_manager.js` — Consent dismissal with overlay detection
+- `test_form_interaction.js` — Form fill and submit flows
+- `test_human_like_behavior.js` — Human-like behavior simulation
+- `test_tile_capture.js` — Tile-based screenshot capture
+- `test_strategy_pipeline.js` — Pipeline cascade behavior
+- `test_recovery_engine.js` — Recovery strategy behavior
+### CDP & Navigation Diagnostics
+```bash
+python tests/test_diagnose_cdp.py
+python tests/test_diagnose_navigation.py
+```
+---
+## 📁 Project Structure
+```
+browsemind/
+├── main.py                    # CLI entry (task mode + command mode)
+├── run_api.py                 # FastAPI server entry
+├── pyproject.toml             # Project config + dependencies
+├── browsemind.yml             # YAML-driven configuration
+├── llm.env / llm.env.example  # API keys (git-ignored)
+│
+├── lib/                       # Core library
+│   ├── agent_loop.py          # Core iteration loop
+│   ├── agent_brain.py         # LLM decision engine (~1,771 lines)
+│   ├── page_state.py          # DOM observation (~620 lines)
+│   ├── actions.py             # 141 ActionTypes + dataclasses
+│   ├── browser_agent.py       # Unified browser API (~1,750 lines)
+│   ├── strategy_pipeline.py   # Single extraction authority (~4,400 lines)
+│   ├── queen_reader.py        # DOMContentLoaded fast path
+│   ├── dom_serializer.py      # browser-use style indexed elements
+│   ├── element_highlighter.py # Numbered blue overlays
+│   ├── navigator.py           # Anti-bot strategy chain
+│   ├── human_like_behavior.py # Mouse/keyboard emulation
+│   ├── consent_manager.py     # Overlay/modal removal
+│   ├── login_pipeline.py      # Automated login flows
+│   │
+│   ├── handlers/              # 28 files, 48 handlers, 141 action types
+│   │   ├── interaction.py     # Click, Input, Scroll, Hover, DblClick, Drag, Upload
+│   │   ├── extraction.py      # 11 extraction action types
+│   │   ├── navigation.py      # Navigate, session/proxy navigate
+│   │   ├── download.py        # Download, DownloadMany (yt-dlp)
+│   │   ├── search_site.py     # Intra-site page discovery
+│   │   ├── tabs.py            # Multi-tab + page history operations
+│   │   └── ...                # 22 more handler files
+│   │
+│   ├── tools/                 # Tool Registry + SERP integrations
+│   │   ├── tool_registry.py   # Central registry
+│   │   ├── builtin_tools.py   # 38 AgentTool wrappers
+│   │   ├── google_serp.py     # Google search (CSE + browser)
+│   │   ├── bing_serp.py       # Bing search
+│   │   ├── brave_serp.py      # Brave search
+│   │   └── duckduckgo_serp.py # DuckDuckGo search
+│   │
+│   ├── memory/                # 3-tier memory (SQLite-backed)
+│   │   ├── working.py         # Current session actions
+│   │   ├── episodic.py        # Past sessions
+│   │   └── semantic.py        # Cross-session patterns + FTS5
+│   │
+│   ├── agents/                # Multi-agent system (advisory)
+│   │   ├── planner.py         # Task decomposition
+│   │   ├── verifier.py        # Outcome verification
+│   │   └── supervisor.py      # Task supervision
+│   │
+│   ├── plugin/                # Plugin system
+│   │   ├── base.py            # Plugin ABCs
+│   │   ├── plugin_manager.py  # Lifecycle management
+│   │   └── mcp_plugin.py      # MCP server integration
+│   │
+│   ├── api/                   # FastAPI REST
+│   │   └── routes.py          # 8 endpoints
+│   │
+│   └── cli/                   # Unified CLI
+│       ├── args.py            # Argument parser
+│       └── crwl_cli.py        # Command implementations
+│
+├── archive/plans/             # Archived design documents
+├── docs/                      # Documentation
+│   ├── architecture.mmd       # Mermaid diagram
+│   ├── DYNAMIC_DATA_EXTRACTION_GUIDE.md  # 52 workflow states
+│   └── ...                    # Crawl4AI docs cache
+│
+├── tests/                     # Python test suite
+├── tests_js/                  # 19 Playwright-based browser tests
+├── tutorial/                  # Crawl4AI tutorial scripts
+└── output/                    # Agent output directory
+```
+---
+## 🤝 Contributing
+See [CONTRIBUTING.md](CONTRIBUTING.md) for detailed guidelines.
+**Quick start:**
+```bash
+git clone https://github.com/prokopis3/browsemind.git
+cd browsemind
+python -m venv .venv && source .venv/bin/activate
+pip install -e ".[dev]"
+playwright install chromium
+```
+**Commit conventions:** We follow [Conventional Commits](https://www.conventionalcommits.org/) with SemVer. See [commitlint.config.js](commitlint.config.js) for allowed scopes and types.
+**Code of Conduct:** [CODE_OF_CONDUCT.md](CODE_OF_CONDUCT.md)
+---
+## 📚 Documentation
+| Resource | Description |
+|----------|-------------|
+| [Architecture Deep Dive](ARCHITECTURE_DEEP_DIVE.md) | Comprehensive 16-section architecture analysis |
+| [Roadmap](ROADMAP.md) | Future priorities, milestones, and research areas |
+| [Changelog](CHANGELOG.md) | Full release history with categorized entries |
+| [Architecture Diagram](docs/architecture.mmd) | Interactive Mermaid architecture diagram |
+| [Dynamic Data Extraction Guide](docs/DYNAMIC_DATA_EXTRACTION_GUIDE.md) | 52 workflow state patterns for LLM agents |
+| [Crawl4AI API Reference](docs/CRAWL4AI_ARCHITECTURE_REFERENCE.md) | Complete crawl4ai API surface reference |
+| [Plugin Architecture](lib/plugin/ARCHITECTURE.md) | Plugin system design and lifecycle |
+| [Configuration File](browsemind.yml) | YAML configuration reference |
+| [Contributing Guidelines](CONTRIBUTING.md) | How to contribute to the project |
+| [Security Policy](SECURITY.md) | Guidelines for reporting vulnerabilities |
+| [Code of Conduct](CODE_OF_CONDUCT.md) | Community standards and expectations |
+---
+## 🙏 Acknowledgments
+- **[Crawl4AI](https://github.com/unclecode/crawl4ai)** — The foundation web crawling library
+- **[Browser-Use](https://github.com/browser-use/browser-use)** — Inspiration for DOM interaction patterns and CLI design
+- **[Page-Agent](https://github.com/alibaba/page-agent)** — Inspiration for interactive element detection and indexing
+- **[PixelRAG](https://github.com/StarTrail-org/PixelRAG)** — Inspiration for Queen Reader fast extraction and tile-based capture
+- **[LiteLLM](https://github.com/BerriAI/litellm)** — Multi-provider LLM interface
+- **[Playwright](https://playwright.dev/)** — Browser automation engine
+---
+## ⚖️ License
+Copyright (c) 2026 BrowseMind Contributors. Licensed under the **GNU AGPL v3** — see [LICENSE](LICENSE) for details.
+---
+## Disclaimer
+> [!CAUTION]
+>
+> This library is provided for **educational and research purposes only**. By using this library, you agree to comply with local and international data scraping and privacy laws. The authors and contributors are not responsible for any misuse of this software. Always respect the terms of service of websites and `robots.txt` files.
+---
+## 🎓 Citations
+If you have used this library for research purposes, please cite us with the following references:
+### BrowseMind
+**BibTeX:**
+```bibtex
+@misc{browsemind,
+    author = {Prokopis3},
+    title = {BrowseMind: LLM Browser Automation Agent},
+    year = {2026},
+    publisher = {GitHub},
+    journal = {GitHub Repository},
+    url = {https://github.com/prokopis3/browsemind},
+    note = {An LLM browser automation \& web scraping agent powered by Crawl4AI}
+}
+```
+**Text citation:**
+> Prokopis3 (2026). *BrowseMind: LLM Browser Automation Agent* [Computer software]. GitHub. https://github.com/prokopis3/browsemind
+### Crawl4AI (underlying engine)
+**BibTeX:**
+```bibtex
+@software{crawl4ai2024,
+  author = {UncleCode},
+  title = {Crawl4AI: Open-source LLM Friendly Web Crawler \& Scraper},
+  year = {2024},
+  publisher = {GitHub},
+  journal = {GitHub Repository},
+  howpublished = {\url{https://github.com/unclecode/crawl4ai}},
+  commit = {c66f3276fd355031c8632500911fe7041ad6fc14}
+}
+```
+---
+## ☕ Support
+If you find this project useful, consider buying me a coffee!
+<div align="center">
+<a href="https://www.buymeacoffee.com/prokopis" target="_blank"><img src="https://cdn.buymeacoffee.com/buttons/v2/default-yellow.png" alt="Buy Me a Coffee" style="height: 60px !important;width: 217px !important;" ></a>
+<br><br>
+<a href="https://giphy.com/stickers/buy-me-a-coffee-support-thanks-for-your-hXMGQqJFlIQMOjpsKC" target="_blank">
+  <img src="https://media1.giphy.com/media/v1.Y2lkPTc5MGI3NjExcWZpdGpuc2dkZjZoMmRoanRzNnhhcGtqcGl6eTU0dWcwd2dxcGk4MyZlcD12MV9pbnRlcm5hbF9naWZfYnlfaWQmY3Q9cw/hXMGQqJFlIQMOjpsKC/giphy.gif" alt="Thank you — your support means a lot!" width="320">
+</a>
+</div>

package/bin/browsemind.js ADDED Viewed

@@ -0,0 +1,135 @@
+#!/usr/bin/env node
+/**
+ * BrowseMind — Node.js CLI Wrapper
+ * ===================================
+ * Thin wrapper that spawns the Python CLI. Enables npm/bun distribution
+ * while the Python backend handles all actual logic.
+ *
+ * Usage:
+ *   browsemind <command> [options]
+ *   crwl <command> [options]
+ *
+ * Postinstall:
+ *   node bin/browsemind.js postinstall  — first-time setup
+ */
+const { spawn, execSync } = require("child_process");
+const path = require("path");
+const fs = require("fs");
+const PACKAGE_NAME = "browsemind";
+const CLI_NAME = "browsemind";
+function findPython() {
+    // Try uv-managed Python first, then system Python
+    const candidates = ["uv run python", "python3", "python", "py -3"];
+    for (const py of candidates) {
+        try {
+            const [cmd, ...args] = py.split(" ");
+            execSync(`${cmd} ${args.join(" ")} --version`, { stdio: "ignore" });
+            return py;
+        } catch {
+            continue;
+        }
+    }
+    return "python";
+}
+function getProjectRoot() {
+    // Resolve the package root (where package.json and main.py live)
+    let dir = __dirname;
+    for (let i = 0; i < 5; i++) {
+        dir = path.resolve(dir, "..");
+        if (fs.existsSync(path.join(dir, "pyproject.toml"))) {
+            return dir;
+        }
+    }
+    // Fallback: assume we're in node_modules/.bin/ or bin/
+    return path.resolve(__dirname, "..");
+}
+function runPostinstall() {
+    const root = getProjectRoot();
+    console.log(`\n  📦 ${PACKAGE_NAME} v${require("../package.json").version}`);
+    console.log(`  Setting up Python environment...`);
+    const py = findPython();
+    const mainPy = path.join(root, "main.py");
+    if (!fs.existsSync(mainPy)) {
+        console.error(`  ❌ Could not find main.py at ${mainPy}`);
+        console.error(`  Please install from source: git clone <repo> && cd ${PACKAGE_NAME}`);
+        process.exit(1);
+    }
+    // Install Python dependencies
+    try {
+        console.log(`  Installing Python dependencies...`);
+        if (py.startsWith("uv")) {
+            execSync("uv sync", { cwd: root, stdio: "inherit" });
+        } else {
+            execSync(`${py} -m pip install -e .`, { cwd: root, stdio: "inherit" });
+        }
+    } catch (e) {
+        console.error(`  ⚠️  pip install failed: ${e.message}`);
+        console.log(`  Run '${py} -m pip install -e .' manually.`);
+    }
+    // Install Playwright browsers
+    try {
+        console.log(`  Installing Playwright browser (Chromium)...`);
+        execSync(`${py} -m playwright install chromium`, { cwd: root, stdio: "inherit" });
+    } catch {
+        console.log(`  ⚠️  Playwright browser install skipped. Run 'playwright install chromium' if needed.`);
+    }
+    console.log(`\n  ✅ ${PACKAGE_NAME} ready!`);
+    console.log(`  Run 'browsemind doctor' to verify installation.`);
+    console.log(`  Run 'browsemind --help' to see available commands.\n`);
+}
+function runCLI() {
+    const args = process.argv.slice(2);
+    const root = getProjectRoot();
+    const mainPy = path.join(root, "main.py");
+    const py = findPython();
+    if (!fs.existsSync(mainPy)) {
+        console.error(`❌ Could not find main.py at ${mainPy}`);
+        process.exit(1);
+    }
+    // Build the command based on Python runtime
+    let cmd, cmdArgs;
+    if (py.startsWith("uv")) {
+        cmd = "uv";
+        cmdArgs = ["run", "python", mainPy, ...args];
+    } else {
+        cmd = py;
+        cmdArgs = [mainPy, ...args];
+    }
+    const proc = spawn(cmd, cmdArgs, {
+        stdio: "inherit",
+        cwd: root,
+        env: { ...process.env, PYTHONUNBUFFERED: "1" },
+    });
+    proc.on("close", (code) => {
+        process.exit(code ?? 0);
+    });
+    proc.on("error", (err) => {
+        console.error(`❌ Failed to start: ${err.message}`);
+        process.exit(1);
+    });
+}
+// ── Entry ─────────────────────────────────────────────────────────────
+if (process.argv[2] === "postinstall") {
+    runPostinstall();
+} else {
+    runCLI();
+}

package/package.json ADDED Viewed

@@ -0,0 +1,54 @@
+{
+  "name": "browsemind",
+  "version": "0.5.0",
+  "description": "LLM browser automation agent — crawl, navigate, extract, interact",
+  "keywords": [
+    "browser-automation",
+    "web-scraping",
+    "crawling",
+    "ai-agent",
+    "llm",
+    "crawl4ai",
+    "web-crawler"
+  ],
+  "homepage": "https://github.com/prokopis3/browsemind",
+  "repository": {
+    "type": "git",
+    "url": "git+https://github.com/prokopis3/browsemind.git"
+  },
+  "license": "AGPL-3.0-or-later",
+  "author": "BrowseMind Contributors",
+  "bin": {
+    "browsemind": "./bin/browsemind.js"
+  },
+  "files": [
+    "bin/",
+    "README.md",
+    "LICENSE"
+  ],
+  "engines": {
+    "node": ">=18.0.0"
+  },
+  "preferGlobal": true,
+  "scripts": {
+    "postinstall": "node bin/browsemind.js postinstall",
+    "postupdate": "node bin/browsemind.js postinstall",
+    "prepare": "husky"
+  },
+  "lint-staged": {
+    "*.{py,pyw}": [
+      "uv tool run ruff check --fix --unsafe-fixes",
+      "uv tool run ruff format"
+    ],
+    "*.{yml,yaml,json,md}": [
+      "npx --no prettier --write"
+    ]
+  },
+  "devDependencies": {
+    "@commitlint/cli": "^21.1.0",
+    "@commitlint/config-conventional": "^21.1.0",
+    "husky": "^9.1.7",
+    "lint-staged": "^17.0.8",
+    "prettier": "^3.9.3"
+  }
+}