PyPI - web-research-agent - Versions diffs - 1.0.0__tar.gz - Mend

web-research-agent 1.0.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (32) hide show

web_research_agent-1.0.0/LICENSE +21 -0
web_research_agent-1.0.0/MANIFEST.in +8 -0
web_research_agent-1.0.0/PKG-INFO +259 -0
web_research_agent-1.0.0/README.md +218 -0
web_research_agent-1.0.0/agent/__init__.py +2 -0
web_research_agent-1.0.0/agent/agent.py +409 -0
web_research_agent-1.0.0/agent/comprehension.py +251 -0
web_research_agent-1.0.0/agent/memory.py +195 -0
web_research_agent-1.0.0/agent/planner.py +244 -0
web_research_agent-1.0.0/cli.py +508 -0
web_research_agent-1.0.0/config/__init__.py +3 -0
web_research_agent-1.0.0/config/config.py +80 -0
web_research_agent-1.0.0/config/config_manager.py +204 -0
web_research_agent-1.0.0/pyproject.toml +3 -0
web_research_agent-1.0.0/requirements.txt +0 -0
web_research_agent-1.0.0/setup.cfg +4 -0
web_research_agent-1.0.0/setup.py +47 -0
web_research_agent-1.0.0/tools/__init__.py +2 -0
web_research_agent-1.0.0/tools/browser.py +251 -0
web_research_agent-1.0.0/tools/code_generator.py +177 -0
web_research_agent-1.0.0/tools/presentation_tool.py +364 -0
web_research_agent-1.0.0/tools/search.py +133 -0
web_research_agent-1.0.0/tools/tool_registry.py +95 -0
web_research_agent-1.0.0/utils/console_ui.py +163 -0
web_research_agent-1.0.0/utils/formatters.py +201 -0
web_research_agent-1.0.0/utils/logger.py +88 -0
web_research_agent-1.0.0/web_research_agent.egg-info/PKG-INFO +259 -0
web_research_agent-1.0.0/web_research_agent.egg-info/SOURCES.txt +30 -0
web_research_agent-1.0.0/web_research_agent.egg-info/dependency_links.txt +1 -0
web_research_agent-1.0.0/web_research_agent.egg-info/entry_points.txt +2 -0
web_research_agent-1.0.0/web_research_agent.egg-info/requires.txt +8 -0
web_research_agent-1.0.0/web_research_agent.egg-info/top_level.txt +4 -0

web_research_agent-1.0.0/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2025 Victor Jotham Ashioya
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

web_research_agent-1.0.0/MANIFEST.in ADDED Viewed

@@ -0,0 +1,8 @@
+include README.md
+include LICENSE
+include requirements.txt
+recursive-include agent *.py
+recursive-include tools *.py
+recursive-include utils *.py
+recursive-include config *.py

web_research_agent-1.0.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,259 @@
+Metadata-Version: 2.2
+Name: web-research-agent
+Version: 1.0.0
+Summary: An intelligent AI agent for web-based research tasks
+Home-page: https://github.com/ashioyajotham/web-research-agent
+Author: Victor Jotham Ashioya
+Author-email: victorashioya960@gmail.com
+Project-URL: Bug Tracker, https://github.com/ashioyajotham/web-research-agent/issues
+Project-URL: Documentation, https://github.com/ashioyajotham/web-research-agent#readme
+Project-URL: Source Code, https://github.com/ashioyajotham/web-research-agent
+Keywords: ai,research,web,agent,llm,search
+Classifier: Programming Language :: Python :: 3
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Operating System :: OS Independent
+Classifier: Development Status :: 4 - Beta
+Classifier: Intended Audience :: Developers
+Classifier: Intended Audience :: Science/Research
+Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
+Requires-Python: >=3.9
+Description-Content-Type: text/markdown
+License-File: LICENSE
+Requires-Dist: click>=8.0.0
+Requires-Dist: requests>=2.25.0
+Requires-Dist: beautifulsoup4>=4.9.0
+Requires-Dist: html2text>=2020.1.16
+Requires-Dist: google-generativeai>=0.3.0
+Requires-Dist: python-dotenv>=0.19.0
+Requires-Dist: prompt_toolkit>=3.0.0
+Requires-Dist: rich>=10.0.0
+Dynamic: author
+Dynamic: author-email
+Dynamic: classifier
+Dynamic: description
+Dynamic: description-content-type
+Dynamic: home-page
+Dynamic: keywords
+Dynamic: project-url
+Dynamic: requires-dist
+Dynamic: requires-python
+Dynamic: summary
+# Web Research Agent
+An intelligent AI agent that can research complex topics by browsing the web, extracting relevant information, recognizing entities, and generating structured reports. The agent leverages a modern web browser, Google search, and AI language models to provide comprehensive answers to research questions.
+## Features
+- **Automated Web Research**: Search the web and browse pages to find information
+- **Entity Recognition**: Automatically identify people, organizations, roles, and other entities
+- **Adaptive Search**: Refine searches based on previously discovered information
+- **Information Synthesis**: Combine information from multiple sources
+- **Task Analysis**: Automatically determine the best approach to research tasks
+- **Structured Output**: Organize findings into well-formatted reports
+- **Code Generation**: Write code when required for data processing tasks
+## Architecture
+```mermaid
+graph TD
+    A[Main] --> B[WebResearchAgent]
+    B --> C1[Memory]
+    B --> C2[Planner]
+    B --> C3[Comprehension]
+    B --> C4[ToolRegistry]
+    C2 -->|Creates| D[Plan]
+    D -->|Contains| E[PlanSteps]
+    C4 -->|Registers| F1[SearchTool]
+    C4 -->|Registers| F2[BrowserTool]
+    C4 -->|Registers| F3[CodeGeneratorTool]
+    C4 -->|Registers| F4[PresentationTool]
+    C3 -->|Provides| G1[Task Analysis]
+    C3 -->|Extracts| G2[Entities]
+    C3 -->|Generates| G3[Summaries]
+    B -->|Executes| H[Tasks]
+    H -->|Produces| I[Results]
+    style B fill:#f9f,stroke:#333,stroke-width:2px
+    style C1 fill:#bbf,stroke:#333
+    style C2 fill:#bbf,stroke:#333
+    style C3 fill:#bbf,stroke:#333
+    style C4 fill:#bbf,stroke:#333
+    style F1 fill:#bfb,stroke:#333
+    style F2 fill:#bfb,stroke:#333
+    style F3 fill:#bfb,stroke:#333
+    style F4 fill:#bfb,stroke:#333
+```
+## Installation
+### Prerequisites
+- Python 3.9 or higher
+- pip (Python package installer)
+### Setup
+1. Clone the repository:
+   ```bash
+   git clone https://github.com/yourusername/web_research_agent.git
+   cd web_research_agent
+   ```
+2. Create a virtual environment:
+   ```bash
+   python -m venv venv
+   source venv/bin/activate  # On Windows: venv\Scripts\activate
+   ```
+3. Install dependencies:
+   ```bash
+   pip install -r requirements.txt
+   ```
+## Configuration
+The agent requires API keys to function properly:
+1. **Gemini API key**: For LLM services
+2. **Serper API key**: For Google search results
+### Setting up your API keys
+#### Option 1: .env file (Recommended)
+Create a `.env` file in the project root:
+```bash
+GEMINI_API_KEY=your_gemini_api_key
+SERPER_API_KEY=your_serper_api_key
+```
+The agent will automatically load this file.
+#### Option 2: Environment Variables
+```bash
+export GEMINI_API_KEY=your_gemini_api_key
+export SERPER_API_KEY=your_serper_api_key
+```
+#### Option 3: Programmatically
+```python
+from config.config_manager import init_config
+config = init_config()
+config.update('gemini_api_key', 'your_gemini_api_key')
+config.update('serper_api_key', 'your_serper_api_key')
+```
+### Additional Configuration Options
+| Config Key | Environment Variable | Description | Default |
+|------------|---------------------|-------------|---------|
+| gemini_api_key | GEMINI_API_KEY | API key for Google's Gemini LLM | - |
+| serper_api_key | SERPER_API_KEY | API key for Serper.dev search | - |
+| log_level | LOG_LEVEL | Logging level | INFO |
+| max_search_results | MAX_SEARCH_RESULTS | Maximum number of search results | 5 |
+| memory_limit | MEMORY_LIMIT | Number of items to keep in memory | 100 |
+| output_format | OUTPUT_FORMAT | Format for output (markdown, text, html) | markdown |
+| timeout | REQUEST_TIMEOUT | Default timeout for web requests (seconds) | 30 |
+## Usage
+### Basic Usage
+1. Create a text file with your research tasks, one per line:
+   ```
+   # tasks.txt
+   Find the name of the COO of the organization that mediated secret talks between US and Chinese AI companies in Geneva in 2023.
+   By what percentage did Volkswagen reduce their Scope 1 and Scope 2 greenhouse gas emissions in 2023 compared to 2021?
+   ```
+2. Run the agent:
+   ```bash
+   python main.py tasks.txt
+   ```
+3. Results will be saved to the `results/` directory as Markdown files.
+### Command Line Options
+```bash
+python main.py tasks.txt --output custom_output_dir
+```
+| Option | Description | Default |
+|--------|-------------|---------|
+| task_file | Path to text file containing tasks | (required) |
+| --output | Directory to store results | results/ |
+## Project Structure
+- **agent/**: Core agent components
+  - **agent.py**: Main agent class
+  - **comprehension.py**: Text understanding capabilities
+  - **memory.py**: Memory management
+  - **planner.py**: Plan creation and management
+- **tools/**: Tools used by the agent
+  - **browser.py**: Web browsing tool
+  - **search.py**: Web search tool
+  - **code_generator.py**: Code generation tool
+  - **presentation_tool.py**: Information formatting
+  - **tool_registry.py**: Tool registration system
+- **utils/**: Utility functions
+  - **console_ui.py**: Console interface
+  - **formatters.py**: Output formatting
+  - **logger.py**: Logging configuration
+- **config/**: Configuration management
+- **main.py**: Entry point
+## Advanced Usage
+### Entity Extraction
+The agent can automatically identify and extract entities from content:
+- **People**: Names of individuals
+- **Organizations**: Companies, agencies, groups
+- **Roles**: Job titles and organizational positions
+- **Locations**: Physical places
+- **Dates**: Temporal references
+This feature helps the agent refine searches and identify key information.
+### Custom Output Formats
+You can customize the output format by setting the `output_format` configuration:
+```python
+from config.config_manager import init_config
+config = init_config()
+config.update('output_format', 'html')  # Options: markdown, json, html
+```
+## Troubleshooting
+### Common Issues
+1. **URL Access Errors**: Some websites block automated access. Try using a different source.
+2. **API Rate Limiting**: If you receive rate limit errors, space out your requests or use a premium API plan.
+3. **Memory Issues**: For very large research tasks, you may need to increase your system's memory allocation.
+### Error Logs
+Logs are stored in the `logs/` directory for debugging.
+## Contributing
+Contributions are welcome! Please feel free to submit a Pull Request.

web_research_agent-1.0.0/README.md ADDED Viewed

@@ -0,0 +1,218 @@
+# Web Research Agent
+An intelligent AI agent that can research complex topics by browsing the web, extracting relevant information, recognizing entities, and generating structured reports. The agent leverages a modern web browser, Google search, and AI language models to provide comprehensive answers to research questions.
+## Features
+- **Automated Web Research**: Search the web and browse pages to find information
+- **Entity Recognition**: Automatically identify people, organizations, roles, and other entities
+- **Adaptive Search**: Refine searches based on previously discovered information
+- **Information Synthesis**: Combine information from multiple sources
+- **Task Analysis**: Automatically determine the best approach to research tasks
+- **Structured Output**: Organize findings into well-formatted reports
+- **Code Generation**: Write code when required for data processing tasks
+## Architecture
+```mermaid
+graph TD
+    A[Main] --> B[WebResearchAgent]
+    B --> C1[Memory]
+    B --> C2[Planner]
+    B --> C3[Comprehension]
+    B --> C4[ToolRegistry]
+    C2 -->|Creates| D[Plan]
+    D -->|Contains| E[PlanSteps]
+    C4 -->|Registers| F1[SearchTool]
+    C4 -->|Registers| F2[BrowserTool]
+    C4 -->|Registers| F3[CodeGeneratorTool]
+    C4 -->|Registers| F4[PresentationTool]
+    C3 -->|Provides| G1[Task Analysis]
+    C3 -->|Extracts| G2[Entities]
+    C3 -->|Generates| G3[Summaries]
+    B -->|Executes| H[Tasks]
+    H -->|Produces| I[Results]
+    style B fill:#f9f,stroke:#333,stroke-width:2px
+    style C1 fill:#bbf,stroke:#333
+    style C2 fill:#bbf,stroke:#333
+    style C3 fill:#bbf,stroke:#333
+    style C4 fill:#bbf,stroke:#333
+    style F1 fill:#bfb,stroke:#333
+    style F2 fill:#bfb,stroke:#333
+    style F3 fill:#bfb,stroke:#333
+    style F4 fill:#bfb,stroke:#333
+```
+## Installation
+### Prerequisites
+- Python 3.9 or higher
+- pip (Python package installer)
+### Setup
+1. Clone the repository:
+   ```bash
+   git clone https://github.com/yourusername/web_research_agent.git
+   cd web_research_agent
+   ```
+2. Create a virtual environment:
+   ```bash
+   python -m venv venv
+   source venv/bin/activate  # On Windows: venv\Scripts\activate
+   ```
+3. Install dependencies:
+   ```bash
+   pip install -r requirements.txt
+   ```
+## Configuration
+The agent requires API keys to function properly:
+1. **Gemini API key**: For LLM services
+2. **Serper API key**: For Google search results
+### Setting up your API keys
+#### Option 1: .env file (Recommended)
+Create a `.env` file in the project root:
+```bash
+GEMINI_API_KEY=your_gemini_api_key
+SERPER_API_KEY=your_serper_api_key
+```
+The agent will automatically load this file.
+#### Option 2: Environment Variables
+```bash
+export GEMINI_API_KEY=your_gemini_api_key
+export SERPER_API_KEY=your_serper_api_key
+```
+#### Option 3: Programmatically
+```python
+from config.config_manager import init_config
+config = init_config()
+config.update('gemini_api_key', 'your_gemini_api_key')
+config.update('serper_api_key', 'your_serper_api_key')
+```
+### Additional Configuration Options
+| Config Key | Environment Variable | Description | Default |
+|------------|---------------------|-------------|---------|
+| gemini_api_key | GEMINI_API_KEY | API key for Google's Gemini LLM | - |
+| serper_api_key | SERPER_API_KEY | API key for Serper.dev search | - |
+| log_level | LOG_LEVEL | Logging level | INFO |
+| max_search_results | MAX_SEARCH_RESULTS | Maximum number of search results | 5 |
+| memory_limit | MEMORY_LIMIT | Number of items to keep in memory | 100 |
+| output_format | OUTPUT_FORMAT | Format for output (markdown, text, html) | markdown |
+| timeout | REQUEST_TIMEOUT | Default timeout for web requests (seconds) | 30 |
+## Usage
+### Basic Usage
+1. Create a text file with your research tasks, one per line:
+   ```
+   # tasks.txt
+   Find the name of the COO of the organization that mediated secret talks between US and Chinese AI companies in Geneva in 2023.
+   By what percentage did Volkswagen reduce their Scope 1 and Scope 2 greenhouse gas emissions in 2023 compared to 2021?
+   ```
+2. Run the agent:
+   ```bash
+   python main.py tasks.txt
+   ```
+3. Results will be saved to the `results/` directory as Markdown files.
+### Command Line Options
+```bash
+python main.py tasks.txt --output custom_output_dir
+```
+| Option | Description | Default |
+|--------|-------------|---------|
+| task_file | Path to text file containing tasks | (required) |
+| --output | Directory to store results | results/ |
+## Project Structure
+- **agent/**: Core agent components
+  - **agent.py**: Main agent class
+  - **comprehension.py**: Text understanding capabilities
+  - **memory.py**: Memory management
+  - **planner.py**: Plan creation and management
+- **tools/**: Tools used by the agent
+  - **browser.py**: Web browsing tool
+  - **search.py**: Web search tool
+  - **code_generator.py**: Code generation tool
+  - **presentation_tool.py**: Information formatting
+  - **tool_registry.py**: Tool registration system
+- **utils/**: Utility functions
+  - **console_ui.py**: Console interface
+  - **formatters.py**: Output formatting
+  - **logger.py**: Logging configuration
+- **config/**: Configuration management
+- **main.py**: Entry point
+## Advanced Usage
+### Entity Extraction
+The agent can automatically identify and extract entities from content:
+- **People**: Names of individuals
+- **Organizations**: Companies, agencies, groups
+- **Roles**: Job titles and organizational positions
+- **Locations**: Physical places
+- **Dates**: Temporal references
+This feature helps the agent refine searches and identify key information.
+### Custom Output Formats
+You can customize the output format by setting the `output_format` configuration:
+```python
+from config.config_manager import init_config
+config = init_config()
+config.update('output_format', 'html')  # Options: markdown, json, html
+```
+## Troubleshooting
+### Common Issues
+1. **URL Access Errors**: Some websites block automated access. Try using a different source.
+2. **API Rate Limiting**: If you receive rate limit errors, space out your requests or use a premium API plan.
+3. **Memory Issues**: For very large research tasks, you may need to increase your system's memory allocation.
+### Error Logs
+Logs are stored in the `logs/` directory for debugging.
+## Contributing
+Contributions are welcome! Please feel free to submit a Pull Request.

web_research_agent-1.0.0/agent/__init__.py ADDED Viewed

	@@ -0,0 +1,2 @@
1	+
2	+ # Agent module initialization