web-research-agent 1.0.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (32) hide show
  1. web_research_agent-1.0.0/LICENSE +21 -0
  2. web_research_agent-1.0.0/MANIFEST.in +8 -0
  3. web_research_agent-1.0.0/PKG-INFO +259 -0
  4. web_research_agent-1.0.0/README.md +218 -0
  5. web_research_agent-1.0.0/agent/__init__.py +2 -0
  6. web_research_agent-1.0.0/agent/agent.py +409 -0
  7. web_research_agent-1.0.0/agent/comprehension.py +251 -0
  8. web_research_agent-1.0.0/agent/memory.py +195 -0
  9. web_research_agent-1.0.0/agent/planner.py +244 -0
  10. web_research_agent-1.0.0/cli.py +508 -0
  11. web_research_agent-1.0.0/config/__init__.py +3 -0
  12. web_research_agent-1.0.0/config/config.py +80 -0
  13. web_research_agent-1.0.0/config/config_manager.py +204 -0
  14. web_research_agent-1.0.0/pyproject.toml +3 -0
  15. web_research_agent-1.0.0/requirements.txt +0 -0
  16. web_research_agent-1.0.0/setup.cfg +4 -0
  17. web_research_agent-1.0.0/setup.py +47 -0
  18. web_research_agent-1.0.0/tools/__init__.py +2 -0
  19. web_research_agent-1.0.0/tools/browser.py +251 -0
  20. web_research_agent-1.0.0/tools/code_generator.py +177 -0
  21. web_research_agent-1.0.0/tools/presentation_tool.py +364 -0
  22. web_research_agent-1.0.0/tools/search.py +133 -0
  23. web_research_agent-1.0.0/tools/tool_registry.py +95 -0
  24. web_research_agent-1.0.0/utils/console_ui.py +163 -0
  25. web_research_agent-1.0.0/utils/formatters.py +201 -0
  26. web_research_agent-1.0.0/utils/logger.py +88 -0
  27. web_research_agent-1.0.0/web_research_agent.egg-info/PKG-INFO +259 -0
  28. web_research_agent-1.0.0/web_research_agent.egg-info/SOURCES.txt +30 -0
  29. web_research_agent-1.0.0/web_research_agent.egg-info/dependency_links.txt +1 -0
  30. web_research_agent-1.0.0/web_research_agent.egg-info/entry_points.txt +2 -0
  31. web_research_agent-1.0.0/web_research_agent.egg-info/requires.txt +8 -0
  32. web_research_agent-1.0.0/web_research_agent.egg-info/top_level.txt +4 -0
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2025 Victor Jotham Ashioya
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,8 @@
1
+
2
+ include README.md
3
+ include LICENSE
4
+ include requirements.txt
5
+ recursive-include agent *.py
6
+ recursive-include tools *.py
7
+ recursive-include utils *.py
8
+ recursive-include config *.py
@@ -0,0 +1,259 @@
1
+ Metadata-Version: 2.2
2
+ Name: web-research-agent
3
+ Version: 1.0.0
4
+ Summary: An intelligent AI agent for web-based research tasks
5
+ Home-page: https://github.com/ashioyajotham/web-research-agent
6
+ Author: Victor Jotham Ashioya
7
+ Author-email: victorashioya960@gmail.com
8
+ Project-URL: Bug Tracker, https://github.com/ashioyajotham/web-research-agent/issues
9
+ Project-URL: Documentation, https://github.com/ashioyajotham/web-research-agent#readme
10
+ Project-URL: Source Code, https://github.com/ashioyajotham/web-research-agent
11
+ Keywords: ai,research,web,agent,llm,search
12
+ Classifier: Programming Language :: Python :: 3
13
+ Classifier: License :: OSI Approved :: MIT License
14
+ Classifier: Operating System :: OS Independent
15
+ Classifier: Development Status :: 4 - Beta
16
+ Classifier: Intended Audience :: Developers
17
+ Classifier: Intended Audience :: Science/Research
18
+ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
19
+ Requires-Python: >=3.9
20
+ Description-Content-Type: text/markdown
21
+ License-File: LICENSE
22
+ Requires-Dist: click>=8.0.0
23
+ Requires-Dist: requests>=2.25.0
24
+ Requires-Dist: beautifulsoup4>=4.9.0
25
+ Requires-Dist: html2text>=2020.1.16
26
+ Requires-Dist: google-generativeai>=0.3.0
27
+ Requires-Dist: python-dotenv>=0.19.0
28
+ Requires-Dist: prompt_toolkit>=3.0.0
29
+ Requires-Dist: rich>=10.0.0
30
+ Dynamic: author
31
+ Dynamic: author-email
32
+ Dynamic: classifier
33
+ Dynamic: description
34
+ Dynamic: description-content-type
35
+ Dynamic: home-page
36
+ Dynamic: keywords
37
+ Dynamic: project-url
38
+ Dynamic: requires-dist
39
+ Dynamic: requires-python
40
+ Dynamic: summary
41
+
42
+ # Web Research Agent
43
+
44
+ An intelligent AI agent that can research complex topics by browsing the web, extracting relevant information, recognizing entities, and generating structured reports. The agent leverages a modern web browser, Google search, and AI language models to provide comprehensive answers to research questions.
45
+
46
+ ## Features
47
+
48
+ - **Automated Web Research**: Search the web and browse pages to find information
49
+ - **Entity Recognition**: Automatically identify people, organizations, roles, and other entities
50
+ - **Adaptive Search**: Refine searches based on previously discovered information
51
+ - **Information Synthesis**: Combine information from multiple sources
52
+ - **Task Analysis**: Automatically determine the best approach to research tasks
53
+ - **Structured Output**: Organize findings into well-formatted reports
54
+ - **Code Generation**: Write code when required for data processing tasks
55
+
56
+ ## Architecture
57
+
58
+ ```mermaid
59
+ graph TD
60
+ A[Main] --> B[WebResearchAgent]
61
+ B --> C1[Memory]
62
+ B --> C2[Planner]
63
+ B --> C3[Comprehension]
64
+ B --> C4[ToolRegistry]
65
+
66
+ C2 -->|Creates| D[Plan]
67
+ D -->|Contains| E[PlanSteps]
68
+
69
+ C4 -->|Registers| F1[SearchTool]
70
+ C4 -->|Registers| F2[BrowserTool]
71
+ C4 -->|Registers| F3[CodeGeneratorTool]
72
+ C4 -->|Registers| F4[PresentationTool]
73
+
74
+ C3 -->|Provides| G1[Task Analysis]
75
+ C3 -->|Extracts| G2[Entities]
76
+ C3 -->|Generates| G3[Summaries]
77
+
78
+ B -->|Executes| H[Tasks]
79
+ H -->|Produces| I[Results]
80
+
81
+ style B fill:#f9f,stroke:#333,stroke-width:2px
82
+ style C1 fill:#bbf,stroke:#333
83
+ style C2 fill:#bbf,stroke:#333
84
+ style C3 fill:#bbf,stroke:#333
85
+ style C4 fill:#bbf,stroke:#333
86
+ style F1 fill:#bfb,stroke:#333
87
+ style F2 fill:#bfb,stroke:#333
88
+ style F3 fill:#bfb,stroke:#333
89
+ style F4 fill:#bfb,stroke:#333
90
+ ```
91
+
92
+ ## Installation
93
+
94
+ ### Prerequisites
95
+
96
+ - Python 3.9 or higher
97
+ - pip (Python package installer)
98
+
99
+ ### Setup
100
+
101
+ 1. Clone the repository:
102
+ ```bash
103
+ git clone https://github.com/yourusername/web_research_agent.git
104
+ cd web_research_agent
105
+ ```
106
+
107
+ 2. Create a virtual environment:
108
+ ```bash
109
+ python -m venv venv
110
+ source venv/bin/activate # On Windows: venv\Scripts\activate
111
+ ```
112
+
113
+ 3. Install dependencies:
114
+ ```bash
115
+ pip install -r requirements.txt
116
+ ```
117
+
118
+ ## Configuration
119
+
120
+ The agent requires API keys to function properly:
121
+
122
+ 1. **Gemini API key**: For LLM services
123
+ 2. **Serper API key**: For Google search results
124
+
125
+ ### Setting up your API keys
126
+
127
+ #### Option 1: .env file (Recommended)
128
+
129
+ Create a `.env` file in the project root:
130
+
131
+ ```bash
132
+ GEMINI_API_KEY=your_gemini_api_key
133
+ SERPER_API_KEY=your_serper_api_key
134
+ ```
135
+
136
+ The agent will automatically load this file.
137
+
138
+ #### Option 2: Environment Variables
139
+
140
+ ```bash
141
+ export GEMINI_API_KEY=your_gemini_api_key
142
+ export SERPER_API_KEY=your_serper_api_key
143
+ ```
144
+
145
+ #### Option 3: Programmatically
146
+
147
+ ```python
148
+ from config.config_manager import init_config
149
+
150
+ config = init_config()
151
+ config.update('gemini_api_key', 'your_gemini_api_key')
152
+ config.update('serper_api_key', 'your_serper_api_key')
153
+ ```
154
+
155
+ ### Additional Configuration Options
156
+
157
+ | Config Key | Environment Variable | Description | Default |
158
+ |------------|---------------------|-------------|---------|
159
+ | gemini_api_key | GEMINI_API_KEY | API key for Google's Gemini LLM | - |
160
+ | serper_api_key | SERPER_API_KEY | API key for Serper.dev search | - |
161
+ | log_level | LOG_LEVEL | Logging level | INFO |
162
+ | max_search_results | MAX_SEARCH_RESULTS | Maximum number of search results | 5 |
163
+ | memory_limit | MEMORY_LIMIT | Number of items to keep in memory | 100 |
164
+ | output_format | OUTPUT_FORMAT | Format for output (markdown, text, html) | markdown |
165
+ | timeout | REQUEST_TIMEOUT | Default timeout for web requests (seconds) | 30 |
166
+
167
+ ## Usage
168
+
169
+ ### Basic Usage
170
+
171
+ 1. Create a text file with your research tasks, one per line:
172
+ ```
173
+ # tasks.txt
174
+ Find the name of the COO of the organization that mediated secret talks between US and Chinese AI companies in Geneva in 2023.
175
+ By what percentage did Volkswagen reduce their Scope 1 and Scope 2 greenhouse gas emissions in 2023 compared to 2021?
176
+ ```
177
+
178
+ 2. Run the agent:
179
+ ```bash
180
+ python main.py tasks.txt
181
+ ```
182
+
183
+ 3. Results will be saved to the `results/` directory as Markdown files.
184
+
185
+ ### Command Line Options
186
+
187
+ ```bash
188
+ python main.py tasks.txt --output custom_output_dir
189
+ ```
190
+
191
+ | Option | Description | Default |
192
+ |--------|-------------|---------|
193
+ | task_file | Path to text file containing tasks | (required) |
194
+ | --output | Directory to store results | results/ |
195
+
196
+ ## Project Structure
197
+
198
+ - **agent/**: Core agent components
199
+ - **agent.py**: Main agent class
200
+ - **comprehension.py**: Text understanding capabilities
201
+ - **memory.py**: Memory management
202
+ - **planner.py**: Plan creation and management
203
+
204
+ - **tools/**: Tools used by the agent
205
+ - **browser.py**: Web browsing tool
206
+ - **search.py**: Web search tool
207
+ - **code_generator.py**: Code generation tool
208
+ - **presentation_tool.py**: Information formatting
209
+ - **tool_registry.py**: Tool registration system
210
+
211
+ - **utils/**: Utility functions
212
+ - **console_ui.py**: Console interface
213
+ - **formatters.py**: Output formatting
214
+ - **logger.py**: Logging configuration
215
+
216
+ - **config/**: Configuration management
217
+
218
+ - **main.py**: Entry point
219
+
220
+ ## Advanced Usage
221
+
222
+ ### Entity Extraction
223
+
224
+ The agent can automatically identify and extract entities from content:
225
+
226
+ - **People**: Names of individuals
227
+ - **Organizations**: Companies, agencies, groups
228
+ - **Roles**: Job titles and organizational positions
229
+ - **Locations**: Physical places
230
+ - **Dates**: Temporal references
231
+
232
+ This feature helps the agent refine searches and identify key information.
233
+
234
+ ### Custom Output Formats
235
+
236
+ You can customize the output format by setting the `output_format` configuration:
237
+
238
+ ```python
239
+ from config.config_manager import init_config
240
+
241
+ config = init_config()
242
+ config.update('output_format', 'html') # Options: markdown, json, html
243
+ ```
244
+
245
+ ## Troubleshooting
246
+
247
+ ### Common Issues
248
+
249
+ 1. **URL Access Errors**: Some websites block automated access. Try using a different source.
250
+ 2. **API Rate Limiting**: If you receive rate limit errors, space out your requests or use a premium API plan.
251
+ 3. **Memory Issues**: For very large research tasks, you may need to increase your system's memory allocation.
252
+
253
+ ### Error Logs
254
+
255
+ Logs are stored in the `logs/` directory for debugging.
256
+
257
+ ## Contributing
258
+
259
+ Contributions are welcome! Please feel free to submit a Pull Request.
@@ -0,0 +1,218 @@
1
+ # Web Research Agent
2
+
3
+ An intelligent AI agent that can research complex topics by browsing the web, extracting relevant information, recognizing entities, and generating structured reports. The agent leverages a modern web browser, Google search, and AI language models to provide comprehensive answers to research questions.
4
+
5
+ ## Features
6
+
7
+ - **Automated Web Research**: Search the web and browse pages to find information
8
+ - **Entity Recognition**: Automatically identify people, organizations, roles, and other entities
9
+ - **Adaptive Search**: Refine searches based on previously discovered information
10
+ - **Information Synthesis**: Combine information from multiple sources
11
+ - **Task Analysis**: Automatically determine the best approach to research tasks
12
+ - **Structured Output**: Organize findings into well-formatted reports
13
+ - **Code Generation**: Write code when required for data processing tasks
14
+
15
+ ## Architecture
16
+
17
+ ```mermaid
18
+ graph TD
19
+ A[Main] --> B[WebResearchAgent]
20
+ B --> C1[Memory]
21
+ B --> C2[Planner]
22
+ B --> C3[Comprehension]
23
+ B --> C4[ToolRegistry]
24
+
25
+ C2 -->|Creates| D[Plan]
26
+ D -->|Contains| E[PlanSteps]
27
+
28
+ C4 -->|Registers| F1[SearchTool]
29
+ C4 -->|Registers| F2[BrowserTool]
30
+ C4 -->|Registers| F3[CodeGeneratorTool]
31
+ C4 -->|Registers| F4[PresentationTool]
32
+
33
+ C3 -->|Provides| G1[Task Analysis]
34
+ C3 -->|Extracts| G2[Entities]
35
+ C3 -->|Generates| G3[Summaries]
36
+
37
+ B -->|Executes| H[Tasks]
38
+ H -->|Produces| I[Results]
39
+
40
+ style B fill:#f9f,stroke:#333,stroke-width:2px
41
+ style C1 fill:#bbf,stroke:#333
42
+ style C2 fill:#bbf,stroke:#333
43
+ style C3 fill:#bbf,stroke:#333
44
+ style C4 fill:#bbf,stroke:#333
45
+ style F1 fill:#bfb,stroke:#333
46
+ style F2 fill:#bfb,stroke:#333
47
+ style F3 fill:#bfb,stroke:#333
48
+ style F4 fill:#bfb,stroke:#333
49
+ ```
50
+
51
+ ## Installation
52
+
53
+ ### Prerequisites
54
+
55
+ - Python 3.9 or higher
56
+ - pip (Python package installer)
57
+
58
+ ### Setup
59
+
60
+ 1. Clone the repository:
61
+ ```bash
62
+ git clone https://github.com/yourusername/web_research_agent.git
63
+ cd web_research_agent
64
+ ```
65
+
66
+ 2. Create a virtual environment:
67
+ ```bash
68
+ python -m venv venv
69
+ source venv/bin/activate # On Windows: venv\Scripts\activate
70
+ ```
71
+
72
+ 3. Install dependencies:
73
+ ```bash
74
+ pip install -r requirements.txt
75
+ ```
76
+
77
+ ## Configuration
78
+
79
+ The agent requires API keys to function properly:
80
+
81
+ 1. **Gemini API key**: For LLM services
82
+ 2. **Serper API key**: For Google search results
83
+
84
+ ### Setting up your API keys
85
+
86
+ #### Option 1: .env file (Recommended)
87
+
88
+ Create a `.env` file in the project root:
89
+
90
+ ```bash
91
+ GEMINI_API_KEY=your_gemini_api_key
92
+ SERPER_API_KEY=your_serper_api_key
93
+ ```
94
+
95
+ The agent will automatically load this file.
96
+
97
+ #### Option 2: Environment Variables
98
+
99
+ ```bash
100
+ export GEMINI_API_KEY=your_gemini_api_key
101
+ export SERPER_API_KEY=your_serper_api_key
102
+ ```
103
+
104
+ #### Option 3: Programmatically
105
+
106
+ ```python
107
+ from config.config_manager import init_config
108
+
109
+ config = init_config()
110
+ config.update('gemini_api_key', 'your_gemini_api_key')
111
+ config.update('serper_api_key', 'your_serper_api_key')
112
+ ```
113
+
114
+ ### Additional Configuration Options
115
+
116
+ | Config Key | Environment Variable | Description | Default |
117
+ |------------|---------------------|-------------|---------|
118
+ | gemini_api_key | GEMINI_API_KEY | API key for Google's Gemini LLM | - |
119
+ | serper_api_key | SERPER_API_KEY | API key for Serper.dev search | - |
120
+ | log_level | LOG_LEVEL | Logging level | INFO |
121
+ | max_search_results | MAX_SEARCH_RESULTS | Maximum number of search results | 5 |
122
+ | memory_limit | MEMORY_LIMIT | Number of items to keep in memory | 100 |
123
+ | output_format | OUTPUT_FORMAT | Format for output (markdown, text, html) | markdown |
124
+ | timeout | REQUEST_TIMEOUT | Default timeout for web requests (seconds) | 30 |
125
+
126
+ ## Usage
127
+
128
+ ### Basic Usage
129
+
130
+ 1. Create a text file with your research tasks, one per line:
131
+ ```
132
+ # tasks.txt
133
+ Find the name of the COO of the organization that mediated secret talks between US and Chinese AI companies in Geneva in 2023.
134
+ By what percentage did Volkswagen reduce their Scope 1 and Scope 2 greenhouse gas emissions in 2023 compared to 2021?
135
+ ```
136
+
137
+ 2. Run the agent:
138
+ ```bash
139
+ python main.py tasks.txt
140
+ ```
141
+
142
+ 3. Results will be saved to the `results/` directory as Markdown files.
143
+
144
+ ### Command Line Options
145
+
146
+ ```bash
147
+ python main.py tasks.txt --output custom_output_dir
148
+ ```
149
+
150
+ | Option | Description | Default |
151
+ |--------|-------------|---------|
152
+ | task_file | Path to text file containing tasks | (required) |
153
+ | --output | Directory to store results | results/ |
154
+
155
+ ## Project Structure
156
+
157
+ - **agent/**: Core agent components
158
+ - **agent.py**: Main agent class
159
+ - **comprehension.py**: Text understanding capabilities
160
+ - **memory.py**: Memory management
161
+ - **planner.py**: Plan creation and management
162
+
163
+ - **tools/**: Tools used by the agent
164
+ - **browser.py**: Web browsing tool
165
+ - **search.py**: Web search tool
166
+ - **code_generator.py**: Code generation tool
167
+ - **presentation_tool.py**: Information formatting
168
+ - **tool_registry.py**: Tool registration system
169
+
170
+ - **utils/**: Utility functions
171
+ - **console_ui.py**: Console interface
172
+ - **formatters.py**: Output formatting
173
+ - **logger.py**: Logging configuration
174
+
175
+ - **config/**: Configuration management
176
+
177
+ - **main.py**: Entry point
178
+
179
+ ## Advanced Usage
180
+
181
+ ### Entity Extraction
182
+
183
+ The agent can automatically identify and extract entities from content:
184
+
185
+ - **People**: Names of individuals
186
+ - **Organizations**: Companies, agencies, groups
187
+ - **Roles**: Job titles and organizational positions
188
+ - **Locations**: Physical places
189
+ - **Dates**: Temporal references
190
+
191
+ This feature helps the agent refine searches and identify key information.
192
+
193
+ ### Custom Output Formats
194
+
195
+ You can customize the output format by setting the `output_format` configuration:
196
+
197
+ ```python
198
+ from config.config_manager import init_config
199
+
200
+ config = init_config()
201
+ config.update('output_format', 'html') # Options: markdown, json, html
202
+ ```
203
+
204
+ ## Troubleshooting
205
+
206
+ ### Common Issues
207
+
208
+ 1. **URL Access Errors**: Some websites block automated access. Try using a different source.
209
+ 2. **API Rate Limiting**: If you receive rate limit errors, space out your requests or use a premium API plan.
210
+ 3. **Memory Issues**: For very large research tasks, you may need to increase your system's memory allocation.
211
+
212
+ ### Error Logs
213
+
214
+ Logs are stored in the `logs/` directory for debugging.
215
+
216
+ ## Contributing
217
+
218
+ Contributions are welcome! Please feel free to submit a Pull Request.
@@ -0,0 +1,2 @@
1
+
2
+ # Agent module initialization