local-deep-research 0.3.0__py3-none-any.whl → 0.3.2__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,549 +0,0 @@
1
- Metadata-Version: 2.1
2
- Name: local-deep-research
3
- Version: 0.3.0
4
- Summary: AI-powered research assistant with deep, iterative analysis using LLMs and web searches
5
- Author-Email: LearningCircuit <185559241+LearningCircuit@users.noreply.github.com>, HashedViking <6432677+HashedViking@users.noreply.github.com>
6
- License: MIT License
7
-
8
- Copyright (c) 2025 LearningCircuit
9
-
10
- Permission is hereby granted, free of charge, to any person obtaining a copy
11
- of this software and associated documentation files (the "Software"), to deal
12
- in the Software without restriction, including without limitation the rights
13
- to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
14
- copies of the Software, and to permit persons to whom the Software is
15
- furnished to do so, subject to the following conditions:
16
-
17
- The above copyright notice and this permission notice shall be included in all
18
- copies or substantial portions of the Software.
19
-
20
- THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
21
- IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
22
- FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
23
- AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
24
- LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
25
- OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
26
- SOFTWARE.
27
-
28
- Classifier: Programming Language :: Python :: 3
29
- Classifier: License :: OSI Approved :: MIT License
30
- Classifier: Operating System :: OS Independent
31
- Project-URL: Homepage, https://github.com/LearningCircuit/local-deep-research
32
- Project-URL: Bug Tracker, https://github.com/LearningCircuit/local-deep-research/issues
33
- Requires-Python: >=3.10
34
- Requires-Dist: langchain>=0.3.18
35
- Requires-Dist: langchain-community>=0.3.17
36
- Requires-Dist: langchain-core>=0.3.34
37
- Requires-Dist: langchain-ollama>=0.2.3
38
- Requires-Dist: langchain-openai>=0.3.5
39
- Requires-Dist: langchain_anthropic>=0.3.7
40
- Requires-Dist: duckduckgo_search>=7.3.2
41
- Requires-Dist: python-dateutil>=2.9.0
42
- Requires-Dist: typing_extensions>=4.12.2
43
- Requires-Dist: justext
44
- Requires-Dist: playwright
45
- Requires-Dist: beautifulsoup4
46
- Requires-Dist: flask>=3.1.0
47
- Requires-Dist: flask-cors>=3.0.10
48
- Requires-Dist: flask-socketio>=5.1.1
49
- Requires-Dist: sqlalchemy>=1.4.23
50
- Requires-Dist: wikipedia
51
- Requires-Dist: arxiv>=1.4.3
52
- Requires-Dist: pypdf
53
- Requires-Dist: sentence-transformers
54
- Requires-Dist: faiss-cpu
55
- Requires-Dist: pydantic>=2.0.0
56
- Requires-Dist: pydantic-settings>=2.0.0
57
- Requires-Dist: toml>=0.10.2
58
- Requires-Dist: platformdirs>=3.0.0
59
- Requires-Dist: dynaconf
60
- Requires-Dist: requests>=2.28.0
61
- Requires-Dist: tiktoken>=0.4.0
62
- Requires-Dist: xmltodict>=0.13.0
63
- Requires-Dist: lxml>=4.9.2
64
- Requires-Dist: pdfplumber>=0.9.0
65
- Requires-Dist: unstructured>=0.10.0
66
- Requires-Dist: google-search-results
67
- Requires-Dist: importlib-resources>=6.5.2
68
- Requires-Dist: setuptools>=78.1.0
69
- Requires-Dist: flask-wtf>=1.2.2
70
- Description-Content-Type: text/markdown
71
-
72
- # Local Deep Research
73
-
74
- ## Features
75
-
76
- - 🔍 **Advanced Research Capabilities**
77
- - Automated deep research with intelligent follow-up questions
78
- - Proper inline citation and source verification
79
- - Multi-iteration analysis for comprehensive coverage
80
- - Full webpage content analysis (not just snippets)
81
-
82
- - 🤖 **Flexible LLM Support**
83
- - Local AI processing with Ollama models
84
- - Cloud LLM support (Claude, GPT)
85
- - Supports all Langchain models
86
- - Configurable model selection based on needs
87
-
88
- - 📊 **Rich Output Options**
89
- - Detailed research findings with proper citations
90
- - Well-structured comprehensive research reports
91
- - Quick summaries for rapid insights
92
- - Source tracking and verification
93
-
94
- - 🔒 **Privacy-Focused**
95
- - Runs entirely on your machine when using local models
96
- - Configurable search settings
97
- - Transparent data handling
98
-
99
- - 🌐 **Enhanced Search Integration**
100
- - **Auto-selection of search sources**: The "auto" search engine intelligently analyzes your query and selects the most appropriate search engine
101
- - Multiple search engines including Wikipedia, arXiv, PubMed, Semantic Scholar, and more
102
- - **Local RAG search for private documents** - search your own documents with vector embeddings
103
- - Full webpage content retrieval and intelligent filtering
104
-
105
- - 🎓 **Academic & Scientific Integration**
106
- - Direct integration with PubMed, arXiv, Wikipedia, Semantic Scholar
107
- - Properly formatted citations from academic sources
108
- - Report structure suitable for literature reviews
109
- - Cross-disciplinary synthesis of information
110
-
111
- | [Reddit](https://www.reddit.com/r/LocalDeepResearch/) | [Discord](https://discord.gg/ttcqQeFcJ3) |
112
-
113
- A powerful AI-powered research assistant that performs deep, iterative analysis using multiple LLMs and web searches. The system can be run locally for privacy or configured to use cloud-based LLMs for enhanced capabilities.
114
-
115
- <div align="center">
116
- <a href="https://www.youtube.com/watch?v=0ISreg9q0p0">
117
- <img src="https://img.youtube.com/vi/0ISreg9q0p0/0.jpg" alt="Local Deep Research">
118
- <br>
119
- <span>▶️ Watch Video</span>
120
- </a>
121
- </div>
122
-
123
- **Important for non-academic searches:** For normal web searches you will need SearXNG or an API key to a search provider like brave search or SerpAPI. The free searches are mostly academic search engines and will not help you for most normal searches.
124
-
125
- ## Quick SearXNG Setup (Recommended)
126
-
127
- ```bash
128
- # Pull the SearXNG Docker image
129
- docker pull searxng/searxng
130
-
131
- # Run SearXNG (will be available at http://localhost:8080)
132
- docker run -d -p 8080:8080 --name searxng searxng/searxng
133
-
134
- # Start SearXNG (Required after system restart)
135
- docker start searxng
136
- ```
137
-
138
- Once these commands are executed, SearXNG will be automatically activated and ready to use. The tool will automatically detect and use your local SearXNG instance for searches.
139
-
140
- ## Windows Installation
141
-
142
- Download the [Windows Installer](https://github.com/LearningCircuit/local-deep-research/releases/download/v0.1.0/LocalDeepResearch_Setup.exe) for easy one-click installation.
143
-
144
- **Requires Ollama (or other model provider configured in .env).**
145
- Download from https://ollama.ai and then pull a model
146
- ollama pull gemma3:12b
147
-
148
- ## Quick Start (not required if installed with windows installer)
149
-
150
- ```bash
151
- # Install the package
152
- pip install local-deep-research
153
-
154
- # Install required browser automation tools
155
- playwright install
156
-
157
- # For local models, install Ollama
158
- # Download from https://ollama.ai and then pull a model
159
- ollama pull gemma3:12b
160
- ```
161
-
162
- Then run:
163
-
164
- ```bash
165
- # Start the web interface (recommended)
166
- ldr-web # (OR python -m local_deep_research.web.app)
167
-
168
- # OR run the command line version
169
- ldr # (OR python -m local_deep_research.main)
170
- ```
171
-
172
- Access the web interface at `http://127.0.0.1:5000` in your browser.
173
-
174
- ## Docker Support
175
-
176
- Build the image first if you haven't already
177
- ```bash
178
- docker build -t local-deep-research .
179
- ```
180
-
181
- Quick Docker Run
182
-
183
- ```bash
184
- # Run with default settings (connects to Ollama running on the host)
185
- docker run --network=host \
186
- -e LDR_LLM__PROVIDER="ollama" \
187
- -e LDR_LLM__MODEL="mistral" \
188
- local-deep-research
189
- ```
190
-
191
- For comprehensive Docker setup information, see:
192
- - [Docker Usage Guide](https://github.com/LearningCircuit/local-deep-research/blob/main/docs/docker-usage-readme.md)
193
- - [Docker Compose Guide](https://github.com/LearningCircuit/local-deep-research/blob/main/docs/docker-compose-guide.md)
194
-
195
- ## Migrating from Version 0.1.0
196
-
197
- If you just upgraded from 0.1.0, and you want to preserve your configuration,
198
- you will need to manually migrate from the old config files to the new settings
199
- database:
200
-
201
- ```bash
202
- python -m local_deep_research.migrate_db
203
- ```
204
-
205
- ## Programmatic Access
206
-
207
- Local Deep Research now provides a simple API for programmatic access to its research capabilities:
208
-
209
- ```python
210
- import os
211
- # Set environment variables to control the LLM
212
- os.environ["LDR_LLM__MODEL"] = "mistral" # Specify model name
213
-
214
- from local_deep_research import quick_summary, generate_report
215
-
216
- # Generate a quick research summary with custom parameters
217
- results = quick_summary(
218
- query="advances in fusion energy",
219
- search_tool="auto", # Auto-select the best search engine
220
- iterations=1, # Single research cycle for speed
221
- questions_per_iteration=2, # Generate 2 follow-up questions
222
- max_results=30, # Consider up to 30 search results
223
- temperature=0.7 # Control creativity of generation
224
- )
225
- print(results["summary"])
226
- ```
227
-
228
- These functions provide flexible options for customizing the search parameters, iterations, and output formats. For more examples, see the [programmatic access tutorial](https://github.com/LearningCircuit/local-deep-research/blob/main/examples/programmatic_access.ipynb).
229
-
230
- ## Configuration System
231
-
232
- The package automatically creates and manages configuration files in your user directory:
233
-
234
- - **Windows**: `Documents\LearningCircuit\local-deep-research\config\`
235
- - **Linux/Mac**: `~/.config/local_deep_research/config/`
236
-
237
- ### Default Configuration Files
238
-
239
- When you first run the tool, it creates these configuration files:
240
-
241
- | File | Purpose |
242
- |------|---------|
243
- | `settings.toml` | General settings for research, web interface, and search |
244
- | `search_engines.toml` | Define and configure search engines |
245
- | `local_collections.toml` | Configure local document collections for RAG |
246
- | `.env` | Environment variables for configuration (recommended for API keys) |
247
-
248
- > **Note:** For comprehensive environment variable configuration, see our [Environment Variables Guide](https://github.com/LearningCircuit/local-deep-research/blob/main/docs/env_configuration.md).
249
-
250
- ## Setting Up AI Models
251
-
252
- The system supports multiple LLM providers:
253
-
254
- ### Local Models (via Ollama)
255
-
256
- 1. [Install Ollama](https://ollama.ai)
257
- 2. Pull a model: `ollama pull gemma3:12b` (recommended model)
258
- 3. Ollama runs on port 11434 by default
259
-
260
- ### Cloud Models
261
-
262
- Add API keys to your environment variables (recommended) by creating a `.env` file in your config directory:
263
-
264
- ```bash
265
- # Set API keys for cloud providers in .env
266
- ANTHROPIC_API_KEY=your-api-key-here # For Claude models
267
- OPENAI_API_KEY=your-openai-key-here # For GPT models
268
- OPENAI_ENDPOINT_API_KEY=your-key-here # For OpenRouter or similar services
269
-
270
- # Set your preferred LLM provider and model
271
- LDR_LLM__PROVIDER=ollama # Options: ollama, openai, anthropic, etc.
272
- LDR_LLM__MODEL=gemma3:12b # Model name to use
273
- ```
274
-
275
- ### Supported LLM Providers
276
-
277
- The system supports multiple LLM providers:
278
-
279
- | Provider | Type | API Key | Setup Details | Models |
280
- |----------|------|---------|---------------|--------|
281
- | `OLLAMA` | Local | No | Install from [ollama.ai](https://ollama.ai) | Mistral, Llama, Gemma, etc. |
282
- | `OPENAI` | Cloud | `OPENAI_API_KEY` | Set in environment | GPT-3.5, GPT-4, GPT-4o |
283
- | `ANTHROPIC` | Cloud | `ANTHROPIC_API_KEY` | Set in environment | Claude 3 Opus, Sonnet, Haiku |
284
- | `OPENAI_ENDPOINT` | Cloud | `OPENAI_ENDPOINT_API_KEY` | Set in environment | Any OpenAI-compatible model |
285
- | `VLLM` | Local | No | Requires GPU setup | Any supported by vLLM |
286
- | `LMSTUDIO` | Local | No | Use LM Studio server | Models from LM Studio |
287
- | `LLAMACPP` | Local | No | Configure model path | GGUF model formats |
288
-
289
- The `OPENAI_ENDPOINT` provider can access any service with an OpenAI-compatible API, including:
290
- - OpenRouter (access to hundreds of models)
291
- - Azure OpenAI
292
- - Together.ai
293
- - Groq
294
- - Anyscale
295
- - Self-hosted LLM servers with OpenAI compatibility
296
-
297
- ## Setting Up Search Engines
298
-
299
- Some search engines require API keys. Add them to your environment variables by creating a `.env` file in your config directory:
300
-
301
- ```bash
302
- # Search engine API keys (add to .env file)
303
- SERP_API_KEY=your-serpapi-key-here # For Google results via SerpAPI
304
- GOOGLE_PSE_API_KEY=your-google-key-here # For Google Programmable Search
305
- GOOGLE_PSE_ENGINE_ID=your-pse-id-here # For Google Programmable Search
306
- BRAVE_API_KEY=your-brave-search-key-here # For Brave Search
307
- GUARDIAN_API_KEY=your-guardian-key-here # For The Guardian
308
-
309
- # Set your preferred search tool
310
- LDR_SEARCH__TOOL=auto # Default: intelligently selects best engine
311
- ```
312
-
313
- > **Tip:** To override other settings via environment variables (e.g., to change the web port), use: **LDR_WEB__PORT=8080**
314
-
315
- ### Available Search Engines
316
-
317
- | Engine | Purpose | API Key Required? | Rate Limit |
318
- |--------|---------|-------------------|------------|
319
- | `auto` | Intelligently selects the best engine | No | Based on selected engine |
320
- | `wikipedia` | General knowledge and facts | No | No strict limit |
321
- | `arxiv` | Scientific papers and research | No | No strict limit |
322
- | `pubmed` | Medical and biomedical research | No | No strict limit |
323
- | `semantic_scholar` | Academic literature across all fields | No | 100/5min |
324
- | `github` | Code repositories and documentation | No | 60/hour (unauthenticated) |
325
- | `brave` | Web search (privacy-focused) | Yes | Based on plan |
326
- | `serpapi` | Google search results | Yes | Based on plan |
327
- | `google_pse` | Custom Google search | Yes | 100/day free tier |
328
- | `wayback` | Historical web content | No | No strict limit |
329
- | `searxng` | Local web search engine | No (requires local server) | No limit |
330
- | Any collection name | Search your local documents | No | No limit |
331
-
332
- > **Note:** For detailed SearXNG setup, see our [SearXNG Setup Guide](https://github.com/LearningCircuit/local-deep-research/blob/main/docs/SearXNG-Setup.md).
333
-
334
- ## Local Document Search (RAG)
335
-
336
- The system can search through your local documents using vector embeddings.
337
-
338
- ### Setting Up Document Collections
339
-
340
- 1. Define collections in `local_collections.toml`. Default collections include:
341
-
342
- ```toml
343
- [project_docs]
344
- name = "Project Documents"
345
- description = "Project documentation and specifications"
346
- paths = ["@format ${DOCS_DIR}/project_documents"]
347
- enabled = true
348
- embedding_model = "all-MiniLM-L6-v2"
349
- embedding_device = "cpu"
350
- embedding_model_type = "sentence_transformers"
351
- max_results = 20
352
- max_filtered_results = 5
353
- chunk_size = 1000
354
- chunk_overlap = 200
355
- cache_dir = "__CACHE_DIR__/local_search/project_docs"
356
- ```
357
-
358
- 2. Create your document directories:
359
- - The `${DOCS_DIR}` variable points to a default location in your Documents folder
360
- - Documents are automatically indexed when the search is first used
361
-
362
- ### Using Local Search
363
-
364
- You can use local document search in several ways:
365
-
366
- 1. **Auto-selection**: Set `tool = "auto"` in `settings.toml` [search] section
367
- 2. **Explicit collection**: Set `tool = "project_docs"` to search only that collection
368
- 3. **All collections**: Set `tool = "local_all"` to search across all collections
369
- 4. **Query syntax**: Type `collection:project_docs your query` to target a specific collection
370
-
371
-
372
- ## Advanced Configuration
373
-
374
- ### Research Parameters
375
-
376
- Edit `settings.toml` to customize research parameters or use environment variables:
377
-
378
- ```toml
379
- [search]
380
- # Search tool to use (auto, wikipedia, arxiv, etc.)
381
- tool = "auto"
382
-
383
- # Number of research cycles
384
- iterations = 2
385
-
386
- # Questions generated per cycle
387
- questions_per_iteration = 2
388
-
389
- # Results per search query
390
- max_results = 50
391
-
392
- # Results after relevance filtering
393
- max_filtered_results = 5
394
- ```
395
-
396
- Using environment variables:
397
- ```bash
398
- LDR_SEARCH__TOOL=auto
399
- LDR_SEARCH__ITERATIONS=3
400
- LDR_SEARCH__QUESTIONS_PER_ITERATION=2
401
- ```
402
-
403
- ## Web Interface
404
-
405
- The web interface offers several features:
406
-
407
- - **Dashboard**: Start and manage research queries
408
- - **Real-time Updates**: Track research progress with improved logging
409
- - **Research History**: Access past queries
410
- - **PDF Export**: Download reports
411
- - **Research Management**: Terminate processes or delete records
412
- - **Enhanced Settings Panel**: New unified settings UI with improved organization
413
-
414
- ## Command Line Interface
415
-
416
- The CLI version allows you to:
417
-
418
- 1. Choose between a quick summary or detailed report
419
- 2. Enter your research query
420
- 3. View results directly in the terminal
421
- 4. Save reports automatically to the configured output directory
422
-
423
- ## Development Environment
424
-
425
- This project now uses PDM for dependency management. To set up a development environment:
426
-
427
- ```bash
428
- # Install PDM if you don't have it
429
- pip install pdm
430
-
431
- # Install dependencies
432
- pdm install --no-self
433
-
434
- # Activate the environment
435
- pdm venv activate
436
- ```
437
-
438
- You can run the application directly using Python module syntax:
439
-
440
- ```bash
441
- # Run the web interface
442
- python -m local_deep_research.web.app
443
-
444
- # Run the CLI version
445
- python -m local_deep_research.main
446
- ```
447
-
448
- For more information, see the [development documentation](docs/developing.md).
449
-
450
- ## Unified Database
451
-
452
- The application now uses a single unified database (`ldr.db`) for all settings and history, making configuration management simpler and more reliable.
453
-
454
- ### Migration from v0.1.x
455
-
456
- If you have existing data in legacy databases from v0.1.x, the application will automatically migrate your data when you first run v0.2.0.
457
-
458
- ## Contributing
459
-
460
- Contributions are welcome! Please feel free to submit a Pull Request.
461
-
462
- ### Development Setup with PDM
463
-
464
- This project uses PDM for dependency management. Here's how to set up your development environment:
465
-
466
- ```bash
467
- # Install PDM if you don't have it
468
- pip install pdm
469
-
470
- # Clone the repository
471
- git clone https://github.com/LearningCircuit/local-deep-research.git
472
- cd local-deep-research
473
-
474
- # Install dependencies including dev dependencies
475
- pdm install --no-self
476
-
477
- # Set up pre-commit hooks
478
- pdm run pre-commit install
479
- pdm run pre-commit install-hooks
480
-
481
- # Activate the virtual environment
482
- pdm venv activate
483
- ```
484
-
485
- #### Common PDM Commands
486
-
487
- ```bash
488
- # Run linting checks
489
- pdm run flake8
490
- pdm run black .
491
-
492
- # Run tests (when available)
493
- pdm run pytest
494
-
495
- # Add a new dependency
496
- pdm add package-name
497
-
498
- # Add a development dependency
499
- pdm add -dG dev package-name
500
- ```
501
-
502
- ### Contributing Process
503
-
504
- 1. Fork the repository
505
- 2. Create your feature branch (`git checkout -b feature/AmazingFeature`)
506
- 3. Make your changes
507
- 4. Run linting checks to ensure code quality
508
- 5. Commit your changes (`git commit -m 'Add some AmazingFeature'`)
509
- 6. Push to the branch (`git push origin feature/AmazingFeature`)
510
- 7. **Important:** Open a Pull Request against the `dev` branch, not the `main` branch
511
-
512
- We prefer all pull requests to be submitted against the `dev` branch for easier testing and integration before releasing to the main branch.
513
-
514
- ### Getting Help
515
-
516
- Join our [Discord server](https://discord.gg/ttcqQeFcJ3) if you're planning to contribute. Let us know about your plans - we're always happy to guide new contributors and discuss feature ideas!
517
-
518
- ## Community & Support
519
-
520
- Join our [Discord server](https://discord.gg/ttcqQeFcJ3) to exchange ideas, discuss usage patterns, and
521
- share research approaches.
522
-
523
- Follow our [Subreddit](https://www.reddit.com/r/LocalDeepResearch/) for announcements, updates, and feature highlights.
524
-
525
- ## License
526
-
527
- This project is licensed under the MIT License.
528
-
529
- ## Acknowledgments
530
-
531
- - Built with [Ollama](https://ollama.ai) for local AI processing
532
- - Search powered by multiple sources:
533
- - [Wikipedia](https://www.wikipedia.org/) for factual knowledge
534
- - [arXiv](https://arxiv.org/) for scientific papers
535
- - [PubMed](https://pubmed.ncbi.nlm.nih.gov/) for biomedical literature
536
- - [Semantic Scholar](https://www.semanticscholar.org/) for academic literature
537
- - [DuckDuckGo](https://duckduckgo.com) for web search
538
- - [The Guardian](https://www.theguardian.com/) for journalism
539
- - [SerpAPI](https://serpapi.com) for Google search results
540
- - [SearXNG](https://searxng.org/) for local web-search engine
541
- - [Brave Search](https://search.brave.com/) for privacy-focused web search
542
- - Built on [LangChain](https://github.com/hwchase17/langchain) framework
543
- - Uses [justext](https://github.com/miso-belica/justext), [Playwright](https://playwright.dev), [FAISS](https://github.com/facebookresearch/faiss), and more
544
-
545
- > **Support Free Knowledge:** If you frequently use the search engines in this tool, please consider making a donation to these organizations:
546
- > - [Donate to Wikipedia](https://donate.wikimedia.org)
547
- > - [Support arXiv](https://arxiv.org/about/give)
548
- > - [Donate to DuckDuckGo](https://duckduckgo.com/donations)
549
- > - [Support PubMed/NCBI](https://www.nlm.nih.gov/pubs/donations/donations.html)