docbuddy 0.1.1__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- docbuddy-0.1.1/CONTRIBUTING.md +110 -0
- docbuddy-0.1.1/LICENSE +21 -0
- docbuddy-0.1.1/MANIFEST.in +8 -0
- docbuddy-0.1.1/PKG-INFO +471 -0
- docbuddy-0.1.1/README.md +422 -0
- docbuddy-0.1.1/config.json.example +49 -0
- docbuddy-0.1.1/docbuddy/__init__.py +3 -0
- docbuddy-0.1.1/docbuddy/cli/__init__.py +1 -0
- docbuddy-0.1.1/docbuddy/cli/main.py +506 -0
- docbuddy-0.1.1/docbuddy/config.py +245 -0
- docbuddy-0.1.1/docbuddy/core/__init__.py +17 -0
- docbuddy-0.1.1/docbuddy/core/document_retrieval.py +507 -0
- docbuddy-0.1.1/docbuddy/core/prompt_builder.py +84 -0
- docbuddy-0.1.1/docbuddy/core/query_processor.py +265 -0
- docbuddy-0.1.1/docbuddy/llm/__init__.py +19 -0
- docbuddy-0.1.1/docbuddy/llm/anthropic_llm.py +43 -0
- docbuddy-0.1.1/docbuddy/llm/base.py +17 -0
- docbuddy-0.1.1/docbuddy/llm/gemini_llm.py +40 -0
- docbuddy-0.1.1/docbuddy/llm/groq_llm.py +40 -0
- docbuddy-0.1.1/docbuddy/llm/ollama_llm.py +40 -0
- docbuddy-0.1.1/docbuddy/llm/openai_llm.py +40 -0
- docbuddy-0.1.1/docbuddy/main.py +40 -0
- docbuddy-0.1.1/docbuddy/tui/__init__.py +1 -0
- docbuddy-0.1.1/docbuddy/tui/app.py +360 -0
- docbuddy-0.1.1/docbuddy/tui/style.css +193 -0
- docbuddy-0.1.1/docbuddy/web/__init__.py +1 -0
- docbuddy-0.1.1/docbuddy/web/app.py +72 -0
- docbuddy-0.1.1/docbuddy/web/handlers.py +145 -0
- docbuddy-0.1.1/docbuddy/web/static/.gitkeep +1 -0
- docbuddy-0.1.1/docbuddy/web/templates/index.html +146 -0
- docbuddy-0.1.1/docbuddy/web/templates/kb_status.html +75 -0
- docbuddy-0.1.1/docbuddy/web/templates/model_info.html +60 -0
- docbuddy-0.1.1/docbuddy/web/templates/templates.html +65 -0
- docbuddy-0.1.1/docbuddy.egg-info/PKG-INFO +471 -0
- docbuddy-0.1.1/docbuddy.egg-info/SOURCES.txt +46 -0
- docbuddy-0.1.1/docbuddy.egg-info/dependency_links.txt +1 -0
- docbuddy-0.1.1/docbuddy.egg-info/entry_points.txt +2 -0
- docbuddy-0.1.1/docbuddy.egg-info/requires.txt +30 -0
- docbuddy-0.1.1/docbuddy.egg-info/top_level.txt +1 -0
- docbuddy-0.1.1/env-template.txt +16 -0
- docbuddy-0.1.1/pyproject.toml +76 -0
- docbuddy-0.1.1/setup.cfg +4 -0
- docbuddy-0.1.1/tests/test_basic.py +31 -0
- docbuddy-0.1.1/tests/test_cli.py +70 -0
- docbuddy-0.1.1/tests/test_document_retrieval.py +84 -0
- docbuddy-0.1.1/tests/test_llm.py +92 -0
- docbuddy-0.1.1/tests/test_prompt_builder.py +57 -0
- docbuddy-0.1.1/tests/test_web.py +80 -0
|
@@ -0,0 +1,110 @@
|
|
|
1
|
+
# Contributing to DocBuddy
|
|
2
|
+
|
|
3
|
+
Thank you for considering contributing to DocBuddy! This document provides guidelines and instructions for contributing.
|
|
4
|
+
|
|
5
|
+
## Code of Conduct
|
|
6
|
+
|
|
7
|
+
Please be respectful and considerate of others when contributing to this project.
|
|
8
|
+
|
|
9
|
+
## How to Contribute
|
|
10
|
+
|
|
11
|
+
### Reporting Bugs
|
|
12
|
+
|
|
13
|
+
If you find a bug, please create an issue with:
|
|
14
|
+
|
|
15
|
+
- A clear title and description
|
|
16
|
+
- Steps to reproduce the bug
|
|
17
|
+
- Expected behavior
|
|
18
|
+
- Actual behavior
|
|
19
|
+
- Any relevant error messages or screenshots
|
|
20
|
+
|
|
21
|
+
### Suggesting Enhancements
|
|
22
|
+
|
|
23
|
+
Enhancement suggestions are welcome. Please include:
|
|
24
|
+
|
|
25
|
+
- A clear description of the enhancement
|
|
26
|
+
- The motivation for the enhancement
|
|
27
|
+
- How it would benefit users
|
|
28
|
+
|
|
29
|
+
### Pull Requests
|
|
30
|
+
|
|
31
|
+
1. Fork the repository
|
|
32
|
+
2. Create a new branch for your feature
|
|
33
|
+
3. Add your changes
|
|
34
|
+
4. Run tests to ensure everything works
|
|
35
|
+
5. Submit a pull request
|
|
36
|
+
|
|
37
|
+
## Development Setup
|
|
38
|
+
|
|
39
|
+
1. Clone the repository:
|
|
40
|
+
```bash
|
|
41
|
+
git clone https://github.com/yourusername/docbuddy.git
|
|
42
|
+
cd docbuddy
|
|
43
|
+
```
|
|
44
|
+
|
|
45
|
+
2. Install development dependencies:
|
|
46
|
+
```bash
|
|
47
|
+
pip install -e ".[dev]"
|
|
48
|
+
```
|
|
49
|
+
|
|
50
|
+
3. Run tests:
|
|
51
|
+
```bash
|
|
52
|
+
pytest
|
|
53
|
+
```
|
|
54
|
+
|
|
55
|
+
4. Format code:
|
|
56
|
+
```bash
|
|
57
|
+
ruff format .
|
|
58
|
+
```
|
|
59
|
+
|
|
60
|
+
5. Lint code:
|
|
61
|
+
```bash
|
|
62
|
+
ruff check .
|
|
63
|
+
```
|
|
64
|
+
|
|
65
|
+
## Project Structure
|
|
66
|
+
|
|
67
|
+
```
|
|
68
|
+
docbuddy/
|
|
69
|
+
├── core/ # Core functionality
|
|
70
|
+
├── llm/ # LLM integrations
|
|
71
|
+
├── cli/ # CLI interface
|
|
72
|
+
├── web/ # Web interface
|
|
73
|
+
└── tui/ # Text User Interface
|
|
74
|
+
```
|
|
75
|
+
|
|
76
|
+
### Adding a New LLM Provider
|
|
77
|
+
|
|
78
|
+
1. Create a new file in the `llm/` directory (e.g., `llm/new_provider_llm.py`)
|
|
79
|
+
2. Implement the `BaseLLM` interface
|
|
80
|
+
3. Update `llm/__init__.py` to include your new provider
|
|
81
|
+
4. Add any necessary configuration to `config.py`
|
|
82
|
+
5. Add tests for your new provider
|
|
83
|
+
|
|
84
|
+
### Extending the Web Interface
|
|
85
|
+
|
|
86
|
+
The web interface uses FastHTML. To add new features:
|
|
87
|
+
|
|
88
|
+
1. Add new handlers in `web/handlers.py`
|
|
89
|
+
2. Register routes in `web/app.py`
|
|
90
|
+
3. Add any necessary templates or static files
|
|
91
|
+
|
|
92
|
+
### Extending the TUI
|
|
93
|
+
|
|
94
|
+
The Text User Interface uses Textual. To add new features:
|
|
95
|
+
|
|
96
|
+
1. Modify the app in `tui/app.py`
|
|
97
|
+
2. Add new screens or widgets as needed
|
|
98
|
+
3. Update CSS styles in `tui/style.css`
|
|
99
|
+
|
|
100
|
+
## Documentation
|
|
101
|
+
|
|
102
|
+
Please document your code with docstrings following the Google Python Style Guide.
|
|
103
|
+
|
|
104
|
+
## Testing
|
|
105
|
+
|
|
106
|
+
- Write tests for new features
|
|
107
|
+
- Ensure all tests pass before submitting a pull request
|
|
108
|
+
- Add both unit tests and integration tests as appropriate
|
|
109
|
+
|
|
110
|
+
Thank you for contributing to DocBuddy!
|
docbuddy-0.1.1/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2024 Atari Assist Contributors
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
docbuddy-0.1.1/PKG-INFO
ADDED
|
@@ -0,0 +1,471 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: docbuddy
|
|
3
|
+
Version: 0.1.1
|
|
4
|
+
Summary: CLI and web assistant for document collections using LLMs and RAG
|
|
5
|
+
Author-email: Michael Borck <michael@borck.me>
|
|
6
|
+
License-Expression: MIT
|
|
7
|
+
Keywords: rag,llm,assistant,cli,web,documentation,chat,knowledge-base
|
|
8
|
+
Classifier: Development Status :: 4 - Beta
|
|
9
|
+
Classifier: Environment :: Console
|
|
10
|
+
Classifier: Intended Audience :: Developers
|
|
11
|
+
Classifier: Intended Audience :: End Users/Desktop
|
|
12
|
+
Classifier: Programming Language :: Python :: 3
|
|
13
|
+
Classifier: Programming Language :: Python :: 3.9
|
|
14
|
+
Classifier: Programming Language :: Python :: 3.10
|
|
15
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
16
|
+
Classifier: Topic :: Documentation
|
|
17
|
+
Classifier: Topic :: Text Processing :: Markup
|
|
18
|
+
Requires-Python: >=3.9
|
|
19
|
+
Description-Content-Type: text/markdown
|
|
20
|
+
License-File: LICENSE
|
|
21
|
+
Requires-Dist: typer[all]
|
|
22
|
+
Requires-Dist: rich
|
|
23
|
+
Requires-Dist: openai
|
|
24
|
+
Requires-Dist: requests
|
|
25
|
+
Requires-Dist: anthropic
|
|
26
|
+
Requires-Dist: google-generativeai
|
|
27
|
+
Requires-Dist: groq
|
|
28
|
+
Requires-Dist: python-fasthtml
|
|
29
|
+
Requires-Dist: python-dotenv
|
|
30
|
+
Requires-Dist: textual>=0.52.1
|
|
31
|
+
Provides-Extra: dev
|
|
32
|
+
Requires-Dist: pytest; extra == "dev"
|
|
33
|
+
Requires-Dist: pytest-cov; extra == "dev"
|
|
34
|
+
Requires-Dist: ruff; extra == "dev"
|
|
35
|
+
Requires-Dist: black; extra == "dev"
|
|
36
|
+
Requires-Dist: mypy; extra == "dev"
|
|
37
|
+
Provides-Extra: embeddings
|
|
38
|
+
Requires-Dist: numpy; extra == "embeddings"
|
|
39
|
+
Requires-Dist: sentence-transformers; extra == "embeddings"
|
|
40
|
+
Provides-Extra: full
|
|
41
|
+
Requires-Dist: numpy; extra == "full"
|
|
42
|
+
Requires-Dist: sentence-transformers; extra == "full"
|
|
43
|
+
Requires-Dist: pytest; extra == "full"
|
|
44
|
+
Requires-Dist: pytest-cov; extra == "full"
|
|
45
|
+
Requires-Dist: ruff; extra == "full"
|
|
46
|
+
Requires-Dist: black; extra == "full"
|
|
47
|
+
Requires-Dist: mypy; extra == "full"
|
|
48
|
+
Dynamic: license-file
|
|
49
|
+
|
|
50
|
+
# DocBuddy
|
|
51
|
+
|
|
52
|
+
A general-purpose document assistant for any documentation using RAG and LLMs like OpenAI, Claude, Gemini, Groq, and Ollama.
|
|
53
|
+
|
|
54
|
+
[](https://opensource.org/licenses/MIT)
|
|
55
|
+
[](https://www.python.org/downloads/)
|
|
56
|
+
|
|
57
|
+
> **Note**: DocBuddy is a general-purpose document assistant that can work with any type of documentation. It uses a Retrieval-Augmented Generation (RAG) architecture to provide accurate and contextual answers based on your documents.
|
|
58
|
+
|
|
59
|
+
## Features
|
|
60
|
+
|
|
61
|
+
- Ask questions about any documentation
|
|
62
|
+
- Choose from multiple LLM backends (OpenAI, Claude, Gemini, Groq, Ollama)
|
|
63
|
+
- Different prompt templates for varying control of LLM knowledge usage
|
|
64
|
+
- Knowledge base metrics and confidence scoring
|
|
65
|
+
- Preview matching documents before asking
|
|
66
|
+
- **Three interface options**:
|
|
67
|
+
- Command-line interface (CLI)
|
|
68
|
+
- Web application with FastHTML (100% server-side rendered)
|
|
69
|
+
- Text User Interface (TUI) with Textual
|
|
70
|
+
- Advanced RAG (Retrieval-Augmented Generation) with semantic search
|
|
71
|
+
- Recursive document search to handle organized file structures
|
|
72
|
+
- Configurable document chunking for precise information retrieval
|
|
73
|
+
- Pre-compute and save embeddings for faster retrieval
|
|
74
|
+
- Comprehensive configuration system with JSON files and environment variables
|
|
75
|
+
- Simple and intuitive interface
|
|
76
|
+
|
|
77
|
+
## Overview
|
|
78
|
+
|
|
79
|
+
DocBuddy is a general-purpose document assistant for any documentation. It works by:
|
|
80
|
+
|
|
81
|
+
1. Loading documentation from a specified directory (including subfolders)
|
|
82
|
+
2. Splitting documents into smaller chunks for precise retrieval with configurable size and overlap
|
|
83
|
+
3. Computing embeddings for semantic search (with fallback to lexical search)
|
|
84
|
+
4. Finding the most relevant document chunks for a user's question
|
|
85
|
+
5. Building prompts using configurable templates (isolation, complementary, or supplementary)
|
|
86
|
+
6. Sending the prompt to an LLM for an answer
|
|
87
|
+
7. Evaluating and displaying the result to the user
|
|
88
|
+
|
|
89
|
+
## Installation
|
|
90
|
+
|
|
91
|
+
### From Source
|
|
92
|
+
|
|
93
|
+
```bash
|
|
94
|
+
# Clone the repository
|
|
95
|
+
git clone https://github.com/yourusername/docbuddy.git
|
|
96
|
+
cd docbuddy
|
|
97
|
+
|
|
98
|
+
# Basic installation
|
|
99
|
+
pip install -e .
|
|
100
|
+
|
|
101
|
+
# With embedding support for semantic search (recommended)
|
|
102
|
+
pip install -e ".[embeddings]"
|
|
103
|
+
|
|
104
|
+
# Full installation with development dependencies
|
|
105
|
+
pip install -e ".[full]"
|
|
106
|
+
```
|
|
107
|
+
|
|
108
|
+
### Requirements
|
|
109
|
+
|
|
110
|
+
- Python 3.9 or higher
|
|
111
|
+
- Basic requirements (automatically installed):
|
|
112
|
+
- typer, rich, openai, anthropic, requests, google-generativeai, groq, python-fasthtml, python-dotenv
|
|
113
|
+
- Optional embedding dependencies (for semantic search):
|
|
114
|
+
- numpy, sentence-transformers
|
|
115
|
+
|
|
116
|
+
## Command-Line Usage
|
|
117
|
+
|
|
118
|
+
### Ask a question using OpenAI (default)
|
|
119
|
+
```bash
|
|
120
|
+
docbuddy ask "How do I implement a REST API?"
|
|
121
|
+
```
|
|
122
|
+
|
|
123
|
+
### Use a different LLM backend
|
|
124
|
+
```bash
|
|
125
|
+
docbuddy ask "What is dependency injection?" --model ollama
|
|
126
|
+
docbuddy ask "How to write unit tests?" --model claude
|
|
127
|
+
docbuddy ask "Explain Docker containers" --model gemini
|
|
128
|
+
docbuddy ask "What is functional programming?" --model groq
|
|
129
|
+
```
|
|
130
|
+
|
|
131
|
+
### Use different prompt templates
|
|
132
|
+
```bash
|
|
133
|
+
# Use only document knowledge (isolation)
|
|
134
|
+
docbuddy ask "What's in this documentation?" --template isolation
|
|
135
|
+
|
|
136
|
+
# Use documents first, fall back to model knowledge if needed (complementary)
|
|
137
|
+
docbuddy ask "Compare React and Angular" --template complementary
|
|
138
|
+
|
|
139
|
+
# Combine document knowledge with model knowledge (supplementary)
|
|
140
|
+
docbuddy ask "Explain microservices architecture" --template supplementary
|
|
141
|
+
```
|
|
142
|
+
|
|
143
|
+
### Preview top matching docs before asking
|
|
144
|
+
```bash
|
|
145
|
+
docbuddy preview "authentication best practices"
|
|
146
|
+
```
|
|
147
|
+
|
|
148
|
+
### Build or rebuild the knowledge base
|
|
149
|
+
```bash
|
|
150
|
+
# Build with saved embeddings (recommended)
|
|
151
|
+
docbuddy build-kb
|
|
152
|
+
|
|
153
|
+
# Build without saving embeddings
|
|
154
|
+
docbuddy build-kb --no-save-embeddings
|
|
155
|
+
|
|
156
|
+
# Customize chunking parameters
|
|
157
|
+
docbuddy build-kb --chunk-size 500 --chunk-overlap 100
|
|
158
|
+
|
|
159
|
+
# Use a different embedding model
|
|
160
|
+
docbuddy build-kb --embedding-model all-mpnet-base-v2
|
|
161
|
+
|
|
162
|
+
# Force rebuild even if documents haven't changed
|
|
163
|
+
docbuddy build-kb --force
|
|
164
|
+
|
|
165
|
+
# Use a custom source directory
|
|
166
|
+
docbuddy build-kb --source-dir /path/to/your/docs
|
|
167
|
+
```
|
|
168
|
+
|
|
169
|
+
### View knowledge base information
|
|
170
|
+
```bash
|
|
171
|
+
docbuddy kb-info
|
|
172
|
+
```
|
|
173
|
+
|
|
174
|
+
### View configuration information
|
|
175
|
+
```bash
|
|
176
|
+
docbuddy config-info
|
|
177
|
+
```
|
|
178
|
+
|
|
179
|
+
### Check if embedding libraries are installed
|
|
180
|
+
```bash
|
|
181
|
+
docbuddy check-embedding-libs
|
|
182
|
+
```
|
|
183
|
+
|
|
184
|
+
### List supported models and templates
|
|
185
|
+
```bash
|
|
186
|
+
docbuddy list-models
|
|
187
|
+
docbuddy list-templates
|
|
188
|
+
```
|
|
189
|
+
|
|
190
|
+
## Interface Options
|
|
191
|
+
|
|
192
|
+
DocBuddy offers three different interfaces to suit your preferences and use cases.
|
|
193
|
+
|
|
194
|
+
### Command-Line Interface (CLI)
|
|
195
|
+
|
|
196
|
+
The CLI provides a traditional command-line experience with rich text output:
|
|
197
|
+
|
|
198
|
+
```bash
|
|
199
|
+
docbuddy ask "How do I implement a REST API?"
|
|
200
|
+
```
|
|
201
|
+
|
|
202
|
+
### Web Application
|
|
203
|
+
|
|
204
|
+
DocBuddy includes a FastHTML web interface with a user-friendly UI and 100% server-side rendering.
|
|
205
|
+
|
|
206
|
+
```bash
|
|
207
|
+
# Start the web server
|
|
208
|
+
docbuddy web
|
|
209
|
+
```
|
|
210
|
+
|
|
211
|
+
By default, the server runs on http://localhost:8000
|
|
212
|
+
|
|
213
|
+
#### Web Interface Features
|
|
214
|
+
|
|
215
|
+
- Ask questions with a simple form interface
|
|
216
|
+
- Select which LLM backend to use
|
|
217
|
+
- Choose prompt templates
|
|
218
|
+
- Preview matching documents
|
|
219
|
+
- View source information for answers
|
|
220
|
+
- No JavaScript required (100% server-side rendered)
|
|
221
|
+
|
|
222
|
+
### Text User Interface (TUI)
|
|
223
|
+
|
|
224
|
+
The TUI provides a rich terminal interface using the Textual framework:
|
|
225
|
+
|
|
226
|
+
```bash
|
|
227
|
+
# Start the TUI application
|
|
228
|
+
docbuddy tui
|
|
229
|
+
```
|
|
230
|
+
|
|
231
|
+
#### TUI Features
|
|
232
|
+
|
|
233
|
+
- Full-screen interactive terminal interface
|
|
234
|
+
- Keyboard shortcuts for quick navigation
|
|
235
|
+
- Dark mode toggle
|
|
236
|
+
- Integrated knowledge base management
|
|
237
|
+
- Real-time status indicators
|
|
238
|
+
- Cross-platform compatibility
|
|
239
|
+
|
|
240
|
+
## Configuration
|
|
241
|
+
|
|
242
|
+
DocBuddy provides a comprehensive configuration system with multiple layers:
|
|
243
|
+
|
|
244
|
+
1. **Default configuration** (hardcoded defaults)
|
|
245
|
+
2. **Configuration files** (JSON format)
|
|
246
|
+
3. **Environment variables** (highest precedence)
|
|
247
|
+
|
|
248
|
+
### Configuration Files
|
|
249
|
+
|
|
250
|
+
DocBuddy looks for a `config.json` file in these locations (in order):
|
|
251
|
+
- Current working directory (`./config.json`)
|
|
252
|
+
- User's home directory (`~/.docbuddy/config.json`)
|
|
253
|
+
- Package directory
|
|
254
|
+
|
|
255
|
+
Create a configuration file based on the example:
|
|
256
|
+
|
|
257
|
+
```bash
|
|
258
|
+
cp config.json.example config.json
|
|
259
|
+
```
|
|
260
|
+
|
|
261
|
+
### Environment Variables
|
|
262
|
+
|
|
263
|
+
Set up your API keys and other configuration options in a `.env` file in the project root:
|
|
264
|
+
|
|
265
|
+
```
|
|
266
|
+
# LLM API keys
|
|
267
|
+
OPENAI_API_KEY=your_openai_key
|
|
268
|
+
CLAUDE_API_KEY=your_anthropic_key
|
|
269
|
+
GEMINI_API_KEY=your_google_key
|
|
270
|
+
GROQ_API_KEY=your_groq_key
|
|
271
|
+
|
|
272
|
+
# DocBuddy configuration
|
|
273
|
+
DOCBUDDY_DEFAULT_MODEL=openai
|
|
274
|
+
DOCBUDDY_SOURCE_DIR=docs
|
|
275
|
+
|
|
276
|
+
# Model-specific configuration (optional)
|
|
277
|
+
OPENAI_MODEL=gpt-3.5-turbo
|
|
278
|
+
CLAUDE_MODEL=claude-3-haiku-20240307
|
|
279
|
+
GEMINI_MODEL=models/gemini-pro
|
|
280
|
+
GROQ_MODEL=mixtral-8x7b-32768
|
|
281
|
+
OLLAMA_MODEL=llama3
|
|
282
|
+
```
|
|
283
|
+
|
|
284
|
+
For Ollama, make sure the Ollama service is running locally.
|
|
285
|
+
|
|
286
|
+
### Configuration Structure
|
|
287
|
+
|
|
288
|
+
The configuration file has these main sections:
|
|
289
|
+
|
|
290
|
+
#### LLM Settings
|
|
291
|
+
```json
|
|
292
|
+
"llm": {
|
|
293
|
+
"default_model": "openai",
|
|
294
|
+
"openai": {
|
|
295
|
+
"model": "gpt-3.5-turbo"
|
|
296
|
+
},
|
|
297
|
+
"ollama": {
|
|
298
|
+
"model": "llama3",
|
|
299
|
+
"base_url": "http://localhost:11434"
|
|
300
|
+
}
|
|
301
|
+
// other providers...
|
|
302
|
+
}
|
|
303
|
+
```
|
|
304
|
+
|
|
305
|
+
#### RAG Settings
|
|
306
|
+
```json
|
|
307
|
+
"rag": {
|
|
308
|
+
"source_dir": "docs",
|
|
309
|
+
"chunk_size": 1000,
|
|
310
|
+
"chunk_overlap": 200,
|
|
311
|
+
"embedding_model": "all-MiniLM-L6-v2",
|
|
312
|
+
"kb_dir": ".kb"
|
|
313
|
+
}
|
|
314
|
+
```
|
|
315
|
+
|
|
316
|
+
#### Prompt Templates
|
|
317
|
+
```json
|
|
318
|
+
"prompts": {
|
|
319
|
+
"default_template": "isolation",
|
|
320
|
+
"templates": {
|
|
321
|
+
"isolation": "You are a helpful assistant...",
|
|
322
|
+
"complementary": "You are a helpful assistant...",
|
|
323
|
+
"supplementary": "You are a helpful assistant..."
|
|
324
|
+
}
|
|
325
|
+
}
|
|
326
|
+
```
|
|
327
|
+
|
|
328
|
+
#### Web Interface Settings
|
|
329
|
+
```json
|
|
330
|
+
"web": {
|
|
331
|
+
"title": "DocBuddy",
|
|
332
|
+
"host": "0.0.0.0",
|
|
333
|
+
"port": 8000,
|
|
334
|
+
"debug": true
|
|
335
|
+
}
|
|
336
|
+
```
|
|
337
|
+
|
|
338
|
+
Many of these settings can also be overridden via command-line parameters when using the CLI.
|
|
339
|
+
|
|
340
|
+
## RAG Implementation
|
|
341
|
+
|
|
342
|
+
DocBuddy implements an advanced Retrieval-Augmented Generation (RAG) system with the following features:
|
|
343
|
+
|
|
344
|
+
### Document Loading
|
|
345
|
+
- Recursively searches directories for documentation files
|
|
346
|
+
- Supports any text-based document format
|
|
347
|
+
- Maintains file path information for better source tracking
|
|
348
|
+
- Configurable source directory through config files or environment variables
|
|
349
|
+
|
|
350
|
+
### Document Chunking
|
|
351
|
+
- Splits documents into smaller, semantically meaningful chunks
|
|
352
|
+
- Preserves sentence boundaries for context
|
|
353
|
+
- Fully configurable chunk size and overlap
|
|
354
|
+
- Metadata tracking to avoid unnecessary rebuilds
|
|
355
|
+
|
|
356
|
+
### Semantic Search
|
|
357
|
+
- Uses sentence-transformers for computing embeddings
|
|
358
|
+
- Choice of different embedding models (all-MiniLM-L6-v2, all-mpnet-base-v2, etc.)
|
|
359
|
+
- Falls back to lexical search when embedding libraries aren't available
|
|
360
|
+
- Pre-computes and caches embeddings for better performance
|
|
361
|
+
|
|
362
|
+
### Prompt Templates
|
|
363
|
+
- Multiple built-in templates with different LLM knowledge utilization strategies:
|
|
364
|
+
- **Isolation**: Uses only document knowledge (good for factual queries)
|
|
365
|
+
- **Complementary**: Uses documents first, falls back to model knowledge if needed
|
|
366
|
+
- **Supplementary**: Combines document knowledge with model knowledge
|
|
367
|
+
- Easily customizable through configuration files
|
|
368
|
+
|
|
369
|
+
### Knowledge Base Management
|
|
370
|
+
- Builds and saves embeddings to disk
|
|
371
|
+
- Loads pre-built knowledge base for faster startup
|
|
372
|
+
- Tracks changes to avoid unnecessary rebuilding
|
|
373
|
+
- Comprehensive command-line tools for managing the knowledge base
|
|
374
|
+
|
|
375
|
+
## Project Structure
|
|
376
|
+
|
|
377
|
+
```
|
|
378
|
+
docbuddy/
|
|
379
|
+
├── __init__.py
|
|
380
|
+
├── config.py # Configuration system
|
|
381
|
+
├── main.py # High-level API
|
|
382
|
+
├── core/ # Core functionality
|
|
383
|
+
│ ├── __init__.py
|
|
384
|
+
│ ├── document_retrieval.py # Advanced RAG implementation
|
|
385
|
+
│ ├── prompt_builder.py # Template rendering
|
|
386
|
+
│ └── query_processor.py # Question answering
|
|
387
|
+
├── llm/ # LLM integrations
|
|
388
|
+
│ ├── __init__.py
|
|
389
|
+
│ ├── base.py
|
|
390
|
+
│ ├── openai_llm.py
|
|
391
|
+
│ ├── ollama_llm.py
|
|
392
|
+
│ ├── anthropic_llm.py
|
|
393
|
+
│ ├── gemini_llm.py
|
|
394
|
+
│ └── groq_llm.py
|
|
395
|
+
├── cli/ # CLI interface
|
|
396
|
+
│ ├── __init__.py
|
|
397
|
+
│ └── main.py
|
|
398
|
+
└── web/ # Web interface
|
|
399
|
+
├── __init__.py
|
|
400
|
+
├── app.py
|
|
401
|
+
├── handlers.py
|
|
402
|
+
├── templates/
|
|
403
|
+
└── static/
|
|
404
|
+
```
|
|
405
|
+
|
|
406
|
+
## Development
|
|
407
|
+
|
|
408
|
+
### Setup Development Environment
|
|
409
|
+
|
|
410
|
+
```bash
|
|
411
|
+
# Install development dependencies
|
|
412
|
+
pip install -e ".[dev]"
|
|
413
|
+
|
|
414
|
+
# Full installation including embedding libraries
|
|
415
|
+
pip install -e ".[full]"
|
|
416
|
+
```
|
|
417
|
+
|
|
418
|
+
### Run Tests
|
|
419
|
+
|
|
420
|
+
```bash
|
|
421
|
+
# Run all tests
|
|
422
|
+
pytest
|
|
423
|
+
|
|
424
|
+
# Run tests with coverage
|
|
425
|
+
pytest --cov=docbuddy
|
|
426
|
+
|
|
427
|
+
# Run specific test file
|
|
428
|
+
pytest tests/test_document_retrieval.py
|
|
429
|
+
```
|
|
430
|
+
|
|
431
|
+
### Formatting and Linting
|
|
432
|
+
|
|
433
|
+
```bash
|
|
434
|
+
# Format code
|
|
435
|
+
ruff format .
|
|
436
|
+
|
|
437
|
+
# Lint code
|
|
438
|
+
ruff check .
|
|
439
|
+
|
|
440
|
+
# Type checking
|
|
441
|
+
mypy docbuddy
|
|
442
|
+
```
|
|
443
|
+
|
|
444
|
+
## Documentation
|
|
445
|
+
|
|
446
|
+
- Detailed docstrings follow the Google Python Style Guide
|
|
447
|
+
- See the `CONTRIBUTING.md` file for contributor guidelines
|
|
448
|
+
|
|
449
|
+
## Adding Your Own Documentation
|
|
450
|
+
|
|
451
|
+
Place your documentation files in the configured source directory (default is `docs/`). You can organize files in subfolders as needed. The system will automatically load and index all documents.
|
|
452
|
+
|
|
453
|
+
After adding new documentation, rebuild the knowledge base:
|
|
454
|
+
|
|
455
|
+
```bash
|
|
456
|
+
docbuddy build-kb
|
|
457
|
+
```
|
|
458
|
+
|
|
459
|
+
## License
|
|
460
|
+
|
|
461
|
+
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
|
|
462
|
+
|
|
463
|
+
## Contributing
|
|
464
|
+
|
|
465
|
+
Contributions are welcome! Please see the [CONTRIBUTING.md](CONTRIBUTING.md) file for details on how to contribute to this project.
|
|
466
|
+
|
|
467
|
+
## Acknowledgments
|
|
468
|
+
|
|
469
|
+
- Thanks to all contributors who have helped with this project
|
|
470
|
+
- This project evolved from an earlier specialized document assistant
|
|
471
|
+
- Thanks to the open-source LLM and RAG communities for their excellent tools and libraries
|