ai-parrot 0.3.0__cp311-cp311-manylinux_2_28_x86_64.whl → 0.3.3__cp311-cp311-manylinux_2_28_x86_64.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Potentially problematic release.
This version of ai-parrot might be problematic. Click here for more details.
- ai_parrot-0.3.3.dist-info/METADATA +318 -0
- {ai_parrot-0.3.0.dist-info → ai_parrot-0.3.3.dist-info}/RECORD +11 -11
- {ai_parrot-0.3.0.dist-info → ai_parrot-0.3.3.dist-info}/WHEEL +1 -1
- parrot/exceptions.cpython-311-x86_64-linux-gnu.so +0 -0
- parrot/loaders/pdf.py +256 -6
- parrot/loaders/videolocal.py +13 -0
- parrot/utils/parsers/toml.cpython-311-x86_64-linux-gnu.so +0 -0
- parrot/utils/types.cpython-311-x86_64-linux-gnu.so +0 -0
- parrot/version.py +1 -1
- ai_parrot-0.3.0.dist-info/METADATA +0 -319
- {ai_parrot-0.3.0.dist-info → ai_parrot-0.3.3.dist-info}/LICENSE +0 -0
- {ai_parrot-0.3.0.dist-info → ai_parrot-0.3.3.dist-info}/top_level.txt +0 -0
|
@@ -0,0 +1,318 @@
|
|
|
1
|
+
Metadata-Version: 2.1
|
|
2
|
+
Name: ai-parrot
|
|
3
|
+
Version: 0.3.3
|
|
4
|
+
Summary: Live Chatbots based on Langchain chatbots and Agents Integrated into Navigator Framework or used into aiohttp applications.
|
|
5
|
+
Home-page: https://github.com/phenobarbital/ai-parrot
|
|
6
|
+
Author: Jesus Lara
|
|
7
|
+
Author-email: jesuslara@phenobarbital.info
|
|
8
|
+
License: MIT
|
|
9
|
+
Project-URL: Source, https://github.com/phenobarbital/ai-parrot
|
|
10
|
+
Project-URL: Tracker, https://github.com/phenobarbital/ai-parrot/issues
|
|
11
|
+
Project-URL: Documentation, https://github.com/phenobarbital/ai-parrot/
|
|
12
|
+
Project-URL: Funding, https://paypal.me/phenobarbital
|
|
13
|
+
Project-URL: Say Thanks!, https://saythanks.io/to/phenobarbital
|
|
14
|
+
Keywords: asyncio,asyncpg,aioredis,aiomcache,langchain,chatbot,agents
|
|
15
|
+
Platform: POSIX
|
|
16
|
+
Classifier: Development Status :: 4 - Beta
|
|
17
|
+
Classifier: Intended Audience :: Developers
|
|
18
|
+
Classifier: Operating System :: POSIX :: Linux
|
|
19
|
+
Classifier: Environment :: Web Environment
|
|
20
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
21
|
+
Classifier: Topic :: Software Development :: Build Tools
|
|
22
|
+
Classifier: Topic :: Software Development :: Libraries :: Python Modules
|
|
23
|
+
Classifier: Programming Language :: Python :: 3.9
|
|
24
|
+
Classifier: Programming Language :: Python :: 3.10
|
|
25
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
26
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
27
|
+
Classifier: Programming Language :: Python :: 3 :: Only
|
|
28
|
+
Classifier: Framework :: AsyncIO
|
|
29
|
+
Requires-Python: >=3.10.12
|
|
30
|
+
Description-Content-Type: text/markdown
|
|
31
|
+
License-File: LICENSE
|
|
32
|
+
Requires-Dist: Cython==3.0.11
|
|
33
|
+
Requires-Dist: accelerate==0.34.2
|
|
34
|
+
Requires-Dist: langchain>=0.2.6
|
|
35
|
+
Requires-Dist: langchain-community>=0.2.6
|
|
36
|
+
Requires-Dist: langchain-core>=0.2.32
|
|
37
|
+
Requires-Dist: langchain-experimental==0.0.62
|
|
38
|
+
Requires-Dist: langchainhub==0.1.15
|
|
39
|
+
Requires-Dist: langchain-text-splitters==0.2.2
|
|
40
|
+
Requires-Dist: langchain-huggingface==0.0.3
|
|
41
|
+
Requires-Dist: huggingface-hub==0.23.5
|
|
42
|
+
Requires-Dist: llama-index==0.10.20
|
|
43
|
+
Requires-Dist: llama-cpp-python==0.2.56
|
|
44
|
+
Requires-Dist: bitsandbytes==0.43.3
|
|
45
|
+
Requires-Dist: Cartopy==0.22.0
|
|
46
|
+
Requires-Dist: chromadb==0.4.24
|
|
47
|
+
Requires-Dist: datasets==2.18.0
|
|
48
|
+
Requires-Dist: faiss-cpu==1.8.0
|
|
49
|
+
Requires-Dist: fastavro==1.9.4
|
|
50
|
+
Requires-Dist: gunicorn==21.2.0
|
|
51
|
+
Requires-Dist: jq==1.7.0
|
|
52
|
+
Requires-Dist: rank-bm25==0.2.2
|
|
53
|
+
Requires-Dist: matplotlib==3.8.3
|
|
54
|
+
Requires-Dist: numba==0.59.0
|
|
55
|
+
Requires-Dist: querysource>=3.12.10
|
|
56
|
+
Requires-Dist: safetensors>=0.4.3
|
|
57
|
+
Requires-Dist: sentence-transformers==3.0.1
|
|
58
|
+
Requires-Dist: tabulate==0.9.0
|
|
59
|
+
Requires-Dist: tiktoken==0.7.0
|
|
60
|
+
Requires-Dist: tokenizers==0.19.1
|
|
61
|
+
Requires-Dist: unstructured==0.14.3
|
|
62
|
+
Requires-Dist: unstructured-client==0.18.0
|
|
63
|
+
Requires-Dist: youtube-transcript-api==0.6.2
|
|
64
|
+
Requires-Dist: selenium==4.18.1
|
|
65
|
+
Requires-Dist: webdriver-manager==4.0.1
|
|
66
|
+
Requires-Dist: transitions==0.9.0
|
|
67
|
+
Requires-Dist: sentencepiece==0.2.0
|
|
68
|
+
Requires-Dist: duckduckgo-search==5.3.0
|
|
69
|
+
Requires-Dist: google-search-results==2.4.2
|
|
70
|
+
Requires-Dist: google-api-python-client>=2.86.0
|
|
71
|
+
Requires-Dist: gdown==5.1.0
|
|
72
|
+
Requires-Dist: weasyprint==61.2
|
|
73
|
+
Requires-Dist: markdown2==2.4.13
|
|
74
|
+
Requires-Dist: fastembed==0.3.4
|
|
75
|
+
Requires-Dist: yfinance==0.2.40
|
|
76
|
+
Requires-Dist: youtube-search==2.1.2
|
|
77
|
+
Requires-Dist: wikipedia==1.4.0
|
|
78
|
+
Requires-Dist: mediawikiapi==1.2
|
|
79
|
+
Requires-Dist: pyowm==3.3.0
|
|
80
|
+
Requires-Dist: O365==2.0.35
|
|
81
|
+
Requires-Dist: stackapi==0.3.1
|
|
82
|
+
Requires-Dist: timm==1.0.9
|
|
83
|
+
Requires-Dist: torchvision==0.19.1
|
|
84
|
+
Provides-Extra: all
|
|
85
|
+
Requires-Dist: langchain-milvus==0.1.1; extra == "all"
|
|
86
|
+
Requires-Dist: milvus==2.3.5; extra == "all"
|
|
87
|
+
Requires-Dist: pymilvus==2.4.4; extra == "all"
|
|
88
|
+
Requires-Dist: groq==0.11.0; extra == "all"
|
|
89
|
+
Requires-Dist: langchain-groq==0.1.4; extra == "all"
|
|
90
|
+
Requires-Dist: llama-index-llms-huggingface==0.2.7; extra == "all"
|
|
91
|
+
Requires-Dist: langchain-google-vertexai==1.0.8; extra == "all"
|
|
92
|
+
Requires-Dist: langchain-google-genai==1.0.8; extra == "all"
|
|
93
|
+
Requires-Dist: google-generativeai==0.7.2; extra == "all"
|
|
94
|
+
Requires-Dist: vertexai==1.60.0; extra == "all"
|
|
95
|
+
Requires-Dist: google-cloud-aiplatform>=1.60.0; extra == "all"
|
|
96
|
+
Requires-Dist: grpc-google-iam-v1==0.13.0; extra == "all"
|
|
97
|
+
Requires-Dist: langchain-openai==0.1.21; extra == "all"
|
|
98
|
+
Requires-Dist: openai==1.40.8; extra == "all"
|
|
99
|
+
Requires-Dist: llama-index-llms-openai==0.1.11; extra == "all"
|
|
100
|
+
Requires-Dist: langchain-anthropic==0.1.23; extra == "all"
|
|
101
|
+
Requires-Dist: anthropic==0.34.0; extra == "all"
|
|
102
|
+
Provides-Extra: analytics
|
|
103
|
+
Requires-Dist: annoy==1.17.3; extra == "analytics"
|
|
104
|
+
Requires-Dist: gradio-tools==0.0.9; extra == "analytics"
|
|
105
|
+
Requires-Dist: gradio-client==0.2.9; extra == "analytics"
|
|
106
|
+
Requires-Dist: streamlit==1.37.1; extra == "analytics"
|
|
107
|
+
Requires-Dist: simsimd==4.3.1; extra == "analytics"
|
|
108
|
+
Requires-Dist: opencv-python==4.10.0.84; extra == "analytics"
|
|
109
|
+
Provides-Extra: anthropic
|
|
110
|
+
Requires-Dist: langchain-anthropic==0.1.11; extra == "anthropic"
|
|
111
|
+
Requires-Dist: anthropic==0.25.2; extra == "anthropic"
|
|
112
|
+
Provides-Extra: crew
|
|
113
|
+
Requires-Dist: colbert-ai==0.2.19; extra == "crew"
|
|
114
|
+
Requires-Dist: vanna==0.3.4; extra == "crew"
|
|
115
|
+
Requires-Dist: crewai[tools]==0.28.8; extra == "crew"
|
|
116
|
+
Provides-Extra: google
|
|
117
|
+
Requires-Dist: langchain-google-vertexai==1.0.10; extra == "google"
|
|
118
|
+
Requires-Dist: langchain-google-genai==1.0.10; extra == "google"
|
|
119
|
+
Requires-Dist: vertexai==1.65.0; extra == "google"
|
|
120
|
+
Provides-Extra: groq
|
|
121
|
+
Requires-Dist: groq==0.11.0; extra == "groq"
|
|
122
|
+
Requires-Dist: langchain-groq==0.1.9; extra == "groq"
|
|
123
|
+
Provides-Extra: hunggingfaces
|
|
124
|
+
Requires-Dist: llama-index-llms-huggingface==0.2.7; extra == "hunggingfaces"
|
|
125
|
+
Provides-Extra: loaders
|
|
126
|
+
Requires-Dist: pymupdf==1.24.4; extra == "loaders"
|
|
127
|
+
Requires-Dist: pymupdf4llm==0.0.1; extra == "loaders"
|
|
128
|
+
Requires-Dist: pdf4llm==0.0.6; extra == "loaders"
|
|
129
|
+
Requires-Dist: PyPDF2==3.0.1; extra == "loaders"
|
|
130
|
+
Requires-Dist: pdfminer.six==20231228; extra == "loaders"
|
|
131
|
+
Requires-Dist: pdfplumber==0.11.0; extra == "loaders"
|
|
132
|
+
Requires-Dist: GitPython==3.1.42; extra == "loaders"
|
|
133
|
+
Requires-Dist: opentelemetry-sdk==1.24.0; extra == "loaders"
|
|
134
|
+
Requires-Dist: rapidocr-onnxruntime==1.3.15; extra == "loaders"
|
|
135
|
+
Requires-Dist: pytesseract==0.3.10; extra == "loaders"
|
|
136
|
+
Requires-Dist: python-docx==1.1.0; extra == "loaders"
|
|
137
|
+
Requires-Dist: python-pptx==0.6.23; extra == "loaders"
|
|
138
|
+
Requires-Dist: docx2txt==0.8; extra == "loaders"
|
|
139
|
+
Requires-Dist: pytube==15.0.0; extra == "loaders"
|
|
140
|
+
Requires-Dist: pydub==0.25.1; extra == "loaders"
|
|
141
|
+
Requires-Dist: markdownify==0.12.1; extra == "loaders"
|
|
142
|
+
Requires-Dist: yt-dlp==2024.4.9; extra == "loaders"
|
|
143
|
+
Requires-Dist: moviepy==1.0.3; extra == "loaders"
|
|
144
|
+
Requires-Dist: mammoth==1.7.1; extra == "loaders"
|
|
145
|
+
Requires-Dist: paddlepaddle==2.6.1; extra == "loaders"
|
|
146
|
+
Requires-Dist: paddlepaddle-gpu==2.6.1; extra == "loaders"
|
|
147
|
+
Requires-Dist: paddleocr==2.8.1; extra == "loaders"
|
|
148
|
+
Requires-Dist: ftfy==6.2.3; extra == "loaders"
|
|
149
|
+
Requires-Dist: librosa==0.10.1; extra == "loaders"
|
|
150
|
+
Requires-Dist: XlsxWriter==3.2.0; extra == "loaders"
|
|
151
|
+
Requires-Dist: xformers==0.0.27.post2; extra == "loaders"
|
|
152
|
+
Provides-Extra: milvus
|
|
153
|
+
Requires-Dist: langchain-milvus>=0.1.4; extra == "milvus"
|
|
154
|
+
Requires-Dist: milvus==2.3.5; extra == "milvus"
|
|
155
|
+
Requires-Dist: pymilvus==2.4.6; extra == "milvus"
|
|
156
|
+
Provides-Extra: openai
|
|
157
|
+
Requires-Dist: langchain-openai==0.1.21; extra == "openai"
|
|
158
|
+
Requires-Dist: openai==1.40.3; extra == "openai"
|
|
159
|
+
Requires-Dist: llama-index-llms-openai==0.1.11; extra == "openai"
|
|
160
|
+
Requires-Dist: tiktoken==0.7.0; extra == "openai"
|
|
161
|
+
Provides-Extra: qdrant
|
|
162
|
+
Requires-Dist: qdrant-client==1.8.0; extra == "qdrant"
|
|
163
|
+
|
|
164
|
+
# AI Parrot: Python package for creating Chatbots
|
|
165
|
+
This is an open-source Python package for creating Chatbots based on Langchain and Navigator.
|
|
166
|
+
This README provides instructions for installation, development, testing, and releasing Parrot.
|
|
167
|
+
|
|
168
|
+
## Installation
|
|
169
|
+
|
|
170
|
+
**Creating a virtual environment:**
|
|
171
|
+
|
|
172
|
+
This is recommended for development and isolation from system-wide libraries.
|
|
173
|
+
Run the following command in your terminal:
|
|
174
|
+
|
|
175
|
+
Debian-based systems installation:
|
|
176
|
+
```
|
|
177
|
+
sudo apt install gcc python3.11-venv python3.11-full python3.11-dev libmemcached-dev zlib1g-dev build-essential libffi-dev unixodbc unixodbc-dev libsqliteodbc libev4 libev-dev
|
|
178
|
+
```
|
|
179
|
+
|
|
180
|
+
For Qdrant installation:
|
|
181
|
+
```
|
|
182
|
+
docker pull qdrant/qdrant
|
|
183
|
+
docker run -d -p 6333:6333 -p 6334:6334 --name qdrant -v $(pwd)/qdrant_storage:/qdrant/storage:z qdrant/qdrant
|
|
184
|
+
```
|
|
185
|
+
|
|
186
|
+
For VertexAI, creates a folder on "env" called "google" and copy the JSON credentials file into it.
|
|
187
|
+
|
|
188
|
+
```bash
|
|
189
|
+
make venv
|
|
190
|
+
```
|
|
191
|
+
|
|
192
|
+
This will create a virtual environment named `.venv`. To activate it, run:
|
|
193
|
+
|
|
194
|
+
```bash
|
|
195
|
+
source .venv/bin/activate # Linux/macOS
|
|
196
|
+
```
|
|
197
|
+
|
|
198
|
+
Once activated, install Parrot within the virtual environment:
|
|
199
|
+
|
|
200
|
+
```bash
|
|
201
|
+
make install
|
|
202
|
+
```
|
|
203
|
+
The output will remind you to activate the virtual environment before development.
|
|
204
|
+
|
|
205
|
+
**Optional** (for developers):
|
|
206
|
+
```bash
|
|
207
|
+
pip install -e .
|
|
208
|
+
```
|
|
209
|
+
|
|
210
|
+
## Start http server
|
|
211
|
+
```bash
|
|
212
|
+
python run.py
|
|
213
|
+
```
|
|
214
|
+
|
|
215
|
+
## Development Setup
|
|
216
|
+
|
|
217
|
+
This section explains how to set up your development environment:
|
|
218
|
+
|
|
219
|
+
1. **Install development requirements:**
|
|
220
|
+
|
|
221
|
+
```bash
|
|
222
|
+
make setup
|
|
223
|
+
```
|
|
224
|
+
|
|
225
|
+
This installs development dependencies like linters and test runners mentioned in the `docs/requirements-dev.txt` file.
|
|
226
|
+
|
|
227
|
+
2. **Install Parrot in editable mode:**
|
|
228
|
+
|
|
229
|
+
This allows you to make changes to the code and test them without reinstalling:
|
|
230
|
+
|
|
231
|
+
```bash
|
|
232
|
+
make dev
|
|
233
|
+
```
|
|
234
|
+
|
|
235
|
+
This uses `flit` to install Parrot in editable mode.
|
|
236
|
+
|
|
237
|
+
|
|
238
|
+
### Usage (Replace with actual usage instructions)
|
|
239
|
+
|
|
240
|
+
*Once you have set up your development environment, you can start using Parrot.*
|
|
241
|
+
|
|
242
|
+
#### Test with Code ChatBOT
|
|
243
|
+
* Set environment variables for:
|
|
244
|
+
[google]
|
|
245
|
+
GOOGLE_API_KEY=apikey
|
|
246
|
+
GOOGLE_CREDENTIALS_FILE=.../credentials.json
|
|
247
|
+
VERTEX_PROJECT_ID=vertex-project
|
|
248
|
+
VERTEX_REGION=region
|
|
249
|
+
|
|
250
|
+
* Run the chatbot:
|
|
251
|
+
```bash
|
|
252
|
+
python examples/test_agent.py
|
|
253
|
+
```
|
|
254
|
+
|
|
255
|
+
### Testing
|
|
256
|
+
|
|
257
|
+
To run the test suite:
|
|
258
|
+
|
|
259
|
+
```bash
|
|
260
|
+
make test
|
|
261
|
+
```
|
|
262
|
+
|
|
263
|
+
This will run tests using `coverage` to report on code coverage.
|
|
264
|
+
|
|
265
|
+
|
|
266
|
+
### Code Formatting
|
|
267
|
+
|
|
268
|
+
To format the code with black:
|
|
269
|
+
|
|
270
|
+
```bash
|
|
271
|
+
make format
|
|
272
|
+
```
|
|
273
|
+
|
|
274
|
+
|
|
275
|
+
### Linting
|
|
276
|
+
|
|
277
|
+
To lint the code for style and potential errors:
|
|
278
|
+
|
|
279
|
+
```bash
|
|
280
|
+
make lint
|
|
281
|
+
```
|
|
282
|
+
|
|
283
|
+
This uses `pylint` and `black` to check for issues.
|
|
284
|
+
|
|
285
|
+
|
|
286
|
+
### Releasing a New Version
|
|
287
|
+
|
|
288
|
+
This section outlines the steps for releasing a new version of Parrot:
|
|
289
|
+
|
|
290
|
+
1. **Ensure everything is clean and tested:**
|
|
291
|
+
|
|
292
|
+
```bash
|
|
293
|
+
make release
|
|
294
|
+
```
|
|
295
|
+
|
|
296
|
+
This runs `lint`, `test`, and `clean` tasks before proceeding.
|
|
297
|
+
|
|
298
|
+
2. **Publish the package:**
|
|
299
|
+
|
|
300
|
+
```bash
|
|
301
|
+
make release
|
|
302
|
+
```
|
|
303
|
+
|
|
304
|
+
This uses `flit` to publish the package to a repository like PyPI. You'll need to have publishing credentials configured for `flit`.
|
|
305
|
+
|
|
306
|
+
|
|
307
|
+
### Cleaning Up
|
|
308
|
+
|
|
309
|
+
To remove the virtual environment:
|
|
310
|
+
|
|
311
|
+
```bash
|
|
312
|
+
make distclean
|
|
313
|
+
```
|
|
314
|
+
|
|
315
|
+
|
|
316
|
+
### Contributing
|
|
317
|
+
|
|
318
|
+
We welcome contributions to Parrot! Please refer to the CONTRIBUTING.md file for guidelines on how to contribute.
|
|
@@ -1,10 +1,10 @@
|
|
|
1
1
|
parrot/__init__.py,sha256=eTkAkHeJ5BBDG2fxrXA4M37ODBJoS1DQYpeBAWL2xeI,387
|
|
2
2
|
parrot/conf.py,sha256=-9bVGC7Rf-6wpIg6-ojvU4S_G1wBLUCVDt46KEGHEhM,4257
|
|
3
|
-
parrot/exceptions.cpython-311-x86_64-linux-gnu.so,sha256=
|
|
3
|
+
parrot/exceptions.cpython-311-x86_64-linux-gnu.so,sha256=VNyBh3uLxGQgB0l1bkWjQDqYUN2ZAvRmV12AqQijV9Q,361184
|
|
4
4
|
parrot/manager.py,sha256=NhzXoWxSgtoWHpmYP8cV2Ujq_SlvCbQYQBaohAeL2TM,5935
|
|
5
5
|
parrot/models.py,sha256=RsVQCqhSXBKRPcu-BCga9Y1wyvENFXDCuq3_ObIKvAo,13452
|
|
6
6
|
parrot/py.typed,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
|
|
7
|
-
parrot/version.py,sha256=
|
|
7
|
+
parrot/version.py,sha256=pbGrvnHWVk2vkgFh0ab5xc4-svi5oC7IvapZz06YLpM,373
|
|
8
8
|
parrot/chatbots/__init__.py,sha256=ypskCnME0xUv6psBEGCEyXCrD0J0ULHSllpVmSxqb4A,200
|
|
9
9
|
parrot/chatbots/abstract.py,sha256=CmDn3k4r9uKImOZRN4L9zxLbCdC-1MPUAorDlfZT-kA,26421
|
|
10
10
|
parrot/chatbots/asktroc.py,sha256=gyWzyvpAnmXwXd-3DEKoIJtAxt6NnP5mUZdZbkFky8s,604
|
|
@@ -56,7 +56,7 @@ parrot/loaders/excel.py,sha256=Y1agxm-jG4AgsA2wlPP3p8uBH40wYW1KM2ycTTLKUm4,12441
|
|
|
56
56
|
parrot/loaders/github.py,sha256=CscyUIqoHTytqCbRUUTcV3QSxI8XoDntq5aTU0vdhzQ,2593
|
|
57
57
|
parrot/loaders/image.py,sha256=A9KCXXoGuhDoyeJaascY7Q1ZK12Kf1ggE1drzJjS3AU,3946
|
|
58
58
|
parrot/loaders/json.py,sha256=6B43k591OpvoJLbsJa8CxJue_lAt713SCdldn8bFW3c,1481
|
|
59
|
-
parrot/loaders/pdf.py,sha256=
|
|
59
|
+
parrot/loaders/pdf.py,sha256=wGwFnsUmeQqtqk3L2vYt2DkW09LUODUJN-xLjuAa-do,17826
|
|
60
60
|
parrot/loaders/pdfchapters.py,sha256=YhA8Cdx3qXBR0vuTVnQ12XgH1DXT_rp1Tawzh4V2U3o,5637
|
|
61
61
|
parrot/loaders/pdffn.py,sha256=gA-vJEWUiIUwbMxP8Nmvlzlcb39DVV69vGKtSzavUoI,4004
|
|
62
62
|
parrot/loaders/pdfimages.py,sha256=4Q_HKiAee_hALBsG2qF7PpMgKP1AivHXhmcsCkUa9eE,7899
|
|
@@ -68,7 +68,7 @@ parrot/loaders/repo.py,sha256=vBqBAnwU6p3_DCvI9DVhi1Bs8iCDYHwFGp0P9zvGRyw,3737
|
|
|
68
68
|
parrot/loaders/rtd.py,sha256=oKOC9Qn3iwulYx5BEvXy4_kusKRsy5RLYNHS-e5p-1k,1988
|
|
69
69
|
parrot/loaders/txt.py,sha256=AeGroWffFT--7TYlTSTr5Av5zAr8YIp1fxt8r5qdi-A,2802
|
|
70
70
|
parrot/loaders/video.py,sha256=pl5Ho69bp5vrWMqg5tLbsnHUus1LByTDoL6NPk57Ays,2929
|
|
71
|
-
parrot/loaders/videolocal.py,sha256=
|
|
71
|
+
parrot/loaders/videolocal.py,sha256=3EASzbettSO2tboTe3GndR4p6Nihwj6HGZoiPXekYo0,4302
|
|
72
72
|
parrot/loaders/vimeo.py,sha256=zOvOOIiaZr_bRswJFI7uIMKISgALOxcSim9ZRUFY1Fc,4114
|
|
73
73
|
parrot/loaders/web.py,sha256=3x06JNpfTGFtvSBPAEBVoVdZkpVXePcJeMtj61B2xJk,8867
|
|
74
74
|
parrot/loaders/web_base.py,sha256=5SjQddT0Vhne8C9s30iU3Ex_9O1PJ8kyDmy8EdhGBo0,4380
|
|
@@ -94,17 +94,17 @@ parrot/tools/wikipedia.py,sha256=oadBTRAupu2dKThEORSHqqVs4u0G9lWOltbP6vSZgPE,199
|
|
|
94
94
|
parrot/tools/zipcode.py,sha256=knScSvKgK7bHxyLcBkZFiMs65e-PlYU2_YhG6mohcjU,6440
|
|
95
95
|
parrot/utils/__init__.py,sha256=vkBIvfl9-0NRLd76MIZk4s49PjtF_dW5imLTv_UOKxM,101
|
|
96
96
|
parrot/utils/toml.py,sha256=CVyqDdAEyOj6AHfNpyQe4IUvLP_SSXlbHROYPeadLvU,302
|
|
97
|
-
parrot/utils/types.cpython-311-x86_64-linux-gnu.so,sha256=
|
|
97
|
+
parrot/utils/types.cpython-311-x86_64-linux-gnu.so,sha256=jghuq8bBlgGDjkb88Efi5l9cgR5KZL_qO7yxglGNsTA,791256
|
|
98
98
|
parrot/utils/uv.py,sha256=Mb09bsi13hhi3xQDBjEhCf-U1wherXl-K4-BLcSvqtc,308
|
|
99
99
|
parrot/utils/parsers/__init__.py,sha256=l82uIu07QvSJ8Xt0d_seei9n7UUL8PE-YFGBTyNbxSI,62
|
|
100
|
-
parrot/utils/parsers/toml.cpython-311-x86_64-linux-gnu.so,sha256=
|
|
100
|
+
parrot/utils/parsers/toml.cpython-311-x86_64-linux-gnu.so,sha256=gEnv6QGF6DtxExEdVTdNx48j90wPYKBLyCH1UCRj4MQ,451088
|
|
101
101
|
resources/users/__init__.py,sha256=sdXUV7h0Oogcdru1RrQxbm9_RcMjftf0zTWqvxBVpO8,151
|
|
102
102
|
resources/users/handlers.py,sha256=BGzqBvPY_OaIF_nONWX4b_B5OyyBrdGuSihIsdlFwjk,291
|
|
103
103
|
resources/users/models.py,sha256=glk7Emv7QCi6i32xRFDrGc8UwK23_LPg0XUOJoHnwRU,6799
|
|
104
104
|
settings/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
|
|
105
105
|
settings/settings.py,sha256=9ueEvyLNurUX-AaIeRPV8GKX1c4YjDLbksUAeqEq6Ck,1854
|
|
106
|
-
ai_parrot-0.3.
|
|
107
|
-
ai_parrot-0.3.
|
|
108
|
-
ai_parrot-0.3.
|
|
109
|
-
ai_parrot-0.3.
|
|
110
|
-
ai_parrot-0.3.
|
|
106
|
+
ai_parrot-0.3.3.dist-info/LICENSE,sha256=vRKOoa7onTsLNvSzJtGtMaNhWWh8B3YAT733Tlu6M4o,1070
|
|
107
|
+
ai_parrot-0.3.3.dist-info/METADATA,sha256=LHLvoMsy1VvMlC33Kl2RKhwnFgj40lMpPDVmVWYj1m8,10624
|
|
108
|
+
ai_parrot-0.3.3.dist-info/WHEEL,sha256=UQ-0qXN3LQUffjrV43_e_ZXj2pgORBqTmXipnkj0E8I,113
|
|
109
|
+
ai_parrot-0.3.3.dist-info/top_level.txt,sha256=qHoO4BhYDfeTkyKnciZSQtn5FSLN3Q-P5xCTkyvbuxg,26
|
|
110
|
+
ai_parrot-0.3.3.dist-info/RECORD,,
|
|
Binary file
|
parrot/loaders/pdf.py
CHANGED
|
@@ -2,10 +2,26 @@ from collections.abc import Callable
|
|
|
2
2
|
from pathlib import Path, PurePath
|
|
3
3
|
from typing import Any
|
|
4
4
|
from io import BytesIO
|
|
5
|
+
import re
|
|
6
|
+
import ftfy
|
|
5
7
|
import fitz
|
|
6
8
|
import pytesseract
|
|
9
|
+
from paddleocr import PaddleOCR
|
|
10
|
+
import torch
|
|
11
|
+
import cv2
|
|
12
|
+
from transformers import (
|
|
13
|
+
# DonutProcessor,
|
|
14
|
+
# VisionEncoderDecoderModel,
|
|
15
|
+
# VisionEncoderDecoderConfig,
|
|
16
|
+
# ViTImageProcessor,
|
|
17
|
+
# AutoTokenizer,
|
|
18
|
+
LayoutLMv3ForTokenClassification,
|
|
19
|
+
LayoutLMv3Processor
|
|
20
|
+
)
|
|
21
|
+
from pdf4llm import to_markdown
|
|
7
22
|
from PIL import Image
|
|
8
23
|
from langchain.docstore.document import Document
|
|
24
|
+
from navconfig import config
|
|
9
25
|
from .basepdf import BasePDF
|
|
10
26
|
|
|
11
27
|
|
|
@@ -31,6 +47,29 @@ class PDFLoader(BasePDF):
|
|
|
31
47
|
**kwargs
|
|
32
48
|
)
|
|
33
49
|
self.parse_images = kwargs.get('parse_images', False)
|
|
50
|
+
self.page_as_images = kwargs.get('page_as_images', False)
|
|
51
|
+
if self.page_as_images is True:
|
|
52
|
+
# # Load the processor and model from Hugging Face
|
|
53
|
+
# self.image_processor = DonutProcessor.from_pretrained(
|
|
54
|
+
# "naver-clova-ix/donut-base-finetuned-docvqa"
|
|
55
|
+
# )
|
|
56
|
+
# self.image_model = VisionEncoderDecoderModel.from_pretrained(
|
|
57
|
+
# "naver-clova-ix/donut-base-finetuned-docvqa",
|
|
58
|
+
|
|
59
|
+
# )
|
|
60
|
+
# Load the processor and model from Hugging Face
|
|
61
|
+
self.image_processor = LayoutLMv3Processor.from_pretrained(
|
|
62
|
+
"microsoft/layoutlmv3-base",
|
|
63
|
+
apply_ocr=True
|
|
64
|
+
)
|
|
65
|
+
self.image_model = LayoutLMv3ForTokenClassification.from_pretrained(
|
|
66
|
+
# "microsoft/layoutlmv3-base-finetuned-funsd"
|
|
67
|
+
"HYPJUDY/layoutlmv3-base-finetuned-funsd"
|
|
68
|
+
)
|
|
69
|
+
# Set device to GPU if available
|
|
70
|
+
self.image_device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
|
|
71
|
+
self.image_model.to(self.image_device)
|
|
72
|
+
|
|
34
73
|
# Table Settings:
|
|
35
74
|
self.table_settings = {
|
|
36
75
|
#"vertical_strategy": "text",
|
|
@@ -42,6 +81,134 @@ class PDFLoader(BasePDF):
|
|
|
42
81
|
if table_settings:
|
|
43
82
|
self.table_settings.update(table_settings)
|
|
44
83
|
|
|
84
|
+
def explain_image(self, image_path):
|
|
85
|
+
"""Function to explain the image."""
|
|
86
|
+
# with open(image_path, "rb") as image_file:
|
|
87
|
+
# image_content = image_file.read()
|
|
88
|
+
|
|
89
|
+
# Open the image
|
|
90
|
+
image = cv2.imread(image_path)
|
|
91
|
+
task_prompt = "<s_docvqa><s_question>{user_input}</s_question><s_answer>"
|
|
92
|
+
question = "Extract Questions about Happily Greet"
|
|
93
|
+
prompt = task_prompt.replace("{user_input}", question)
|
|
94
|
+
|
|
95
|
+
decoder_input_ids = self.image_processor.tokenizer(
|
|
96
|
+
prompt,
|
|
97
|
+
add_special_tokens=False,
|
|
98
|
+
return_tensors="pt",
|
|
99
|
+
).input_ids
|
|
100
|
+
|
|
101
|
+
pixel_values = self.image_processor(
|
|
102
|
+
image,
|
|
103
|
+
return_tensors="pt"
|
|
104
|
+
).pixel_values
|
|
105
|
+
|
|
106
|
+
# Send inputs to the appropriate device
|
|
107
|
+
pixel_values = pixel_values.to(self.image_device)
|
|
108
|
+
decoder_input_ids = decoder_input_ids.to(self.image_device)
|
|
109
|
+
|
|
110
|
+
outputs = self.image_model.generate(
|
|
111
|
+
pixel_values,
|
|
112
|
+
decoder_input_ids=decoder_input_ids,
|
|
113
|
+
max_length=self.image_model.decoder.config.max_position_embeddings,
|
|
114
|
+
pad_token_id=self.image_processor.tokenizer.pad_token_id,
|
|
115
|
+
eos_token_id=self.image_processor.tokenizer.eos_token_id,
|
|
116
|
+
bad_words_ids=[[self.image_processor.tokenizer.unk_token_id]],
|
|
117
|
+
# use_cache=True
|
|
118
|
+
return_dict_in_generate=True,
|
|
119
|
+
)
|
|
120
|
+
|
|
121
|
+
sequence = self.image_processor.batch_decode(outputs.sequences)[0]
|
|
122
|
+
|
|
123
|
+
|
|
124
|
+
sequence = sequence.replace(
|
|
125
|
+
self.image_processor.tokenizer.eos_token, ""
|
|
126
|
+
).replace(
|
|
127
|
+
self.image_processor.tokenizer.pad_token, ""
|
|
128
|
+
)
|
|
129
|
+
# remove first task start token
|
|
130
|
+
sequence = re.sub(r"<.*?>", "", sequence, count=1).strip()
|
|
131
|
+
# Print the extracted sequence
|
|
132
|
+
print("Extracted Text:", sequence)
|
|
133
|
+
|
|
134
|
+
print(self.image_processor.token2json(sequence))
|
|
135
|
+
|
|
136
|
+
# Format the output as Markdown (optional step)
|
|
137
|
+
markdown_text = self.format_as_markdown(sequence)
|
|
138
|
+
print("Markdown Format:\n", markdown_text)
|
|
139
|
+
|
|
140
|
+
return None
|
|
141
|
+
|
|
142
|
+
def convert_to_markdown(self, text):
|
|
143
|
+
"""
|
|
144
|
+
Convert the cleaned text into a markdown format.
|
|
145
|
+
You can enhance this function to detect tables, headings, etc.
|
|
146
|
+
"""
|
|
147
|
+
# For example, we can identify sections or headers and format them in Markdown
|
|
148
|
+
markdown_text = text
|
|
149
|
+
# Detect headings and bold them
|
|
150
|
+
markdown_text = re.sub(r"(^.*Scorecard.*$)", r"## \1", markdown_text)
|
|
151
|
+
# Convert lines with ":" to a list item (rough approach)
|
|
152
|
+
markdown_text = re.sub(r"(\w+):", r"- **\1**:", markdown_text)
|
|
153
|
+
# Return the markdown formatted text
|
|
154
|
+
return markdown_text
|
|
155
|
+
|
|
156
|
+
def clean_tokenized_text(self, tokenized_text):
|
|
157
|
+
"""
|
|
158
|
+
Clean the tokenized text by fixing encoding issues and formatting, preserving line breaks.
|
|
159
|
+
"""
|
|
160
|
+
# Fix encoding issues using ftfy
|
|
161
|
+
cleaned_text = ftfy.fix_text(tokenized_text)
|
|
162
|
+
|
|
163
|
+
# Remove <s> and </s> tags (special tokens)
|
|
164
|
+
cleaned_text = cleaned_text.replace("<s>", "").replace("</s>", "")
|
|
165
|
+
|
|
166
|
+
# Replace special characters like 'Ġ' and fix multiple spaces, preserving new lines
|
|
167
|
+
cleaned_text = cleaned_text.replace("Ġ", " ")
|
|
168
|
+
|
|
169
|
+
# Avoid collapsing line breaks, but still normalize multiple spaces
|
|
170
|
+
# Replace multiple spaces with a single space, but preserve line breaks
|
|
171
|
+
cleaned_text = re.sub(r" +", " ", cleaned_text)
|
|
172
|
+
|
|
173
|
+
return cleaned_text.strip()
|
|
174
|
+
|
|
175
|
+
def extract_page_text(self, image_path) -> str:
|
|
176
|
+
# Open the image
|
|
177
|
+
image = Image.open(image_path).convert("RGB")
|
|
178
|
+
|
|
179
|
+
# Processor handles the OCR internally, no need for words or boxes
|
|
180
|
+
encoding = self.image_processor(image, return_tensors="pt", truncation=True)
|
|
181
|
+
encoding = {k: v.to(self.image_device) for k, v in encoding.items()}
|
|
182
|
+
|
|
183
|
+
# Forward pass
|
|
184
|
+
outputs = self.image_model(**encoding)
|
|
185
|
+
logits = outputs.logits
|
|
186
|
+
|
|
187
|
+
# Get predictions
|
|
188
|
+
predictions = logits.argmax(-1).squeeze().tolist()
|
|
189
|
+
labels = [self.image_model.config.id2label[pred] for pred in predictions]
|
|
190
|
+
|
|
191
|
+
# Get the words and boxes from the processor's OCR step
|
|
192
|
+
words = self.image_processor.tokenizer.convert_ids_to_tokens(
|
|
193
|
+
encoding['input_ids'].squeeze().tolist()
|
|
194
|
+
)
|
|
195
|
+
boxes = encoding['bbox'].squeeze().tolist()
|
|
196
|
+
|
|
197
|
+
# Combine words and labels, preserving line breaks based on vertical box position
|
|
198
|
+
extracted_text = ""
|
|
199
|
+
last_box = None
|
|
200
|
+
for word, label, box in zip(words, labels, boxes):
|
|
201
|
+
if label != 'O':
|
|
202
|
+
# Check if the current word is on a new line based on the vertical position of the box
|
|
203
|
+
if last_box and abs(box[1] - last_box[1]) > 10: # A threshold for line breaks
|
|
204
|
+
extracted_text += "\n" # Add a line break
|
|
205
|
+
|
|
206
|
+
extracted_text += f"{word} "
|
|
207
|
+
last_box = box
|
|
208
|
+
cleaned_text = self.clean_tokenized_text(extracted_text)
|
|
209
|
+
markdown_text = self.convert_to_markdown(cleaned_text)
|
|
210
|
+
return markdown_text
|
|
211
|
+
|
|
45
212
|
def _load_pdf(self, path: Path) -> list:
|
|
46
213
|
"""
|
|
47
214
|
Load a PDF file using the Fitz library.
|
|
@@ -56,6 +223,32 @@ class PDFLoader(BasePDF):
|
|
|
56
223
|
self.logger.info(f"Loading PDF file: {path}")
|
|
57
224
|
pdf = fitz.open(str(path)) # Open the PDF file
|
|
58
225
|
docs = []
|
|
226
|
+
try:
|
|
227
|
+
md_text = to_markdown(pdf) # get markdown for all pages
|
|
228
|
+
_meta = {
|
|
229
|
+
"url": f'{path}',
|
|
230
|
+
"source": f"{path.name}",
|
|
231
|
+
"filename": path.name,
|
|
232
|
+
"type": 'pdf',
|
|
233
|
+
"question": '',
|
|
234
|
+
"answer": '',
|
|
235
|
+
"source_type": self._source_type,
|
|
236
|
+
"data": {},
|
|
237
|
+
"summary": '',
|
|
238
|
+
"document_meta": {
|
|
239
|
+
"title": pdf.metadata.get("title", ""),
|
|
240
|
+
"creationDate": pdf.metadata.get("creationDate", ""),
|
|
241
|
+
"author": pdf.metadata.get("author", ""),
|
|
242
|
+
}
|
|
243
|
+
}
|
|
244
|
+
docs.append(
|
|
245
|
+
Document(
|
|
246
|
+
page_content=md_text,
|
|
247
|
+
metadata=_meta
|
|
248
|
+
)
|
|
249
|
+
)
|
|
250
|
+
except Exception:
|
|
251
|
+
pass
|
|
59
252
|
for page_number in range(pdf.page_count):
|
|
60
253
|
page = pdf[page_number]
|
|
61
254
|
text = page.get_text()
|
|
@@ -79,12 +272,7 @@ class PDFLoader(BasePDF):
|
|
|
79
272
|
"summary": summary,
|
|
80
273
|
"document_meta": {
|
|
81
274
|
"title": pdf.metadata.get("title", ""),
|
|
82
|
-
# "subject": pdf.metadata.get("subject", ""),
|
|
83
|
-
# "keywords": pdf.metadata.get("keywords", ""),
|
|
84
275
|
"creationDate": pdf.metadata.get("creationDate", ""),
|
|
85
|
-
# "modDate": pdf.metadata.get("modDate", ""),
|
|
86
|
-
# "producer": pdf.metadata.get("producer", ""),
|
|
87
|
-
# "creator": pdf.metadata.get("creator", ""),
|
|
88
276
|
"author": pdf.metadata.get("author", ""),
|
|
89
277
|
}
|
|
90
278
|
}
|
|
@@ -96,9 +284,10 @@ class PDFLoader(BasePDF):
|
|
|
96
284
|
)
|
|
97
285
|
# Extract images and use OCR to get text from each image
|
|
98
286
|
# second: images
|
|
287
|
+
file_name = path.stem.replace(' ', '_').replace('.', '').lower()
|
|
99
288
|
if self.parse_images is True:
|
|
289
|
+
# extract any images in page:
|
|
100
290
|
image_list = page.get_images(full=True)
|
|
101
|
-
file_name = path.stem.replace(' ', '_').replace('.', '').lower()
|
|
102
291
|
for img_index, img in enumerate(image_list):
|
|
103
292
|
xref = img[0]
|
|
104
293
|
base_image = pdf.extract_image(xref)
|
|
@@ -181,7 +370,68 @@ class PDFLoader(BasePDF):
|
|
|
181
370
|
)
|
|
182
371
|
except Exception as exc:
|
|
183
372
|
print(exc)
|
|
373
|
+
# fourth: page as image
|
|
374
|
+
if self.page_as_images is True:
|
|
375
|
+
# Convert the page to a Pixmap (which is an image)
|
|
376
|
+
mat = fitz.Matrix(2, 2)
|
|
377
|
+
pix = page.get_pixmap(dpi=300, matrix=mat) # Increase DPI for better resolution
|
|
378
|
+
img_name = f'{file_name}_page_{page_num}.png'
|
|
379
|
+
img_path = self._imgdir.joinpath(img_name)
|
|
380
|
+
if img_path.exists():
|
|
381
|
+
img_path.unlink(missing_ok=True)
|
|
382
|
+
self.logger.notice(
|
|
383
|
+
f"Saving Page {page_number} as Image on {img_path}"
|
|
384
|
+
)
|
|
385
|
+
pix.save(
|
|
386
|
+
img_path
|
|
387
|
+
)
|
|
388
|
+
# TODO passing the image to a AI visual to get explanation
|
|
389
|
+
# Get the extracted text from the image
|
|
390
|
+
text = self.extract_page_text(img_path)
|
|
391
|
+
url = f'/static/images/{img_name}'
|
|
392
|
+
image_meta = {
|
|
393
|
+
"url": url,
|
|
394
|
+
"source": f"{path.name} Page.#{page_num}",
|
|
395
|
+
"filename": path.name,
|
|
396
|
+
"index": f"{path.name}:{page_num}",
|
|
397
|
+
"question": '',
|
|
398
|
+
"answer": '',
|
|
399
|
+
"type": 'page',
|
|
400
|
+
"data": {},
|
|
401
|
+
"summary": '',
|
|
402
|
+
"document_meta": {
|
|
403
|
+
"image_name": img_name,
|
|
404
|
+
"page_number": f"{page_number}"
|
|
405
|
+
},
|
|
406
|
+
"source_type": self._source_type
|
|
407
|
+
}
|
|
408
|
+
docs.append(
|
|
409
|
+
Document(page_content=text, metadata=image_meta)
|
|
410
|
+
)
|
|
184
411
|
pdf.close()
|
|
185
412
|
return docs
|
|
186
413
|
else:
|
|
187
414
|
return []
|
|
415
|
+
|
|
416
|
+
def get_ocr(self, img_path) -> list:
|
|
417
|
+
# Initialize PaddleOCR with table recognition
|
|
418
|
+
self.ocr_model = PaddleOCR(
|
|
419
|
+
lang='en',
|
|
420
|
+
det_model_dir=None,
|
|
421
|
+
rec_model_dir=None,
|
|
422
|
+
rec_char_dict_path=None,
|
|
423
|
+
table=True,
|
|
424
|
+
# use_angle_cls=True,
|
|
425
|
+
# use_gpu=True
|
|
426
|
+
)
|
|
427
|
+
result = self.ocr_model.ocr(img_path, cls=True)
|
|
428
|
+
|
|
429
|
+
# extract tables:
|
|
430
|
+
# The result contains the table structure and content
|
|
431
|
+
tables = []
|
|
432
|
+
for line in result:
|
|
433
|
+
if 'html' in line[1]:
|
|
434
|
+
html_table = line[1]['html']
|
|
435
|
+
tables.append(html_table)
|
|
436
|
+
|
|
437
|
+
print('TABLES > ', tables)
|
parrot/loaders/videolocal.py
CHANGED
|
@@ -105,3 +105,16 @@ class VideoLocalLoader(BaseVideoLoader):
|
|
|
105
105
|
if set(item.parts).isdisjoint(self.skip_directories):
|
|
106
106
|
documents.extend(self.load_video(item))
|
|
107
107
|
return self.split_documents(documents)
|
|
108
|
+
|
|
109
|
+
def extract(self) -> list:
|
|
110
|
+
documents = []
|
|
111
|
+
if self.path.is_file():
|
|
112
|
+
docs = self.load_video(self.path)
|
|
113
|
+
documents.extend(docs)
|
|
114
|
+
if self.path.is_dir():
|
|
115
|
+
# iterate over the files in the directory
|
|
116
|
+
for ext in self._extension:
|
|
117
|
+
for item in self.path.glob(f'*{ext}'):
|
|
118
|
+
if set(item.parts).isdisjoint(self.skip_directories):
|
|
119
|
+
documents.extend(self.load_video(item))
|
|
120
|
+
return documents
|
|
Binary file
|
|
Binary file
|
parrot/version.py
CHANGED
|
@@ -3,7 +3,7 @@
|
|
|
3
3
|
__title__ = "ai-parrot"
|
|
4
4
|
__description__ = "Live Chatbots based on Langchain chatbots and Agents \
|
|
5
5
|
Integrated into Navigator Framework or used into aiohttp applications."
|
|
6
|
-
__version__ = "0.3.
|
|
6
|
+
__version__ = "0.3.3"
|
|
7
7
|
__author__ = "Jesus Lara"
|
|
8
8
|
__author_email__ = "jesuslarag@gmail.com"
|
|
9
9
|
__license__ = "MIT"
|
|
@@ -1,319 +0,0 @@
|
|
|
1
|
-
Metadata-Version: 2.1
|
|
2
|
-
Name: ai-parrot
|
|
3
|
-
Version: 0.3.0
|
|
4
|
-
Summary: Live Chatbots based on Langchain chatbots and Agents Integrated into Navigator Framework or used into aiohttp applications.
|
|
5
|
-
Home-page: https://github.com/phenobarbital/ai-parrot
|
|
6
|
-
Author: Jesus Lara
|
|
7
|
-
Author-email: jesuslara@phenobarbital.info
|
|
8
|
-
License: MIT
|
|
9
|
-
Project-URL: Source, https://github.com/phenobarbital/ai-parrot
|
|
10
|
-
Project-URL: Tracker, https://github.com/phenobarbital/ai-parrot/issues
|
|
11
|
-
Project-URL: Documentation, https://github.com/phenobarbital/ai-parrot/
|
|
12
|
-
Project-URL: Funding, https://paypal.me/phenobarbital
|
|
13
|
-
Project-URL: Say Thanks!, https://saythanks.io/to/phenobarbital
|
|
14
|
-
Keywords: asyncio,asyncpg,aioredis,aiomcache,langchain,chatbot,agents
|
|
15
|
-
Platform: POSIX
|
|
16
|
-
Classifier: Development Status :: 4 - Beta
|
|
17
|
-
Classifier: Intended Audience :: Developers
|
|
18
|
-
Classifier: Operating System :: POSIX :: Linux
|
|
19
|
-
Classifier: Environment :: Web Environment
|
|
20
|
-
Classifier: License :: OSI Approved :: MIT License
|
|
21
|
-
Classifier: Topic :: Software Development :: Build Tools
|
|
22
|
-
Classifier: Topic :: Software Development :: Libraries :: Python Modules
|
|
23
|
-
Classifier: Programming Language :: Python :: 3.9
|
|
24
|
-
Classifier: Programming Language :: Python :: 3.10
|
|
25
|
-
Classifier: Programming Language :: Python :: 3.11
|
|
26
|
-
Classifier: Programming Language :: Python :: 3.12
|
|
27
|
-
Classifier: Programming Language :: Python :: 3 :: Only
|
|
28
|
-
Classifier: Framework :: AsyncIO
|
|
29
|
-
Requires-Python: >=3.10.12
|
|
30
|
-
Description-Content-Type: text/markdown
|
|
31
|
-
License-File: LICENSE
|
|
32
|
-
Requires-Dist: Cython ==3.0.9
|
|
33
|
-
Requires-Dist: pymupdf ==1.24.4
|
|
34
|
-
Requires-Dist: pymupdf4llm ==0.0.1
|
|
35
|
-
Requires-Dist: pdf4llm ==0.0.6
|
|
36
|
-
Requires-Dist: PyPDF2 ==3.0.1
|
|
37
|
-
Requires-Dist: pdfminer.six ==20231228
|
|
38
|
-
Requires-Dist: pdfplumber ==0.11.0
|
|
39
|
-
Requires-Dist: bitsandbytes ==0.43.0
|
|
40
|
-
Requires-Dist: Cartopy ==0.22.0
|
|
41
|
-
Requires-Dist: chromadb ==0.4.24
|
|
42
|
-
Requires-Dist: contourpy ==1.2.0
|
|
43
|
-
Requires-Dist: datasets ==2.18.0
|
|
44
|
-
Requires-Dist: faiss-cpu ==1.8.0
|
|
45
|
-
Requires-Dist: fastavro ==1.9.4
|
|
46
|
-
Requires-Dist: GitPython ==3.1.42
|
|
47
|
-
Requires-Dist: gunicorn ==21.2.0
|
|
48
|
-
Requires-Dist: jq ==1.7.0
|
|
49
|
-
Requires-Dist: rank-bm25 ==0.2.2
|
|
50
|
-
Requires-Dist: matplotlib ==3.8.3
|
|
51
|
-
Requires-Dist: numba ==0.59.0
|
|
52
|
-
Requires-Dist: opentelemetry-sdk ==1.24.0
|
|
53
|
-
Requires-Dist: rapidocr-onnxruntime ==1.3.15
|
|
54
|
-
Requires-Dist: pytesseract ==0.3.10
|
|
55
|
-
Requires-Dist: python-docx ==1.1.0
|
|
56
|
-
Requires-Dist: python-pptx ==0.6.23
|
|
57
|
-
Requires-Dist: docx2txt ==0.8
|
|
58
|
-
Requires-Dist: pytube ==15.0.0
|
|
59
|
-
Requires-Dist: pydub ==0.25.1
|
|
60
|
-
Requires-Dist: markdownify ==0.12.1
|
|
61
|
-
Requires-Dist: librosa ==0.10.1
|
|
62
|
-
Requires-Dist: yt-dlp ==2024.4.9
|
|
63
|
-
Requires-Dist: moviepy ==1.0.3
|
|
64
|
-
Requires-Dist: safetensors ==0.4.2
|
|
65
|
-
Requires-Dist: sentence-transformers ==2.6.1
|
|
66
|
-
Requires-Dist: tabulate ==0.9.0
|
|
67
|
-
Requires-Dist: tiktoken ==0.7.0
|
|
68
|
-
Requires-Dist: tokenizers ==0.19.1
|
|
69
|
-
Requires-Dist: unstructured ==0.14.3
|
|
70
|
-
Requires-Dist: unstructured-client ==0.18.0
|
|
71
|
-
Requires-Dist: uvloop ==0.19.0
|
|
72
|
-
Requires-Dist: XlsxWriter ==3.2.0
|
|
73
|
-
Requires-Dist: youtube-transcript-api ==0.6.2
|
|
74
|
-
Requires-Dist: selenium ==4.18.1
|
|
75
|
-
Requires-Dist: webdriver-manager ==4.0.1
|
|
76
|
-
Requires-Dist: transitions ==0.9.0
|
|
77
|
-
Requires-Dist: sentencepiece ==0.2.0
|
|
78
|
-
Requires-Dist: duckduckgo-search ==5.3.0
|
|
79
|
-
Requires-Dist: google-search-results ==2.4.2
|
|
80
|
-
Requires-Dist: google-api-python-client >=2.86.0
|
|
81
|
-
Requires-Dist: gdown ==5.1.0
|
|
82
|
-
Requires-Dist: weasyprint ==61.2
|
|
83
|
-
Requires-Dist: markdown2 ==2.4.13
|
|
84
|
-
Requires-Dist: xformers ==0.0.25.post1
|
|
85
|
-
Requires-Dist: fastembed ==0.3.4
|
|
86
|
-
Requires-Dist: mammoth ==1.7.1
|
|
87
|
-
Requires-Dist: accelerate ==0.29.3
|
|
88
|
-
Requires-Dist: langchain >=0.2.6
|
|
89
|
-
Requires-Dist: langchain-community >=0.2.6
|
|
90
|
-
Requires-Dist: langchain-core ==0.2.32
|
|
91
|
-
Requires-Dist: langchain-experimental ==0.0.62
|
|
92
|
-
Requires-Dist: langchainhub ==0.1.15
|
|
93
|
-
Requires-Dist: langchain-text-splitters ==0.2.2
|
|
94
|
-
Requires-Dist: huggingface-hub ==0.23.5
|
|
95
|
-
Requires-Dist: llama-index ==0.10.20
|
|
96
|
-
Requires-Dist: llama-cpp-python ==0.2.56
|
|
97
|
-
Requires-Dist: asyncdb[all] >=2.7.10
|
|
98
|
-
Requires-Dist: querysource >=3.10.1
|
|
99
|
-
Requires-Dist: yfinance ==0.2.40
|
|
100
|
-
Requires-Dist: youtube-search ==2.1.2
|
|
101
|
-
Requires-Dist: wikipedia ==1.4.0
|
|
102
|
-
Requires-Dist: mediawikiapi ==1.2
|
|
103
|
-
Requires-Dist: wikibase-rest-api-client ==0.2.0
|
|
104
|
-
Requires-Dist: asknews ==0.7.30
|
|
105
|
-
Requires-Dist: pyowm ==3.3.0
|
|
106
|
-
Requires-Dist: O365 ==2.0.35
|
|
107
|
-
Requires-Dist: langchain-huggingface ==0.0.3
|
|
108
|
-
Requires-Dist: stackapi ==0.3.1
|
|
109
|
-
Provides-Extra: all
|
|
110
|
-
Requires-Dist: langchain-milvus ==0.1.1 ; extra == 'all'
|
|
111
|
-
Requires-Dist: milvus ==2.3.5 ; extra == 'all'
|
|
112
|
-
Requires-Dist: pymilvus ==2.4.4 ; extra == 'all'
|
|
113
|
-
Requires-Dist: groq ==0.6.0 ; extra == 'all'
|
|
114
|
-
Requires-Dist: langchain-groq ==0.1.4 ; extra == 'all'
|
|
115
|
-
Requires-Dist: llama-index-llms-huggingface ==0.2.7 ; extra == 'all'
|
|
116
|
-
Requires-Dist: langchain-google-vertexai ==1.0.8 ; extra == 'all'
|
|
117
|
-
Requires-Dist: langchain-google-genai ==1.0.8 ; extra == 'all'
|
|
118
|
-
Requires-Dist: google-generativeai ==0.7.2 ; extra == 'all'
|
|
119
|
-
Requires-Dist: vertexai ==1.60.0 ; extra == 'all'
|
|
120
|
-
Requires-Dist: google-cloud-aiplatform >=1.60.0 ; extra == 'all'
|
|
121
|
-
Requires-Dist: grpc-google-iam-v1 ==0.13.0 ; extra == 'all'
|
|
122
|
-
Requires-Dist: langchain-openai ==0.1.21 ; extra == 'all'
|
|
123
|
-
Requires-Dist: openai ==1.40.8 ; extra == 'all'
|
|
124
|
-
Requires-Dist: llama-index-llms-openai ==0.1.11 ; extra == 'all'
|
|
125
|
-
Requires-Dist: langchain-anthropic ==0.1.23 ; extra == 'all'
|
|
126
|
-
Requires-Dist: anthropic ==0.34.0 ; extra == 'all'
|
|
127
|
-
Provides-Extra: analytics
|
|
128
|
-
Requires-Dist: annoy ==1.17.3 ; extra == 'analytics'
|
|
129
|
-
Requires-Dist: gradio-tools ==0.0.9 ; extra == 'analytics'
|
|
130
|
-
Requires-Dist: gradio-client ==0.2.9 ; extra == 'analytics'
|
|
131
|
-
Requires-Dist: streamlit ==1.37.1 ; extra == 'analytics'
|
|
132
|
-
Requires-Dist: simsimd ==4.3.1 ; extra == 'analytics'
|
|
133
|
-
Requires-Dist: opencv-python ==4.10.0.84 ; extra == 'analytics'
|
|
134
|
-
Provides-Extra: anthropic
|
|
135
|
-
Requires-Dist: langchain-anthropic ==0.1.11 ; extra == 'anthropic'
|
|
136
|
-
Requires-Dist: anthropic ==0.25.2 ; extra == 'anthropic'
|
|
137
|
-
Provides-Extra: crew
|
|
138
|
-
Requires-Dist: colbert-ai ==0.2.19 ; extra == 'crew'
|
|
139
|
-
Requires-Dist: vanna ==0.3.4 ; extra == 'crew'
|
|
140
|
-
Requires-Dist: crewai[tools] ==0.28.8 ; extra == 'crew'
|
|
141
|
-
Provides-Extra: google
|
|
142
|
-
Requires-Dist: langchain-google-vertexai ==1.0.4 ; extra == 'google'
|
|
143
|
-
Requires-Dist: langchain-google-genai ==1.0.4 ; extra == 'google'
|
|
144
|
-
Requires-Dist: google-generativeai ==0.5.4 ; extra == 'google'
|
|
145
|
-
Requires-Dist: vertexai ==1.49.0 ; extra == 'google'
|
|
146
|
-
Requires-Dist: google-cloud-aiplatform ==1.49.0 ; extra == 'google'
|
|
147
|
-
Requires-Dist: grpc-google-iam-v1 ==0.13.0 ; extra == 'google'
|
|
148
|
-
Provides-Extra: groq
|
|
149
|
-
Requires-Dist: groq ==0.6.0 ; extra == 'groq'
|
|
150
|
-
Requires-Dist: langchain-groq ==0.1.4 ; extra == 'groq'
|
|
151
|
-
Provides-Extra: hunggingfaces
|
|
152
|
-
Requires-Dist: llama-index-llms-huggingface ==0.2.7 ; extra == 'hunggingfaces'
|
|
153
|
-
Provides-Extra: milvus
|
|
154
|
-
Requires-Dist: langchain-milvus ==0.1.1 ; extra == 'milvus'
|
|
155
|
-
Requires-Dist: milvus ==2.3.5 ; extra == 'milvus'
|
|
156
|
-
Requires-Dist: pymilvus ==2.4.4 ; extra == 'milvus'
|
|
157
|
-
Provides-Extra: openai
|
|
158
|
-
Requires-Dist: langchain-openai ==0.1.21 ; extra == 'openai'
|
|
159
|
-
Requires-Dist: openai ==1.40.3 ; extra == 'openai'
|
|
160
|
-
Requires-Dist: llama-index-llms-openai ==0.1.11 ; extra == 'openai'
|
|
161
|
-
Requires-Dist: tiktoken ==0.7.0 ; extra == 'openai'
|
|
162
|
-
Provides-Extra: qdrant
|
|
163
|
-
Requires-Dist: qdrant-client ==1.8.0 ; extra == 'qdrant'
|
|
164
|
-
|
|
165
|
-
# AI Parrot: Python package for creating Chatbots
|
|
166
|
-
This is an open-source Python package for creating Chatbots based on Langchain and Navigator.
|
|
167
|
-
This README provides instructions for installation, development, testing, and releasing Parrot.
|
|
168
|
-
|
|
169
|
-
## Installation
|
|
170
|
-
|
|
171
|
-
**Creating a virtual environment:**
|
|
172
|
-
|
|
173
|
-
This is recommended for development and isolation from system-wide libraries.
|
|
174
|
-
Run the following command in your terminal:
|
|
175
|
-
|
|
176
|
-
Debian-based systems installation:
|
|
177
|
-
```
|
|
178
|
-
sudo apt install gcc python3.11-venv python3.11-full python3.11-dev libmemcached-dev zlib1g-dev build-essential libffi-dev unixodbc unixodbc-dev libsqliteodbc libev4 libev-dev
|
|
179
|
-
```
|
|
180
|
-
|
|
181
|
-
For Qdrant installation:
|
|
182
|
-
```
|
|
183
|
-
docker pull qdrant/qdrant
|
|
184
|
-
docker run -d -p 6333:6333 -p 6334:6334 --name qdrant -v $(pwd)/qdrant_storage:/qdrant/storage:z qdrant/qdrant
|
|
185
|
-
```
|
|
186
|
-
|
|
187
|
-
For VertexAI, creates a folder on "env" called "google" and copy the JSON credentials file into it.
|
|
188
|
-
|
|
189
|
-
```bash
|
|
190
|
-
make venv
|
|
191
|
-
```
|
|
192
|
-
|
|
193
|
-
This will create a virtual environment named `.venv`. To activate it, run:
|
|
194
|
-
|
|
195
|
-
```bash
|
|
196
|
-
source .venv/bin/activate # Linux/macOS
|
|
197
|
-
```
|
|
198
|
-
|
|
199
|
-
Once activated, install Parrot within the virtual environment:
|
|
200
|
-
|
|
201
|
-
```bash
|
|
202
|
-
make install
|
|
203
|
-
```
|
|
204
|
-
The output will remind you to activate the virtual environment before development.
|
|
205
|
-
|
|
206
|
-
**Optional** (for developers):
|
|
207
|
-
```bash
|
|
208
|
-
pip install -e .
|
|
209
|
-
```
|
|
210
|
-
|
|
211
|
-
## Start http server
|
|
212
|
-
```bash
|
|
213
|
-
python run.py
|
|
214
|
-
```
|
|
215
|
-
|
|
216
|
-
## Development Setup
|
|
217
|
-
|
|
218
|
-
This section explains how to set up your development environment:
|
|
219
|
-
|
|
220
|
-
1. **Install development requirements:**
|
|
221
|
-
|
|
222
|
-
```bash
|
|
223
|
-
make setup
|
|
224
|
-
```
|
|
225
|
-
|
|
226
|
-
This installs development dependencies like linters and test runners mentioned in the `docs/requirements-dev.txt` file.
|
|
227
|
-
|
|
228
|
-
2. **Install Parrot in editable mode:**
|
|
229
|
-
|
|
230
|
-
This allows you to make changes to the code and test them without reinstalling:
|
|
231
|
-
|
|
232
|
-
```bash
|
|
233
|
-
make dev
|
|
234
|
-
```
|
|
235
|
-
|
|
236
|
-
This uses `flit` to install Parrot in editable mode.
|
|
237
|
-
|
|
238
|
-
|
|
239
|
-
### Usage (Replace with actual usage instructions)
|
|
240
|
-
|
|
241
|
-
*Once you have set up your development environment, you can start using Parrot.*
|
|
242
|
-
|
|
243
|
-
#### Test with Code ChatBOT
|
|
244
|
-
* Set environment variables for:
|
|
245
|
-
[google]
|
|
246
|
-
GOOGLE_API_KEY=apikey
|
|
247
|
-
GOOGLE_CREDENTIALS_FILE=.../credentials.json
|
|
248
|
-
VERTEX_PROJECT_ID=vertex-project
|
|
249
|
-
VERTEX_REGION=region
|
|
250
|
-
|
|
251
|
-
* Run the chatbot:
|
|
252
|
-
```bash
|
|
253
|
-
python examples/test_agent.py
|
|
254
|
-
```
|
|
255
|
-
|
|
256
|
-
### Testing
|
|
257
|
-
|
|
258
|
-
To run the test suite:
|
|
259
|
-
|
|
260
|
-
```bash
|
|
261
|
-
make test
|
|
262
|
-
```
|
|
263
|
-
|
|
264
|
-
This will run tests using `coverage` to report on code coverage.
|
|
265
|
-
|
|
266
|
-
|
|
267
|
-
### Code Formatting
|
|
268
|
-
|
|
269
|
-
To format the code with black:
|
|
270
|
-
|
|
271
|
-
```bash
|
|
272
|
-
make format
|
|
273
|
-
```
|
|
274
|
-
|
|
275
|
-
|
|
276
|
-
### Linting
|
|
277
|
-
|
|
278
|
-
To lint the code for style and potential errors:
|
|
279
|
-
|
|
280
|
-
```bash
|
|
281
|
-
make lint
|
|
282
|
-
```
|
|
283
|
-
|
|
284
|
-
This uses `pylint` and `black` to check for issues.
|
|
285
|
-
|
|
286
|
-
|
|
287
|
-
### Releasing a New Version
|
|
288
|
-
|
|
289
|
-
This section outlines the steps for releasing a new version of Parrot:
|
|
290
|
-
|
|
291
|
-
1. **Ensure everything is clean and tested:**
|
|
292
|
-
|
|
293
|
-
```bash
|
|
294
|
-
make release
|
|
295
|
-
```
|
|
296
|
-
|
|
297
|
-
This runs `lint`, `test`, and `clean` tasks before proceeding.
|
|
298
|
-
|
|
299
|
-
2. **Publish the package:**
|
|
300
|
-
|
|
301
|
-
```bash
|
|
302
|
-
make release
|
|
303
|
-
```
|
|
304
|
-
|
|
305
|
-
This uses `flit` to publish the package to a repository like PyPI. You'll need to have publishing credentials configured for `flit`.
|
|
306
|
-
|
|
307
|
-
|
|
308
|
-
### Cleaning Up
|
|
309
|
-
|
|
310
|
-
To remove the virtual environment:
|
|
311
|
-
|
|
312
|
-
```bash
|
|
313
|
-
make distclean
|
|
314
|
-
```
|
|
315
|
-
|
|
316
|
-
|
|
317
|
-
### Contributing
|
|
318
|
-
|
|
319
|
-
We welcome contributions to Parrot! Please refer to the CONTRIBUTING.md file for guidelines on how to contribute.
|
|
File without changes
|
|
File without changes
|