ai-parrot 0.3.1__cp311-cp311-manylinux_2_28_x86_64.whl → 0.3.5__cp311-cp311-manylinux_2_28_x86_64.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.


This version of ai-parrot might be problematic. Click here for more details.

@@ -0,0 +1,300 @@
1
+ Metadata-Version: 2.1
2
+ Name: ai-parrot
3
+ Version: 0.3.5
4
+ Summary: Live Chatbots based on Langchain chatbots and Agents Integrated into Navigator Framework or used into aiohttp applications.
5
+ Home-page: https://github.com/phenobarbital/ai-parrot
6
+ Author: Jesus Lara
7
+ Author-email: jesuslara@phenobarbital.info
8
+ License: MIT
9
+ Project-URL: Source, https://github.com/phenobarbital/ai-parrot
10
+ Project-URL: Tracker, https://github.com/phenobarbital/ai-parrot/issues
11
+ Project-URL: Documentation, https://github.com/phenobarbital/ai-parrot/
12
+ Project-URL: Funding, https://paypal.me/phenobarbital
13
+ Project-URL: Say Thanks!, https://saythanks.io/to/phenobarbital
14
+ Keywords: asyncio,asyncpg,aioredis,aiomcache,langchain,chatbot,agents
15
+ Platform: POSIX
16
+ Classifier: Development Status :: 4 - Beta
17
+ Classifier: Intended Audience :: Developers
18
+ Classifier: Operating System :: POSIX :: Linux
19
+ Classifier: Environment :: Web Environment
20
+ Classifier: License :: OSI Approved :: MIT License
21
+ Classifier: Topic :: Software Development :: Build Tools
22
+ Classifier: Topic :: Software Development :: Libraries :: Python Modules
23
+ Classifier: Programming Language :: Python :: 3.9
24
+ Classifier: Programming Language :: Python :: 3.10
25
+ Classifier: Programming Language :: Python :: 3.11
26
+ Classifier: Programming Language :: Python :: 3.12
27
+ Classifier: Programming Language :: Python :: 3 :: Only
28
+ Classifier: Framework :: AsyncIO
29
+ Requires-Python: >=3.9.20
30
+ Description-Content-Type: text/markdown
31
+ License-File: LICENSE
32
+ Requires-Dist: Cython==3.0.11
33
+ Requires-Dist: accelerate==0.34.2
34
+ Requires-Dist: langchain>=0.2.6
35
+ Requires-Dist: langchain-community>=0.2.6
36
+ Requires-Dist: langchain-core>=0.2.32
37
+ Requires-Dist: langchain-experimental==0.0.62
38
+ Requires-Dist: langchainhub==0.1.15
39
+ Requires-Dist: langchain-text-splitters==0.2.2
40
+ Requires-Dist: langchain-huggingface==0.0.3
41
+ Requires-Dist: huggingface-hub==0.23.5
42
+ Requires-Dist: llama-index==0.10.20
43
+ Requires-Dist: llama-cpp-python==0.2.56
44
+ Requires-Dist: bitsandbytes==0.43.3
45
+ Requires-Dist: Cartopy==0.22.0
46
+ Requires-Dist: chromadb==0.4.24
47
+ Requires-Dist: datasets==2.18.0
48
+ Requires-Dist: faiss-cpu==1.8.0
49
+ Requires-Dist: fastavro==1.9.4
50
+ Requires-Dist: gunicorn==21.2.0
51
+ Requires-Dist: jq==1.7.0
52
+ Requires-Dist: rank-bm25==0.2.2
53
+ Requires-Dist: matplotlib==3.8.3
54
+ Requires-Dist: numba==0.59.0
55
+ Requires-Dist: querysource>=3.12.10
56
+ Requires-Dist: safetensors>=0.4.3
57
+ Requires-Dist: sentence-transformers==3.0.1
58
+ Requires-Dist: tabulate==0.9.0
59
+ Requires-Dist: tiktoken==0.7.0
60
+ Requires-Dist: tokenizers==0.19.1
61
+ Requires-Dist: selenium>=4.18.1
62
+ Requires-Dist: webdriver-manager>=4.0.1
63
+ Requires-Dist: transitions==0.9.0
64
+ Requires-Dist: sentencepiece==0.2.0
65
+ Requires-Dist: duckduckgo-search==5.3.0
66
+ Requires-Dist: google-search-results==2.4.2
67
+ Requires-Dist: google-api-python-client>=2.86.0
68
+ Requires-Dist: gdown==5.1.0
69
+ Requires-Dist: weasyprint==61.2
70
+ Requires-Dist: markdown2==2.4.13
71
+ Requires-Dist: fastembed==0.3.4
72
+ Requires-Dist: yfinance==0.2.40
73
+ Requires-Dist: youtube-search==2.1.2
74
+ Requires-Dist: wikipedia==1.4.0
75
+ Requires-Dist: mediawikiapi==1.2
76
+ Requires-Dist: pyowm==3.3.0
77
+ Requires-Dist: O365==2.0.35
78
+ Requires-Dist: stackapi==0.3.1
79
+ Requires-Dist: torchvision==0.19.1
80
+ Requires-Dist: tf-keras==2.17.0
81
+ Provides-Extra: analytics
82
+ Requires-Dist: annoy==1.17.3; extra == "analytics"
83
+ Requires-Dist: gradio-tools==0.0.9; extra == "analytics"
84
+ Requires-Dist: gradio-client==0.2.9; extra == "analytics"
85
+ Requires-Dist: streamlit==1.37.1; extra == "analytics"
86
+ Requires-Dist: simsimd==4.3.1; extra == "analytics"
87
+ Requires-Dist: opencv-python==4.10.0.84; extra == "analytics"
88
+ Provides-Extra: anthropic
89
+ Requires-Dist: langchain-anthropic==0.1.11; extra == "anthropic"
90
+ Requires-Dist: anthropic==0.25.2; extra == "anthropic"
91
+ Provides-Extra: crew
92
+ Requires-Dist: colbert-ai==0.2.19; extra == "crew"
93
+ Requires-Dist: vanna==0.3.4; extra == "crew"
94
+ Requires-Dist: crewai[tools]==0.28.8; extra == "crew"
95
+ Provides-Extra: google
96
+ Requires-Dist: langchain-google-vertexai==1.0.10; extra == "google"
97
+ Requires-Dist: langchain-google-genai==1.0.10; extra == "google"
98
+ Requires-Dist: vertexai==1.65.0; extra == "google"
99
+ Provides-Extra: groq
100
+ Requires-Dist: groq==0.11.0; extra == "groq"
101
+ Requires-Dist: langchain-groq==0.1.9; extra == "groq"
102
+ Provides-Extra: hunggingfaces
103
+ Requires-Dist: llama-index-llms-huggingface==0.2.7; extra == "hunggingfaces"
104
+ Provides-Extra: loaders
105
+ Requires-Dist: unstructured==0.14.3; extra == "loaders"
106
+ Requires-Dist: unstructured-client==0.18.0; extra == "loaders"
107
+ Requires-Dist: youtube-transcript-api==0.6.2; extra == "loaders"
108
+ Requires-Dist: pymupdf==1.24.4; extra == "loaders"
109
+ Requires-Dist: pymupdf4llm==0.0.1; extra == "loaders"
110
+ Requires-Dist: pdf4llm==0.0.6; extra == "loaders"
111
+ Requires-Dist: PyPDF2==3.0.1; extra == "loaders"
112
+ Requires-Dist: pdfminer.six==20231228; extra == "loaders"
113
+ Requires-Dist: pdfplumber==0.11.0; extra == "loaders"
114
+ Requires-Dist: GitPython==3.1.42; extra == "loaders"
115
+ Requires-Dist: opentelemetry-sdk==1.24.0; extra == "loaders"
116
+ Requires-Dist: rapidocr-onnxruntime==1.3.15; extra == "loaders"
117
+ Requires-Dist: pytesseract==0.3.10; extra == "loaders"
118
+ Requires-Dist: python-docx==1.1.0; extra == "loaders"
119
+ Requires-Dist: python-pptx==0.6.23; extra == "loaders"
120
+ Requires-Dist: docx2txt==0.8; extra == "loaders"
121
+ Requires-Dist: pytube==15.0.0; extra == "loaders"
122
+ Requires-Dist: pydub==0.25.1; extra == "loaders"
123
+ Requires-Dist: markdownify==0.12.1; extra == "loaders"
124
+ Requires-Dist: yt-dlp==2024.4.9; extra == "loaders"
125
+ Requires-Dist: moviepy==1.0.3; extra == "loaders"
126
+ Requires-Dist: mammoth==1.7.1; extra == "loaders"
127
+ Requires-Dist: paddlepaddle==2.6.1; extra == "loaders"
128
+ Requires-Dist: paddlepaddle-gpu==2.6.1; extra == "loaders"
129
+ Requires-Dist: paddleocr==2.8.1; extra == "loaders"
130
+ Requires-Dist: ftfy==6.2.3; extra == "loaders"
131
+ Requires-Dist: librosa==0.10.1; extra == "loaders"
132
+ Requires-Dist: XlsxWriter==3.2.0; extra == "loaders"
133
+ Requires-Dist: timm==1.0.9; extra == "loaders"
134
+ Provides-Extra: milvus
135
+ Requires-Dist: langchain-milvus>=0.1.4; extra == "milvus"
136
+ Requires-Dist: milvus==2.3.5; extra == "milvus"
137
+ Requires-Dist: pymilvus==2.4.6; extra == "milvus"
138
+ Provides-Extra: openai
139
+ Requires-Dist: langchain-openai==0.1.21; extra == "openai"
140
+ Requires-Dist: openai==1.40.3; extra == "openai"
141
+ Requires-Dist: llama-index-llms-openai==0.1.11; extra == "openai"
142
+ Requires-Dist: tiktoken==0.7.0; extra == "openai"
143
+ Provides-Extra: qdrant
144
+ Requires-Dist: qdrant-client==1.8.0; extra == "qdrant"
145
+
146
+ # AI Parrot: Python package for creating Chatbots
147
+ This is an open-source Python package for creating Chatbots based on Langchain and Navigator.
148
+ This README provides instructions for installation, development, testing, and releasing Parrot.
149
+
150
+ ## Installation
151
+
152
+ **Creating a virtual environment:**
153
+
154
+ This is recommended for development and isolation from system-wide libraries.
155
+ Run the following command in your terminal:
156
+
157
+ Debian-based systems installation:
158
+ ```
159
+ sudo apt install gcc python3.11-venv python3.11-full python3.11-dev libmemcached-dev zlib1g-dev build-essential libffi-dev unixodbc unixodbc-dev libsqliteodbc libev4 libev-dev
160
+ ```
161
+
162
+ For Qdrant installation:
163
+ ```
164
+ docker pull qdrant/qdrant
165
+ docker run -d -p 6333:6333 -p 6334:6334 --name qdrant -v $(pwd)/qdrant_storage:/qdrant/storage:z qdrant/qdrant
166
+ ```
167
+
168
+ For VertexAI, creates a folder on "env" called "google" and copy the JSON credentials file into it.
169
+
170
+ ```bash
171
+ make venv
172
+ ```
173
+
174
+ This will create a virtual environment named `.venv`. To activate it, run:
175
+
176
+ ```bash
177
+ source .venv/bin/activate # Linux/macOS
178
+ ```
179
+
180
+ Once activated, install Parrot within the virtual environment:
181
+
182
+ ```bash
183
+ make install
184
+ ```
185
+ The output will remind you to activate the virtual environment before development.
186
+
187
+ **Optional** (for developers):
188
+ ```bash
189
+ pip install -e .
190
+ ```
191
+
192
+ ## Start http server
193
+ ```bash
194
+ python run.py
195
+ ```
196
+
197
+ ## Development Setup
198
+
199
+ This section explains how to set up your development environment:
200
+
201
+ 1. **Install development requirements:**
202
+
203
+ ```bash
204
+ make setup
205
+ ```
206
+
207
+ This installs development dependencies like linters and test runners mentioned in the `docs/requirements-dev.txt` file.
208
+
209
+ 2. **Install Parrot in editable mode:**
210
+
211
+ This allows you to make changes to the code and test them without reinstalling:
212
+
213
+ ```bash
214
+ make dev
215
+ ```
216
+
217
+ This uses `flit` to install Parrot in editable mode.
218
+
219
+
220
+ ### Usage (Replace with actual usage instructions)
221
+
222
+ *Once you have set up your development environment, you can start using Parrot.*
223
+
224
+ #### Test with Code ChatBOT
225
+ * Set environment variables for:
226
+ [google]
227
+ GOOGLE_API_KEY=apikey
228
+ GOOGLE_CREDENTIALS_FILE=.../credentials.json
229
+ VERTEX_PROJECT_ID=vertex-project
230
+ VERTEX_REGION=region
231
+
232
+ * Run the chatbot:
233
+ ```bash
234
+ python examples/test_agent.py
235
+ ```
236
+
237
+ ### Testing
238
+
239
+ To run the test suite:
240
+
241
+ ```bash
242
+ make test
243
+ ```
244
+
245
+ This will run tests using `coverage` to report on code coverage.
246
+
247
+
248
+ ### Code Formatting
249
+
250
+ To format the code with black:
251
+
252
+ ```bash
253
+ make format
254
+ ```
255
+
256
+
257
+ ### Linting
258
+
259
+ To lint the code for style and potential errors:
260
+
261
+ ```bash
262
+ make lint
263
+ ```
264
+
265
+ This uses `pylint` and `black` to check for issues.
266
+
267
+
268
+ ### Releasing a New Version
269
+
270
+ This section outlines the steps for releasing a new version of Parrot:
271
+
272
+ 1. **Ensure everything is clean and tested:**
273
+
274
+ ```bash
275
+ make release
276
+ ```
277
+
278
+ This runs `lint`, `test`, and `clean` tasks before proceeding.
279
+
280
+ 2. **Publish the package:**
281
+
282
+ ```bash
283
+ make release
284
+ ```
285
+
286
+ This uses `flit` to publish the package to a repository like PyPI. You'll need to have publishing credentials configured for `flit`.
287
+
288
+
289
+ ### Cleaning Up
290
+
291
+ To remove the virtual environment:
292
+
293
+ ```bash
294
+ make distclean
295
+ ```
296
+
297
+
298
+ ### Contributing
299
+
300
+ We welcome contributions to Parrot! Please refer to the CONTRIBUTING.md file for guidelines on how to contribute.
@@ -1,10 +1,10 @@
1
1
  parrot/__init__.py,sha256=eTkAkHeJ5BBDG2fxrXA4M37ODBJoS1DQYpeBAWL2xeI,387
2
2
  parrot/conf.py,sha256=-9bVGC7Rf-6wpIg6-ojvU4S_G1wBLUCVDt46KEGHEhM,4257
3
- parrot/exceptions.cpython-311-x86_64-linux-gnu.so,sha256=gDwsnUlOlwphVU97XaqG5e7BJs_PWPKdwgwDsjyVZIg,361200
3
+ parrot/exceptions.cpython-311-x86_64-linux-gnu.so,sha256=VNyBh3uLxGQgB0l1bkWjQDqYUN2ZAvRmV12AqQijV9Q,361184
4
4
  parrot/manager.py,sha256=NhzXoWxSgtoWHpmYP8cV2Ujq_SlvCbQYQBaohAeL2TM,5935
5
5
  parrot/models.py,sha256=RsVQCqhSXBKRPcu-BCga9Y1wyvENFXDCuq3_ObIKvAo,13452
6
6
  parrot/py.typed,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
7
- parrot/version.py,sha256=4J1UyyW-XSClqr_4-Z-C1QFQ9XMZ0LjzaF_4UIlbgic,373
7
+ parrot/version.py,sha256=RknhCGT72EptwSfSvr4rE9_fulip0_-gBlla3iuIIi4,373
8
8
  parrot/chatbots/__init__.py,sha256=ypskCnME0xUv6psBEGCEyXCrD0J0ULHSllpVmSxqb4A,200
9
9
  parrot/chatbots/abstract.py,sha256=CmDn3k4r9uKImOZRN4L9zxLbCdC-1MPUAorDlfZT-kA,26421
10
10
  parrot/chatbots/asktroc.py,sha256=gyWzyvpAnmXwXd-3DEKoIJtAxt6NnP5mUZdZbkFky8s,604
@@ -56,7 +56,7 @@ parrot/loaders/excel.py,sha256=Y1agxm-jG4AgsA2wlPP3p8uBH40wYW1KM2ycTTLKUm4,12441
56
56
  parrot/loaders/github.py,sha256=CscyUIqoHTytqCbRUUTcV3QSxI8XoDntq5aTU0vdhzQ,2593
57
57
  parrot/loaders/image.py,sha256=A9KCXXoGuhDoyeJaascY7Q1ZK12Kf1ggE1drzJjS3AU,3946
58
58
  parrot/loaders/json.py,sha256=6B43k591OpvoJLbsJa8CxJue_lAt713SCdldn8bFW3c,1481
59
- parrot/loaders/pdf.py,sha256=flGlUf9dLAD2Uh8MkvLP27OU1nvroeHU2HM5a3rBH3M,7996
59
+ parrot/loaders/pdf.py,sha256=nyeT4emrewxeO2dUQxW3QOcdk1vg1JYtPKNAV8tThm0,17512
60
60
  parrot/loaders/pdfchapters.py,sha256=YhA8Cdx3qXBR0vuTVnQ12XgH1DXT_rp1Tawzh4V2U3o,5637
61
61
  parrot/loaders/pdffn.py,sha256=gA-vJEWUiIUwbMxP8Nmvlzlcb39DVV69vGKtSzavUoI,4004
62
62
  parrot/loaders/pdfimages.py,sha256=4Q_HKiAee_hALBsG2qF7PpMgKP1AivHXhmcsCkUa9eE,7899
@@ -68,7 +68,7 @@ parrot/loaders/repo.py,sha256=vBqBAnwU6p3_DCvI9DVhi1Bs8iCDYHwFGp0P9zvGRyw,3737
68
68
  parrot/loaders/rtd.py,sha256=oKOC9Qn3iwulYx5BEvXy4_kusKRsy5RLYNHS-e5p-1k,1988
69
69
  parrot/loaders/txt.py,sha256=AeGroWffFT--7TYlTSTr5Av5zAr8YIp1fxt8r5qdi-A,2802
70
70
  parrot/loaders/video.py,sha256=pl5Ho69bp5vrWMqg5tLbsnHUus1LByTDoL6NPk57Ays,2929
71
- parrot/loaders/videolocal.py,sha256=3EASzbettSO2tboTe3GndR4p6Nihwj6HGZoiPXekYo0,4302
71
+ parrot/loaders/videolocal.py,sha256=VBjtDIZ7CxkmgISXNr2Nc68MHa9-57qQr0uSxLIsAOU,4326
72
72
  parrot/loaders/vimeo.py,sha256=zOvOOIiaZr_bRswJFI7uIMKISgALOxcSim9ZRUFY1Fc,4114
73
73
  parrot/loaders/web.py,sha256=3x06JNpfTGFtvSBPAEBVoVdZkpVXePcJeMtj61B2xJk,8867
74
74
  parrot/loaders/web_base.py,sha256=5SjQddT0Vhne8C9s30iU3Ex_9O1PJ8kyDmy8EdhGBo0,4380
@@ -94,17 +94,17 @@ parrot/tools/wikipedia.py,sha256=oadBTRAupu2dKThEORSHqqVs4u0G9lWOltbP6vSZgPE,199
94
94
  parrot/tools/zipcode.py,sha256=knScSvKgK7bHxyLcBkZFiMs65e-PlYU2_YhG6mohcjU,6440
95
95
  parrot/utils/__init__.py,sha256=vkBIvfl9-0NRLd76MIZk4s49PjtF_dW5imLTv_UOKxM,101
96
96
  parrot/utils/toml.py,sha256=CVyqDdAEyOj6AHfNpyQe4IUvLP_SSXlbHROYPeadLvU,302
97
- parrot/utils/types.cpython-311-x86_64-linux-gnu.so,sha256=kdox48-JUzj92QP6amGOCTIEQhrBUMn6qzrhX1u17CY,791912
97
+ parrot/utils/types.cpython-311-x86_64-linux-gnu.so,sha256=jghuq8bBlgGDjkb88Efi5l9cgR5KZL_qO7yxglGNsTA,791256
98
98
  parrot/utils/uv.py,sha256=Mb09bsi13hhi3xQDBjEhCf-U1wherXl-K4-BLcSvqtc,308
99
99
  parrot/utils/parsers/__init__.py,sha256=l82uIu07QvSJ8Xt0d_seei9n7UUL8PE-YFGBTyNbxSI,62
100
- parrot/utils/parsers/toml.cpython-311-x86_64-linux-gnu.so,sha256=vdQTxL4AyxinDpoDVk0Syx-ycDL02OmXESJOtiVFl0A,451056
100
+ parrot/utils/parsers/toml.cpython-311-x86_64-linux-gnu.so,sha256=gEnv6QGF6DtxExEdVTdNx48j90wPYKBLyCH1UCRj4MQ,451088
101
101
  resources/users/__init__.py,sha256=sdXUV7h0Oogcdru1RrQxbm9_RcMjftf0zTWqvxBVpO8,151
102
102
  resources/users/handlers.py,sha256=BGzqBvPY_OaIF_nONWX4b_B5OyyBrdGuSihIsdlFwjk,291
103
103
  resources/users/models.py,sha256=glk7Emv7QCi6i32xRFDrGc8UwK23_LPg0XUOJoHnwRU,6799
104
104
  settings/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
105
105
  settings/settings.py,sha256=9ueEvyLNurUX-AaIeRPV8GKX1c4YjDLbksUAeqEq6Ck,1854
106
- ai_parrot-0.3.1.dist-info/LICENSE,sha256=vRKOoa7onTsLNvSzJtGtMaNhWWh8B3YAT733Tlu6M4o,1070
107
- ai_parrot-0.3.1.dist-info/METADATA,sha256=rtYKLZ9cdUv1OELsaBCbmadVUKnx14fMUraNIp5EbD0,10410
108
- ai_parrot-0.3.1.dist-info/WHEEL,sha256=tFO7F0mawMNWa_NzTDA1ygqZBeMykVNIr04O5Zxk1TE,113
109
- ai_parrot-0.3.1.dist-info/top_level.txt,sha256=qHoO4BhYDfeTkyKnciZSQtn5FSLN3Q-P5xCTkyvbuxg,26
110
- ai_parrot-0.3.1.dist-info/RECORD,,
106
+ ai_parrot-0.3.5.dist-info/LICENSE,sha256=vRKOoa7onTsLNvSzJtGtMaNhWWh8B3YAT733Tlu6M4o,1070
107
+ ai_parrot-0.3.5.dist-info/METADATA,sha256=G19tFikgbnhRqltqs2_OulmbuqdrA4Gwp2wW_dx5URk,9721
108
+ ai_parrot-0.3.5.dist-info/WHEEL,sha256=UQ-0qXN3LQUffjrV43_e_ZXj2pgORBqTmXipnkj0E8I,113
109
+ ai_parrot-0.3.5.dist-info/top_level.txt,sha256=qHoO4BhYDfeTkyKnciZSQtn5FSLN3Q-P5xCTkyvbuxg,26
110
+ ai_parrot-0.3.5.dist-info/RECORD,,
@@ -1,5 +1,5 @@
1
1
  Wheel-Version: 1.0
2
- Generator: setuptools (72.2.0)
2
+ Generator: setuptools (74.1.2)
3
3
  Root-Is-Purelib: false
4
4
  Tag: cp311-cp311-manylinux_2_28_x86_64
5
5
 
parrot/loaders/pdf.py CHANGED
@@ -2,10 +2,26 @@ from collections.abc import Callable
2
2
  from pathlib import Path, PurePath
3
3
  from typing import Any
4
4
  from io import BytesIO
5
+ import re
6
+ import ftfy
5
7
  import fitz
6
8
  import pytesseract
9
+ from paddleocr import PaddleOCR
10
+ import torch
11
+ import cv2
12
+ from transformers import (
13
+ # DonutProcessor,
14
+ # VisionEncoderDecoderModel,
15
+ # VisionEncoderDecoderConfig,
16
+ # ViTImageProcessor,
17
+ # AutoTokenizer,
18
+ LayoutLMv3ForTokenClassification,
19
+ LayoutLMv3Processor
20
+ )
21
+ from pdf4llm import to_markdown
7
22
  from PIL import Image
8
23
  from langchain.docstore.document import Document
24
+ from navconfig import config
9
25
  from .basepdf import BasePDF
10
26
 
11
27
 
@@ -31,6 +47,21 @@ class PDFLoader(BasePDF):
31
47
  **kwargs
32
48
  )
33
49
  self.parse_images = kwargs.get('parse_images', False)
50
+ self.page_as_images = kwargs.get('page_as_images', False)
51
+ if self.page_as_images is True:
52
+ # Load the processor and model from Hugging Face
53
+ self.image_processor = LayoutLMv3Processor.from_pretrained(
54
+ "microsoft/layoutlmv3-base",
55
+ apply_ocr=True
56
+ )
57
+ self.image_model = LayoutLMv3ForTokenClassification.from_pretrained(
58
+ # "microsoft/layoutlmv3-base-finetuned-funsd"
59
+ "HYPJUDY/layoutlmv3-base-finetuned-funsd"
60
+ )
61
+ # Set device to GPU if available
62
+ self.image_device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
63
+ self.image_model.to(self.image_device)
64
+
34
65
  # Table Settings:
35
66
  self.table_settings = {
36
67
  #"vertical_strategy": "text",
@@ -42,6 +73,134 @@ class PDFLoader(BasePDF):
42
73
  if table_settings:
43
74
  self.table_settings.update(table_settings)
44
75
 
76
+ def explain_image(self, image_path):
77
+ """Function to explain the image."""
78
+ # with open(image_path, "rb") as image_file:
79
+ # image_content = image_file.read()
80
+
81
+ # Open the image
82
+ image = cv2.imread(image_path)
83
+ task_prompt = "<s_docvqa><s_question>{user_input}</s_question><s_answer>"
84
+ question = "Extract Questions about Happily Greet"
85
+ prompt = task_prompt.replace("{user_input}", question)
86
+
87
+ decoder_input_ids = self.image_processor.tokenizer(
88
+ prompt,
89
+ add_special_tokens=False,
90
+ return_tensors="pt",
91
+ ).input_ids
92
+
93
+ pixel_values = self.image_processor(
94
+ image,
95
+ return_tensors="pt"
96
+ ).pixel_values
97
+
98
+ # Send inputs to the appropriate device
99
+ pixel_values = pixel_values.to(self.image_device)
100
+ decoder_input_ids = decoder_input_ids.to(self.image_device)
101
+
102
+ outputs = self.image_model.generate(
103
+ pixel_values,
104
+ decoder_input_ids=decoder_input_ids,
105
+ max_length=self.image_model.decoder.config.max_position_embeddings,
106
+ pad_token_id=self.image_processor.tokenizer.pad_token_id,
107
+ eos_token_id=self.image_processor.tokenizer.eos_token_id,
108
+ bad_words_ids=[[self.image_processor.tokenizer.unk_token_id]],
109
+ # use_cache=True
110
+ return_dict_in_generate=True,
111
+ )
112
+
113
+ sequence = self.image_processor.batch_decode(outputs.sequences)[0]
114
+
115
+
116
+ sequence = sequence.replace(
117
+ self.image_processor.tokenizer.eos_token, ""
118
+ ).replace(
119
+ self.image_processor.tokenizer.pad_token, ""
120
+ )
121
+ # remove first task start token
122
+ sequence = re.sub(r"<.*?>", "", sequence, count=1).strip()
123
+ # Print the extracted sequence
124
+ print("Extracted Text:", sequence)
125
+
126
+ print(self.image_processor.token2json(sequence))
127
+
128
+ # Format the output as Markdown (optional step)
129
+ markdown_text = self.format_as_markdown(sequence)
130
+ print("Markdown Format:\n", markdown_text)
131
+
132
+ return None
133
+
134
+ def convert_to_markdown(self, text):
135
+ """
136
+ Convert the cleaned text into a markdown format.
137
+ You can enhance this function to detect tables, headings, etc.
138
+ """
139
+ # For example, we can identify sections or headers and format them in Markdown
140
+ markdown_text = text
141
+ # Detect headings and bold them
142
+ markdown_text = re.sub(r"(^.*Scorecard.*$)", r"## \1", markdown_text)
143
+ # Convert lines with ":" to a list item (rough approach)
144
+ markdown_text = re.sub(r"(\w+):", r"- **\1**:", markdown_text)
145
+ # Return the markdown formatted text
146
+ return markdown_text
147
+
148
+ def clean_tokenized_text(self, tokenized_text):
149
+ """
150
+ Clean the tokenized text by fixing encoding issues and formatting, preserving line breaks.
151
+ """
152
+ # Fix encoding issues using ftfy
153
+ cleaned_text = ftfy.fix_text(tokenized_text)
154
+
155
+ # Remove <s> and </s> tags (special tokens)
156
+ cleaned_text = cleaned_text.replace("<s>", "").replace("</s>", "")
157
+
158
+ # Replace special characters like 'Ġ' and fix multiple spaces, preserving new lines
159
+ cleaned_text = cleaned_text.replace("Ġ", " ")
160
+
161
+ # Avoid collapsing line breaks, but still normalize multiple spaces
162
+ # Replace multiple spaces with a single space, but preserve line breaks
163
+ cleaned_text = re.sub(r" +", " ", cleaned_text)
164
+
165
+ return cleaned_text.strip()
166
+
167
+ def extract_page_text(self, image_path) -> str:
168
+ # Open the image
169
+ image = Image.open(image_path).convert("RGB")
170
+
171
+ # Processor handles the OCR internally, no need for words or boxes
172
+ encoding = self.image_processor(image, return_tensors="pt", truncation=True)
173
+ encoding = {k: v.to(self.image_device) for k, v in encoding.items()}
174
+
175
+ # Forward pass
176
+ outputs = self.image_model(**encoding)
177
+ logits = outputs.logits
178
+
179
+ # Get predictions
180
+ predictions = logits.argmax(-1).squeeze().tolist()
181
+ labels = [self.image_model.config.id2label[pred] for pred in predictions]
182
+
183
+ # Get the words and boxes from the processor's OCR step
184
+ words = self.image_processor.tokenizer.convert_ids_to_tokens(
185
+ encoding['input_ids'].squeeze().tolist()
186
+ )
187
+ boxes = encoding['bbox'].squeeze().tolist()
188
+
189
+ # Combine words and labels, preserving line breaks based on vertical box position
190
+ extracted_text = ""
191
+ last_box = None
192
+ for word, label, box in zip(words, labels, boxes):
193
+ if label != 'O':
194
+ # Check if the current word is on a new line based on the vertical position of the box
195
+ if last_box and abs(box[1] - last_box[1]) > 10: # A threshold for line breaks
196
+ extracted_text += "\n" # Add a line break
197
+
198
+ extracted_text += f"{word} "
199
+ last_box = box
200
+ cleaned_text = self.clean_tokenized_text(extracted_text)
201
+ markdown_text = self.convert_to_markdown(cleaned_text)
202
+ return markdown_text
203
+
45
204
  def _load_pdf(self, path: Path) -> list:
46
205
  """
47
206
  Load a PDF file using the Fitz library.
@@ -56,6 +215,32 @@ class PDFLoader(BasePDF):
56
215
  self.logger.info(f"Loading PDF file: {path}")
57
216
  pdf = fitz.open(str(path)) # Open the PDF file
58
217
  docs = []
218
+ try:
219
+ md_text = to_markdown(pdf) # get markdown for all pages
220
+ _meta = {
221
+ "url": f'{path}',
222
+ "source": f"{path.name}",
223
+ "filename": path.name,
224
+ "type": 'pdf',
225
+ "question": '',
226
+ "answer": '',
227
+ "source_type": self._source_type,
228
+ "data": {},
229
+ "summary": '',
230
+ "document_meta": {
231
+ "title": pdf.metadata.get("title", ""),
232
+ "creationDate": pdf.metadata.get("creationDate", ""),
233
+ "author": pdf.metadata.get("author", ""),
234
+ }
235
+ }
236
+ docs.append(
237
+ Document(
238
+ page_content=md_text,
239
+ metadata=_meta
240
+ )
241
+ )
242
+ except Exception:
243
+ pass
59
244
  for page_number in range(pdf.page_count):
60
245
  page = pdf[page_number]
61
246
  text = page.get_text()
@@ -79,12 +264,7 @@ class PDFLoader(BasePDF):
79
264
  "summary": summary,
80
265
  "document_meta": {
81
266
  "title": pdf.metadata.get("title", ""),
82
- # "subject": pdf.metadata.get("subject", ""),
83
- # "keywords": pdf.metadata.get("keywords", ""),
84
267
  "creationDate": pdf.metadata.get("creationDate", ""),
85
- # "modDate": pdf.metadata.get("modDate", ""),
86
- # "producer": pdf.metadata.get("producer", ""),
87
- # "creator": pdf.metadata.get("creator", ""),
88
268
  "author": pdf.metadata.get("author", ""),
89
269
  }
90
270
  }
@@ -96,9 +276,10 @@ class PDFLoader(BasePDF):
96
276
  )
97
277
  # Extract images and use OCR to get text from each image
98
278
  # second: images
279
+ file_name = path.stem.replace(' ', '_').replace('.', '').lower()
99
280
  if self.parse_images is True:
281
+ # extract any images in page:
100
282
  image_list = page.get_images(full=True)
101
- file_name = path.stem.replace(' ', '_').replace('.', '').lower()
102
283
  for img_index, img in enumerate(image_list):
103
284
  xref = img[0]
104
285
  base_image = pdf.extract_image(xref)
@@ -181,7 +362,69 @@ class PDFLoader(BasePDF):
181
362
  )
182
363
  except Exception as exc:
183
364
  print(exc)
365
+ # fourth: page as image
366
+ if self.page_as_images is True:
367
+ # Convert the page to a Pixmap (which is an image)
368
+ mat = fitz.Matrix(2, 2)
369
+ pix = page.get_pixmap(dpi=300, matrix=mat) # Increase DPI for better resolution
370
+ img_name = f'{file_name}_page_{page_num}.png'
371
+ img_path = self._imgdir.joinpath(img_name)
372
+ if img_path.exists():
373
+ img_path.unlink(missing_ok=True)
374
+ self.logger.notice(
375
+ f"Saving Page {page_number} as Image on {img_path}"
376
+ )
377
+ pix.save(
378
+ img_path
379
+ )
380
+ # TODO passing the image to a AI visual to get explanation
381
+ # Get the extracted text from the image
382
+ text = self.extract_page_text(img_path)
383
+ print('TEXT EXTRACTED >> ', text)
384
+ url = f'/static/images/{img_name}'
385
+ image_meta = {
386
+ "url": url,
387
+ "source": f"{path.name} Page.#{page_num}",
388
+ "filename": path.name,
389
+ "index": f"{path.name}:{page_num}",
390
+ "question": '',
391
+ "answer": '',
392
+ "type": 'page',
393
+ "data": {},
394
+ "summary": '',
395
+ "document_meta": {
396
+ "image_name": img_name,
397
+ "page_number": f"{page_number}"
398
+ },
399
+ "source_type": self._source_type
400
+ }
401
+ docs.append(
402
+ Document(page_content=text, metadata=image_meta)
403
+ )
184
404
  pdf.close()
185
405
  return docs
186
406
  else:
187
407
  return []
408
+
409
+ def get_ocr(self, img_path) -> list:
410
+ # Initialize PaddleOCR with table recognition
411
+ self.ocr_model = PaddleOCR(
412
+ lang='en',
413
+ det_model_dir=None,
414
+ rec_model_dir=None,
415
+ rec_char_dict_path=None,
416
+ table=True,
417
+ # use_angle_cls=True,
418
+ # use_gpu=True
419
+ )
420
+ result = self.ocr_model.ocr(img_path, cls=True)
421
+
422
+ # extract tables:
423
+ # The result contains the table structure and content
424
+ tables = []
425
+ for line in result:
426
+ if 'html' in line[1]:
427
+ html_table = line[1]['html']
428
+ tables.append(html_table)
429
+
430
+ print('TABLES > ', tables)
@@ -26,14 +26,15 @@ class VideoLocalLoader(BaseVideoLoader):
26
26
 
27
27
  def load_video(self, path: PurePath) -> list:
28
28
  metadata = {
29
- "source": f"{path}",
30
29
  "url": f"{path.name}",
31
- "index": path.stem,
30
+ "source": f"{path}",
32
31
  "filename": f"{path}",
32
+ "index": path.stem,
33
33
  "question": '',
34
34
  "answer": '',
35
35
  'type': 'video_transcript',
36
36
  "source_type": self._source_type,
37
+ "data": {},
37
38
  "summary": '',
38
39
  "document_meta": {
39
40
  "language": self._language,
parrot/version.py CHANGED
@@ -3,7 +3,7 @@
3
3
  __title__ = "ai-parrot"
4
4
  __description__ = "Live Chatbots based on Langchain chatbots and Agents \
5
5
  Integrated into Navigator Framework or used into aiohttp applications."
6
- __version__ = "0.3.1"
6
+ __version__ = "0.3.5"
7
7
  __author__ = "Jesus Lara"
8
8
  __author_email__ = "jesuslarag@gmail.com"
9
9
  __license__ = "MIT"
@@ -1,319 +0,0 @@
1
- Metadata-Version: 2.1
2
- Name: ai-parrot
3
- Version: 0.3.1
4
- Summary: Live Chatbots based on Langchain chatbots and Agents Integrated into Navigator Framework or used into aiohttp applications.
5
- Home-page: https://github.com/phenobarbital/ai-parrot
6
- Author: Jesus Lara
7
- Author-email: jesuslara@phenobarbital.info
8
- License: MIT
9
- Project-URL: Source, https://github.com/phenobarbital/ai-parrot
10
- Project-URL: Tracker, https://github.com/phenobarbital/ai-parrot/issues
11
- Project-URL: Documentation, https://github.com/phenobarbital/ai-parrot/
12
- Project-URL: Funding, https://paypal.me/phenobarbital
13
- Project-URL: Say Thanks!, https://saythanks.io/to/phenobarbital
14
- Keywords: asyncio,asyncpg,aioredis,aiomcache,langchain,chatbot,agents
15
- Platform: POSIX
16
- Classifier: Development Status :: 4 - Beta
17
- Classifier: Intended Audience :: Developers
18
- Classifier: Operating System :: POSIX :: Linux
19
- Classifier: Environment :: Web Environment
20
- Classifier: License :: OSI Approved :: MIT License
21
- Classifier: Topic :: Software Development :: Build Tools
22
- Classifier: Topic :: Software Development :: Libraries :: Python Modules
23
- Classifier: Programming Language :: Python :: 3.9
24
- Classifier: Programming Language :: Python :: 3.10
25
- Classifier: Programming Language :: Python :: 3.11
26
- Classifier: Programming Language :: Python :: 3.12
27
- Classifier: Programming Language :: Python :: 3 :: Only
28
- Classifier: Framework :: AsyncIO
29
- Requires-Python: >=3.10.12
30
- Description-Content-Type: text/markdown
31
- License-File: LICENSE
32
- Requires-Dist: Cython ==3.0.9
33
- Requires-Dist: pymupdf ==1.24.4
34
- Requires-Dist: pymupdf4llm ==0.0.1
35
- Requires-Dist: pdf4llm ==0.0.6
36
- Requires-Dist: PyPDF2 ==3.0.1
37
- Requires-Dist: pdfminer.six ==20231228
38
- Requires-Dist: pdfplumber ==0.11.0
39
- Requires-Dist: bitsandbytes ==0.43.0
40
- Requires-Dist: Cartopy ==0.22.0
41
- Requires-Dist: chromadb ==0.4.24
42
- Requires-Dist: contourpy ==1.2.0
43
- Requires-Dist: datasets ==2.18.0
44
- Requires-Dist: faiss-cpu ==1.8.0
45
- Requires-Dist: fastavro ==1.9.4
46
- Requires-Dist: GitPython ==3.1.42
47
- Requires-Dist: gunicorn ==21.2.0
48
- Requires-Dist: jq ==1.7.0
49
- Requires-Dist: rank-bm25 ==0.2.2
50
- Requires-Dist: matplotlib ==3.8.3
51
- Requires-Dist: numba ==0.59.0
52
- Requires-Dist: opentelemetry-sdk ==1.24.0
53
- Requires-Dist: rapidocr-onnxruntime ==1.3.15
54
- Requires-Dist: pytesseract ==0.3.10
55
- Requires-Dist: python-docx ==1.1.0
56
- Requires-Dist: python-pptx ==0.6.23
57
- Requires-Dist: docx2txt ==0.8
58
- Requires-Dist: pytube ==15.0.0
59
- Requires-Dist: pydub ==0.25.1
60
- Requires-Dist: markdownify ==0.12.1
61
- Requires-Dist: librosa ==0.10.1
62
- Requires-Dist: yt-dlp ==2024.4.9
63
- Requires-Dist: moviepy ==1.0.3
64
- Requires-Dist: safetensors ==0.4.2
65
- Requires-Dist: sentence-transformers ==2.6.1
66
- Requires-Dist: tabulate ==0.9.0
67
- Requires-Dist: tiktoken ==0.7.0
68
- Requires-Dist: tokenizers ==0.19.1
69
- Requires-Dist: unstructured ==0.14.3
70
- Requires-Dist: unstructured-client ==0.18.0
71
- Requires-Dist: uvloop ==0.19.0
72
- Requires-Dist: XlsxWriter ==3.2.0
73
- Requires-Dist: youtube-transcript-api ==0.6.2
74
- Requires-Dist: selenium ==4.18.1
75
- Requires-Dist: webdriver-manager ==4.0.1
76
- Requires-Dist: transitions ==0.9.0
77
- Requires-Dist: sentencepiece ==0.2.0
78
- Requires-Dist: duckduckgo-search ==5.3.0
79
- Requires-Dist: google-search-results ==2.4.2
80
- Requires-Dist: google-api-python-client >=2.86.0
81
- Requires-Dist: gdown ==5.1.0
82
- Requires-Dist: weasyprint ==61.2
83
- Requires-Dist: markdown2 ==2.4.13
84
- Requires-Dist: xformers ==0.0.25.post1
85
- Requires-Dist: fastembed ==0.3.4
86
- Requires-Dist: mammoth ==1.7.1
87
- Requires-Dist: accelerate ==0.29.3
88
- Requires-Dist: langchain >=0.2.6
89
- Requires-Dist: langchain-community >=0.2.6
90
- Requires-Dist: langchain-core ==0.2.32
91
- Requires-Dist: langchain-experimental ==0.0.62
92
- Requires-Dist: langchainhub ==0.1.15
93
- Requires-Dist: langchain-text-splitters ==0.2.2
94
- Requires-Dist: huggingface-hub ==0.23.5
95
- Requires-Dist: llama-index ==0.10.20
96
- Requires-Dist: llama-cpp-python ==0.2.56
97
- Requires-Dist: asyncdb[all] >=2.7.10
98
- Requires-Dist: querysource >=3.10.1
99
- Requires-Dist: yfinance ==0.2.40
100
- Requires-Dist: youtube-search ==2.1.2
101
- Requires-Dist: wikipedia ==1.4.0
102
- Requires-Dist: mediawikiapi ==1.2
103
- Requires-Dist: wikibase-rest-api-client ==0.2.0
104
- Requires-Dist: asknews ==0.7.30
105
- Requires-Dist: pyowm ==3.3.0
106
- Requires-Dist: O365 ==2.0.35
107
- Requires-Dist: langchain-huggingface ==0.0.3
108
- Requires-Dist: stackapi ==0.3.1
109
- Provides-Extra: all
110
- Requires-Dist: langchain-milvus ==0.1.1 ; extra == 'all'
111
- Requires-Dist: milvus ==2.3.5 ; extra == 'all'
112
- Requires-Dist: pymilvus ==2.4.4 ; extra == 'all'
113
- Requires-Dist: groq ==0.6.0 ; extra == 'all'
114
- Requires-Dist: langchain-groq ==0.1.4 ; extra == 'all'
115
- Requires-Dist: llama-index-llms-huggingface ==0.2.7 ; extra == 'all'
116
- Requires-Dist: langchain-google-vertexai ==1.0.8 ; extra == 'all'
117
- Requires-Dist: langchain-google-genai ==1.0.8 ; extra == 'all'
118
- Requires-Dist: google-generativeai ==0.7.2 ; extra == 'all'
119
- Requires-Dist: vertexai ==1.60.0 ; extra == 'all'
120
- Requires-Dist: google-cloud-aiplatform >=1.60.0 ; extra == 'all'
121
- Requires-Dist: grpc-google-iam-v1 ==0.13.0 ; extra == 'all'
122
- Requires-Dist: langchain-openai ==0.1.21 ; extra == 'all'
123
- Requires-Dist: openai ==1.40.8 ; extra == 'all'
124
- Requires-Dist: llama-index-llms-openai ==0.1.11 ; extra == 'all'
125
- Requires-Dist: langchain-anthropic ==0.1.23 ; extra == 'all'
126
- Requires-Dist: anthropic ==0.34.0 ; extra == 'all'
127
- Provides-Extra: analytics
128
- Requires-Dist: annoy ==1.17.3 ; extra == 'analytics'
129
- Requires-Dist: gradio-tools ==0.0.9 ; extra == 'analytics'
130
- Requires-Dist: gradio-client ==0.2.9 ; extra == 'analytics'
131
- Requires-Dist: streamlit ==1.37.1 ; extra == 'analytics'
132
- Requires-Dist: simsimd ==4.3.1 ; extra == 'analytics'
133
- Requires-Dist: opencv-python ==4.10.0.84 ; extra == 'analytics'
134
- Provides-Extra: anthropic
135
- Requires-Dist: langchain-anthropic ==0.1.11 ; extra == 'anthropic'
136
- Requires-Dist: anthropic ==0.25.2 ; extra == 'anthropic'
137
- Provides-Extra: crew
138
- Requires-Dist: colbert-ai ==0.2.19 ; extra == 'crew'
139
- Requires-Dist: vanna ==0.3.4 ; extra == 'crew'
140
- Requires-Dist: crewai[tools] ==0.28.8 ; extra == 'crew'
141
- Provides-Extra: google
142
- Requires-Dist: langchain-google-vertexai ==1.0.4 ; extra == 'google'
143
- Requires-Dist: langchain-google-genai ==1.0.4 ; extra == 'google'
144
- Requires-Dist: google-generativeai ==0.5.4 ; extra == 'google'
145
- Requires-Dist: vertexai ==1.49.0 ; extra == 'google'
146
- Requires-Dist: google-cloud-aiplatform ==1.49.0 ; extra == 'google'
147
- Requires-Dist: grpc-google-iam-v1 ==0.13.0 ; extra == 'google'
148
- Provides-Extra: groq
149
- Requires-Dist: groq ==0.6.0 ; extra == 'groq'
150
- Requires-Dist: langchain-groq ==0.1.4 ; extra == 'groq'
151
- Provides-Extra: hunggingfaces
152
- Requires-Dist: llama-index-llms-huggingface ==0.2.7 ; extra == 'hunggingfaces'
153
- Provides-Extra: milvus
154
- Requires-Dist: langchain-milvus ==0.1.1 ; extra == 'milvus'
155
- Requires-Dist: milvus ==2.3.5 ; extra == 'milvus'
156
- Requires-Dist: pymilvus ==2.4.4 ; extra == 'milvus'
157
- Provides-Extra: openai
158
- Requires-Dist: langchain-openai ==0.1.21 ; extra == 'openai'
159
- Requires-Dist: openai ==1.40.3 ; extra == 'openai'
160
- Requires-Dist: llama-index-llms-openai ==0.1.11 ; extra == 'openai'
161
- Requires-Dist: tiktoken ==0.7.0 ; extra == 'openai'
162
- Provides-Extra: qdrant
163
- Requires-Dist: qdrant-client ==1.8.0 ; extra == 'qdrant'
164
-
165
- # AI Parrot: Python package for creating Chatbots
166
- This is an open-source Python package for creating Chatbots based on Langchain and Navigator.
167
- This README provides instructions for installation, development, testing, and releasing Parrot.
168
-
169
- ## Installation
170
-
171
- **Creating a virtual environment:**
172
-
173
- This is recommended for development and isolation from system-wide libraries.
174
- Run the following command in your terminal:
175
-
176
- Debian-based systems installation:
177
- ```
178
- sudo apt install gcc python3.11-venv python3.11-full python3.11-dev libmemcached-dev zlib1g-dev build-essential libffi-dev unixodbc unixodbc-dev libsqliteodbc libev4 libev-dev
179
- ```
180
-
181
- For Qdrant installation:
182
- ```
183
- docker pull qdrant/qdrant
184
- docker run -d -p 6333:6333 -p 6334:6334 --name qdrant -v $(pwd)/qdrant_storage:/qdrant/storage:z qdrant/qdrant
185
- ```
186
-
187
- For VertexAI, creates a folder on "env" called "google" and copy the JSON credentials file into it.
188
-
189
- ```bash
190
- make venv
191
- ```
192
-
193
- This will create a virtual environment named `.venv`. To activate it, run:
194
-
195
- ```bash
196
- source .venv/bin/activate # Linux/macOS
197
- ```
198
-
199
- Once activated, install Parrot within the virtual environment:
200
-
201
- ```bash
202
- make install
203
- ```
204
- The output will remind you to activate the virtual environment before development.
205
-
206
- **Optional** (for developers):
207
- ```bash
208
- pip install -e .
209
- ```
210
-
211
- ## Start http server
212
- ```bash
213
- python run.py
214
- ```
215
-
216
- ## Development Setup
217
-
218
- This section explains how to set up your development environment:
219
-
220
- 1. **Install development requirements:**
221
-
222
- ```bash
223
- make setup
224
- ```
225
-
226
- This installs development dependencies like linters and test runners mentioned in the `docs/requirements-dev.txt` file.
227
-
228
- 2. **Install Parrot in editable mode:**
229
-
230
- This allows you to make changes to the code and test them without reinstalling:
231
-
232
- ```bash
233
- make dev
234
- ```
235
-
236
- This uses `flit` to install Parrot in editable mode.
237
-
238
-
239
- ### Usage (Replace with actual usage instructions)
240
-
241
- *Once you have set up your development environment, you can start using Parrot.*
242
-
243
- #### Test with Code ChatBOT
244
- * Set environment variables for:
245
- [google]
246
- GOOGLE_API_KEY=apikey
247
- GOOGLE_CREDENTIALS_FILE=.../credentials.json
248
- VERTEX_PROJECT_ID=vertex-project
249
- VERTEX_REGION=region
250
-
251
- * Run the chatbot:
252
- ```bash
253
- python examples/test_agent.py
254
- ```
255
-
256
- ### Testing
257
-
258
- To run the test suite:
259
-
260
- ```bash
261
- make test
262
- ```
263
-
264
- This will run tests using `coverage` to report on code coverage.
265
-
266
-
267
- ### Code Formatting
268
-
269
- To format the code with black:
270
-
271
- ```bash
272
- make format
273
- ```
274
-
275
-
276
- ### Linting
277
-
278
- To lint the code for style and potential errors:
279
-
280
- ```bash
281
- make lint
282
- ```
283
-
284
- This uses `pylint` and `black` to check for issues.
285
-
286
-
287
- ### Releasing a New Version
288
-
289
- This section outlines the steps for releasing a new version of Parrot:
290
-
291
- 1. **Ensure everything is clean and tested:**
292
-
293
- ```bash
294
- make release
295
- ```
296
-
297
- This runs `lint`, `test`, and `clean` tasks before proceeding.
298
-
299
- 2. **Publish the package:**
300
-
301
- ```bash
302
- make release
303
- ```
304
-
305
- This uses `flit` to publish the package to a repository like PyPI. You'll need to have publishing credentials configured for `flit`.
306
-
307
-
308
- ### Cleaning Up
309
-
310
- To remove the virtual environment:
311
-
312
- ```bash
313
- make distclean
314
- ```
315
-
316
-
317
- ### Contributing
318
-
319
- We welcome contributions to Parrot! Please refer to the CONTRIBUTING.md file for guidelines on how to contribute.