PyPI - ai-parrot - Versions diffs - 0.3.1__cp311-cp311-manylinux_2_28_x86_64.whl → 0.3.5__cp311-cp311-manylinux_2_28_x86_64.whl - Mend

ai-parrot 0.3.1__cp311-cp311-manylinux_2_28_x86_64.whl → 0.3.5__cp311-cp311-manylinux_2_28_x86_64.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of ai-parrot might be problematic. Click here for more details.

Files changed (12) hide show

ai_parrot-0.3.5.dist-info/METADATA +300 -0
{ai_parrot-0.3.1.dist-info → ai_parrot-0.3.5.dist-info}/RECORD +11 -11
{ai_parrot-0.3.1.dist-info → ai_parrot-0.3.5.dist-info}/WHEEL +1 -1
parrot/exceptions.cpython-311-x86_64-linux-gnu.so +0 -0
parrot/loaders/pdf.py +249 -6
parrot/loaders/videolocal.py +3 -2
parrot/utils/parsers/toml.cpython-311-x86_64-linux-gnu.so +0 -0
parrot/utils/types.cpython-311-x86_64-linux-gnu.so +0 -0
parrot/version.py +1 -1
ai_parrot-0.3.1.dist-info/METADATA +0 -319
{ai_parrot-0.3.1.dist-info → ai_parrot-0.3.5.dist-info}/LICENSE +0 -0
{ai_parrot-0.3.1.dist-info → ai_parrot-0.3.5.dist-info}/top_level.txt +0 -0

ai_parrot-0.3.5.dist-info/METADATA ADDED Viewed

@@ -0,0 +1,300 @@
+Metadata-Version: 2.1
+Name: ai-parrot
+Version: 0.3.5
+Summary: Live Chatbots based on Langchain chatbots and Agents     Integrated into Navigator Framework or used into aiohttp applications.
+Home-page: https://github.com/phenobarbital/ai-parrot
+Author: Jesus Lara
+Author-email: jesuslara@phenobarbital.info
+License: MIT
+Project-URL: Source, https://github.com/phenobarbital/ai-parrot
+Project-URL: Tracker, https://github.com/phenobarbital/ai-parrot/issues
+Project-URL: Documentation, https://github.com/phenobarbital/ai-parrot/
+Project-URL: Funding, https://paypal.me/phenobarbital
+Project-URL: Say Thanks!, https://saythanks.io/to/phenobarbital
+Keywords: asyncio,asyncpg,aioredis,aiomcache,langchain,chatbot,agents
+Platform: POSIX
+Classifier: Development Status :: 4 - Beta
+Classifier: Intended Audience :: Developers
+Classifier: Operating System :: POSIX :: Linux
+Classifier: Environment :: Web Environment
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Topic :: Software Development :: Build Tools
+Classifier: Topic :: Software Development :: Libraries :: Python Modules
+Classifier: Programming Language :: Python :: 3.9
+Classifier: Programming Language :: Python :: 3.10
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Classifier: Programming Language :: Python :: 3 :: Only
+Classifier: Framework :: AsyncIO
+Requires-Python: >=3.9.20
+Description-Content-Type: text/markdown
+License-File: LICENSE
+Requires-Dist: Cython==3.0.11
+Requires-Dist: accelerate==0.34.2
+Requires-Dist: langchain>=0.2.6
+Requires-Dist: langchain-community>=0.2.6
+Requires-Dist: langchain-core>=0.2.32
+Requires-Dist: langchain-experimental==0.0.62
+Requires-Dist: langchainhub==0.1.15
+Requires-Dist: langchain-text-splitters==0.2.2
+Requires-Dist: langchain-huggingface==0.0.3
+Requires-Dist: huggingface-hub==0.23.5
+Requires-Dist: llama-index==0.10.20
+Requires-Dist: llama-cpp-python==0.2.56
+Requires-Dist: bitsandbytes==0.43.3
+Requires-Dist: Cartopy==0.22.0
+Requires-Dist: chromadb==0.4.24
+Requires-Dist: datasets==2.18.0
+Requires-Dist: faiss-cpu==1.8.0
+Requires-Dist: fastavro==1.9.4
+Requires-Dist: gunicorn==21.2.0
+Requires-Dist: jq==1.7.0
+Requires-Dist: rank-bm25==0.2.2
+Requires-Dist: matplotlib==3.8.3
+Requires-Dist: numba==0.59.0
+Requires-Dist: querysource>=3.12.10
+Requires-Dist: safetensors>=0.4.3
+Requires-Dist: sentence-transformers==3.0.1
+Requires-Dist: tabulate==0.9.0
+Requires-Dist: tiktoken==0.7.0
+Requires-Dist: tokenizers==0.19.1
+Requires-Dist: selenium>=4.18.1
+Requires-Dist: webdriver-manager>=4.0.1
+Requires-Dist: transitions==0.9.0
+Requires-Dist: sentencepiece==0.2.0
+Requires-Dist: duckduckgo-search==5.3.0
+Requires-Dist: google-search-results==2.4.2
+Requires-Dist: google-api-python-client>=2.86.0
+Requires-Dist: gdown==5.1.0
+Requires-Dist: weasyprint==61.2
+Requires-Dist: markdown2==2.4.13
+Requires-Dist: fastembed==0.3.4
+Requires-Dist: yfinance==0.2.40
+Requires-Dist: youtube-search==2.1.2
+Requires-Dist: wikipedia==1.4.0
+Requires-Dist: mediawikiapi==1.2
+Requires-Dist: pyowm==3.3.0
+Requires-Dist: O365==2.0.35
+Requires-Dist: stackapi==0.3.1
+Requires-Dist: torchvision==0.19.1
+Requires-Dist: tf-keras==2.17.0
+Provides-Extra: analytics
+Requires-Dist: annoy==1.17.3; extra == "analytics"
+Requires-Dist: gradio-tools==0.0.9; extra == "analytics"
+Requires-Dist: gradio-client==0.2.9; extra == "analytics"
+Requires-Dist: streamlit==1.37.1; extra == "analytics"
+Requires-Dist: simsimd==4.3.1; extra == "analytics"
+Requires-Dist: opencv-python==4.10.0.84; extra == "analytics"
+Provides-Extra: anthropic
+Requires-Dist: langchain-anthropic==0.1.11; extra == "anthropic"
+Requires-Dist: anthropic==0.25.2; extra == "anthropic"
+Provides-Extra: crew
+Requires-Dist: colbert-ai==0.2.19; extra == "crew"
+Requires-Dist: vanna==0.3.4; extra == "crew"
+Requires-Dist: crewai[tools]==0.28.8; extra == "crew"
+Provides-Extra: google
+Requires-Dist: langchain-google-vertexai==1.0.10; extra == "google"
+Requires-Dist: langchain-google-genai==1.0.10; extra == "google"
+Requires-Dist: vertexai==1.65.0; extra == "google"
+Provides-Extra: groq
+Requires-Dist: groq==0.11.0; extra == "groq"
+Requires-Dist: langchain-groq==0.1.9; extra == "groq"
+Provides-Extra: hunggingfaces
+Requires-Dist: llama-index-llms-huggingface==0.2.7; extra == "hunggingfaces"
+Provides-Extra: loaders
+Requires-Dist: unstructured==0.14.3; extra == "loaders"
+Requires-Dist: unstructured-client==0.18.0; extra == "loaders"
+Requires-Dist: youtube-transcript-api==0.6.2; extra == "loaders"
+Requires-Dist: pymupdf==1.24.4; extra == "loaders"
+Requires-Dist: pymupdf4llm==0.0.1; extra == "loaders"
+Requires-Dist: pdf4llm==0.0.6; extra == "loaders"
+Requires-Dist: PyPDF2==3.0.1; extra == "loaders"
+Requires-Dist: pdfminer.six==20231228; extra == "loaders"
+Requires-Dist: pdfplumber==0.11.0; extra == "loaders"
+Requires-Dist: GitPython==3.1.42; extra == "loaders"
+Requires-Dist: opentelemetry-sdk==1.24.0; extra == "loaders"
+Requires-Dist: rapidocr-onnxruntime==1.3.15; extra == "loaders"
+Requires-Dist: pytesseract==0.3.10; extra == "loaders"
+Requires-Dist: python-docx==1.1.0; extra == "loaders"
+Requires-Dist: python-pptx==0.6.23; extra == "loaders"
+Requires-Dist: docx2txt==0.8; extra == "loaders"
+Requires-Dist: pytube==15.0.0; extra == "loaders"
+Requires-Dist: pydub==0.25.1; extra == "loaders"
+Requires-Dist: markdownify==0.12.1; extra == "loaders"
+Requires-Dist: yt-dlp==2024.4.9; extra == "loaders"
+Requires-Dist: moviepy==1.0.3; extra == "loaders"
+Requires-Dist: mammoth==1.7.1; extra == "loaders"
+Requires-Dist: paddlepaddle==2.6.1; extra == "loaders"
+Requires-Dist: paddlepaddle-gpu==2.6.1; extra == "loaders"
+Requires-Dist: paddleocr==2.8.1; extra == "loaders"
+Requires-Dist: ftfy==6.2.3; extra == "loaders"
+Requires-Dist: librosa==0.10.1; extra == "loaders"
+Requires-Dist: XlsxWriter==3.2.0; extra == "loaders"
+Requires-Dist: timm==1.0.9; extra == "loaders"
+Provides-Extra: milvus
+Requires-Dist: langchain-milvus>=0.1.4; extra == "milvus"
+Requires-Dist: milvus==2.3.5; extra == "milvus"
+Requires-Dist: pymilvus==2.4.6; extra == "milvus"
+Provides-Extra: openai
+Requires-Dist: langchain-openai==0.1.21; extra == "openai"
+Requires-Dist: openai==1.40.3; extra == "openai"
+Requires-Dist: llama-index-llms-openai==0.1.11; extra == "openai"
+Requires-Dist: tiktoken==0.7.0; extra == "openai"
+Provides-Extra: qdrant
+Requires-Dist: qdrant-client==1.8.0; extra == "qdrant"
+# AI Parrot: Python package for creating Chatbots
+This is an open-source Python package for creating Chatbots based on Langchain and Navigator.
+This README provides instructions for installation, development, testing, and releasing Parrot.
+## Installation
+**Creating a virtual environment:**
+This is recommended for development and isolation from system-wide libraries.
+Run the following command in your terminal:
+Debian-based systems installation:
+   ```
+   sudo apt install gcc python3.11-venv python3.11-full python3.11-dev libmemcached-dev zlib1g-dev build-essential libffi-dev unixodbc unixodbc-dev libsqliteodbc libev4 libev-dev
+   ```
+   For Qdrant installation:
+   ```
+   docker pull qdrant/qdrant
+   docker run -d -p 6333:6333 -p 6334:6334 --name qdrant -v $(pwd)/qdrant_storage:/qdrant/storage:z qdrant/qdrant
+   ```
+For VertexAI, creates a folder on "env" called "google" and copy the JSON credentials file into it.
+   ```bash
+   make venv
+   ```
+   This will create a virtual environment named `.venv`. To activate it, run:
+   ```bash
+   source .venv/bin/activate  # Linux/macOS
+   ```
+   Once activated, install Parrot within the virtual environment:
+   ```bash
+   make install
+   ```
+   The output will remind you to activate the virtual environment before development.
+   **Optional** (for developers):
+   ```bash
+   pip install -e .
+   ```
+## Start http server
+```bash
+python run.py
+```
+## Development Setup
+This section explains how to set up your development environment:
+1. **Install development requirements:**
+   ```bash
+   make setup
+   ```
+   This installs development dependencies like linters and test runners mentioned in the `docs/requirements-dev.txt` file.
+2. **Install Parrot in editable mode:**
+   This allows you to make changes to the code and test them without reinstalling:
+   ```bash
+   make dev
+   ```
+   This uses `flit` to install Parrot in editable mode.
+### Usage (Replace with actual usage instructions)
+*Once you have set up your development environment, you can start using Parrot.*
+#### Test with Code ChatBOT
+* Set environment variables for:
+ [google]
+   GOOGLE_API_KEY=apikey
+   GOOGLE_CREDENTIALS_FILE=.../credentials.json
+   VERTEX_PROJECT_ID=vertex-project
+   VERTEX_REGION=region
+* Run the chatbot:
+    ```bash
+    python examples/test_agent.py
+    ```
+### Testing
+To run the test suite:
+```bash
+make test
+```
+This will run tests using `coverage` to report on code coverage.
+### Code Formatting
+To format the code with black:
+```bash
+make format
+```
+### Linting
+To lint the code for style and potential errors:
+```bash
+make lint
+```
+This uses `pylint` and `black` to check for issues.
+### Releasing a New Version
+This section outlines the steps for releasing a new version of Parrot:
+1. **Ensure everything is clean and tested:**
+   ```bash
+   make release
+   ```
+   This runs `lint`, `test`, and `clean` tasks before proceeding.
+2. **Publish the package:**
+   ```bash
+   make release
+   ```
+   This uses `flit` to publish the package to a repository like PyPI. You'll need to have publishing credentials configured for `flit`.
+### Cleaning Up
+To remove the virtual environment:
+```bash
+make distclean
+```
+### Contributing
+We welcome contributions to Parrot! Please refer to the CONTRIBUTING.md file for guidelines on how to contribute.

{ai_parrot-0.3.1.dist-info → ai_parrot-0.3.5.dist-info}/RECORD RENAMED Viewed

@@ -1,10 +1,10 @@
 parrot/__init__.py,sha256=eTkAkHeJ5BBDG2fxrXA4M37ODBJoS1DQYpeBAWL2xeI,387
 parrot/conf.py,sha256=-9bVGC7Rf-6wpIg6-ojvU4S_G1wBLUCVDt46KEGHEhM,4257
-parrot/exceptions.cpython-311-x86_64-linux-gnu.so,sha256=gDwsnUlOlwphVU97XaqG5e7BJs_PWPKdwgwDsjyVZIg,361200
+parrot/exceptions.cpython-311-x86_64-linux-gnu.so,sha256=VNyBh3uLxGQgB0l1bkWjQDqYUN2ZAvRmV12AqQijV9Q,361184
 parrot/manager.py,sha256=NhzXoWxSgtoWHpmYP8cV2Ujq_SlvCbQYQBaohAeL2TM,5935
 parrot/models.py,sha256=RsVQCqhSXBKRPcu-BCga9Y1wyvENFXDCuq3_ObIKvAo,13452
 parrot/py.typed,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
-parrot/version.py,sha256=4J1UyyW-XSClqr_4-Z-C1QFQ9XMZ0LjzaF_4UIlbgic,373
+parrot/version.py,sha256=RknhCGT72EptwSfSvr4rE9_fulip0_-gBlla3iuIIi4,373
 parrot/chatbots/__init__.py,sha256=ypskCnME0xUv6psBEGCEyXCrD0J0ULHSllpVmSxqb4A,200
 parrot/chatbots/abstract.py,sha256=CmDn3k4r9uKImOZRN4L9zxLbCdC-1MPUAorDlfZT-kA,26421
 parrot/chatbots/asktroc.py,sha256=gyWzyvpAnmXwXd-3DEKoIJtAxt6NnP5mUZdZbkFky8s,604
@@ -56,7 +56,7 @@ parrot/loaders/excel.py,sha256=Y1agxm-jG4AgsA2wlPP3p8uBH40wYW1KM2ycTTLKUm4,12441
 parrot/loaders/github.py,sha256=CscyUIqoHTytqCbRUUTcV3QSxI8XoDntq5aTU0vdhzQ,2593
 parrot/loaders/image.py,sha256=A9KCXXoGuhDoyeJaascY7Q1ZK12Kf1ggE1drzJjS3AU,3946
 parrot/loaders/json.py,sha256=6B43k591OpvoJLbsJa8CxJue_lAt713SCdldn8bFW3c,1481
-parrot/loaders/pdf.py,sha256=flGlUf9dLAD2Uh8MkvLP27OU1nvroeHU2HM5a3rBH3M,7996
+parrot/loaders/pdf.py,sha256=nyeT4emrewxeO2dUQxW3QOcdk1vg1JYtPKNAV8tThm0,17512
 parrot/loaders/pdfchapters.py,sha256=YhA8Cdx3qXBR0vuTVnQ12XgH1DXT_rp1Tawzh4V2U3o,5637
 parrot/loaders/pdffn.py,sha256=gA-vJEWUiIUwbMxP8Nmvlzlcb39DVV69vGKtSzavUoI,4004
 parrot/loaders/pdfimages.py,sha256=4Q_HKiAee_hALBsG2qF7PpMgKP1AivHXhmcsCkUa9eE,7899
@@ -68,7 +68,7 @@ parrot/loaders/repo.py,sha256=vBqBAnwU6p3_DCvI9DVhi1Bs8iCDYHwFGp0P9zvGRyw,3737
 parrot/loaders/rtd.py,sha256=oKOC9Qn3iwulYx5BEvXy4_kusKRsy5RLYNHS-e5p-1k,1988
 parrot/loaders/txt.py,sha256=AeGroWffFT--7TYlTSTr5Av5zAr8YIp1fxt8r5qdi-A,2802
 parrot/loaders/video.py,sha256=pl5Ho69bp5vrWMqg5tLbsnHUus1LByTDoL6NPk57Ays,2929
-parrot/loaders/videolocal.py,sha256=3EASzbettSO2tboTe3GndR4p6Nihwj6HGZoiPXekYo0,4302
+parrot/loaders/videolocal.py,sha256=VBjtDIZ7CxkmgISXNr2Nc68MHa9-57qQr0uSxLIsAOU,4326
 parrot/loaders/vimeo.py,sha256=zOvOOIiaZr_bRswJFI7uIMKISgALOxcSim9ZRUFY1Fc,4114
 parrot/loaders/web.py,sha256=3x06JNpfTGFtvSBPAEBVoVdZkpVXePcJeMtj61B2xJk,8867
 parrot/loaders/web_base.py,sha256=5SjQddT0Vhne8C9s30iU3Ex_9O1PJ8kyDmy8EdhGBo0,4380
@@ -94,17 +94,17 @@ parrot/tools/wikipedia.py,sha256=oadBTRAupu2dKThEORSHqqVs4u0G9lWOltbP6vSZgPE,199
 parrot/tools/zipcode.py,sha256=knScSvKgK7bHxyLcBkZFiMs65e-PlYU2_YhG6mohcjU,6440
 parrot/utils/__init__.py,sha256=vkBIvfl9-0NRLd76MIZk4s49PjtF_dW5imLTv_UOKxM,101
 parrot/utils/toml.py,sha256=CVyqDdAEyOj6AHfNpyQe4IUvLP_SSXlbHROYPeadLvU,302
-parrot/utils/types.cpython-311-x86_64-linux-gnu.so,sha256=kdox48-JUzj92QP6amGOCTIEQhrBUMn6qzrhX1u17CY,791912
+parrot/utils/types.cpython-311-x86_64-linux-gnu.so,sha256=jghuq8bBlgGDjkb88Efi5l9cgR5KZL_qO7yxglGNsTA,791256
 parrot/utils/uv.py,sha256=Mb09bsi13hhi3xQDBjEhCf-U1wherXl-K4-BLcSvqtc,308
 parrot/utils/parsers/__init__.py,sha256=l82uIu07QvSJ8Xt0d_seei9n7UUL8PE-YFGBTyNbxSI,62
-parrot/utils/parsers/toml.cpython-311-x86_64-linux-gnu.so,sha256=vdQTxL4AyxinDpoDVk0Syx-ycDL02OmXESJOtiVFl0A,451056
+parrot/utils/parsers/toml.cpython-311-x86_64-linux-gnu.so,sha256=gEnv6QGF6DtxExEdVTdNx48j90wPYKBLyCH1UCRj4MQ,451088
 resources/users/__init__.py,sha256=sdXUV7h0Oogcdru1RrQxbm9_RcMjftf0zTWqvxBVpO8,151
 resources/users/handlers.py,sha256=BGzqBvPY_OaIF_nONWX4b_B5OyyBrdGuSihIsdlFwjk,291
 resources/users/models.py,sha256=glk7Emv7QCi6i32xRFDrGc8UwK23_LPg0XUOJoHnwRU,6799
 settings/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
 settings/settings.py,sha256=9ueEvyLNurUX-AaIeRPV8GKX1c4YjDLbksUAeqEq6Ck,1854
-ai_parrot-0.3.1.dist-info/LICENSE,sha256=vRKOoa7onTsLNvSzJtGtMaNhWWh8B3YAT733Tlu6M4o,1070
-ai_parrot-0.3.1.dist-info/METADATA,sha256=rtYKLZ9cdUv1OELsaBCbmadVUKnx14fMUraNIp5EbD0,10410
-ai_parrot-0.3.1.dist-info/WHEEL,sha256=tFO7F0mawMNWa_NzTDA1ygqZBeMykVNIr04O5Zxk1TE,113
-ai_parrot-0.3.1.dist-info/top_level.txt,sha256=qHoO4BhYDfeTkyKnciZSQtn5FSLN3Q-P5xCTkyvbuxg,26
-ai_parrot-0.3.1.dist-info/RECORD,,
+ai_parrot-0.3.5.dist-info/LICENSE,sha256=vRKOoa7onTsLNvSzJtGtMaNhWWh8B3YAT733Tlu6M4o,1070
+ai_parrot-0.3.5.dist-info/METADATA,sha256=G19tFikgbnhRqltqs2_OulmbuqdrA4Gwp2wW_dx5URk,9721
+ai_parrot-0.3.5.dist-info/WHEEL,sha256=UQ-0qXN3LQUffjrV43_e_ZXj2pgORBqTmXipnkj0E8I,113
+ai_parrot-0.3.5.dist-info/top_level.txt,sha256=qHoO4BhYDfeTkyKnciZSQtn5FSLN3Q-P5xCTkyvbuxg,26
+ai_parrot-0.3.5.dist-info/RECORD,,

{ai_parrot-0.3.1.dist-info → ai_parrot-0.3.5.dist-info}/WHEEL RENAMED Viewed

@@ -1,5 +1,5 @@
 Wheel-Version: 1.0
-Generator: setuptools (72.2.0)
+Generator: setuptools (74.1.2)
 Root-Is-Purelib: false
 Tag: cp311-cp311-manylinux_2_28_x86_64

parrot/exceptions.cpython-311-x86_64-linux-gnu.so CHANGED Viewed

Binary file

parrot/loaders/pdf.py CHANGED Viewed

@@ -2,10 +2,26 @@ from collections.abc import Callable
 from pathlib import Path, PurePath
 from typing import Any
 from io import BytesIO
+import re
+import ftfy
 import fitz
 import pytesseract
+from paddleocr import PaddleOCR
+import torch
+import cv2
+from transformers import (
+    # DonutProcessor,
+    # VisionEncoderDecoderModel,
+    # VisionEncoderDecoderConfig,
+    # ViTImageProcessor,
+    # AutoTokenizer,
+    LayoutLMv3ForTokenClassification,
+    LayoutLMv3Processor
+)
+from pdf4llm import to_markdown
 from PIL import Image
 from langchain.docstore.document import Document
+from navconfig import config
 from .basepdf import BasePDF
@@ -31,6 +47,21 @@ class PDFLoader(BasePDF):
             **kwargs
         )
         self.parse_images = kwargs.get('parse_images', False)
+        self.page_as_images = kwargs.get('page_as_images', False)
+        if self.page_as_images is True:
+            # Load the processor and model from Hugging Face
+            self.image_processor = LayoutLMv3Processor.from_pretrained(
+                "microsoft/layoutlmv3-base",
+                apply_ocr=True
+            )
+            self.image_model = LayoutLMv3ForTokenClassification.from_pretrained(
+                # "microsoft/layoutlmv3-base-finetuned-funsd"
+                "HYPJUDY/layoutlmv3-base-finetuned-funsd"
+            )
+            # Set device to GPU if available
+            self.image_device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+            self.image_model.to(self.image_device)
         # Table Settings:
         self.table_settings = {
             #"vertical_strategy": "text",
@@ -42,6 +73,134 @@ class PDFLoader(BasePDF):
         if table_settings:
             self.table_settings.update(table_settings)
+    def explain_image(self, image_path):
+        """Function to explain the image."""
+        # with open(image_path, "rb") as image_file:
+        #     image_content = image_file.read()
+        # Open the image
+        image = cv2.imread(image_path)
+        task_prompt = "<s_docvqa><s_question>{user_input}</s_question><s_answer>"
+        question = "Extract Questions about Happily Greet"
+        prompt = task_prompt.replace("{user_input}", question)
+        decoder_input_ids = self.image_processor.tokenizer(
+            prompt,
+            add_special_tokens=False,
+            return_tensors="pt",
+        ).input_ids
+        pixel_values = self.image_processor(
+            image,
+            return_tensors="pt"
+        ).pixel_values
+        # Send inputs to the appropriate device
+        pixel_values = pixel_values.to(self.image_device)
+        decoder_input_ids = decoder_input_ids.to(self.image_device)
+        outputs = self.image_model.generate(
+            pixel_values,
+            decoder_input_ids=decoder_input_ids,
+            max_length=self.image_model.decoder.config.max_position_embeddings,
+            pad_token_id=self.image_processor.tokenizer.pad_token_id,
+            eos_token_id=self.image_processor.tokenizer.eos_token_id,
+            bad_words_ids=[[self.image_processor.tokenizer.unk_token_id]],
+            # use_cache=True
+            return_dict_in_generate=True,
+        )
+        sequence = self.image_processor.batch_decode(outputs.sequences)[0]
+        sequence = sequence.replace(
+            self.image_processor.tokenizer.eos_token, ""
+        ).replace(
+            self.image_processor.tokenizer.pad_token, ""
+        )
+        # remove first task start token
+        sequence = re.sub(r"<.*?>", "", sequence, count=1).strip()
+        # Print the extracted sequence
+        print("Extracted Text:", sequence)
+        print(self.image_processor.token2json(sequence))
+        # Format the output as Markdown (optional step)
+        markdown_text = self.format_as_markdown(sequence)
+        print("Markdown Format:\n", markdown_text)
+        return None
+    def convert_to_markdown(self, text):
+        """
+        Convert the cleaned text into a markdown format.
+        You can enhance this function to detect tables, headings, etc.
+        """
+        # For example, we can identify sections or headers and format them in Markdown
+        markdown_text = text
+        # Detect headings and bold them
+        markdown_text = re.sub(r"(^.*Scorecard.*$)", r"## \1", markdown_text)
+        # Convert lines with ":" to a list item (rough approach)
+        markdown_text = re.sub(r"(\w+):", r"- **\1**:", markdown_text)
+        # Return the markdown formatted text
+        return markdown_text
+    def clean_tokenized_text(self, tokenized_text):
+        """
+        Clean the tokenized text by fixing encoding issues and formatting, preserving line breaks.
+        """
+        # Fix encoding issues using ftfy
+        cleaned_text = ftfy.fix_text(tokenized_text)
+        # Remove <s> and </s> tags (special tokens)
+        cleaned_text = cleaned_text.replace("<s>", "").replace("</s>", "")
+        # Replace special characters like 'Ġ' and fix multiple spaces, preserving new lines
+        cleaned_text = cleaned_text.replace("Ġ", " ")
+        # Avoid collapsing line breaks, but still normalize multiple spaces
+        # Replace multiple spaces with a single space, but preserve line breaks
+        cleaned_text = re.sub(r" +", " ", cleaned_text)
+        return cleaned_text.strip()
+    def extract_page_text(self, image_path) -> str:
+        # Open the image
+        image = Image.open(image_path).convert("RGB")
+        # Processor handles the OCR internally, no need for words or boxes
+        encoding = self.image_processor(image, return_tensors="pt", truncation=True)
+        encoding = {k: v.to(self.image_device) for k, v in encoding.items()}
+        # Forward pass
+        outputs = self.image_model(**encoding)
+        logits = outputs.logits
+        # Get predictions
+        predictions = logits.argmax(-1).squeeze().tolist()
+        labels = [self.image_model.config.id2label[pred] for pred in predictions]
+        # Get the words and boxes from the processor's OCR step
+        words = self.image_processor.tokenizer.convert_ids_to_tokens(
+            encoding['input_ids'].squeeze().tolist()
+        )
+        boxes = encoding['bbox'].squeeze().tolist()
+        # Combine words and labels, preserving line breaks based on vertical box position
+        extracted_text = ""
+        last_box = None
+        for word, label, box in zip(words, labels, boxes):
+            if label != 'O':
+                # Check if the current word is on a new line based on the vertical position of the box
+                if last_box and abs(box[1] - last_box[1]) > 10:  # A threshold for line breaks
+                    extracted_text += "\n"  # Add a line break
+                extracted_text += f"{word} "
+                last_box = box
+        cleaned_text = self.clean_tokenized_text(extracted_text)
+        markdown_text = self.convert_to_markdown(cleaned_text)
+        return markdown_text
     def _load_pdf(self, path: Path) -> list:
         """
         Load a PDF file using the Fitz library.
@@ -56,6 +215,32 @@ class PDFLoader(BasePDF):
             self.logger.info(f"Loading PDF file: {path}")
             pdf = fitz.open(str(path))  # Open the PDF file
             docs = []
+            try:
+                md_text = to_markdown(pdf) # get markdown for all pages
+                _meta = {
+                    "url": f'{path}',
+                    "source": f"{path.name}",
+                    "filename": path.name,
+                    "type": 'pdf',
+                    "question": '',
+                    "answer": '',
+                    "source_type": self._source_type,
+                    "data": {},
+                    "summary": '',
+                    "document_meta": {
+                        "title": pdf.metadata.get("title", ""),
+                        "creationDate": pdf.metadata.get("creationDate", ""),
+                        "author": pdf.metadata.get("author", ""),
+                    }
+                }
+                docs.append(
+                    Document(
+                        page_content=md_text,
+                        metadata=_meta
+                    )
+                )
+            except Exception:
+                pass
             for page_number in range(pdf.page_count):
                 page = pdf[page_number]
                 text = page.get_text()
@@ -79,12 +264,7 @@ class PDFLoader(BasePDF):
                         "summary": summary,
                         "document_meta": {
                             "title": pdf.metadata.get("title", ""),
-                            # "subject": pdf.metadata.get("subject", ""),
-                            # "keywords": pdf.metadata.get("keywords", ""),
                             "creationDate": pdf.metadata.get("creationDate", ""),
-                            # "modDate": pdf.metadata.get("modDate", ""),
-                            # "producer": pdf.metadata.get("producer", ""),
-                            # "creator": pdf.metadata.get("creator", ""),
                             "author": pdf.metadata.get("author", ""),
                         }
                     }
@@ -96,9 +276,10 @@ class PDFLoader(BasePDF):
                     )
                 # Extract images and use OCR to get text from each image
                 # second: images
+                file_name = path.stem.replace(' ', '_').replace('.', '').lower()
                 if self.parse_images is True:
+                    # extract any images in page:
                     image_list = page.get_images(full=True)
-                    file_name = path.stem.replace(' ', '_').replace('.', '').lower()
                     for img_index, img in enumerate(image_list):
                         xref = img[0]
                         base_image = pdf.extract_image(xref)
@@ -181,7 +362,69 @@ class PDFLoader(BasePDF):
                             )
                 except Exception as exc:
                     print(exc)
+                # fourth: page as image
+                if self.page_as_images is True:
+                    # Convert the page to a Pixmap (which is an image)
+                    mat = fitz.Matrix(2, 2)
+                    pix = page.get_pixmap(dpi=300, matrix=mat) # Increase DPI for better resolution
+                    img_name = f'{file_name}_page_{page_num}.png'
+                    img_path = self._imgdir.joinpath(img_name)
+                    if img_path.exists():
+                        img_path.unlink(missing_ok=True)
+                    self.logger.notice(
+                        f"Saving Page {page_number} as Image on {img_path}"
+                    )
+                    pix.save(
+                        img_path
+                    )
+                    # TODO passing the image to a AI visual to get explanation
+                    # Get the extracted text from the image
+                    text = self.extract_page_text(img_path)
+                    print('TEXT EXTRACTED >> ', text)
+                    url = f'/static/images/{img_name}'
+                    image_meta = {
+                        "url": url,
+                        "source": f"{path.name} Page.#{page_num}",
+                        "filename": path.name,
+                        "index": f"{path.name}:{page_num}",
+                        "question": '',
+                        "answer": '',
+                        "type": 'page',
+                        "data": {},
+                        "summary": '',
+                        "document_meta": {
+                            "image_name": img_name,
+                            "page_number": f"{page_number}"
+                        },
+                        "source_type": self._source_type
+                    }
+                    docs.append(
+                        Document(page_content=text, metadata=image_meta)
+                    )
             pdf.close()
             return docs
         else:
             return []
+    def get_ocr(self, img_path) -> list:
+        # Initialize PaddleOCR with table recognition
+        self.ocr_model = PaddleOCR(
+            lang='en',
+            det_model_dir=None,
+            rec_model_dir=None,
+            rec_char_dict_path=None,
+            table=True,
+            # use_angle_cls=True,
+            # use_gpu=True
+        )
+        result = self.ocr_model.ocr(img_path, cls=True)
+        # extract tables:
+        # The result contains the table structure and content
+        tables = []
+        for line in result:
+            if 'html' in line[1]:
+                html_table = line[1]['html']
+                tables.append(html_table)
+        print('TABLES > ', tables)

parrot/loaders/videolocal.py CHANGED Viewed

@@ -26,14 +26,15 @@ class VideoLocalLoader(BaseVideoLoader):
     def load_video(self, path: PurePath) -> list:
         metadata = {
-            "source": f"{path}",
             "url": f"{path.name}",
-            "index": path.stem,
+            "source": f"{path}",
             "filename": f"{path}",
+            "index": path.stem,
             "question": '',
             "answer": '',
             'type': 'video_transcript',
             "source_type": self._source_type,
+            "data": {},
             "summary": '',
             "document_meta": {
                 "language": self._language,

parrot/utils/parsers/toml.cpython-311-x86_64-linux-gnu.so CHANGED Viewed

Binary file

parrot/utils/types.cpython-311-x86_64-linux-gnu.so CHANGED Viewed

Binary file

parrot/version.py CHANGED Viewed

@@ -3,7 +3,7 @@
 __title__ = "ai-parrot"
 __description__ = "Live Chatbots based on Langchain chatbots and Agents \
     Integrated into Navigator Framework or used into aiohttp applications."
-__version__ = "0.3.1"
+__version__ = "0.3.5"
 __author__ = "Jesus Lara"
 __author_email__ = "jesuslarag@gmail.com"
 __license__ = "MIT"

ai_parrot-0.3.1.dist-info/METADATA DELETED Viewed

@@ -1,319 +0,0 @@
-Metadata-Version: 2.1
-Name: ai-parrot
-Version: 0.3.1
-Summary: Live Chatbots based on Langchain chatbots and Agents     Integrated into Navigator Framework or used into aiohttp applications.
-Home-page: https://github.com/phenobarbital/ai-parrot
-Author: Jesus Lara
-Author-email: jesuslara@phenobarbital.info
-License: MIT
-Project-URL: Source, https://github.com/phenobarbital/ai-parrot
-Project-URL: Tracker, https://github.com/phenobarbital/ai-parrot/issues
-Project-URL: Documentation, https://github.com/phenobarbital/ai-parrot/
-Project-URL: Funding, https://paypal.me/phenobarbital
-Project-URL: Say Thanks!, https://saythanks.io/to/phenobarbital
-Keywords: asyncio,asyncpg,aioredis,aiomcache,langchain,chatbot,agents
-Platform: POSIX
-Classifier: Development Status :: 4 - Beta
-Classifier: Intended Audience :: Developers
-Classifier: Operating System :: POSIX :: Linux
-Classifier: Environment :: Web Environment
-Classifier: License :: OSI Approved :: MIT License
-Classifier: Topic :: Software Development :: Build Tools
-Classifier: Topic :: Software Development :: Libraries :: Python Modules
-Classifier: Programming Language :: Python :: 3.9
-Classifier: Programming Language :: Python :: 3.10
-Classifier: Programming Language :: Python :: 3.11
-Classifier: Programming Language :: Python :: 3.12
-Classifier: Programming Language :: Python :: 3 :: Only
-Classifier: Framework :: AsyncIO
-Requires-Python: >=3.10.12
-Description-Content-Type: text/markdown
-License-File: LICENSE
-Requires-Dist: Cython ==3.0.9
-Requires-Dist: pymupdf ==1.24.4
-Requires-Dist: pymupdf4llm ==0.0.1
-Requires-Dist: pdf4llm ==0.0.6
-Requires-Dist: PyPDF2 ==3.0.1
-Requires-Dist: pdfminer.six ==20231228
-Requires-Dist: pdfplumber ==0.11.0
-Requires-Dist: bitsandbytes ==0.43.0
-Requires-Dist: Cartopy ==0.22.0
-Requires-Dist: chromadb ==0.4.24
-Requires-Dist: contourpy ==1.2.0
-Requires-Dist: datasets ==2.18.0
-Requires-Dist: faiss-cpu ==1.8.0
-Requires-Dist: fastavro ==1.9.4
-Requires-Dist: GitPython ==3.1.42
-Requires-Dist: gunicorn ==21.2.0
-Requires-Dist: jq ==1.7.0
-Requires-Dist: rank-bm25 ==0.2.2
-Requires-Dist: matplotlib ==3.8.3
-Requires-Dist: numba ==0.59.0
-Requires-Dist: opentelemetry-sdk ==1.24.0
-Requires-Dist: rapidocr-onnxruntime ==1.3.15
-Requires-Dist: pytesseract ==0.3.10
-Requires-Dist: python-docx ==1.1.0
-Requires-Dist: python-pptx ==0.6.23
-Requires-Dist: docx2txt ==0.8
-Requires-Dist: pytube ==15.0.0
-Requires-Dist: pydub ==0.25.1
-Requires-Dist: markdownify ==0.12.1
-Requires-Dist: librosa ==0.10.1
-Requires-Dist: yt-dlp ==2024.4.9
-Requires-Dist: moviepy ==1.0.3
-Requires-Dist: safetensors ==0.4.2
-Requires-Dist: sentence-transformers ==2.6.1
-Requires-Dist: tabulate ==0.9.0
-Requires-Dist: tiktoken ==0.7.0
-Requires-Dist: tokenizers ==0.19.1
-Requires-Dist: unstructured ==0.14.3
-Requires-Dist: unstructured-client ==0.18.0
-Requires-Dist: uvloop ==0.19.0
-Requires-Dist: XlsxWriter ==3.2.0
-Requires-Dist: youtube-transcript-api ==0.6.2
-Requires-Dist: selenium ==4.18.1
-Requires-Dist: webdriver-manager ==4.0.1
-Requires-Dist: transitions ==0.9.0
-Requires-Dist: sentencepiece ==0.2.0
-Requires-Dist: duckduckgo-search ==5.3.0
-Requires-Dist: google-search-results ==2.4.2
-Requires-Dist: google-api-python-client >=2.86.0
-Requires-Dist: gdown ==5.1.0
-Requires-Dist: weasyprint ==61.2
-Requires-Dist: markdown2 ==2.4.13
-Requires-Dist: xformers ==0.0.25.post1
-Requires-Dist: fastembed ==0.3.4
-Requires-Dist: mammoth ==1.7.1
-Requires-Dist: accelerate ==0.29.3
-Requires-Dist: langchain >=0.2.6
-Requires-Dist: langchain-community >=0.2.6
-Requires-Dist: langchain-core ==0.2.32
-Requires-Dist: langchain-experimental ==0.0.62
-Requires-Dist: langchainhub ==0.1.15
-Requires-Dist: langchain-text-splitters ==0.2.2
-Requires-Dist: huggingface-hub ==0.23.5
-Requires-Dist: llama-index ==0.10.20
-Requires-Dist: llama-cpp-python ==0.2.56
-Requires-Dist: asyncdb[all] >=2.7.10
-Requires-Dist: querysource >=3.10.1
-Requires-Dist: yfinance ==0.2.40
-Requires-Dist: youtube-search ==2.1.2
-Requires-Dist: wikipedia ==1.4.0
-Requires-Dist: mediawikiapi ==1.2
-Requires-Dist: wikibase-rest-api-client ==0.2.0
-Requires-Dist: asknews ==0.7.30
-Requires-Dist: pyowm ==3.3.0
-Requires-Dist: O365 ==2.0.35
-Requires-Dist: langchain-huggingface ==0.0.3
-Requires-Dist: stackapi ==0.3.1
-Provides-Extra: all
-Requires-Dist: langchain-milvus ==0.1.1 ; extra == 'all'
-Requires-Dist: milvus ==2.3.5 ; extra == 'all'
-Requires-Dist: pymilvus ==2.4.4 ; extra == 'all'
-Requires-Dist: groq ==0.6.0 ; extra == 'all'
-Requires-Dist: langchain-groq ==0.1.4 ; extra == 'all'
-Requires-Dist: llama-index-llms-huggingface ==0.2.7 ; extra == 'all'
-Requires-Dist: langchain-google-vertexai ==1.0.8 ; extra == 'all'
-Requires-Dist: langchain-google-genai ==1.0.8 ; extra == 'all'
-Requires-Dist: google-generativeai ==0.7.2 ; extra == 'all'
-Requires-Dist: vertexai ==1.60.0 ; extra == 'all'
-Requires-Dist: google-cloud-aiplatform >=1.60.0 ; extra == 'all'
-Requires-Dist: grpc-google-iam-v1 ==0.13.0 ; extra == 'all'
-Requires-Dist: langchain-openai ==0.1.21 ; extra == 'all'
-Requires-Dist: openai ==1.40.8 ; extra == 'all'
-Requires-Dist: llama-index-llms-openai ==0.1.11 ; extra == 'all'
-Requires-Dist: langchain-anthropic ==0.1.23 ; extra == 'all'
-Requires-Dist: anthropic ==0.34.0 ; extra == 'all'
-Provides-Extra: analytics
-Requires-Dist: annoy ==1.17.3 ; extra == 'analytics'
-Requires-Dist: gradio-tools ==0.0.9 ; extra == 'analytics'
-Requires-Dist: gradio-client ==0.2.9 ; extra == 'analytics'
-Requires-Dist: streamlit ==1.37.1 ; extra == 'analytics'
-Requires-Dist: simsimd ==4.3.1 ; extra == 'analytics'
-Requires-Dist: opencv-python ==4.10.0.84 ; extra == 'analytics'
-Provides-Extra: anthropic
-Requires-Dist: langchain-anthropic ==0.1.11 ; extra == 'anthropic'
-Requires-Dist: anthropic ==0.25.2 ; extra == 'anthropic'
-Provides-Extra: crew
-Requires-Dist: colbert-ai ==0.2.19 ; extra == 'crew'
-Requires-Dist: vanna ==0.3.4 ; extra == 'crew'
-Requires-Dist: crewai[tools] ==0.28.8 ; extra == 'crew'
-Provides-Extra: google
-Requires-Dist: langchain-google-vertexai ==1.0.4 ; extra == 'google'
-Requires-Dist: langchain-google-genai ==1.0.4 ; extra == 'google'
-Requires-Dist: google-generativeai ==0.5.4 ; extra == 'google'
-Requires-Dist: vertexai ==1.49.0 ; extra == 'google'
-Requires-Dist: google-cloud-aiplatform ==1.49.0 ; extra == 'google'
-Requires-Dist: grpc-google-iam-v1 ==0.13.0 ; extra == 'google'
-Provides-Extra: groq
-Requires-Dist: groq ==0.6.0 ; extra == 'groq'
-Requires-Dist: langchain-groq ==0.1.4 ; extra == 'groq'
-Provides-Extra: hunggingfaces
-Requires-Dist: llama-index-llms-huggingface ==0.2.7 ; extra == 'hunggingfaces'
-Provides-Extra: milvus
-Requires-Dist: langchain-milvus ==0.1.1 ; extra == 'milvus'
-Requires-Dist: milvus ==2.3.5 ; extra == 'milvus'
-Requires-Dist: pymilvus ==2.4.4 ; extra == 'milvus'
-Provides-Extra: openai
-Requires-Dist: langchain-openai ==0.1.21 ; extra == 'openai'
-Requires-Dist: openai ==1.40.3 ; extra == 'openai'
-Requires-Dist: llama-index-llms-openai ==0.1.11 ; extra == 'openai'
-Requires-Dist: tiktoken ==0.7.0 ; extra == 'openai'
-Provides-Extra: qdrant
-Requires-Dist: qdrant-client ==1.8.0 ; extra == 'qdrant'
-# AI Parrot: Python package for creating Chatbots
-This is an open-source Python package for creating Chatbots based on Langchain and Navigator.
-This README provides instructions for installation, development, testing, and releasing Parrot.
-## Installation
-**Creating a virtual environment:**
-This is recommended for development and isolation from system-wide libraries.
-Run the following command in your terminal:
-Debian-based systems installation:
-   ```
-   sudo apt install gcc python3.11-venv python3.11-full python3.11-dev libmemcached-dev zlib1g-dev build-essential libffi-dev unixodbc unixodbc-dev libsqliteodbc libev4 libev-dev
-   ```
-   For Qdrant installation:
-   ```
-   docker pull qdrant/qdrant
-   docker run -d -p 6333:6333 -p 6334:6334 --name qdrant -v $(pwd)/qdrant_storage:/qdrant/storage:z qdrant/qdrant
-   ```
-For VertexAI, creates a folder on "env" called "google" and copy the JSON credentials file into it.
-   ```bash
-   make venv
-   ```
-   This will create a virtual environment named `.venv`. To activate it, run:
-   ```bash
-   source .venv/bin/activate  # Linux/macOS
-   ```
-   Once activated, install Parrot within the virtual environment:
-   ```bash
-   make install
-   ```
-   The output will remind you to activate the virtual environment before development.
-   **Optional** (for developers):
-   ```bash
-   pip install -e .
-   ```
-## Start http server
-```bash
-python run.py
-```
-## Development Setup
-This section explains how to set up your development environment:
-1. **Install development requirements:**
-   ```bash
-   make setup
-   ```
-   This installs development dependencies like linters and test runners mentioned in the `docs/requirements-dev.txt` file.
-2. **Install Parrot in editable mode:**
-   This allows you to make changes to the code and test them without reinstalling:
-   ```bash
-   make dev
-   ```
-   This uses `flit` to install Parrot in editable mode.
-### Usage (Replace with actual usage instructions)
-*Once you have set up your development environment, you can start using Parrot.*
-#### Test with Code ChatBOT
-* Set environment variables for:
- [google]
-   GOOGLE_API_KEY=apikey
-   GOOGLE_CREDENTIALS_FILE=.../credentials.json
-   VERTEX_PROJECT_ID=vertex-project
-   VERTEX_REGION=region
-* Run the chatbot:
-    ```bash
-    python examples/test_agent.py
-    ```
-### Testing
-To run the test suite:
-```bash
-make test
-```
-This will run tests using `coverage` to report on code coverage.
-### Code Formatting
-To format the code with black:
-```bash
-make format
-```
-### Linting
-To lint the code for style and potential errors:
-```bash
-make lint
-```
-This uses `pylint` and `black` to check for issues.
-### Releasing a New Version
-This section outlines the steps for releasing a new version of Parrot:
-1. **Ensure everything is clean and tested:**
-   ```bash
-   make release
-   ```
-   This runs `lint`, `test`, and `clean` tasks before proceeding.
-2. **Publish the package:**
-   ```bash
-   make release
-   ```
-   This uses `flit` to publish the package to a repository like PyPI. You'll need to have publishing credentials configured for `flit`.
-### Cleaning Up
-To remove the virtual environment:
-```bash
-make distclean
-```
-### Contributing
-We welcome contributions to Parrot! Please refer to the CONTRIBUTING.md file for guidelines on how to contribute.

{ai_parrot-0.3.1.dist-info → ai_parrot-0.3.5.dist-info}/LICENSE RENAMED Viewed

File without changes

{ai_parrot-0.3.1.dist-info → ai_parrot-0.3.5.dist-info}/top_level.txt RENAMED Viewed

File without changes