PyPI - vectoriz - Versions diffs - 0.0.3__tar.gz → 0.0.4__tar.gz - Mend

vectoriz 0.0.3tar.gz → 0.0.4tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (19) hide show

vectoriz-0.0.4/PKG-INFO ADDED Viewed

@@ -0,0 +1,85 @@
+Metadata-Version: 2.4
+Name: vectoriz
+Version: 0.0.4
+Summary: Python library for creating vectorized data from text or files.
+Home-page: https://github.com/PedroHenriqueDevBR/vectoriz
+Author: PedroHenriqueDevBR
+Author-email: pedro.henrique.particular@gmail.com
+Classifier: Programming Language :: Python :: 3.12
+Classifier: Operating System :: OS Independent
+Requires-Python: >=3.12
+Description-Content-Type: text/markdown
+Requires-Dist: faiss-cpu==1.10.0
+Requires-Dist: numpy==2.2.4
+Requires-Dist: sentence-transformers==4.0.2
+Requires-Dist: python-docx==1.1.2
+Dynamic: author
+Dynamic: author-email
+Dynamic: classifier
+Dynamic: description
+Dynamic: description-content-type
+Dynamic: home-page
+Dynamic: requires-dist
+Dynamic: requires-python
+Dynamic: summary
+# Vectoriz
+A tool for generating vector embeddings for Retrieval-Augmented Generation (RAG) applications.
+## Overview
+This project provides utilities to create, manage, and optimize vector embeddings for use in RAG systems. It streamlines the process of converting documents and data sources into vector representations suitable for semantic search and retrieval.
+## Features
+- Document processing and chunking
+- Vector embedding generation using various models
+- Vector database integration
+- Optimization tools for RAG performance
+- Easy-to-use API for embedding creation
+## Installation
+```bash
+git clone https://github.com/PedroHenriqueDevBR/vectoriz.git
+cd vectoriz
+pip install -r requirements.txt
+```
+## Usage
+```python
+# initial informations
+index_db_path = "./data/faiss_db.index" # path to save/load index
+np_db_path = "./data/np_db.npz" # path to save/load numpy data
+directory_path = "/home/username/Documents/" # Path where the files (.txt, .docx) are saved
+# Class instance
+transformer = TokenTransformer()
+files_features = FilesFeature()
+# Load files and create a argument class (pack with embedings, chunk_names and text_list)
+argument = files_features.load_all_files_from_directory(directory_path)
+# Created FAISS index to be used in queries
+index = transformer.create_index(argument.text_list)
+# To load files from VectorDB use
+vector_client = VectorDBClient()
+vector_client.load_data(self.index_db_path, self.np_db_path)
+index = vector_client.faiss_index
+argument = vector_client.file_argument
+# To save data on VectorDB use
+vector_client = VectorDBClient(index, argument)
+vector_client.save_data(index_db_path, np_db_path)
+```
+## Contributing
+Contributions are welcome! Please feel free to submit a Pull Request.
+## License
+This project is licensed under the MIT License - see the LICENSE file for details.

vectoriz-0.0.4/README.md ADDED Viewed

@@ -0,0 +1,60 @@
+# Vectoriz
+A tool for generating vector embeddings for Retrieval-Augmented Generation (RAG) applications.
+## Overview
+This project provides utilities to create, manage, and optimize vector embeddings for use in RAG systems. It streamlines the process of converting documents and data sources into vector representations suitable for semantic search and retrieval.
+## Features
+- Document processing and chunking
+- Vector embedding generation using various models
+- Vector database integration
+- Optimization tools for RAG performance
+- Easy-to-use API for embedding creation
+## Installation
+```bash
+git clone https://github.com/PedroHenriqueDevBR/vectoriz.git
+cd vectoriz
+pip install -r requirements.txt
+```
+## Usage
+```python
+# initial informations
+index_db_path = "./data/faiss_db.index" # path to save/load index
+np_db_path = "./data/np_db.npz" # path to save/load numpy data
+directory_path = "/home/username/Documents/" # Path where the files (.txt, .docx) are saved
+# Class instance
+transformer = TokenTransformer()
+files_features = FilesFeature()
+# Load files and create a argument class (pack with embedings, chunk_names and text_list)
+argument = files_features.load_all_files_from_directory(directory_path)
+# Created FAISS index to be used in queries
+index = transformer.create_index(argument.text_list)
+# To load files from VectorDB use
+vector_client = VectorDBClient()
+vector_client.load_data(self.index_db_path, self.np_db_path)
+index = vector_client.faiss_index
+argument = vector_client.file_argument
+# To save data on VectorDB use
+vector_client = VectorDBClient(index, argument)
+vector_client.save_data(index_db_path, np_db_path)
+```
+## Contributing
+Contributions are welcome! Please feel free to submit a Pull Request.
+## License
+This project is licensed under the MIT License - see the LICENSE file for details.

{vectoriz-0.0.3 → vectoriz-0.0.4}/setup.py RENAMED Viewed

@@ -2,7 +2,7 @@ from setuptools import setup, find_packages
 setup(
     name="vectoriz",
-    version="0.0.3",
+    version="0.0.4",
     author="PedroHenriqueDevBR",
     author_email="pedro.henrique.particular@gmail.com",
     description="Python library for creating vectorized data from text or files.",

{vectoriz-0.0.3 → vectoriz-0.0.4}/vectoriz/files.py RENAMED Viewed

@@ -127,7 +127,7 @@ class FilesFeature:
             full_text.append(paragraph.text)
         return "\n".join(full_text)
-    def load_txt_files_from_directory(self, directory: str) -> FileArgument:
+    def load_txt_files_from_directory(self, directory: str, verbose: bool = False) -> FileArgument:
         """
         Load all text files from the specified directory and extract their content.
         This method scans the specified directory for files with the '.txt' extension
@@ -145,16 +145,22 @@ class FilesFeature:
         argument: FileArgument = FileArgument([], [], [])
         for file in os.listdir(directory):
             if not file.endswith(".txt"):
+                if verbose:
+                    print(f"Error file: {file}")
                 continue
             text = self._extract_txt_content(directory, file)
             if text is None:
+                if verbose:
+                    print(f"Error file: {file}")
                 continue
             argument.add_data(file, text)
+            if verbose:
+                    print(f"Loaded txt file: {file}")
         return argument
-    def load_docx_files_from_directory(self, directory: str) -> FileArgument:
+    def load_docx_files_from_directory(self, directory: str, verbose: bool = False) -> FileArgument:
         """
         Load all Word (.docx) files from the specified directory and extract their content.
@@ -174,16 +180,22 @@ class FilesFeature:
         argument: FileArgument = FileArgument([], [], [])
         for file in os.listdir(directory):
             if not file.endswith(".docx"):
+                if verbose:
+                    print(f"Error file: {file}")
                 continue
             text = self._extract_docx_content(directory, file)
             if text is None:
+                if verbose:
+                    print(f"Error file: {file}")
                 continue
             argument.add_data(file, text)
+            if verbose:
+                print(f"Loaded Word file: {file}")
         return argument
-    def load_all_files_from_directory(self, directory: str) -> FileArgument:
+    def load_all_files_from_directory(self, directory: str, verbose: bool =  False) -> FileArgument:
         """
         Load all supported files (.txt and .docx) from the specified directory and its subdirectories.
@@ -199,15 +211,23 @@ class FilesFeature:
         argument: FileArgument = FileArgument([], [], [])
         for root, _, files in os.walk(directory):
             for file in files:
+                readed = False
                 if file.endswith(".txt"):
                     text = self._extract_txt_content(root, file)
                     if text is not None:
                         argument.add_data(file, text)
+                        readed = True
                 elif file.endswith(".docx"):
                     try:
                         text = self._extract_docx_content(root, file)
                         if text is not None:
                             argument.add_data(file, text)
+                            readed = True
                     except Exception as e:
                         print(f"Error processing {file}: {str(e)}")
+                if verbose and readed:
+                    print(f"Loaded file: {file}")
+                elif verbose and not readed:
+                    print(f"Error file: {file}")
         return argument

vectoriz-0.0.4/vectoriz.egg-info/PKG-INFO ADDED Viewed

@@ -0,0 +1,85 @@
+Metadata-Version: 2.4
+Name: vectoriz
+Version: 0.0.4
+Summary: Python library for creating vectorized data from text or files.
+Home-page: https://github.com/PedroHenriqueDevBR/vectoriz
+Author: PedroHenriqueDevBR
+Author-email: pedro.henrique.particular@gmail.com
+Classifier: Programming Language :: Python :: 3.12
+Classifier: Operating System :: OS Independent
+Requires-Python: >=3.12
+Description-Content-Type: text/markdown
+Requires-Dist: faiss-cpu==1.10.0
+Requires-Dist: numpy==2.2.4
+Requires-Dist: sentence-transformers==4.0.2
+Requires-Dist: python-docx==1.1.2
+Dynamic: author
+Dynamic: author-email
+Dynamic: classifier
+Dynamic: description
+Dynamic: description-content-type
+Dynamic: home-page
+Dynamic: requires-dist
+Dynamic: requires-python
+Dynamic: summary
+# Vectoriz
+A tool for generating vector embeddings for Retrieval-Augmented Generation (RAG) applications.
+## Overview
+This project provides utilities to create, manage, and optimize vector embeddings for use in RAG systems. It streamlines the process of converting documents and data sources into vector representations suitable for semantic search and retrieval.
+## Features
+- Document processing and chunking
+- Vector embedding generation using various models
+- Vector database integration
+- Optimization tools for RAG performance
+- Easy-to-use API for embedding creation
+## Installation
+```bash
+git clone https://github.com/PedroHenriqueDevBR/vectoriz.git
+cd vectoriz
+pip install -r requirements.txt
+```
+## Usage
+```python
+# initial informations
+index_db_path = "./data/faiss_db.index" # path to save/load index
+np_db_path = "./data/np_db.npz" # path to save/load numpy data
+directory_path = "/home/username/Documents/" # Path where the files (.txt, .docx) are saved
+# Class instance
+transformer = TokenTransformer()
+files_features = FilesFeature()
+# Load files and create a argument class (pack with embedings, chunk_names and text_list)
+argument = files_features.load_all_files_from_directory(directory_path)
+# Created FAISS index to be used in queries
+index = transformer.create_index(argument.text_list)
+# To load files from VectorDB use
+vector_client = VectorDBClient()
+vector_client.load_data(self.index_db_path, self.np_db_path)
+index = vector_client.faiss_index
+argument = vector_client.file_argument
+# To save data on VectorDB use
+vector_client = VectorDBClient(index, argument)
+vector_client.save_data(index_db_path, np_db_path)
+```
+## Contributing
+Contributions are welcome! Please feel free to submit a Pull Request.
+## License
+This project is licensed under the MIT License - see the LICENSE file for details.

vectoriz-0.0.3/PKG-INFO DELETED Viewed

@@ -1,60 +0,0 @@
-Metadata-Version: 2.4
-Name: vectoriz
-Version: 0.0.3
-Summary: Python library for creating vectorized data from text or files.
-Home-page: https://github.com/PedroHenriqueDevBR/vectoriz
-Author: PedroHenriqueDevBR
-Author-email: pedro.henrique.particular@gmail.com
-Classifier: Programming Language :: Python :: 3.12
-Classifier: Operating System :: OS Independent
-Requires-Python: >=3.12
-Description-Content-Type: text/markdown
-Requires-Dist: faiss-cpu==1.10.0
-Requires-Dist: numpy==2.2.4
-Requires-Dist: sentence-transformers==4.0.2
-Requires-Dist: python-docx==1.1.2
-Dynamic: author
-Dynamic: author-email
-Dynamic: classifier
-Dynamic: description
-Dynamic: description-content-type
-Dynamic: home-page
-Dynamic: requires-dist
-Dynamic: requires-python
-Dynamic: summary
-# RAG-vector-creator
-## Overview
-This project implements a RAG (Retrieval-Augmented Generation) system for creating and managing vector embeddings from documents using FAISS and NumPy libraries. It efficiently transforms text data into high-dimensional vector representations that enable semantic search capabilities, similarity matching, and context-aware document retrieval for enhanced question answering applications.
-## Features
-- Document ingestion and preprocessing
-- Vector embedding generation using state-of-the-art models
-- Efficient storage and retrieval of embeddings
-- Integration with LLM-based generation systems
-## Installation
-```bash
-pip install -r requirements.txt
-python app.py
-```
-## Build lib
-To build the lib run the commands:
-```
-python setup.py sdist bdist_wheel
-```
-To test the install run:
-```
-pip install .
-```
-## License
-MIT

vectoriz-0.0.3/README.md DELETED Viewed

@@ -1,35 +0,0 @@
-# RAG-vector-creator
-## Overview
-This project implements a RAG (Retrieval-Augmented Generation) system for creating and managing vector embeddings from documents using FAISS and NumPy libraries. It efficiently transforms text data into high-dimensional vector representations that enable semantic search capabilities, similarity matching, and context-aware document retrieval for enhanced question answering applications.
-## Features
-- Document ingestion and preprocessing
-- Vector embedding generation using state-of-the-art models
-- Efficient storage and retrieval of embeddings
-- Integration with LLM-based generation systems
-## Installation
-```bash
-pip install -r requirements.txt
-python app.py
-```
-## Build lib
-To build the lib run the commands:
-```
-python setup.py sdist bdist_wheel
-```
-To test the install run:
-```
-pip install .
-```
-## License
-MIT

vectoriz-0.0.3/vectoriz.egg-info/PKG-INFO DELETED Viewed

@@ -1,60 +0,0 @@
-Metadata-Version: 2.4
-Name: vectoriz
-Version: 0.0.3
-Summary: Python library for creating vectorized data from text or files.
-Home-page: https://github.com/PedroHenriqueDevBR/vectoriz
-Author: PedroHenriqueDevBR
-Author-email: pedro.henrique.particular@gmail.com
-Classifier: Programming Language :: Python :: 3.12
-Classifier: Operating System :: OS Independent
-Requires-Python: >=3.12
-Description-Content-Type: text/markdown
-Requires-Dist: faiss-cpu==1.10.0
-Requires-Dist: numpy==2.2.4
-Requires-Dist: sentence-transformers==4.0.2
-Requires-Dist: python-docx==1.1.2
-Dynamic: author
-Dynamic: author-email
-Dynamic: classifier
-Dynamic: description
-Dynamic: description-content-type
-Dynamic: home-page
-Dynamic: requires-dist
-Dynamic: requires-python
-Dynamic: summary
-# RAG-vector-creator
-## Overview
-This project implements a RAG (Retrieval-Augmented Generation) system for creating and managing vector embeddings from documents using FAISS and NumPy libraries. It efficiently transforms text data into high-dimensional vector representations that enable semantic search capabilities, similarity matching, and context-aware document retrieval for enhanced question answering applications.
-## Features
-- Document ingestion and preprocessing
-- Vector embedding generation using state-of-the-art models
-- Efficient storage and retrieval of embeddings
-- Integration with LLM-based generation systems
-## Installation
-```bash
-pip install -r requirements.txt
-python app.py
-```
-## Build lib
-To build the lib run the commands:
-```
-python setup.py sdist bdist_wheel
-```
-To test the install run:
-```
-pip install .
-```
-## License
-MIT