vectoriz 0.0.3__tar.gz → 0.0.4__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,85 @@
1
+ Metadata-Version: 2.4
2
+ Name: vectoriz
3
+ Version: 0.0.4
4
+ Summary: Python library for creating vectorized data from text or files.
5
+ Home-page: https://github.com/PedroHenriqueDevBR/vectoriz
6
+ Author: PedroHenriqueDevBR
7
+ Author-email: pedro.henrique.particular@gmail.com
8
+ Classifier: Programming Language :: Python :: 3.12
9
+ Classifier: Operating System :: OS Independent
10
+ Requires-Python: >=3.12
11
+ Description-Content-Type: text/markdown
12
+ Requires-Dist: faiss-cpu==1.10.0
13
+ Requires-Dist: numpy==2.2.4
14
+ Requires-Dist: sentence-transformers==4.0.2
15
+ Requires-Dist: python-docx==1.1.2
16
+ Dynamic: author
17
+ Dynamic: author-email
18
+ Dynamic: classifier
19
+ Dynamic: description
20
+ Dynamic: description-content-type
21
+ Dynamic: home-page
22
+ Dynamic: requires-dist
23
+ Dynamic: requires-python
24
+ Dynamic: summary
25
+
26
+ # Vectoriz
27
+
28
+ A tool for generating vector embeddings for Retrieval-Augmented Generation (RAG) applications.
29
+
30
+ ## Overview
31
+
32
+ This project provides utilities to create, manage, and optimize vector embeddings for use in RAG systems. It streamlines the process of converting documents and data sources into vector representations suitable for semantic search and retrieval.
33
+
34
+ ## Features
35
+
36
+ - Document processing and chunking
37
+ - Vector embedding generation using various models
38
+ - Vector database integration
39
+ - Optimization tools for RAG performance
40
+ - Easy-to-use API for embedding creation
41
+
42
+ ## Installation
43
+
44
+ ```bash
45
+ git clone https://github.com/PedroHenriqueDevBR/vectoriz.git
46
+ cd vectoriz
47
+ pip install -r requirements.txt
48
+ ```
49
+
50
+ ## Usage
51
+
52
+ ```python
53
+ # initial informations
54
+ index_db_path = "./data/faiss_db.index" # path to save/load index
55
+ np_db_path = "./data/np_db.npz" # path to save/load numpy data
56
+ directory_path = "/home/username/Documents/" # Path where the files (.txt, .docx) are saved
57
+
58
+ # Class instance
59
+ transformer = TokenTransformer()
60
+ files_features = FilesFeature()
61
+
62
+ # Load files and create a argument class (pack with embedings, chunk_names and text_list)
63
+ argument = files_features.load_all_files_from_directory(directory_path)
64
+
65
+ # Created FAISS index to be used in queries
66
+ index = transformer.create_index(argument.text_list)
67
+
68
+ # To load files from VectorDB use
69
+ vector_client = VectorDBClient()
70
+ vector_client.load_data(self.index_db_path, self.np_db_path)
71
+ index = vector_client.faiss_index
72
+ argument = vector_client.file_argument
73
+
74
+ # To save data on VectorDB use
75
+ vector_client = VectorDBClient(index, argument)
76
+ vector_client.save_data(index_db_path, np_db_path)
77
+ ```
78
+
79
+ ## Contributing
80
+
81
+ Contributions are welcome! Please feel free to submit a Pull Request.
82
+
83
+ ## License
84
+
85
+ This project is licensed under the MIT License - see the LICENSE file for details.
@@ -0,0 +1,60 @@
1
+ # Vectoriz
2
+
3
+ A tool for generating vector embeddings for Retrieval-Augmented Generation (RAG) applications.
4
+
5
+ ## Overview
6
+
7
+ This project provides utilities to create, manage, and optimize vector embeddings for use in RAG systems. It streamlines the process of converting documents and data sources into vector representations suitable for semantic search and retrieval.
8
+
9
+ ## Features
10
+
11
+ - Document processing and chunking
12
+ - Vector embedding generation using various models
13
+ - Vector database integration
14
+ - Optimization tools for RAG performance
15
+ - Easy-to-use API for embedding creation
16
+
17
+ ## Installation
18
+
19
+ ```bash
20
+ git clone https://github.com/PedroHenriqueDevBR/vectoriz.git
21
+ cd vectoriz
22
+ pip install -r requirements.txt
23
+ ```
24
+
25
+ ## Usage
26
+
27
+ ```python
28
+ # initial informations
29
+ index_db_path = "./data/faiss_db.index" # path to save/load index
30
+ np_db_path = "./data/np_db.npz" # path to save/load numpy data
31
+ directory_path = "/home/username/Documents/" # Path where the files (.txt, .docx) are saved
32
+
33
+ # Class instance
34
+ transformer = TokenTransformer()
35
+ files_features = FilesFeature()
36
+
37
+ # Load files and create a argument class (pack with embedings, chunk_names and text_list)
38
+ argument = files_features.load_all_files_from_directory(directory_path)
39
+
40
+ # Created FAISS index to be used in queries
41
+ index = transformer.create_index(argument.text_list)
42
+
43
+ # To load files from VectorDB use
44
+ vector_client = VectorDBClient()
45
+ vector_client.load_data(self.index_db_path, self.np_db_path)
46
+ index = vector_client.faiss_index
47
+ argument = vector_client.file_argument
48
+
49
+ # To save data on VectorDB use
50
+ vector_client = VectorDBClient(index, argument)
51
+ vector_client.save_data(index_db_path, np_db_path)
52
+ ```
53
+
54
+ ## Contributing
55
+
56
+ Contributions are welcome! Please feel free to submit a Pull Request.
57
+
58
+ ## License
59
+
60
+ This project is licensed under the MIT License - see the LICENSE file for details.
@@ -2,7 +2,7 @@ from setuptools import setup, find_packages
2
2
 
3
3
  setup(
4
4
  name="vectoriz",
5
- version="0.0.3",
5
+ version="0.0.4",
6
6
  author="PedroHenriqueDevBR",
7
7
  author_email="pedro.henrique.particular@gmail.com",
8
8
  description="Python library for creating vectorized data from text or files.",
@@ -127,7 +127,7 @@ class FilesFeature:
127
127
  full_text.append(paragraph.text)
128
128
  return "\n".join(full_text)
129
129
 
130
- def load_txt_files_from_directory(self, directory: str) -> FileArgument:
130
+ def load_txt_files_from_directory(self, directory: str, verbose: bool = False) -> FileArgument:
131
131
  """
132
132
  Load all text files from the specified directory and extract their content.
133
133
  This method scans the specified directory for files with the '.txt' extension
@@ -145,16 +145,22 @@ class FilesFeature:
145
145
  argument: FileArgument = FileArgument([], [], [])
146
146
  for file in os.listdir(directory):
147
147
  if not file.endswith(".txt"):
148
+ if verbose:
149
+ print(f"Error file: {file}")
148
150
  continue
149
151
 
150
152
  text = self._extract_txt_content(directory, file)
151
153
  if text is None:
154
+ if verbose:
155
+ print(f"Error file: {file}")
152
156
  continue
153
157
 
154
158
  argument.add_data(file, text)
159
+ if verbose:
160
+ print(f"Loaded txt file: {file}")
155
161
  return argument
156
162
 
157
- def load_docx_files_from_directory(self, directory: str) -> FileArgument:
163
+ def load_docx_files_from_directory(self, directory: str, verbose: bool = False) -> FileArgument:
158
164
  """
159
165
  Load all Word (.docx) files from the specified directory and extract their content.
160
166
 
@@ -174,16 +180,22 @@ class FilesFeature:
174
180
  argument: FileArgument = FileArgument([], [], [])
175
181
  for file in os.listdir(directory):
176
182
  if not file.endswith(".docx"):
183
+ if verbose:
184
+ print(f"Error file: {file}")
177
185
  continue
178
186
 
179
187
  text = self._extract_docx_content(directory, file)
180
188
  if text is None:
189
+ if verbose:
190
+ print(f"Error file: {file}")
181
191
  continue
182
192
 
183
193
  argument.add_data(file, text)
194
+ if verbose:
195
+ print(f"Loaded Word file: {file}")
184
196
  return argument
185
197
 
186
- def load_all_files_from_directory(self, directory: str) -> FileArgument:
198
+ def load_all_files_from_directory(self, directory: str, verbose: bool = False) -> FileArgument:
187
199
  """
188
200
  Load all supported files (.txt and .docx) from the specified directory and its subdirectories.
189
201
 
@@ -199,15 +211,23 @@ class FilesFeature:
199
211
  argument: FileArgument = FileArgument([], [], [])
200
212
  for root, _, files in os.walk(directory):
201
213
  for file in files:
214
+ readed = False
202
215
  if file.endswith(".txt"):
203
216
  text = self._extract_txt_content(root, file)
204
217
  if text is not None:
205
218
  argument.add_data(file, text)
219
+ readed = True
206
220
  elif file.endswith(".docx"):
207
221
  try:
208
222
  text = self._extract_docx_content(root, file)
209
223
  if text is not None:
210
224
  argument.add_data(file, text)
225
+ readed = True
211
226
  except Exception as e:
212
227
  print(f"Error processing {file}: {str(e)}")
228
+
229
+ if verbose and readed:
230
+ print(f"Loaded file: {file}")
231
+ elif verbose and not readed:
232
+ print(f"Error file: {file}")
213
233
  return argument
@@ -0,0 +1,85 @@
1
+ Metadata-Version: 2.4
2
+ Name: vectoriz
3
+ Version: 0.0.4
4
+ Summary: Python library for creating vectorized data from text or files.
5
+ Home-page: https://github.com/PedroHenriqueDevBR/vectoriz
6
+ Author: PedroHenriqueDevBR
7
+ Author-email: pedro.henrique.particular@gmail.com
8
+ Classifier: Programming Language :: Python :: 3.12
9
+ Classifier: Operating System :: OS Independent
10
+ Requires-Python: >=3.12
11
+ Description-Content-Type: text/markdown
12
+ Requires-Dist: faiss-cpu==1.10.0
13
+ Requires-Dist: numpy==2.2.4
14
+ Requires-Dist: sentence-transformers==4.0.2
15
+ Requires-Dist: python-docx==1.1.2
16
+ Dynamic: author
17
+ Dynamic: author-email
18
+ Dynamic: classifier
19
+ Dynamic: description
20
+ Dynamic: description-content-type
21
+ Dynamic: home-page
22
+ Dynamic: requires-dist
23
+ Dynamic: requires-python
24
+ Dynamic: summary
25
+
26
+ # Vectoriz
27
+
28
+ A tool for generating vector embeddings for Retrieval-Augmented Generation (RAG) applications.
29
+
30
+ ## Overview
31
+
32
+ This project provides utilities to create, manage, and optimize vector embeddings for use in RAG systems. It streamlines the process of converting documents and data sources into vector representations suitable for semantic search and retrieval.
33
+
34
+ ## Features
35
+
36
+ - Document processing and chunking
37
+ - Vector embedding generation using various models
38
+ - Vector database integration
39
+ - Optimization tools for RAG performance
40
+ - Easy-to-use API for embedding creation
41
+
42
+ ## Installation
43
+
44
+ ```bash
45
+ git clone https://github.com/PedroHenriqueDevBR/vectoriz.git
46
+ cd vectoriz
47
+ pip install -r requirements.txt
48
+ ```
49
+
50
+ ## Usage
51
+
52
+ ```python
53
+ # initial informations
54
+ index_db_path = "./data/faiss_db.index" # path to save/load index
55
+ np_db_path = "./data/np_db.npz" # path to save/load numpy data
56
+ directory_path = "/home/username/Documents/" # Path where the files (.txt, .docx) are saved
57
+
58
+ # Class instance
59
+ transformer = TokenTransformer()
60
+ files_features = FilesFeature()
61
+
62
+ # Load files and create a argument class (pack with embedings, chunk_names and text_list)
63
+ argument = files_features.load_all_files_from_directory(directory_path)
64
+
65
+ # Created FAISS index to be used in queries
66
+ index = transformer.create_index(argument.text_list)
67
+
68
+ # To load files from VectorDB use
69
+ vector_client = VectorDBClient()
70
+ vector_client.load_data(self.index_db_path, self.np_db_path)
71
+ index = vector_client.faiss_index
72
+ argument = vector_client.file_argument
73
+
74
+ # To save data on VectorDB use
75
+ vector_client = VectorDBClient(index, argument)
76
+ vector_client.save_data(index_db_path, np_db_path)
77
+ ```
78
+
79
+ ## Contributing
80
+
81
+ Contributions are welcome! Please feel free to submit a Pull Request.
82
+
83
+ ## License
84
+
85
+ This project is licensed under the MIT License - see the LICENSE file for details.
vectoriz-0.0.3/PKG-INFO DELETED
@@ -1,60 +0,0 @@
1
- Metadata-Version: 2.4
2
- Name: vectoriz
3
- Version: 0.0.3
4
- Summary: Python library for creating vectorized data from text or files.
5
- Home-page: https://github.com/PedroHenriqueDevBR/vectoriz
6
- Author: PedroHenriqueDevBR
7
- Author-email: pedro.henrique.particular@gmail.com
8
- Classifier: Programming Language :: Python :: 3.12
9
- Classifier: Operating System :: OS Independent
10
- Requires-Python: >=3.12
11
- Description-Content-Type: text/markdown
12
- Requires-Dist: faiss-cpu==1.10.0
13
- Requires-Dist: numpy==2.2.4
14
- Requires-Dist: sentence-transformers==4.0.2
15
- Requires-Dist: python-docx==1.1.2
16
- Dynamic: author
17
- Dynamic: author-email
18
- Dynamic: classifier
19
- Dynamic: description
20
- Dynamic: description-content-type
21
- Dynamic: home-page
22
- Dynamic: requires-dist
23
- Dynamic: requires-python
24
- Dynamic: summary
25
-
26
- # RAG-vector-creator
27
-
28
- ## Overview
29
- This project implements a RAG (Retrieval-Augmented Generation) system for creating and managing vector embeddings from documents using FAISS and NumPy libraries. It efficiently transforms text data into high-dimensional vector representations that enable semantic search capabilities, similarity matching, and context-aware document retrieval for enhanced question answering applications.
30
-
31
- ## Features
32
-
33
- - Document ingestion and preprocessing
34
- - Vector embedding generation using state-of-the-art models
35
- - Efficient storage and retrieval of embeddings
36
- - Integration with LLM-based generation systems
37
-
38
- ## Installation
39
-
40
- ```bash
41
- pip install -r requirements.txt
42
- python app.py
43
- ```
44
-
45
- ## Build lib
46
-
47
- To build the lib run the commands:
48
-
49
- ```
50
- python setup.py sdist bdist_wheel
51
- ```
52
-
53
- To test the install run:
54
- ```
55
- pip install .
56
- ```
57
-
58
- ## License
59
-
60
- MIT
vectoriz-0.0.3/README.md DELETED
@@ -1,35 +0,0 @@
1
- # RAG-vector-creator
2
-
3
- ## Overview
4
- This project implements a RAG (Retrieval-Augmented Generation) system for creating and managing vector embeddings from documents using FAISS and NumPy libraries. It efficiently transforms text data into high-dimensional vector representations that enable semantic search capabilities, similarity matching, and context-aware document retrieval for enhanced question answering applications.
5
-
6
- ## Features
7
-
8
- - Document ingestion and preprocessing
9
- - Vector embedding generation using state-of-the-art models
10
- - Efficient storage and retrieval of embeddings
11
- - Integration with LLM-based generation systems
12
-
13
- ## Installation
14
-
15
- ```bash
16
- pip install -r requirements.txt
17
- python app.py
18
- ```
19
-
20
- ## Build lib
21
-
22
- To build the lib run the commands:
23
-
24
- ```
25
- python setup.py sdist bdist_wheel
26
- ```
27
-
28
- To test the install run:
29
- ```
30
- pip install .
31
- ```
32
-
33
- ## License
34
-
35
- MIT
@@ -1,60 +0,0 @@
1
- Metadata-Version: 2.4
2
- Name: vectoriz
3
- Version: 0.0.3
4
- Summary: Python library for creating vectorized data from text or files.
5
- Home-page: https://github.com/PedroHenriqueDevBR/vectoriz
6
- Author: PedroHenriqueDevBR
7
- Author-email: pedro.henrique.particular@gmail.com
8
- Classifier: Programming Language :: Python :: 3.12
9
- Classifier: Operating System :: OS Independent
10
- Requires-Python: >=3.12
11
- Description-Content-Type: text/markdown
12
- Requires-Dist: faiss-cpu==1.10.0
13
- Requires-Dist: numpy==2.2.4
14
- Requires-Dist: sentence-transformers==4.0.2
15
- Requires-Dist: python-docx==1.1.2
16
- Dynamic: author
17
- Dynamic: author-email
18
- Dynamic: classifier
19
- Dynamic: description
20
- Dynamic: description-content-type
21
- Dynamic: home-page
22
- Dynamic: requires-dist
23
- Dynamic: requires-python
24
- Dynamic: summary
25
-
26
- # RAG-vector-creator
27
-
28
- ## Overview
29
- This project implements a RAG (Retrieval-Augmented Generation) system for creating and managing vector embeddings from documents using FAISS and NumPy libraries. It efficiently transforms text data into high-dimensional vector representations that enable semantic search capabilities, similarity matching, and context-aware document retrieval for enhanced question answering applications.
30
-
31
- ## Features
32
-
33
- - Document ingestion and preprocessing
34
- - Vector embedding generation using state-of-the-art models
35
- - Efficient storage and retrieval of embeddings
36
- - Integration with LLM-based generation systems
37
-
38
- ## Installation
39
-
40
- ```bash
41
- pip install -r requirements.txt
42
- python app.py
43
- ```
44
-
45
- ## Build lib
46
-
47
- To build the lib run the commands:
48
-
49
- ```
50
- python setup.py sdist bdist_wheel
51
- ```
52
-
53
- To test the install run:
54
- ```
55
- pip install .
56
- ```
57
-
58
- ## License
59
-
60
- MIT
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes