PyPI - prevectorchunks-core - Versions diffs - 0.1.23__py3-none-any.whl → 0.1.25__py3-none-any.whl - Mend

prevectorchunks-core 0.1.23py3-none-any.whl → 0.1.25py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (6) hide show

prevectorchunks_core/services/markdown_and_chunk_documents.py CHANGED Viewed

@@ -12,9 +12,9 @@ from ..config.splitter_config import SplitterConfig
 from dotenv import load_dotenv
-from chunk_documents_crud_vdb import chunk_documents
-from chunk_to_all_content_mapper import ChunkMapper
-from core.prevectorchunks_core.utils.file_loader import SplitType
+from .chunk_documents_crud_vdb import chunk_documents
+from .chunk_to_all_content_mapper import ChunkMapper
+from ..utils.file_loader import SplitType
 load_dotenv(override=True)

{prevectorchunks_core-0.1.23.dist-info → prevectorchunks_core-0.1.25.dist-info}/METADATA RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: prevectorchunks-core
-Version: 0.1.23
+Version: 0.1.25
 Summary: A Python module that allows conversion of a document into chunks to be inserted into Pinecone vector database
 Author-email: Zul Al-Kabir <zul.developer.2023@gmail.com>
 Project-URL: Homepage, https://github.com/yourusername/mydep
@@ -187,6 +187,41 @@ Updates existing chunks in the Vector Database by document name.
 - Keeping VDB chunks up to date when documents change
 - Re-ingesting revised or corrected content
+---
+### 5. ``markdown_and_chunk_documents``
+```python
+from prevectorchunks_core.services.markdown_and_chunk_documents import MarkdownAndChunkDocuments
+markdown_processor = MarkdownAndChunkDocuments()
+mapped_chunks = markdown_processor.markdown_and_chunk_documents("example.pdf")
+```
+**Description**
+This new function automatically:
+1. Converts a document (PDF, DOCX, etc.) into images using `DocuToImageConverter`.
+2. Extracts **Markdown and text** content from those images using `DocuToMarkdownExtractor` (powered by GPT).
+3. Converts the extracted markdown text into **RL-based chunks** using `ChunkMapper` and `chunk_documents`.
+4. Merges unmatched markdown segments into the final structured output.
+**Parameters**
+- `file_path` (*str*): Path to the document (PDF, DOCX, or image) you want to process.
+**Returns**
+- `mapped_chunks` (*list[dict]*): A list of markdown-based chunks with both markdown and chunked text content.
+**Example**
+```python
+if __name__ == "__main__":
+    markdown_processor = MarkdownAndChunkDocuments()
+    mapped_chunks = markdown_processor.markdown_and_chunk_documents("421307-nz-au-top-loading-washer-guide-shorter.pdf")
+    print(mapped_chunks)
+```
+**Use Cases**
+- End-to-end document-to-markdown-to-chunks pipeline
+- Automating preprocessing for RAG/LLM ingestion
+- Extracting structured markdown for semantic search or content indexing
 ---
 ## 🚀 Example Workflow

{prevectorchunks_core-0.1.23.dist-info → prevectorchunks_core-0.1.25.dist-info}/RECORD RENAMED Viewed

@@ -23,7 +23,7 @@ prevectorchunks_core/services/audio_processor.py,sha256=XKNYhXHIt_77a3PT2wwKvnCS
 prevectorchunks_core/services/chunk_documents_crud_vdb.py,sha256=Md4vy7vJDnSYpvZiF0HbHCOA0StSVm62ALHAPYU2A7I,16279
 prevectorchunks_core/services/chunk_to_all_content_mapper.py,sha256=xEz2idxJTsJwyCJWMPZCk3CFcalKhbSuucFH9TPouU0,2778
 prevectorchunks_core/services/image_processor.py,sha256=2CRwTbI-czbakm9aG-kMdx908bc5H1rQETQiVCKbWd8,3518
-prevectorchunks_core/services/markdown_and_chunk_documents.py,sha256=BYwu4FcliFU-adnPoqUuqjAkRvV7mVtOvAPKS1sM6Zk,2884
+prevectorchunks_core/services/markdown_and_chunk_documents.py,sha256=vfGvvirn3rtwIVJtVK5_dJSrV3JeO0p0d0rc5BOnGx8,2862
 prevectorchunks_core/services/propositional_index.py,sha256=cVH3obhLtlcfJYA6VN4KfC3len4fe5nNcboorlouOb0,4151
 prevectorchunks_core/services/video_analyser.py,sha256=1wI38xZ8vdE8T4EBAnxWzt7Hc8vTYrdQhbA4Y5VZLeY,6651
 prevectorchunks_core/tests/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
@@ -32,8 +32,8 @@ prevectorchunks_core/utils/__init__.py,sha256=aez3v2dwGHXvmALXVBPR-mQgvxMqxv9NsE
 prevectorchunks_core/utils/extract_content.py,sha256=fMDT-BsjYutHLnOFebLhMFpb1UFAB8ldGldxh11FsXw,2920
 prevectorchunks_core/utils/file_loader.py,sha256=JkCKiz3M2TMw5qHoTJXhbn33PfTv5gvQ3nfrbaQOmHs,10689
 prevectorchunks_core/utils/llm_wrapper.py,sha256=7GfyM5p5PeIehi4Dj5jgC7-xi2SjZuyyPuLkWtucQzQ,1139
-prevectorchunks_core-0.1.23.dist-info/licenses/LICENCE,sha256=Ljp4XVKnncsQ59h0eMW6J5V-ylsVeqDRC8smR7UPIDs,512
-prevectorchunks_core-0.1.23.dist-info/METADATA,sha256=vKOvF3fsJY6Zn-jfY0DNUP6CljBQadByMcD8xzta2WU,9272
-prevectorchunks_core-0.1.23.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
-prevectorchunks_core-0.1.23.dist-info/top_level.txt,sha256=OWJgfrUDNTh49PpKvRXHY8lVeWqzFbTr9OkDoAvpvPk,21
-prevectorchunks_core-0.1.23.dist-info/RECORD,,
+prevectorchunks_core-0.1.25.dist-info/licenses/LICENCE,sha256=Ljp4XVKnncsQ59h0eMW6J5V-ylsVeqDRC8smR7UPIDs,512
+prevectorchunks_core-0.1.25.dist-info/METADATA,sha256=MkwN6c12SBuLRwFgaWUwANwGMsno4939fo5jU9otYBc,10687
+prevectorchunks_core-0.1.25.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
+prevectorchunks_core-0.1.25.dist-info/top_level.txt,sha256=OWJgfrUDNTh49PpKvRXHY8lVeWqzFbTr9OkDoAvpvPk,21
+prevectorchunks_core-0.1.25.dist-info/RECORD,,

{prevectorchunks_core-0.1.23.dist-info → prevectorchunks_core-0.1.25.dist-info}/WHEEL RENAMED Viewed

File without changes

{prevectorchunks_core-0.1.23.dist-info → prevectorchunks_core-0.1.25.dist-info}/licenses/LICENCE RENAMED Viewed

File without changes

{prevectorchunks_core-0.1.23.dist-info → prevectorchunks_core-0.1.25.dist-info}/top_level.txt RENAMED Viewed

File without changes

prevectorchunks-core 0.1.23__py3-none-any.whl → 0.1.25__py3-none-any.whl

prevectorchunks-core 0.1.23py3-none-any.whl → 0.1.25py3-none-any.whl