PyPI - sparrow-parse - Versions diffs - 0.3.3__tar.gz → 0.3.4__tar.gz - Mend

sparrow-parse 0.3.3tar.gz → 0.3.4tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (29) hide show

{sparrow-parse-0.3.3 → sparrow-parse-0.3.4}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: sparrow-parse
-Version: 0.3.3
+Version: 0.3.4
 Summary: Sparrow Parse is a Python package for parsing and extracting information from documents.
 Home-page: https://github.com/katanaml/sparrow/tree/main/sparrow-data/parse
 Author: Andrej Baranovskij
@@ -87,6 +87,8 @@ Example:
 ## Parsing and extraction
+### HTML extractor
 ```
 from sparrow_parse.extractor.html_extractor import HTMLExtractor
@@ -119,6 +121,36 @@ Example:
 *debug* - `True`
+### Sparrow Parse VL (vision-language) extractor
+```
+extractor = VLLMExtractor()
+# export HF_TOKEN="hf_"
+config = {
+    "method": "huggingface",  # Could be 'huggingface' or 'local_gpu'
+    "hf_space": "katanaml/sparrow-qwen2-vl-7b",
+    "hf_token": os.getenv('HF_TOKEN'),
+    # Additional fields for local GPU inference
+    # "device": "cuda", "model_path": "model.pth"
+}
+# Use the factory to get the correct instance
+factory = InferenceFactory(config)
+model_inference_instance = factory.get_inference_instance()
+input_data = [
+    {
+        "image": "/Users/andrejb/Documents/work/epik/bankstatement/bonds_table.png",
+        "text_input": "retrieve financial instruments data. return response in JSON format"
+    }
+]
+# Now you can run inference without knowing which implementation is used
+result = extractor.run_inference(model_inference_instance, input_data, generic_query=False, debug=True)
+print("Inference Result:", result)
+```
 ## PDF optimization
 ```

{sparrow-parse-0.3.3 → sparrow-parse-0.3.4}/README.md RENAMED Viewed

@@ -68,6 +68,8 @@ Example:
 ## Parsing and extraction
+### HTML extractor
 ```
 from sparrow_parse.extractor.html_extractor import HTMLExtractor
@@ -100,6 +102,36 @@ Example:
 *debug* - `True`
+### Sparrow Parse VL (vision-language) extractor
+```
+extractor = VLLMExtractor()
+# export HF_TOKEN="hf_"
+config = {
+    "method": "huggingface",  # Could be 'huggingface' or 'local_gpu'
+    "hf_space": "katanaml/sparrow-qwen2-vl-7b",
+    "hf_token": os.getenv('HF_TOKEN'),
+    # Additional fields for local GPU inference
+    # "device": "cuda", "model_path": "model.pth"
+}
+# Use the factory to get the correct instance
+factory = InferenceFactory(config)
+model_inference_instance = factory.get_inference_instance()
+input_data = [
+    {
+        "image": "/Users/andrejb/Documents/work/epik/bankstatement/bonds_table.png",
+        "text_input": "retrieve financial instruments data. return response in JSON format"
+    }
+]
+# Now you can run inference without knowing which implementation is used
+result = extractor.run_inference(model_inference_instance, input_data, generic_query=False, debug=True)
+print("Inference Result:", result)
+```
 ## PDF optimization
 ```

{sparrow-parse-0.3.3 → sparrow-parse-0.3.4}/setup.py RENAMED Viewed

@@ -8,7 +8,7 @@ with open("requirements.txt", "r", encoding="utf-8") as fh:
 setup(
     name="sparrow-parse",
-    version="0.3.3",
+    version="0.3.4",
     author="Andrej Baranovskij",
     author_email="andrejus.baranovskis@gmail.com",
     description="Sparrow Parse is a Python package for parsing and extracting information from documents.",

sparrow-parse-0.3.4/sparrow_parse/__init__.py ADDED Viewed

	@@ -0,0 +1 @@
1	+ __version__ = '0.3.4'

{sparrow-parse-0.3.3 → sparrow-parse-0.3.4}/sparrow_parse.egg-info/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: sparrow-parse
-Version: 0.3.3
+Version: 0.3.4
 Summary: Sparrow Parse is a Python package for parsing and extracting information from documents.
 Home-page: https://github.com/katanaml/sparrow/tree/main/sparrow-data/parse
 Author: Andrej Baranovskij
@@ -87,6 +87,8 @@ Example:
 ## Parsing and extraction
+### HTML extractor
 ```
 from sparrow_parse.extractor.html_extractor import HTMLExtractor
@@ -119,6 +121,36 @@ Example:
 *debug* - `True`
+### Sparrow Parse VL (vision-language) extractor
+```
+extractor = VLLMExtractor()
+# export HF_TOKEN="hf_"
+config = {
+    "method": "huggingface",  # Could be 'huggingface' or 'local_gpu'
+    "hf_space": "katanaml/sparrow-qwen2-vl-7b",
+    "hf_token": os.getenv('HF_TOKEN'),
+    # Additional fields for local GPU inference
+    # "device": "cuda", "model_path": "model.pth"
+}
+# Use the factory to get the correct instance
+factory = InferenceFactory(config)
+model_inference_instance = factory.get_inference_instance()
+input_data = [
+    {
+        "image": "/Users/andrejb/Documents/work/epik/bankstatement/bonds_table.png",
+        "text_input": "retrieve financial instruments data. return response in JSON format"
+    }
+]
+# Now you can run inference without knowing which implementation is used
+result = extractor.run_inference(model_inference_instance, input_data, generic_query=False, debug=True)
+print("Inference Result:", result)
+```
 ## PDF optimization
 ```