swarmauri_parser_fitzpdf 0.8.0.dev4__tar.gz → 0.8.0.dev22__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,107 @@
1
+ Metadata-Version: 2.4
2
+ Name: swarmauri_parser_fitzpdf
3
+ Version: 0.8.0.dev22
4
+ Summary: Fitz PDF Parser for Swarmauri.
5
+ License-Expression: Apache-2.0
6
+ License-File: LICENSE
7
+ Keywords: swarmauri,parser,fitzpdf,fitz,pdf
8
+ Author: Jacob Stewart
9
+ Author-email: jacob@swarmauri.com
10
+ Requires-Python: >=3.10,<3.13
11
+ Classifier: License :: OSI Approved :: Apache Software License
12
+ Classifier: Programming Language :: Python :: 3.10
13
+ Classifier: Programming Language :: Python :: 3.11
14
+ Classifier: Programming Language :: Python :: 3.12
15
+ Classifier: Natural Language :: English
16
+ Classifier: Development Status :: 3 - Alpha
17
+ Classifier: Intended Audience :: Developers
18
+ Classifier: Topic :: Software Development :: Libraries :: Application Frameworks
19
+ Requires-Dist: PyMuPDF (>=1.24.12)
20
+ Requires-Dist: swarmauri_base
21
+ Requires-Dist: swarmauri_core
22
+ Requires-Dist: swarmauri_standard
23
+ Description-Content-Type: text/markdown
24
+
25
+ ![Swarmauri Logo](https://github.com/swarmauri/swarmauri-sdk/blob/3d4d1cfa949399d7019ae9d8f296afba773dfb7f/assets/swarmauri.brand.theme.svg)
26
+
27
+ <p align="center">
28
+ <a href="https://pypi.org/project/swarmauri_parser_fitzpdf/">
29
+ <img src="https://img.shields.io/pypi/dm/swarmauri_parser_fitzpdf" alt="PyPI - Downloads"/></a>
30
+ <a href="https://hits.sh/github.com/swarmauri/swarmauri-sdk/tree/master/pkgs/community/swarmauri_parser_fitzpdf/">
31
+ <img alt="Hits" src="https://hits.sh/github.com/swarmauri/swarmauri-sdk/tree/master/pkgs/community/swarmauri_parser_fitzpdf.svg"/></a>
32
+ <a href="https://pypi.org/project/swarmauri_parser_fitzpdf/">
33
+ <img src="https://img.shields.io/pypi/pyversions/swarmauri_parser_fitzpdf" alt="PyPI - Python Version"/></a>
34
+ <a href="https://pypi.org/project/swarmauri_parser_fitzpdf/">
35
+ <img src="https://img.shields.io/pypi/l/swarmauri_parser_fitzpdf" alt="PyPI - License"/></a>
36
+ <a href="https://pypi.org/project/swarmauri_parser_fitzpdf/">
37
+ <img src="https://img.shields.io/pypi/v/swarmauri_parser_fitzpdf?label=swarmauri_parser_fitzpdf&color=green" alt="PyPI - swarmauri_parser_fitzpdf"/></a>
38
+ </p>
39
+
40
+ ---
41
+
42
+ # Swarmauri Parser Fitz PDF
43
+
44
+ PDF-to-text parser for Swarmauri built on [PyMuPDF (`pymupdf`)](https://pymupdf.readthedocs.io/). Extracts text from every page of a PDF and returns a `Document` object with the aggregated content and source metadata.
45
+
46
+ ## Features
47
+
48
+ - Opens PDFs via PyMuPDF and collects text per page.
49
+ - Emits a single `Document` with `content` containing the combined text and `metadata['source']` holding the file path.
50
+ - Raises a clear error if the input is not a file path string; returns an empty list if PyMuPDF encounters parsing failures.
51
+
52
+ ## Prerequisites
53
+
54
+ - Python 3.10 or newer.
55
+ - PyMuPDF (`pymupdf`) along with system dependencies (X11 libraries on Linux, poppler on some distros). Install OS packages listed in [PyMuPDF docs](https://pymupdf.readthedocs.io/en/latest/installation.html) before pip installing if needed.
56
+ - Read access to the PDF files you plan to parse.
57
+
58
+ ## Installation
59
+
60
+ ```bash
61
+ # pip
62
+ pip install swarmauri_parser_fitzpdf
63
+
64
+ # poetry
65
+ poetry add swarmauri_parser_fitzpdf
66
+
67
+ # uv (pyproject-based projects)
68
+ uv add swarmauri_parser_fitzpdf
69
+ ```
70
+
71
+ ## Quickstart
72
+
73
+ ```python
74
+ from swarmauri_parser_fitzpdf import FitzPdfParser
75
+
76
+ parser = FitzPdfParser()
77
+ documents = parser.parse("reports/quarterly.pdf")
78
+
79
+ for doc in documents:
80
+ print(doc.metadata["source"])
81
+ print(doc.content[:500])
82
+ ```
83
+
84
+ ## Handling Errors
85
+
86
+ ```python
87
+ from swarmauri_parser_fitzpdf import FitzPdfParser
88
+
89
+ parser = FitzPdfParser()
90
+ try:
91
+ docs = parser.parse("missing.pdf")
92
+ if not docs:
93
+ print("Parsing failed or returned no content.")
94
+ except ValueError as exc:
95
+ print(f"Bad input: {exc}")
96
+ ```
97
+
98
+ ## Tips
99
+
100
+ - Pre-process PDFs (deskew, OCR) before parsing if they contain scanned pages without embedded text; PyMuPDF only extracts existing text objects.
101
+ - For multi-document pipelines, pair this parser with Swarmauri token-count measurements or summarizers to chunk large PDFs.
102
+ - Cache parsed output if the same PDF is accessed frequently—parsing large documents repeatedly is expensive.
103
+
104
+ ## Want to help?
105
+
106
+ If you want to contribute to swarmauri-sdk, read up on our [guidelines for contributing](https://github.com/swarmauri/swarmauri-sdk/blob/master/contributing.md) that will help you get started.
107
+
@@ -0,0 +1,82 @@
1
+ ![Swarmauri Logo](https://github.com/swarmauri/swarmauri-sdk/blob/3d4d1cfa949399d7019ae9d8f296afba773dfb7f/assets/swarmauri.brand.theme.svg)
2
+
3
+ <p align="center">
4
+ <a href="https://pypi.org/project/swarmauri_parser_fitzpdf/">
5
+ <img src="https://img.shields.io/pypi/dm/swarmauri_parser_fitzpdf" alt="PyPI - Downloads"/></a>
6
+ <a href="https://hits.sh/github.com/swarmauri/swarmauri-sdk/tree/master/pkgs/community/swarmauri_parser_fitzpdf/">
7
+ <img alt="Hits" src="https://hits.sh/github.com/swarmauri/swarmauri-sdk/tree/master/pkgs/community/swarmauri_parser_fitzpdf.svg"/></a>
8
+ <a href="https://pypi.org/project/swarmauri_parser_fitzpdf/">
9
+ <img src="https://img.shields.io/pypi/pyversions/swarmauri_parser_fitzpdf" alt="PyPI - Python Version"/></a>
10
+ <a href="https://pypi.org/project/swarmauri_parser_fitzpdf/">
11
+ <img src="https://img.shields.io/pypi/l/swarmauri_parser_fitzpdf" alt="PyPI - License"/></a>
12
+ <a href="https://pypi.org/project/swarmauri_parser_fitzpdf/">
13
+ <img src="https://img.shields.io/pypi/v/swarmauri_parser_fitzpdf?label=swarmauri_parser_fitzpdf&color=green" alt="PyPI - swarmauri_parser_fitzpdf"/></a>
14
+ </p>
15
+
16
+ ---
17
+
18
+ # Swarmauri Parser Fitz PDF
19
+
20
+ PDF-to-text parser for Swarmauri built on [PyMuPDF (`pymupdf`)](https://pymupdf.readthedocs.io/). Extracts text from every page of a PDF and returns a `Document` object with the aggregated content and source metadata.
21
+
22
+ ## Features
23
+
24
+ - Opens PDFs via PyMuPDF and collects text per page.
25
+ - Emits a single `Document` with `content` containing the combined text and `metadata['source']` holding the file path.
26
+ - Raises a clear error if the input is not a file path string; returns an empty list if PyMuPDF encounters parsing failures.
27
+
28
+ ## Prerequisites
29
+
30
+ - Python 3.10 or newer.
31
+ - PyMuPDF (`pymupdf`) along with system dependencies (X11 libraries on Linux, poppler on some distros). Install OS packages listed in [PyMuPDF docs](https://pymupdf.readthedocs.io/en/latest/installation.html) before pip installing if needed.
32
+ - Read access to the PDF files you plan to parse.
33
+
34
+ ## Installation
35
+
36
+ ```bash
37
+ # pip
38
+ pip install swarmauri_parser_fitzpdf
39
+
40
+ # poetry
41
+ poetry add swarmauri_parser_fitzpdf
42
+
43
+ # uv (pyproject-based projects)
44
+ uv add swarmauri_parser_fitzpdf
45
+ ```
46
+
47
+ ## Quickstart
48
+
49
+ ```python
50
+ from swarmauri_parser_fitzpdf import FitzPdfParser
51
+
52
+ parser = FitzPdfParser()
53
+ documents = parser.parse("reports/quarterly.pdf")
54
+
55
+ for doc in documents:
56
+ print(doc.metadata["source"])
57
+ print(doc.content[:500])
58
+ ```
59
+
60
+ ## Handling Errors
61
+
62
+ ```python
63
+ from swarmauri_parser_fitzpdf import FitzPdfParser
64
+
65
+ parser = FitzPdfParser()
66
+ try:
67
+ docs = parser.parse("missing.pdf")
68
+ if not docs:
69
+ print("Parsing failed or returned no content.")
70
+ except ValueError as exc:
71
+ print(f"Bad input: {exc}")
72
+ ```
73
+
74
+ ## Tips
75
+
76
+ - Pre-process PDFs (deskew, OCR) before parsing if they contain scanned pages without embedded text; PyMuPDF only extracts existing text objects.
77
+ - For multi-document pipelines, pair this parser with Swarmauri token-count measurements or summarizers to chunk large PDFs.
78
+ - Cache parsed output if the same PDF is accessed frequently—parsing large documents repeatedly is expensive.
79
+
80
+ ## Want to help?
81
+
82
+ If you want to contribute to swarmauri-sdk, read up on our [guidelines for contributing](https://github.com/swarmauri/swarmauri-sdk/blob/master/contributing.md) that will help you get started.
@@ -1,6 +1,6 @@
1
1
  [project]
2
2
  name = "swarmauri_parser_fitzpdf"
3
- version = "0.8.0.dev4"
3
+ version = "0.8.0.dev22"
4
4
  description = "Fitz PDF Parser for Swarmauri."
5
5
  license = "Apache-2.0"
6
6
  readme = "README.md"
@@ -11,6 +11,10 @@ classifiers = [
11
11
  "Programming Language :: Python :: 3.10",
12
12
  "Programming Language :: Python :: 3.11",
13
13
  "Programming Language :: Python :: 3.12",
14
+ "Natural Language :: English",
15
+ "Development Status :: 3 - Alpha",
16
+ "Intended Audience :: Developers",
17
+ "Topic :: Software Development :: Libraries :: Application Frameworks",
14
18
  ]
15
19
  authors = [{ name = "Jacob Stewart", email = "jacob@swarmauri.com" }]
16
20
  dependencies = [
@@ -19,6 +23,13 @@ dependencies = [
19
23
  "swarmauri_base",
20
24
  "swarmauri_standard",
21
25
  ]
26
+ keywords = [
27
+ "swarmauri",
28
+ "parser",
29
+ "fitzpdf",
30
+ "fitz",
31
+ "pdf",
32
+ ]
22
33
 
23
34
  [tool.uv.sources]
24
35
  swarmauri_core = { workspace = true }
@@ -1,64 +0,0 @@
1
- Metadata-Version: 2.3
2
- Name: swarmauri_parser_fitzpdf
3
- Version: 0.8.0.dev4
4
- Summary: Fitz PDF Parser for Swarmauri.
5
- License: Apache-2.0
6
- Author: Jacob Stewart
7
- Author-email: jacob@swarmauri.com
8
- Requires-Python: >=3.10,<3.13
9
- Classifier: License :: OSI Approved :: Apache Software License
10
- Classifier: Programming Language :: Python :: 3.10
11
- Classifier: Programming Language :: Python :: 3.11
12
- Classifier: Programming Language :: Python :: 3.12
13
- Requires-Dist: PyMuPDF (>=1.24.12)
14
- Requires-Dist: swarmauri_base
15
- Requires-Dist: swarmauri_core
16
- Requires-Dist: swarmauri_standard
17
- Description-Content-Type: text/markdown
18
-
19
-
20
- ![Swamauri Logo](https://res.cloudinary.com/dbjmpekvl/image/upload/v1730099724/Swarmauri-logo-lockup-2048x757_hww01w.png)
21
-
22
- <p align="center">
23
- <a href="https://pypi.org/project/swarmauri_parser_fitzpdf/">
24
- <img src="https://img.shields.io/pypi/dm/swarmauri_parser_fitzpdf" alt="PyPI - Downloads"/></a>
25
- <a href="https://hits.sh/github.com/swarmauri/swarmauri-sdk/tree/master/pkgs/community/swarmauri_parser_fitzpdf/">
26
- <img alt="Hits" src="https://hits.sh/github.com/swarmauri/swarmauri-sdk/tree/master/pkgs/community/swarmauri_parser_fitzpdf.svg"/></a>
27
- <a href="https://pypi.org/project/swarmauri_parser_fitzpdf/">
28
- <img src="https://img.shields.io/pypi/pyversions/swarmauri_parser_fitzpdf" alt="PyPI - Python Version"/></a>
29
- <a href="https://pypi.org/project/swarmauri_parser_fitzpdf/">
30
- <img src="https://img.shields.io/pypi/l/swarmauri_parser_fitzpdf" alt="PyPI - License"/></a>
31
- <a href="https://pypi.org/project/swarmauri_parser_fitzpdf/">
32
- <img src="https://img.shields.io/pypi/v/swarmauri_parser_fitzpdf?label=swarmauri_parser_fitzpdf&color=green" alt="PyPI - swarmauri_parser_fitzpdf"/></a>
33
- </p>
34
-
35
- ---
36
-
37
- # Swarmauri Parser Fitz PDF
38
-
39
- A parser to extract text from PDF files using PyMuPDF.
40
-
41
- ## Installation
42
-
43
- ```bash
44
- pip install swarmauri_parser_fitzpdf
45
- ```
46
-
47
- ## Usage
48
- Basic usage examples with code snippets
49
- ```python
50
- from swarmauri.parsers.FitzPdfParser import PDFtoTextParser
51
-
52
- parser = PDFtoTextParser()
53
- file_path = "path/to/your/pdf/file.pdf"
54
- documents = parser.parse(file_path)
55
-
56
- for document in documents:
57
- print(document.content)
58
- print(document.metadata)
59
- ```
60
-
61
- ## Want to help?
62
-
63
- If you want to contribute to swarmauri-sdk, read up on our [guidelines for contributing](https://github.com/swarmauri/swarmauri-sdk/blob/master/contributing.md) that will help you get started.
64
-
@@ -1,45 +0,0 @@
1
-
2
- ![Swamauri Logo](https://res.cloudinary.com/dbjmpekvl/image/upload/v1730099724/Swarmauri-logo-lockup-2048x757_hww01w.png)
3
-
4
- <p align="center">
5
- <a href="https://pypi.org/project/swarmauri_parser_fitzpdf/">
6
- <img src="https://img.shields.io/pypi/dm/swarmauri_parser_fitzpdf" alt="PyPI - Downloads"/></a>
7
- <a href="https://hits.sh/github.com/swarmauri/swarmauri-sdk/tree/master/pkgs/community/swarmauri_parser_fitzpdf/">
8
- <img alt="Hits" src="https://hits.sh/github.com/swarmauri/swarmauri-sdk/tree/master/pkgs/community/swarmauri_parser_fitzpdf.svg"/></a>
9
- <a href="https://pypi.org/project/swarmauri_parser_fitzpdf/">
10
- <img src="https://img.shields.io/pypi/pyversions/swarmauri_parser_fitzpdf" alt="PyPI - Python Version"/></a>
11
- <a href="https://pypi.org/project/swarmauri_parser_fitzpdf/">
12
- <img src="https://img.shields.io/pypi/l/swarmauri_parser_fitzpdf" alt="PyPI - License"/></a>
13
- <a href="https://pypi.org/project/swarmauri_parser_fitzpdf/">
14
- <img src="https://img.shields.io/pypi/v/swarmauri_parser_fitzpdf?label=swarmauri_parser_fitzpdf&color=green" alt="PyPI - swarmauri_parser_fitzpdf"/></a>
15
- </p>
16
-
17
- ---
18
-
19
- # Swarmauri Parser Fitz PDF
20
-
21
- A parser to extract text from PDF files using PyMuPDF.
22
-
23
- ## Installation
24
-
25
- ```bash
26
- pip install swarmauri_parser_fitzpdf
27
- ```
28
-
29
- ## Usage
30
- Basic usage examples with code snippets
31
- ```python
32
- from swarmauri.parsers.FitzPdfParser import PDFtoTextParser
33
-
34
- parser = PDFtoTextParser()
35
- file_path = "path/to/your/pdf/file.pdf"
36
- documents = parser.parse(file_path)
37
-
38
- for document in documents:
39
- print(document.content)
40
- print(document.metadata)
41
- ```
42
-
43
- ## Want to help?
44
-
45
- If you want to contribute to swarmauri-sdk, read up on our [guidelines for contributing](https://github.com/swarmauri/swarmauri-sdk/blob/master/contributing.md) that will help you get started.