opendataloader-pdf 0.0.13__py3-none-any.whl → 0.0.15__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Potentially problematic release.
This version of opendataloader-pdf might be problematic. Click here for more details.
- opendataloader_pdf/jar/opendataloader-pdf-cli.jar +0 -0
- {opendataloader_pdf-0.0.13.dist-info → opendataloader_pdf-0.0.15.dist-info}/METADATA +13 -9
- {opendataloader_pdf-0.0.13.dist-info → opendataloader_pdf-0.0.15.dist-info}/RECORD +5 -5
- {opendataloader_pdf-0.0.13.dist-info → opendataloader_pdf-0.0.15.dist-info}/WHEEL +0 -0
- {opendataloader_pdf-0.0.13.dist-info → opendataloader_pdf-0.0.15.dist-info}/top_level.txt +0 -0
|
Binary file
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: opendataloader-pdf
|
|
3
|
-
Version: 0.0.
|
|
3
|
+
Version: 0.0.15
|
|
4
4
|
Summary: A Python wrapper for the opendataloader-pdf Java CLI.
|
|
5
5
|
Home-page: https://github.com/opendataloader-project/opendataloader-pdf
|
|
6
6
|
Author: opendataloader-project
|
|
@@ -10,6 +10,7 @@ Classifier: Programming Language :: Python :: 3
|
|
|
10
10
|
Classifier: Operating System :: OS Independent
|
|
11
11
|
Requires-Python: >=3.7
|
|
12
12
|
Description-Content-Type: text/markdown
|
|
13
|
+
Requires-Dist: importlib_resources; python_version < "3.9"
|
|
13
14
|
Dynamic: author
|
|
14
15
|
Dynamic: author-email
|
|
15
16
|
Dynamic: classifier
|
|
@@ -17,6 +18,7 @@ Dynamic: description
|
|
|
17
18
|
Dynamic: description-content-type
|
|
18
19
|
Dynamic: home-page
|
|
19
20
|
Dynamic: license
|
|
21
|
+
Dynamic: requires-dist
|
|
20
22
|
Dynamic: requires-python
|
|
21
23
|
Dynamic: summary
|
|
22
24
|
|
|
@@ -34,7 +36,7 @@ Dynamic: summary
|
|
|
34
36
|
|
|
35
37
|
<br/>
|
|
36
38
|
|
|
37
|
-
**Safe, Open, High-Performance —
|
|
39
|
+
**Safe, Open, High-Performance — PDF for AI**
|
|
38
40
|
|
|
39
41
|
OpenDataLoader-PDF converts PDFs into JSON, Markdown or Html — ready to feed into modern AI stacks (LLMs, vector search, and RAG).
|
|
40
42
|
|
|
@@ -51,19 +53,21 @@ AI-safety is enabled by default and automatically filters likely prompt-injectio
|
|
|
51
53
|
- 🔒 **Local-First Privacy** — Runs fully on your machine
|
|
52
54
|
- ⚡ **Fast & Lightweight** — Rule-Based Heuristic, High-Throughput, No GPU
|
|
53
55
|
- 🛡️ **AI-Safety** — Auto-Filters likely prompt-injection content
|
|
54
|
-
-
|
|
56
|
+
- 👐 **Open-Source** — Free for commercial use
|
|
55
57
|
- 🖍️ **Annotated PDF Visualization** — See detected structures overlaid on the original
|
|
56
58
|
|
|
57
|
-
|
|
59
|
+
[Download Annotated PDF Sample](https://raw.githubusercontent.com/opendataloader-project/opendataloader-pdf/main/resources/1901.03003_annotated.pdf)
|
|
60
|
+
|
|
61
|
+

|
|
58
62
|
|
|
59
63
|
<br/>
|
|
60
64
|
|
|
61
65
|
## 🚀 Upcoming Features
|
|
62
66
|
|
|
63
|
-
- 🖨️ **OCR for scanned PDFs** — image-only pages
|
|
64
|
-
- 🧠 **Table AI option** —
|
|
65
|
-
-
|
|
66
|
-
- 🛡️ **AI
|
|
67
|
+
- 🖨️ **OCR for scanned PDFs** — Extract data from image-only pages
|
|
68
|
+
- 🧠 **Table AI option** — Higher accuracy for tables with borderless or merged cells
|
|
69
|
+
- ⚡ **Performance Benchmarks** — Transparent evaluations with open datasets and metrics, reported regularly
|
|
70
|
+
- 🛡️ **AI Red Teaming** — Transparent adversarial benchmarks with datasets and metrics, reported regularly
|
|
67
71
|
|
|
68
72
|
<br/>
|
|
69
73
|
|
|
@@ -79,7 +83,7 @@ AI-safety is enabled by default and automatically filters likely prompt-injectio
|
|
|
79
83
|
### Installation
|
|
80
84
|
|
|
81
85
|
```sh
|
|
82
|
-
pip install -U opendataloader-pdf
|
|
86
|
+
pip install -U opendataloader-pdf
|
|
83
87
|
```
|
|
84
88
|
|
|
85
89
|
### Usage
|
|
@@ -13,8 +13,8 @@ opendataloader_pdf/THIRD_PARTY/licenses/LICENSE-JJ2000.txt,sha256=itSesIy3XiNWgJ
|
|
|
13
13
|
opendataloader_pdf/THIRD_PARTY/licenses/MIT.txt,sha256=JPCdbR3BU0uO_KypOd3sGWnKwlVHGq4l0pmrjoGtop8,1078
|
|
14
14
|
opendataloader_pdf/THIRD_PARTY/licenses/MPL-2.0.txt,sha256=CGF6Fx5WV7DJmRZJ8_6w6JEt2N9bu4p6zDo18fTHHRw,15818
|
|
15
15
|
opendataloader_pdf/THIRD_PARTY/licenses/Plexus Classworlds License.txt,sha256=ZQuKXwVz4FeC34ApB20vYg8kPTwgIUKRzEk5ew74-hU,1937
|
|
16
|
-
opendataloader_pdf/jar/opendataloader-pdf-cli.jar,sha256=
|
|
17
|
-
opendataloader_pdf-0.0.
|
|
18
|
-
opendataloader_pdf-0.0.
|
|
19
|
-
opendataloader_pdf-0.0.
|
|
20
|
-
opendataloader_pdf-0.0.
|
|
16
|
+
opendataloader_pdf/jar/opendataloader-pdf-cli.jar,sha256=GCTahEYOHGxpId3ce3pbkB4C2CVf2VbHMY78WjvzIk4,22126046
|
|
17
|
+
opendataloader_pdf-0.0.15.dist-info/METADATA,sha256=7J_lFR5yzyXMywass6JZaUh6GSt3UG4nBInfMlS_c5c,18727
|
|
18
|
+
opendataloader_pdf-0.0.15.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
|
|
19
|
+
opendataloader_pdf-0.0.15.dist-info/top_level.txt,sha256=xee0qFQd6HPfS50E2NLICGuR6cq9C9At5SJ81yv5HkY,19
|
|
20
|
+
opendataloader_pdf-0.0.15.dist-info/RECORD,,
|
|
File without changes
|
|
File without changes
|