opendataloader-pdf 0.0.13__py3-none-any.whl → 0.0.15__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.


This version of opendataloader-pdf might be problematic. Click here for more details.

@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: opendataloader-pdf
3
- Version: 0.0.13
3
+ Version: 0.0.15
4
4
  Summary: A Python wrapper for the opendataloader-pdf Java CLI.
5
5
  Home-page: https://github.com/opendataloader-project/opendataloader-pdf
6
6
  Author: opendataloader-project
@@ -10,6 +10,7 @@ Classifier: Programming Language :: Python :: 3
10
10
  Classifier: Operating System :: OS Independent
11
11
  Requires-Python: >=3.7
12
12
  Description-Content-Type: text/markdown
13
+ Requires-Dist: importlib_resources; python_version < "3.9"
13
14
  Dynamic: author
14
15
  Dynamic: author-email
15
16
  Dynamic: classifier
@@ -17,6 +18,7 @@ Dynamic: description
17
18
  Dynamic: description-content-type
18
19
  Dynamic: home-page
19
20
  Dynamic: license
21
+ Dynamic: requires-dist
20
22
  Dynamic: requires-python
21
23
  Dynamic: summary
22
24
 
@@ -34,7 +36,7 @@ Dynamic: summary
34
36
 
35
37
  <br/>
36
38
 
37
- **Safe, Open, High-Performance — OpenDataLoader PDF for AI**
39
+ **Safe, Open, High-Performance — PDF for AI**
38
40
 
39
41
  OpenDataLoader-PDF converts PDFs into JSON, Markdown or Html — ready to feed into modern AI stacks (LLMs, vector search, and RAG).
40
42
 
@@ -51,19 +53,21 @@ AI-safety is enabled by default and automatically filters likely prompt-injectio
51
53
  - 🔒 **Local-First Privacy** — Runs fully on your machine
52
54
  - ⚡ **Fast & Lightweight** — Rule-Based Heuristic, High-Throughput, No GPU
53
55
  - 🛡️ **AI-Safety** — Auto-Filters likely prompt-injection content
54
- - 🆓 **Open-Source** — Free for commercial use
56
+ - 👐 **Open-Source** — Free for commercial use
55
57
  - 🖍️ **Annotated PDF Visualization** — See detected structures overlaid on the original
56
58
 
57
- ![Annotated PDF Example](https://raw.githubusercontent.com/opendataloader-project/opendataloader-pdf/main/resources/example_annotated_pdf.png)
59
+ [Download Annotated PDF Sample](https://raw.githubusercontent.com/opendataloader-project/opendataloader-pdf/main/resources/1901.03003_annotated.pdf)
60
+
61
+ ![Annotated PDF Preview](https://raw.githubusercontent.com/opendataloader-project/opendataloader-pdf/main/resources/example_annotated_pdf.png)
58
62
 
59
63
  <br/>
60
64
 
61
65
  ## 🚀 Upcoming Features
62
66
 
63
- - 🖨️ **OCR for scanned PDFs** — image-only pages → selectable text
64
- - 🧠 **Table AI option** — higher accuracy for borderless/merged cells
65
- - 📊 **Layout benchmarks** — public datasets & metrics; regular reports
66
- - 🛡️ **AI-Safety red-team** — adversarial datasets & metrics; regular reports
67
+ - 🖨️ **OCR for scanned PDFs** — Extract data from image-only pages
68
+ - 🧠 **Table AI option** — Higher accuracy for tables with borderless or merged cells
69
+ - **Performance Benchmarks** — Transparent evaluations with open datasets and metrics, reported regularly
70
+ - 🛡️ **AI Red Teaming** — Transparent adversarial benchmarks with datasets and metrics, reported regularly
67
71
 
68
72
  <br/>
69
73
 
@@ -79,7 +83,7 @@ AI-safety is enabled by default and automatically filters likely prompt-injectio
79
83
  ### Installation
80
84
 
81
85
  ```sh
82
- pip install -U opendataloader-pdf importlib_resources
86
+ pip install -U opendataloader-pdf
83
87
  ```
84
88
 
85
89
  ### Usage
@@ -13,8 +13,8 @@ opendataloader_pdf/THIRD_PARTY/licenses/LICENSE-JJ2000.txt,sha256=itSesIy3XiNWgJ
13
13
  opendataloader_pdf/THIRD_PARTY/licenses/MIT.txt,sha256=JPCdbR3BU0uO_KypOd3sGWnKwlVHGq4l0pmrjoGtop8,1078
14
14
  opendataloader_pdf/THIRD_PARTY/licenses/MPL-2.0.txt,sha256=CGF6Fx5WV7DJmRZJ8_6w6JEt2N9bu4p6zDo18fTHHRw,15818
15
15
  opendataloader_pdf/THIRD_PARTY/licenses/Plexus Classworlds License.txt,sha256=ZQuKXwVz4FeC34ApB20vYg8kPTwgIUKRzEk5ew74-hU,1937
16
- opendataloader_pdf/jar/opendataloader-pdf-cli.jar,sha256=TaorIHoCQsG-1TPaUga3NgMZuj6M87sruXN5VigPO_Q,22126020
17
- opendataloader_pdf-0.0.13.dist-info/METADATA,sha256=wa1J5J8DXLQQgH-uGWhCc9MSvH9JmVC8Ww-9F9R_r0Q,18452
18
- opendataloader_pdf-0.0.13.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
19
- opendataloader_pdf-0.0.13.dist-info/top_level.txt,sha256=xee0qFQd6HPfS50E2NLICGuR6cq9C9At5SJ81yv5HkY,19
20
- opendataloader_pdf-0.0.13.dist-info/RECORD,,
16
+ opendataloader_pdf/jar/opendataloader-pdf-cli.jar,sha256=GCTahEYOHGxpId3ce3pbkB4C2CVf2VbHMY78WjvzIk4,22126046
17
+ opendataloader_pdf-0.0.15.dist-info/METADATA,sha256=7J_lFR5yzyXMywass6JZaUh6GSt3UG4nBInfMlS_c5c,18727
18
+ opendataloader_pdf-0.0.15.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
19
+ opendataloader_pdf-0.0.15.dist-info/top_level.txt,sha256=xee0qFQd6HPfS50E2NLICGuR6cq9C9At5SJ81yv5HkY,19
20
+ opendataloader_pdf-0.0.15.dist-info/RECORD,,