opendataloader-pdf 0.0.0__py3-none-any.whl → 0.0.5__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Potentially problematic release.
This version of opendataloader-pdf might be problematic. Click here for more details.
- opendataloader_pdf/jar/opendataloader-pdf-cli.jar +0 -0
- opendataloader_pdf/wrapper.py +1 -1
- {opendataloader_pdf-0.0.0.dist-info → opendataloader_pdf-0.0.5.dist-info}/METADATA +67 -91
- {opendataloader_pdf-0.0.0.dist-info → opendataloader_pdf-0.0.5.dist-info}/RECORD +6 -6
- {opendataloader_pdf-0.0.0.dist-info → opendataloader_pdf-0.0.5.dist-info}/WHEEL +0 -0
- {opendataloader_pdf-0.0.0.dist-info → opendataloader_pdf-0.0.5.dist-info}/top_level.txt +0 -0
|
Binary file
|
opendataloader_pdf/wrapper.py
CHANGED
|
@@ -1,91 +1,67 @@
|
|
|
1
|
-
Metadata-Version: 2.4
|
|
2
|
-
Name: opendataloader-pdf
|
|
3
|
-
Version: 0.0.
|
|
4
|
-
Summary: A Python wrapper for the opendataloader-pdf Java CLI.
|
|
5
|
-
Home-page: https://github.com/opendataloader-project/opendataloader-pdf
|
|
6
|
-
Author: opendataloader-project
|
|
7
|
-
Author-email: open.dataloader@hancom.com
|
|
8
|
-
License: MPL-2.0
|
|
9
|
-
Classifier: Programming Language :: Python :: 3
|
|
10
|
-
Classifier: Operating System :: OS Independent
|
|
11
|
-
Requires-Python: >=3.7
|
|
12
|
-
Description-Content-Type: text/markdown
|
|
13
|
-
Dynamic: author
|
|
14
|
-
Dynamic: author-email
|
|
15
|
-
Dynamic: classifier
|
|
16
|
-
Dynamic: description
|
|
17
|
-
Dynamic: description-content-type
|
|
18
|
-
Dynamic: home-page
|
|
19
|
-
Dynamic: license
|
|
20
|
-
Dynamic: requires-python
|
|
21
|
-
Dynamic: summary
|
|
22
|
-
|
|
23
|
-
# Opendataloader PDF Python Wrapper
|
|
24
|
-
|
|
25
|
-
This package is a Python wrapper for the `opendataloader-pdf` Java command-line tool.
|
|
26
|
-
|
|
27
|
-
It allows you to process PDF files and convert them to JSON or Markdown format directly from Python.
|
|
28
|
-
|
|
29
|
-
## Prerequisites
|
|
30
|
-
|
|
31
|
-
- Java 11 or higher must be installed and available in your system's PATH.
|
|
32
|
-
|
|
33
|
-
## Installation
|
|
34
|
-
|
|
35
|
-
```bash
|
|
36
|
-
pip install opendataloader-pdf
|
|
37
|
-
```
|
|
38
|
-
|
|
39
|
-
## Usage
|
|
40
|
-
|
|
41
|
-
Here is a basic example of how to use
|
|
42
|
-
|
|
43
|
-
```python
|
|
44
|
-
import opendataloader_pdf
|
|
45
|
-
|
|
46
|
-
|
|
47
|
-
|
|
48
|
-
|
|
49
|
-
|
|
50
|
-
|
|
51
|
-
|
|
52
|
-
|
|
53
|
-
|
|
54
|
-
|
|
55
|
-
|
|
56
|
-
|
|
57
|
-
|
|
58
|
-
|
|
59
|
-
|
|
60
|
-
|
|
61
|
-
|
|
62
|
-
|
|
63
|
-
|
|
64
|
-
|
|
65
|
-
|
|
66
|
-
|
|
67
|
-
|
|
68
|
-
except Exception as e:
|
|
69
|
-
print(f"An error occurred: {e}")
|
|
70
|
-
```
|
|
71
|
-
|
|
72
|
-
## Function: `run()`
|
|
73
|
-
|
|
74
|
-
The main function to process PDFs.
|
|
75
|
-
|
|
76
|
-
### Parameters
|
|
77
|
-
|
|
78
|
-
- `input_path` (str): Path to the input PDF file or folder. **(Required)**
|
|
79
|
-
- `output_folder` (str, optional): Path to the output folder. Defaults to the input folder.
|
|
80
|
-
- `password` (str, optional): Password for the PDF file.
|
|
81
|
-
- `generate_markdown` (bool, optional): If `True`, generates a Markdown output file. Defaults to `False`.
|
|
82
|
-
- `generate_pdf` (bool, optional): If `True`, generates an annotated PDF output file. Defaults to `False`.
|
|
83
|
-
- `keep_line_breaks` (bool, optional): If `True`, keeps line breaks in the output. Defaults to `False`.
|
|
84
|
-
- `find_hidden_text` (bool, optional): If `True`, finds hidden text in the PDF. Defaults to `False`.
|
|
85
|
-
- `html_in_markdown` (bool, optional): If `True`, uses HTML in the Markdown output. Defaults to `False`.
|
|
86
|
-
- `add_image_to_markdown` (bool, optional): If `True`, adds images to the Markdown output. Defaults to `False`.
|
|
87
|
-
- `debug` (bool, optional): If `True`, prints all messages from the CLI to the console during execution. Defaults to `False`.
|
|
88
|
-
|
|
89
|
-
### Returns
|
|
90
|
-
|
|
91
|
-
- `str`: The standard output from the CLI tool. When processing a single file without other output formats specified, this will be the JSON content. When `debug=True`, this will be the combined stdout and stderr from the CLI tool.
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: opendataloader-pdf
|
|
3
|
+
Version: 0.0.5
|
|
4
|
+
Summary: A Python wrapper for the opendataloader-pdf Java CLI.
|
|
5
|
+
Home-page: https://github.com/opendataloader-project/opendataloader-pdf
|
|
6
|
+
Author: opendataloader-project
|
|
7
|
+
Author-email: open.dataloader@hancom.com
|
|
8
|
+
License: MPL-2.0
|
|
9
|
+
Classifier: Programming Language :: Python :: 3
|
|
10
|
+
Classifier: Operating System :: OS Independent
|
|
11
|
+
Requires-Python: >=3.7
|
|
12
|
+
Description-Content-Type: text/markdown
|
|
13
|
+
Dynamic: author
|
|
14
|
+
Dynamic: author-email
|
|
15
|
+
Dynamic: classifier
|
|
16
|
+
Dynamic: description
|
|
17
|
+
Dynamic: description-content-type
|
|
18
|
+
Dynamic: home-page
|
|
19
|
+
Dynamic: license
|
|
20
|
+
Dynamic: requires-python
|
|
21
|
+
Dynamic: summary
|
|
22
|
+
|
|
23
|
+
# Opendataloader PDF Python Wrapper
|
|
24
|
+
|
|
25
|
+
This package is a Python wrapper for the `opendataloader-pdf` Java command-line tool.
|
|
26
|
+
|
|
27
|
+
It allows you to process PDF files and convert them to JSON or Markdown format directly from Python.
|
|
28
|
+
|
|
29
|
+
## Prerequisites
|
|
30
|
+
|
|
31
|
+
- Java 11 or higher must be installed and available in your system's PATH.
|
|
32
|
+
|
|
33
|
+
## Installation
|
|
34
|
+
|
|
35
|
+
```bash
|
|
36
|
+
pip install opendataloader-pdf
|
|
37
|
+
```
|
|
38
|
+
|
|
39
|
+
## Usage
|
|
40
|
+
|
|
41
|
+
Here is a basic example of how to use:
|
|
42
|
+
|
|
43
|
+
```python
|
|
44
|
+
import opendataloader_pdf
|
|
45
|
+
|
|
46
|
+
opendataloader_pdf.run("path/to/document.pdf", to_markdown=True)
|
|
47
|
+
|
|
48
|
+
# If you don’t specify an output_folder,
|
|
49
|
+
# the output data will be saved in the same directory as the input document.
|
|
50
|
+
```
|
|
51
|
+
|
|
52
|
+
## Function: `run()`
|
|
53
|
+
|
|
54
|
+
The main function to process PDFs.
|
|
55
|
+
|
|
56
|
+
### Parameters
|
|
57
|
+
|
|
58
|
+
- `input_path` (str): Path to the input PDF file or folder. **(Required)**
|
|
59
|
+
- `output_folder` (str, optional): Path to the output folder. Defaults to the input folder.
|
|
60
|
+
- `password` (str, optional): Password for the PDF file.
|
|
61
|
+
- `to_markdown` (bool, optional): If `True`, generates a Markdown output file. Defaults to `False`.
|
|
62
|
+
- `to_annotated_pdf` (bool, optional): If `True`, generates an annotated PDF output file. Defaults to `False`.
|
|
63
|
+
- `keep_line_breaks` (bool, optional): If `True`, keeps line breaks in the output. Defaults to `False`.
|
|
64
|
+
- `find_hidden_text` (bool, optional): If `True`, finds hidden text in the PDF. Defaults to `False`.
|
|
65
|
+
- `html_in_markdown` (bool, optional): If `True`, uses HTML in the Markdown output. Defaults to `False`.
|
|
66
|
+
- `add_image_to_markdown` (bool, optional): If `True`, adds images to the Markdown output. Defaults to `False`.
|
|
67
|
+
- `debug` (bool, optional): If `True`, prints all messages from the CLI to the console during execution. Defaults to `False`.
|
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
opendataloader_pdf/LICENSE,sha256=rxdbnZbuk8IaA2FS4bkFsLlTBNSujCySHHYJEAuo334,15921
|
|
2
2
|
opendataloader_pdf/NOTICE.md,sha256=Uxc6sEbVz2hfsDinzzSNMtmsjx9HsQUod0yy0cswUwg,562
|
|
3
3
|
opendataloader_pdf/__init__.py,sha256=T5RV-dcgjNCm8klNy_EH-IgOeodcPg6Yc34HHXtuAmQ,44
|
|
4
|
-
opendataloader_pdf/wrapper.py,sha256=
|
|
4
|
+
opendataloader_pdf/wrapper.py,sha256=c9UnFHw0PaCd_FEwFasoxOZVuGM_owTMnrkNmPSVfH4,4414
|
|
5
5
|
opendataloader_pdf/THIRD_PARTY/THIRD_PARTY_LICENSES.md,sha256=QRYYiXFS2zBDGdmWRo_SrRfGhrdRBwhiRo1SdUKfrQo,11235
|
|
6
6
|
opendataloader_pdf/THIRD_PARTY/THIRD_PARTY_NOTICES.md,sha256=pB2ZitFM1u0x3rIDpMHsLxOe4OFNCZRqkzeR-bfpFzE,8911
|
|
7
7
|
opendataloader_pdf/THIRD_PARTY/licenses/Apache-2.0.txt,sha256=z8d0m5b2O9McPEK1xHG_dWgUBT6EfBDz6wA0F7xSPTA,11358
|
|
@@ -13,8 +13,8 @@ opendataloader_pdf/THIRD_PARTY/licenses/LICENSE-JJ2000.txt,sha256=itSesIy3XiNWgJ
|
|
|
13
13
|
opendataloader_pdf/THIRD_PARTY/licenses/MIT.txt,sha256=JPCdbR3BU0uO_KypOd3sGWnKwlVHGq4l0pmrjoGtop8,1078
|
|
14
14
|
opendataloader_pdf/THIRD_PARTY/licenses/MPL-2.0.txt,sha256=CGF6Fx5WV7DJmRZJ8_6w6JEt2N9bu4p6zDo18fTHHRw,15818
|
|
15
15
|
opendataloader_pdf/THIRD_PARTY/licenses/Plexus Classworlds License.txt,sha256=ZQuKXwVz4FeC34ApB20vYg8kPTwgIUKRzEk5ew74-hU,1937
|
|
16
|
-
opendataloader_pdf/jar/opendataloader-pdf-cli.jar,sha256=
|
|
17
|
-
opendataloader_pdf-0.0.
|
|
18
|
-
opendataloader_pdf-0.0.
|
|
19
|
-
opendataloader_pdf-0.0.
|
|
20
|
-
opendataloader_pdf-0.0.
|
|
16
|
+
opendataloader_pdf/jar/opendataloader-pdf-cli.jar,sha256=m4pZYjMlwvnDEQ4-K_FYaIzSSYCTFuD68hc7hGY_nhk,22114429
|
|
17
|
+
opendataloader_pdf-0.0.5.dist-info/METADATA,sha256=A3R7j4EZPgVFlfNvsFtVUjWqkir0z59H3i6L1TdI444,2352
|
|
18
|
+
opendataloader_pdf-0.0.5.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
|
|
19
|
+
opendataloader_pdf-0.0.5.dist-info/top_level.txt,sha256=xee0qFQd6HPfS50E2NLICGuR6cq9C9At5SJ81yv5HkY,19
|
|
20
|
+
opendataloader_pdf-0.0.5.dist-info/RECORD,,
|
|
File without changes
|
|
File without changes
|