pdfitdown 0.0.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- pdfitdown-0.0.0/LICENSE +21 -0
- pdfitdown-0.0.0/PKG-INFO +80 -0
- pdfitdown-0.0.0/README.md +64 -0
- pdfitdown-0.0.0/pyproject.toml +33 -0
- pdfitdown-0.0.0/setup.cfg +4 -0
- pdfitdown-0.0.0/src/pdfitdown/__init__.py +1 -0
- pdfitdown-0.0.0/src/pdfitdown/pdfconversion.py +43 -0
- pdfitdown-0.0.0/src/pdfitdown.egg-info/PKG-INFO +80 -0
- pdfitdown-0.0.0/src/pdfitdown.egg-info/SOURCES.txt +10 -0
- pdfitdown-0.0.0/src/pdfitdown.egg-info/dependency_links.txt +1 -0
- pdfitdown-0.0.0/src/pdfitdown.egg-info/requires.txt +2 -0
- pdfitdown-0.0.0/src/pdfitdown.egg-info/top_level.txt +1 -0
pdfitdown-0.0.0/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2024 Clelia (Astra) Bertelli
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
pdfitdown-0.0.0/PKG-INFO
ADDED
|
@@ -0,0 +1,80 @@
|
|
|
1
|
+
Metadata-Version: 2.1
|
|
2
|
+
Name: pdfitdown
|
|
3
|
+
Version: 0.0.0
|
|
4
|
+
Summary: PdfItDown - Convert Everything to PDF
|
|
5
|
+
Author-email: "Clelia (Astra) Bertelli" <astraberte9@gmail.com>
|
|
6
|
+
Project-URL: Homepage, https://github.com/AstraBert/PdfItDown
|
|
7
|
+
Project-URL: Issues, https://github.com/AstraBert/PdfItDown/issues
|
|
8
|
+
Classifier: Programming Language :: Python :: 3
|
|
9
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
10
|
+
Classifier: Operating System :: OS Independent
|
|
11
|
+
Requires-Python: >=3.10
|
|
12
|
+
Description-Content-Type: text/markdown
|
|
13
|
+
License-File: LICENSE
|
|
14
|
+
Requires-Dist: markitdown
|
|
15
|
+
Requires-Dist: markdown_pdf==1.3.2
|
|
16
|
+
|
|
17
|
+
<div align="center">
|
|
18
|
+
<h1>PdfItDown</h1>
|
|
19
|
+
<h2>Convert Everything to PDF</h2>
|
|
20
|
+
</div>
|
|
21
|
+
<br>
|
|
22
|
+
<div align="center">
|
|
23
|
+
<img src="https://raw.githubusercontent.com/AstraBert/PdfItDown/main/logo.png" alt="PdfItDown Logo">
|
|
24
|
+
</div>
|
|
25
|
+
|
|
26
|
+
**PdfItDown** is a python package that relies on [`markitdown` by Microsoft](https://github.com/microsoft/markitdown/) and [`markdown_pdf`](https://github.com/vb64/markdown-pdf).
|
|
27
|
+
|
|
28
|
+
### Applicability
|
|
29
|
+
|
|
30
|
+
**PdfItDown** is applicable to the following file formats:
|
|
31
|
+
|
|
32
|
+
- PDF
|
|
33
|
+
- PowerPoint
|
|
34
|
+
- Word
|
|
35
|
+
- Excel
|
|
36
|
+
- HTML
|
|
37
|
+
- Text-based formats (CSV, JSON, XML)
|
|
38
|
+
- ZIP files (iterates over contents)
|
|
39
|
+
|
|
40
|
+
### How does it work?
|
|
41
|
+
|
|
42
|
+
**PdfItDown** works in a very simple way:
|
|
43
|
+
|
|
44
|
+
```mermaid
|
|
45
|
+
graph LR
|
|
46
|
+
2(Input File) --> 3[markitdown]
|
|
47
|
+
3[markitdown] --> 4[Markdown content]
|
|
48
|
+
4[Markdown content] --> 5[markdown-pdf]
|
|
49
|
+
5[markdown-pdf] --> 6(PDF file)
|
|
50
|
+
```
|
|
51
|
+
|
|
52
|
+
### Installation and Usage
|
|
53
|
+
|
|
54
|
+
To install **PdfItDown**, just run:
|
|
55
|
+
|
|
56
|
+
```bash
|
|
57
|
+
python3 -m pip install pdfitdown
|
|
58
|
+
```
|
|
59
|
+
|
|
60
|
+
And then you can simply use it inside your python scripts:
|
|
61
|
+
|
|
62
|
+
```python
|
|
63
|
+
from pdfitdown.pdfconversion import convert_to_pdf
|
|
64
|
+
|
|
65
|
+
output_pdf = convert_to_pdf(file_path = "BusinessGrowth.xlsx", output_path = "business_growth.pdf", title = "Business Growth")
|
|
66
|
+
```
|
|
67
|
+
|
|
68
|
+
In this example, you will find the output PDF under `business_growth.pdf`.
|
|
69
|
+
|
|
70
|
+
### Contributing
|
|
71
|
+
|
|
72
|
+
Contributions are always welcome!
|
|
73
|
+
|
|
74
|
+
Find contribution guidelines at [CONTRIBUTING.md](https://github.com/AstraBert/PdfItDown/tree/main/CONTRIBUTING.md)
|
|
75
|
+
|
|
76
|
+
### License and Funding
|
|
77
|
+
|
|
78
|
+
This project is open-source and is provided under an [MIT License](https://github.com/AstraBert/PdfItDown/tree/main/LICENSE).
|
|
79
|
+
|
|
80
|
+
If you found it useful, please consider [funding it](https://github.com/sponsors/AstraBert).
|
|
@@ -0,0 +1,64 @@
|
|
|
1
|
+
<div align="center">
|
|
2
|
+
<h1>PdfItDown</h1>
|
|
3
|
+
<h2>Convert Everything to PDF</h2>
|
|
4
|
+
</div>
|
|
5
|
+
<br>
|
|
6
|
+
<div align="center">
|
|
7
|
+
<img src="https://raw.githubusercontent.com/AstraBert/PdfItDown/main/logo.png" alt="PdfItDown Logo">
|
|
8
|
+
</div>
|
|
9
|
+
|
|
10
|
+
**PdfItDown** is a python package that relies on [`markitdown` by Microsoft](https://github.com/microsoft/markitdown/) and [`markdown_pdf`](https://github.com/vb64/markdown-pdf).
|
|
11
|
+
|
|
12
|
+
### Applicability
|
|
13
|
+
|
|
14
|
+
**PdfItDown** is applicable to the following file formats:
|
|
15
|
+
|
|
16
|
+
- PDF
|
|
17
|
+
- PowerPoint
|
|
18
|
+
- Word
|
|
19
|
+
- Excel
|
|
20
|
+
- HTML
|
|
21
|
+
- Text-based formats (CSV, JSON, XML)
|
|
22
|
+
- ZIP files (iterates over contents)
|
|
23
|
+
|
|
24
|
+
### How does it work?
|
|
25
|
+
|
|
26
|
+
**PdfItDown** works in a very simple way:
|
|
27
|
+
|
|
28
|
+
```mermaid
|
|
29
|
+
graph LR
|
|
30
|
+
2(Input File) --> 3[markitdown]
|
|
31
|
+
3[markitdown] --> 4[Markdown content]
|
|
32
|
+
4[Markdown content] --> 5[markdown-pdf]
|
|
33
|
+
5[markdown-pdf] --> 6(PDF file)
|
|
34
|
+
```
|
|
35
|
+
|
|
36
|
+
### Installation and Usage
|
|
37
|
+
|
|
38
|
+
To install **PdfItDown**, just run:
|
|
39
|
+
|
|
40
|
+
```bash
|
|
41
|
+
python3 -m pip install pdfitdown
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
And then you can simply use it inside your python scripts:
|
|
45
|
+
|
|
46
|
+
```python
|
|
47
|
+
from pdfitdown.pdfconversion import convert_to_pdf
|
|
48
|
+
|
|
49
|
+
output_pdf = convert_to_pdf(file_path = "BusinessGrowth.xlsx", output_path = "business_growth.pdf", title = "Business Growth")
|
|
50
|
+
```
|
|
51
|
+
|
|
52
|
+
In this example, you will find the output PDF under `business_growth.pdf`.
|
|
53
|
+
|
|
54
|
+
### Contributing
|
|
55
|
+
|
|
56
|
+
Contributions are always welcome!
|
|
57
|
+
|
|
58
|
+
Find contribution guidelines at [CONTRIBUTING.md](https://github.com/AstraBert/PdfItDown/tree/main/CONTRIBUTING.md)
|
|
59
|
+
|
|
60
|
+
### License and Funding
|
|
61
|
+
|
|
62
|
+
This project is open-source and is provided under an [MIT License](https://github.com/AstraBert/PdfItDown/tree/main/LICENSE).
|
|
63
|
+
|
|
64
|
+
If you found it useful, please consider [funding it](https://github.com/sponsors/AstraBert).
|
|
@@ -0,0 +1,33 @@
|
|
|
1
|
+
[build-system]
|
|
2
|
+
requires = ["setuptools>=61.0"]
|
|
3
|
+
build-backend = "setuptools.build_meta"
|
|
4
|
+
|
|
5
|
+
[project]
|
|
6
|
+
name = "pdfitdown"
|
|
7
|
+
version = "0.0.0"
|
|
8
|
+
authors = [
|
|
9
|
+
{ name="Clelia (Astra) Bertelli", email="astraberte9@gmail.com" },
|
|
10
|
+
]
|
|
11
|
+
description = "PdfItDown - Convert Everything to PDF"
|
|
12
|
+
readme = "README.md"
|
|
13
|
+
requires-python = ">=3.10"
|
|
14
|
+
classifiers = [
|
|
15
|
+
"Programming Language :: Python :: 3",
|
|
16
|
+
"License :: OSI Approved :: MIT License",
|
|
17
|
+
"Operating System :: OS Independent",
|
|
18
|
+
]
|
|
19
|
+
dependencies = [
|
|
20
|
+
'markitdown',
|
|
21
|
+
'markdown_pdf == 1.3.2',
|
|
22
|
+
]
|
|
23
|
+
|
|
24
|
+
[project.urls]
|
|
25
|
+
Homepage = "https://github.com/AstraBert/PdfItDown"
|
|
26
|
+
Issues = "https://github.com/AstraBert/PdfItDown/issues"
|
|
27
|
+
|
|
28
|
+
[tool.setuptools.packages.find]
|
|
29
|
+
where = ["src"]
|
|
30
|
+
include = ["pdfitdown*"]
|
|
31
|
+
|
|
32
|
+
[options.package_data]
|
|
33
|
+
pdfitdown = ["*"]
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
from .pdfconversion import convert_to_pdf
|
|
@@ -0,0 +1,43 @@
|
|
|
1
|
+
# Import required libraries
|
|
2
|
+
from markitdown import MarkItDown # Library for conversion to markdown
|
|
3
|
+
from markdown_pdf import MarkdownPdf, Section # Library for PDF generation
|
|
4
|
+
|
|
5
|
+
def convert_to_pdf(
|
|
6
|
+
file_path: str, # Path to input file
|
|
7
|
+
output_path: str, # Desired path for output PDF
|
|
8
|
+
title: str = "PDF Title" # Optional title for the PDF, defaults to "PDF Title"
|
|
9
|
+
):
|
|
10
|
+
"""
|
|
11
|
+
Converts a .pdf/.pptx/.docx/.csv/.json/.xml/.html/.zip file to PDF format.
|
|
12
|
+
|
|
13
|
+
Args:
|
|
14
|
+
file_path: Path to the source .pdf/.pptx/.docx/.csv/.json/.xml/.html/.zip file
|
|
15
|
+
output_path: Where to save the resulting PDF
|
|
16
|
+
title: Title to be set in PDF metadata
|
|
17
|
+
|
|
18
|
+
Returns:
|
|
19
|
+
str: Path to the generated PDF file
|
|
20
|
+
"""
|
|
21
|
+
# Initialize markdown converter
|
|
22
|
+
md = MarkItDown()
|
|
23
|
+
|
|
24
|
+
# Convert file to markdown
|
|
25
|
+
result = md.convert(file_path)
|
|
26
|
+
|
|
27
|
+
# Extract the text content from the conversion result
|
|
28
|
+
finstr = result.text_content
|
|
29
|
+
|
|
30
|
+
# Create new PDF document with no table of contents (toc_level=0)
|
|
31
|
+
pdf = MarkdownPdf(toc_level=0)
|
|
32
|
+
|
|
33
|
+
# Add the converted markdown content as a section in the PDF
|
|
34
|
+
pdf.add_section(Section(finstr))
|
|
35
|
+
|
|
36
|
+
# Set the PDF document's title in its metadata
|
|
37
|
+
pdf.meta["title"] = title
|
|
38
|
+
|
|
39
|
+
# Save the PDF to the specified output path
|
|
40
|
+
pdf.save(output_path)
|
|
41
|
+
|
|
42
|
+
# Return the path where the PDF was saved
|
|
43
|
+
return output_path
|
|
@@ -0,0 +1,80 @@
|
|
|
1
|
+
Metadata-Version: 2.1
|
|
2
|
+
Name: pdfitdown
|
|
3
|
+
Version: 0.0.0
|
|
4
|
+
Summary: PdfItDown - Convert Everything to PDF
|
|
5
|
+
Author-email: "Clelia (Astra) Bertelli" <astraberte9@gmail.com>
|
|
6
|
+
Project-URL: Homepage, https://github.com/AstraBert/PdfItDown
|
|
7
|
+
Project-URL: Issues, https://github.com/AstraBert/PdfItDown/issues
|
|
8
|
+
Classifier: Programming Language :: Python :: 3
|
|
9
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
10
|
+
Classifier: Operating System :: OS Independent
|
|
11
|
+
Requires-Python: >=3.10
|
|
12
|
+
Description-Content-Type: text/markdown
|
|
13
|
+
License-File: LICENSE
|
|
14
|
+
Requires-Dist: markitdown
|
|
15
|
+
Requires-Dist: markdown_pdf==1.3.2
|
|
16
|
+
|
|
17
|
+
<div align="center">
|
|
18
|
+
<h1>PdfItDown</h1>
|
|
19
|
+
<h2>Convert Everything to PDF</h2>
|
|
20
|
+
</div>
|
|
21
|
+
<br>
|
|
22
|
+
<div align="center">
|
|
23
|
+
<img src="https://raw.githubusercontent.com/AstraBert/PdfItDown/main/logo.png" alt="PdfItDown Logo">
|
|
24
|
+
</div>
|
|
25
|
+
|
|
26
|
+
**PdfItDown** is a python package that relies on [`markitdown` by Microsoft](https://github.com/microsoft/markitdown/) and [`markdown_pdf`](https://github.com/vb64/markdown-pdf).
|
|
27
|
+
|
|
28
|
+
### Applicability
|
|
29
|
+
|
|
30
|
+
**PdfItDown** is applicable to the following file formats:
|
|
31
|
+
|
|
32
|
+
- PDF
|
|
33
|
+
- PowerPoint
|
|
34
|
+
- Word
|
|
35
|
+
- Excel
|
|
36
|
+
- HTML
|
|
37
|
+
- Text-based formats (CSV, JSON, XML)
|
|
38
|
+
- ZIP files (iterates over contents)
|
|
39
|
+
|
|
40
|
+
### How does it work?
|
|
41
|
+
|
|
42
|
+
**PdfItDown** works in a very simple way:
|
|
43
|
+
|
|
44
|
+
```mermaid
|
|
45
|
+
graph LR
|
|
46
|
+
2(Input File) --> 3[markitdown]
|
|
47
|
+
3[markitdown] --> 4[Markdown content]
|
|
48
|
+
4[Markdown content] --> 5[markdown-pdf]
|
|
49
|
+
5[markdown-pdf] --> 6(PDF file)
|
|
50
|
+
```
|
|
51
|
+
|
|
52
|
+
### Installation and Usage
|
|
53
|
+
|
|
54
|
+
To install **PdfItDown**, just run:
|
|
55
|
+
|
|
56
|
+
```bash
|
|
57
|
+
python3 -m pip install pdfitdown
|
|
58
|
+
```
|
|
59
|
+
|
|
60
|
+
And then you can simply use it inside your python scripts:
|
|
61
|
+
|
|
62
|
+
```python
|
|
63
|
+
from pdfitdown.pdfconversion import convert_to_pdf
|
|
64
|
+
|
|
65
|
+
output_pdf = convert_to_pdf(file_path = "BusinessGrowth.xlsx", output_path = "business_growth.pdf", title = "Business Growth")
|
|
66
|
+
```
|
|
67
|
+
|
|
68
|
+
In this example, you will find the output PDF under `business_growth.pdf`.
|
|
69
|
+
|
|
70
|
+
### Contributing
|
|
71
|
+
|
|
72
|
+
Contributions are always welcome!
|
|
73
|
+
|
|
74
|
+
Find contribution guidelines at [CONTRIBUTING.md](https://github.com/AstraBert/PdfItDown/tree/main/CONTRIBUTING.md)
|
|
75
|
+
|
|
76
|
+
### License and Funding
|
|
77
|
+
|
|
78
|
+
This project is open-source and is provided under an [MIT License](https://github.com/AstraBert/PdfItDown/tree/main/LICENSE).
|
|
79
|
+
|
|
80
|
+
If you found it useful, please consider [funding it](https://github.com/sponsors/AstraBert).
|
|
@@ -0,0 +1,10 @@
|
|
|
1
|
+
LICENSE
|
|
2
|
+
README.md
|
|
3
|
+
pyproject.toml
|
|
4
|
+
src/pdfitdown/__init__.py
|
|
5
|
+
src/pdfitdown/pdfconversion.py
|
|
6
|
+
src/pdfitdown.egg-info/PKG-INFO
|
|
7
|
+
src/pdfitdown.egg-info/SOURCES.txt
|
|
8
|
+
src/pdfitdown.egg-info/dependency_links.txt
|
|
9
|
+
src/pdfitdown.egg-info/requires.txt
|
|
10
|
+
src/pdfitdown.egg-info/top_level.txt
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
pdfitdown
|