pdf_paradise 0.3.20
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/README.md +933 -0
- data/bin/automatic_pdf_title +7 -0
- data/bin/burst_this_pdf_file +7 -0
- data/bin/combine_these_pdf_pages +7 -0
- data/bin/compress_via_hexapdf +7 -0
- data/bin/convert_markdown_to_pdf +7 -0
- data/bin/convert_pdf_to_text +7 -0
- data/bin/delete_first_page_of_this_pdf_file +7 -0
- data/bin/djvu_to_pdf +7 -0
- data/bin/merge_then_open +7 -0
- data/bin/n_pages +10 -0
- data/bin/open_main_pdf +7 -0
- data/bin/pdf_paradise +9 -0
- data/bin/rotate_pdf +7 -0
- data/bin/set_main_book +7 -0
- data/bin/set_title_of_this_pdf_file +15 -0
- data/doc/README.gen +871 -0
- data/doc/todo/todo.md +13 -0
- data/images/Logo_for_the_pdf_paradise_project.avif +0 -0
- data/lib/pdf_paradise/base/base.rb +344 -0
- data/lib/pdf_paradise/base/colours.rb +67 -0
- data/lib/pdf_paradise/colours/colours.rb +27 -0
- data/lib/pdf_paradise/commandline/commandline.rb +109 -0
- data/lib/pdf_paradise/commandline/help.rb +77 -0
- data/lib/pdf_paradise/commandline/menu.rb +173 -0
- data/lib/pdf_paradise/compress/compress_this_pdf_file.rb +108 -0
- data/lib/pdf_paradise/compress/compress_via_hexapdf.rb +27 -0
- data/lib/pdf_paradise/compress/compress_via_qpdf.rb +32 -0
- data/lib/pdf_paradise/constants/constants.rb +76 -0
- data/lib/pdf_paradise/convert_text_to_pdf.rb +94 -0
- data/lib/pdf_paradise/css/project.css +17 -0
- data/lib/pdf_paradise/fpdf/README.md +2 -0
- data/lib/pdf_paradise/fpdf/bookmark.rb +129 -0
- data/lib/pdf_paradise/fpdf/chinese.rb +454 -0
- data/lib/pdf_paradise/fpdf/fpdf.rb +1902 -0
- data/lib/pdf_paradise/fpdf/fpdf_eps.rb +138 -0
- data/lib/pdf_paradise/fpdf/makefont.rb +1794 -0
- data/lib/pdf_paradise/gui/README.md +6 -0
- data/lib/pdf_paradise/gui/fox/split_pdf_file.rb +77 -0
- data/lib/pdf_paradise/gui/gtk2/pdf_viewer/pdf_viewer.rb +34 -0
- data/lib/pdf_paradise/gui/gtk2/split_pdf_file/split_pdf_file.rb +34 -0
- data/lib/pdf_paradise/gui/gtk2/statistics_widget/statistics_widget.rb +34 -0
- data/lib/pdf_paradise/gui/gtk3/controller/controller.rb +214 -0
- data/lib/pdf_paradise/gui/gtk3/pdf_viewer/pdf_viewer.rb +34 -0
- data/lib/pdf_paradise/gui/gtk3/split_pdf_file/split_pdf_file.rb +34 -0
- data/lib/pdf_paradise/gui/jruby/delete_the_first_or_the_last_page_of_this_pdf_file/delete_the_first_or_the_last_page_of_this_pdf_file.rb +167 -0
- data/lib/pdf_paradise/gui/jruby/remove_the_first_page_of_this_pdf_file/remove_the_first_page_of_this_pdf_file.rb +103 -0
- data/lib/pdf_paradise/gui/libui/extract_all_images_from_this_pdf_file/extract_all_images_from_this_pdf_file.rb +223 -0
- data/lib/pdf_paradise/gui/libui/remove_the_first_page_of_this_pdf_file/remove_the_first_page_of_this_pdf_file.rb +267 -0
- data/lib/pdf_paradise/gui/libui/rotate_pdf_file/rotate_pdf_file.rb +219 -0
- data/lib/pdf_paradise/gui/libui/statistics_widget/statistics_widget.rb +233 -0
- data/lib/pdf_paradise/gui/shared_code/pdf_viewer/pdf_viewer.css +5 -0
- data/lib/pdf_paradise/gui/shared_code/pdf_viewer/pdf_viewer_module.rb +287 -0
- data/lib/pdf_paradise/gui/shared_code/remove_the_first_page_of_this_pdf_file_module/remove_the_first_page_of_this_pdf_file_module.rb +31 -0
- data/lib/pdf_paradise/gui/shared_code/split_pdf_file/split_pdf_file_module.rb +295 -0
- data/lib/pdf_paradise/gui/universal_widgets/convert_pdf_to_text/convert_pdf_to_text.rb +366 -0
- data/lib/pdf_paradise/gui/universal_widgets/delete_the_first_or_the_last_page_of_this_pdf_file/delete_the_first_or_the_last_page_of_this_pdf_file.rb +776 -0
- data/lib/pdf_paradise/gui/universal_widgets/statistics_widget/statistics_widget.rb +407 -0
- data/lib/pdf_paradise/gui/universal_widgets/to_pdf/to_pdf.rb +351 -0
- data/lib/pdf_paradise/hexapdf/001_rainbow_pattern_example.rb +0 -0
- data/lib/pdf_paradise/hexapdf/hexapdf.rb +123 -0
- data/lib/pdf_paradise/images/PDF_PARADISE_LOGO.png +0 -0
- data/lib/pdf_paradise/main_pdf/main_pdf.rb +474 -0
- data/lib/pdf_paradise/merge_pdf/menu.rb +63 -0
- data/lib/pdf_paradise/merge_pdf/merge_pdf.rb +307 -0
- data/lib/pdf_paradise/merge_pdf_namespace.rb +9 -0
- data/lib/pdf_paradise/merge_then_open/merge_then_open.rb +105 -0
- data/lib/pdf_paradise/prawn_addons/README.md +2 -0
- data/lib/pdf_paradise/prawn_addons/prawn_addons.rb +17 -0
- data/lib/pdf_paradise/project/project.rb +22 -0
- data/lib/pdf_paradise/remove_pdf_password.rb +391 -0
- data/lib/pdf_paradise/requires/batch_require_toplevel_files.rb +22 -0
- data/lib/pdf_paradise/requires/colours.rb +11 -0
- data/lib/pdf_paradise/requires/colours_and_esystem_and_save_file_and_fileutils_and_opn.rb +13 -0
- data/lib/pdf_paradise/requires/esystem_and_colours.rb +11 -0
- data/lib/pdf_paradise/requires/esystem_and_opn_and_colours.rb +10 -0
- data/lib/pdf_paradise/requires/require_the_whole_project.rb +30 -0
- data/lib/pdf_paradise/requires/require_utility_scripts.rb +9 -0
- data/lib/pdf_paradise/set_main_book.rb +156 -0
- data/lib/pdf_paradise/set_pdf_title.rb +220 -0
- data/lib/pdf_paradise/sinatra/embeddable_interface.rb +389 -0
- data/lib/pdf_paradise/toplevel_methods/convert_epub_to_pdf.rb +27 -0
- data/lib/pdf_paradise/toplevel_methods/convert_markdown_to_pdf.rb +45 -0
- data/lib/pdf_paradise/toplevel_methods/convert_ppt_to_pdf.rb +35 -0
- data/lib/pdf_paradise/toplevel_methods/e.rb +16 -0
- data/lib/pdf_paradise/toplevel_methods/esystem.rb +20 -0
- data/lib/pdf_paradise/toplevel_methods/misc.rb +228 -0
- data/lib/pdf_paradise/toplevel_methods/number_pages.rb +38 -0
- data/lib/pdf_paradise/toplevel_methods/opened_pdf_files.rb +221 -0
- data/lib/pdf_paradise/toplevel_methods/query_pdf_title.rb +201 -0
- data/lib/pdf_paradise/toplevel_methods/reduce_size_of_this_pdf_file.rb +46 -0
- data/lib/pdf_paradise/toplevel_methods/roebe.rb +17 -0
- data/lib/pdf_paradise/toplevel_methods/to_pdf.rb +12 -0
- data/lib/pdf_paradise/utility_scripts/README.md +3 -0
- data/lib/pdf_paradise/utility_scripts/automatic_pdf_title.rb +104 -0
- data/lib/pdf_paradise/utility_scripts/check_syntax_of_pdf_files.rb +106 -0
- data/lib/pdf_paradise/utility_scripts/combine_these_pdf_pages.rb +118 -0
- data/lib/pdf_paradise/utility_scripts/convert_pdf_to_text.rb +179 -0
- data/lib/pdf_paradise/utility_scripts/delete_last_page_of_this_pdf_file.rb +180 -0
- data/lib/pdf_paradise/utility_scripts/delete_the_first_page_of_this_pdf_file/delete_the_first_page_of_this_pdf_file.rb +429 -0
- data/lib/pdf_paradise/utility_scripts/delete_this_page_of_this_pdf_file.rb +356 -0
- data/lib/pdf_paradise/utility_scripts/djvu_to_pdf.rb +87 -0
- data/lib/pdf_paradise/utility_scripts/extract_all_images_from_this_pdf_file.rb +129 -0
- data/lib/pdf_paradise/utility_scripts/extract_pdf_page.rb +283 -0
- data/lib/pdf_paradise/utility_scripts/pdf_file_n_total_pages.rb +348 -0
- data/lib/pdf_paradise/utility_scripts/pdf_optimizer.rb +111 -0
- data/lib/pdf_paradise/utility_scripts/pdf_statistics.rb +148 -0
- data/lib/pdf_paradise/utility_scripts/pdf_to_html.rb +75 -0
- data/lib/pdf_paradise/utility_scripts/remove_images.rb +110 -0
- data/lib/pdf_paradise/utility_scripts/rotate_pdf_file.rb +303 -0
- data/lib/pdf_paradise/utility_scripts/split_pdf.rb +364 -0
- data/lib/pdf_paradise/utility_scripts/to_pdf.rb +130 -0
- data/lib/pdf_paradise/utility_scripts/to_qdf.rb +66 -0
- data/lib/pdf_paradise/version/version.rb +19 -0
- data/lib/pdf_paradise/www/README.md +2 -0
- data/lib/pdf_paradise/www/sinatra/app.rb +304 -0
- data/lib/pdf_paradise/yaml/working_on_these_pdf_files.yml +4 -0
- data/lib/pdf_paradise.rb +5 -0
- data/pdf_paradise.gemspec +61 -0
- data/test/fpdf/001_minimal_example.rb +12 -0
- data/test/fpdf/002.pdf +0 -0
- data/test/fpdf/002_header_and_footer_example.rb +64 -0
- data/test/fpdf/003.pdf +98 -0
- data/test/fpdf/003_justified_paragraphs.rb +96 -0
- data/test/fpdf/file1.md +3 -0
- data/test/fpdf/file2.md +3 -0
- data/test/fpdf/test.pdf +0 -0
- data/test/testing_pdf_paradise.rb +12 -0
- metadata +239 -0
data/README.md
ADDED
|
@@ -0,0 +1,933 @@
|
|
|
1
|
+
[](https://www.gobolinux.org/)
|
|
2
|
+
[](https://www.ruby-lang.org/en/)
|
|
3
|
+
[](https://badge.fury.io/rb/pdf_paradise)
|
|
4
|
+
|
|
5
|
+
This gem was <b>last updated</b> on the <span style="color: darkblue; font-weight: bold">04.02.2024</span> (dd.mm.yyyy notation), at <span style="color: steelblue; font-weight: bold">04:38:42</span> o'clock.
|
|
6
|
+
|
|
7
|
+
<img src="https://i.imgur.com/unhKNEw.png" style="margin-left: 2em">
|
|
8
|
+
<img src="https://i.imgur.com/xMrkTJM.png" style="margin-left: 1em">
|
|
9
|
+
|
|
10
|
+
This project can help with pdf-related activities, such as extracting
|
|
11
|
+
a .pdf page, converting .pdf page, merging .pdf files, splitting
|
|
12
|
+
.pdf files, setting the title of a .pdf page and similar actions.
|
|
13
|
+
|
|
14
|
+
The project has to remain quite flexible. We may use external programs
|
|
15
|
+
such as **ghoscript** or **qpdf**, or we may use pure ruby solutions,
|
|
16
|
+
such as via the gem called <b>combine_pdf</b>, <b>prawn</b> or
|
|
17
|
+
<b>hexapdf</b>.
|
|
18
|
+
|
|
19
|
+
The file here (README.gen, respectively the generated file called
|
|
20
|
+
**README.md**), will describe some of the components that make up
|
|
21
|
+
this gem.
|
|
22
|
+
|
|
23
|
+
## Rationale for making use of separate pdf-related projects
|
|
24
|
+
|
|
25
|
+
There are many different pdf-related toolkits available if you look
|
|
26
|
+
for them on the <b>www</b>.
|
|
27
|
+
|
|
28
|
+
For example, we have <b>prawn</b>, we have <b>qpdf</b>, we have
|
|
29
|
+
<b>calibre</b>, we have <b>hexapdf</b>, we have
|
|
30
|
+
<b>ghostscript</b>, and many more applications.
|
|
31
|
+
|
|
32
|
+
Some of these projects have unique features; and some of them have
|
|
33
|
+
overlapping functionality, such as reading the content of .pdf files
|
|
34
|
+
in a simplified manner (number of pages, title, author and so
|
|
35
|
+
forth).
|
|
36
|
+
|
|
37
|
+
The PdfParadise project attempts to support as many different
|
|
38
|
+
(open-source) projects as possible. From the point of view of
|
|
39
|
+
the PdfParadise project, it is also permissive to <b>support
|
|
40
|
+
closed source projects</b>, provided that **the code remains
|
|
41
|
+
simple** (and simple to change), for instance via ruby's
|
|
42
|
+
system / `` callbacks for commandline binaries.
|
|
43
|
+
|
|
44
|
+
The primary focus for the pdf_paradise gem is on open-source
|
|
45
|
+
projects, though, so closed-source is a secondary objective,
|
|
46
|
+
at best.
|
|
47
|
+
|
|
48
|
+
Why does the PdfParadise project attempt to support many different
|
|
49
|
+
pdf-related projects?
|
|
50
|
+
|
|
51
|
+
The answer to this question is rather simple: on Linux I have a <b>lot</b>
|
|
52
|
+
of flexibility and can use literally any pdf-related project just fine.
|
|
53
|
+
On Windows, however had, I am more restricted in what I can use. Not
|
|
54
|
+
all programs are available on windows or can be easily compiled or
|
|
55
|
+
be installed there. Thus, in order to allow the pdf_paradise .gem
|
|
56
|
+
to work on windows, we need a certain level of flexibility.
|
|
57
|
+
|
|
58
|
+
The reason why I added this subsection here in <b>June 2021</b> was that
|
|
59
|
+
I am slowly changing the sinatra-related part of the PdfParadise
|
|
60
|
+
project, in order to embed the functionality into my main controller
|
|
61
|
+
which is handled by the **Roebe** namespace. In that controller
|
|
62
|
+
I wanted to easily <b>offer pdf-related functionality</b>
|
|
63
|
+
"out of the box" when I start the sinatra-application on windows.
|
|
64
|
+
Because I want to be able to offer pdf-related modifications on
|
|
65
|
+
windows as well, the PdfParadise project had to become more
|
|
66
|
+
flexible, so that a simple toplevel route, such as **/pdf**, will
|
|
67
|
+
work properly, and lead to entry points (subroutes) that allow
|
|
68
|
+
me to tap into the features offered by the PdfParadise project.
|
|
69
|
+
That way I can then, for instance, easily display the number of
|
|
70
|
+
pages in a .pdf file on windows as well.
|
|
71
|
+
|
|
72
|
+
So, the primary **summary** here is this: <b>the PdfParadise project must
|
|
73
|
+
remain flexible in order to support a proper workflow on windows
|
|
74
|
+
systems as well</b>. (We could use WSL on windows, but not every
|
|
75
|
+
computer has this available, so I am targeting "vanilla" windows
|
|
76
|
+
really.)
|
|
77
|
+
|
|
78
|
+
Note that one slight drawback is that the sinatra part of
|
|
79
|
+
the PdfParadise project now has <b>a dependency</b> on the
|
|
80
|
+
**cyberweb** project, so if you want to use that, you also
|
|
81
|
+
have to install the cyberweb gem. This is a trade-off - for me
|
|
82
|
+
the more important part is long-term maintainability of
|
|
83
|
+
the pdf_paradise project in the long run, so a unified
|
|
84
|
+
code base had to be used in this regard.
|
|
85
|
+
|
|
86
|
+
## Converting a .pdf file to text
|
|
87
|
+
|
|
88
|
+
Sometimes you may wish to have a text-file describing the content of
|
|
89
|
+
a .pdf file, rather than the .pdf file itself.
|
|
90
|
+
|
|
91
|
+
Via class **PdfParadise::ConvertPdfToText**, residing in the file at
|
|
92
|
+
**pdf_paradise/convert_pdf_to_text.rb**, you can convert a .pdf file
|
|
93
|
+
to a text file.
|
|
94
|
+
|
|
95
|
+
Usage example from ruby, for the file called **foobar.pdf**:
|
|
96
|
+
|
|
97
|
+
PdfParadise::ConvertPdfToText.new(ARGV)
|
|
98
|
+
PdfParadise::ConvertPdfToText.new('foobar.pdf')
|
|
99
|
+
|
|
100
|
+
You can also use the bin/ file from the commandline:
|
|
101
|
+
|
|
102
|
+
convert_pdf_to_text
|
|
103
|
+
convert_pdf_to_text foobar.pdf
|
|
104
|
+
|
|
105
|
+
There is also a ruby-gtk3 widget that offers the functionality
|
|
106
|
+
from class **PdfParadise::ConvertPdfToText**, if the user
|
|
107
|
+
has gtk3 installed and the ruby-bindings to it as well.
|
|
108
|
+
|
|
109
|
+
You can start that ruby-gtk3 widget via:
|
|
110
|
+
|
|
111
|
+
convert_pdf_to_text --gui
|
|
112
|
+
|
|
113
|
+
## Storing the .pdf pages that are currently open
|
|
114
|
+
|
|
115
|
+
If you need to store the .pdf files that are currently open, you
|
|
116
|
+
can use the following commandline to do so:
|
|
117
|
+
|
|
118
|
+
pdfparadise --store-open-pdf-files
|
|
119
|
+
|
|
120
|
+
This will attempt to store the full path to the .pdf files
|
|
121
|
+
into a local file. That way you may also be able to batch-open
|
|
122
|
+
these .pdf files at a later time, e. g. when you switch your
|
|
123
|
+
window manager or after a reboot.
|
|
124
|
+
|
|
125
|
+
Since as of October 2022 I am not using this as much anymore as
|
|
126
|
+
before, because the <b>roebe</b> gem has a class called Books
|
|
127
|
+
(at <b>roebe/classes/books.rb</b>) that handles .pdf files for
|
|
128
|
+
me. I use that class as if I am reading different "books" - each
|
|
129
|
+
individual .pdf file is then a "book".
|
|
130
|
+
|
|
131
|
+
## Converting markdown .md files to .pdf files
|
|
132
|
+
|
|
133
|
+
If you use kramdown, prawn and kramdown-pdf-converter, then you
|
|
134
|
+
can convert .md files on the commandline, via:
|
|
135
|
+
|
|
136
|
+
convert_markdown_to_pdf path_to_pdf_file_goes_here.pdf
|
|
137
|
+
|
|
138
|
+
Install the necessary gems prior to using this commandline
|
|
139
|
+
functionality.
|
|
140
|
+
|
|
141
|
+
## Querying the title of a .pdf file
|
|
142
|
+
|
|
143
|
+
<b>class PdfParadise::QueryPdfTitle</b> will report the title of
|
|
144
|
+
any .pdf file that is passed into it, on the commandline.
|
|
145
|
+
|
|
146
|
+
This currently depends on <b>exiftool</b> but at a later time,
|
|
147
|
+
this may change to also allow a query via prawn or other tools.
|
|
148
|
+
|
|
149
|
+
If you need to determine whether a given .pdf file has a title
|
|
150
|
+
or whether it does not, you can use
|
|
151
|
+
<b>PdfParadise.does_this_pdf_file_have_a_title?</b>, such
|
|
152
|
+
as in:
|
|
153
|
+
|
|
154
|
+
PdfParadise.does_this_pdf_file_have_a_title? "foobar.pdf" # => true
|
|
155
|
+
|
|
156
|
+
This method will return **true** if the .pdf file at hand has a
|
|
157
|
+
title; and **false** otherwise.
|
|
158
|
+
|
|
159
|
+
## Determining how many pages a given .pdf file has
|
|
160
|
+
|
|
161
|
+
class **PdfParadise::PdfFileNTotalPages** can be used to query
|
|
162
|
+
how many pages a given .pdf file has.
|
|
163
|
+
|
|
164
|
+
The executable called **bin/n_pages** (thus, **n_pages**) can
|
|
165
|
+
be used to query this, on the commandline.
|
|
166
|
+
|
|
167
|
+
Example:
|
|
168
|
+
|
|
169
|
+
n_pages foobar.pdf
|
|
170
|
+
|
|
171
|
+
Do note that the class requires the external program
|
|
172
|
+
called **pdfinfo**.
|
|
173
|
+
|
|
174
|
+
It is possible to query the number of pages in a given .pdf
|
|
175
|
+
file without **pdfinfo**, but some .pdf files are a bit buggy,
|
|
176
|
+
and **pdfinfo** is simply more reliable than the regex that
|
|
177
|
+
was used until March 2020. So, past March 2020, the program
|
|
178
|
+
**pdfinfo** is now used by default. Note that pdfinfo is
|
|
179
|
+
part of the poppler software suite.
|
|
180
|
+
|
|
181
|
+
You can also use the following toplevel API for this:
|
|
182
|
+
|
|
183
|
+
PdfParadise.n_pages? 'THE_PATH_TO_THE_PDF_FILE_GOES_IN_HERE.pdf'
|
|
184
|
+
PdfParadise.n_pages? 'foobar.pdf'
|
|
185
|
+
|
|
186
|
+
## Adding page numbers to .pdf files
|
|
187
|
+
|
|
188
|
+
Via the combine_pdf gem it is now possible to add page numbers
|
|
189
|
+
to .pdf files. This has a few limitations for complex .pdf files,
|
|
190
|
+
due to combine_pdf having limitations in turn - but for simple
|
|
191
|
+
.pdf files this should work really well.
|
|
192
|
+
|
|
193
|
+
How to use that functionality?
|
|
194
|
+
|
|
195
|
+
Consider using the following toplevel API:
|
|
196
|
+
|
|
197
|
+
PdfParadise.number_pages('this_file.pdf')
|
|
198
|
+
|
|
199
|
+
The file called **this_file.pdf** has to exist in order for
|
|
200
|
+
this to work, of course.
|
|
201
|
+
|
|
202
|
+
The current default is to display the page numbers on the bottom
|
|
203
|
+
right side. This is hardcoded, but you could modify the code
|
|
204
|
+
to adapt to your needs; see also how combine_pdf does this.
|
|
205
|
+
(You have to pass an option-hash.)
|
|
206
|
+
|
|
207
|
+
## Various GUI component of the PdfParadise project
|
|
208
|
+
|
|
209
|
+
The **PdfParadise project** comes with some ruby-gtk3 specific
|
|
210
|
+
GUIs, but a few ruby-gtk2 and ruby-tk bindings may exist
|
|
211
|
+
as well. The **ruby-gtk3** components constitute the main GUI
|
|
212
|
+
elements of this project, though.
|
|
213
|
+
|
|
214
|
+
You can start, from the commandline, the gtk-wrapper
|
|
215
|
+
over the **split_pdf_file** functionality.
|
|
216
|
+
|
|
217
|
+
In order to do this, do either one of the following:
|
|
218
|
+
|
|
219
|
+
pdf_paradise --gui
|
|
220
|
+
pdf_paradise --gtk
|
|
221
|
+
|
|
222
|
+
This will require the **gtk_paradise** project and the gtk
|
|
223
|
+
bindings, so quite a lot. **gem install gtk3** and
|
|
224
|
+
**gem install gtk_paradise** should help.
|
|
225
|
+
|
|
226
|
+
The GUI for class SplitPdfFile is called **PdfParadise::Gtk::SplitPdfFile**.
|
|
227
|
+
The idea behind it is to allow you to determine some of the parameters
|
|
228
|
+
in a graphical fashion.
|
|
229
|
+
|
|
230
|
+
Since as of **September 2019**, there is also a mini-widget for quickly
|
|
231
|
+
removing the first page of a .pdf file. This is really minimal right
|
|
232
|
+
now and not very elegant; it may be improved in the future, but for
|
|
233
|
+
the time being it is what it is. It is more a proof-of-concept that
|
|
234
|
+
it can work.
|
|
235
|
+
|
|
236
|
+
You can start this via:
|
|
237
|
+
|
|
238
|
+
require 'pdf_paradise/gui/gtk2/remove_first_page_of_pdf_file.rb'
|
|
239
|
+
|
|
240
|
+
PdfParadise.start_gtk_gui_remove_first_page_of_pdf_file
|
|
241
|
+
|
|
242
|
+
Note that as of **January 2021** the gtk bindings will default to
|
|
243
|
+
**ruby-gtk3**. Support for ruby-gtk2 will be retained, though,
|
|
244
|
+
but new code may not necessarily be written for ruby-gtk2 in
|
|
245
|
+
mind. ruby-gtk3 is now the main GUI target for this project.
|
|
246
|
+
|
|
247
|
+
I am slowly porting the individual widgets.
|
|
248
|
+
|
|
249
|
+
The following widgets have been ported so far:
|
|
250
|
+
|
|
251
|
+
PdfParadise::GUI::Gtk::StatisticsWidget # can be found under pdf_paradise/gui/gtk3/statistics_widget/statistics_widget.rb
|
|
252
|
+
|
|
253
|
+
## Specification of the .pdf format
|
|
254
|
+
|
|
255
|
+
This subsection is a stub - I only needed it to gather information
|
|
256
|
+
about the .pdf specification. This is NOT complete - it only shall
|
|
257
|
+
contain some useful information and snippets about the .pdf
|
|
258
|
+
specification.
|
|
259
|
+
|
|
260
|
+
PDF stands short for **Portable Document Format**.
|
|
261
|
+
|
|
262
|
+
PDF has been standardized as **ISO-32000** in the year **2008**.
|
|
263
|
+
|
|
264
|
+
In the pdf-specification we can distinguish these entities:
|
|
265
|
+
|
|
266
|
+
Objects: these are not objects in the OOP sense, but simply the
|
|
267
|
+
basic data type of the PDF standard. There are 9 types of objects:
|
|
268
|
+
null, boolean, integer, real, name, string, array, dictionary and
|
|
269
|
+
stream.
|
|
270
|
+
|
|
271
|
+
Dictionary: this is a key-value pair that is unordered. They are
|
|
272
|
+
denoted by << and >> at the beginning and the end.
|
|
273
|
+
|
|
274
|
+
Indirect Objects: these are objects that are referred to by
|
|
275
|
+
reference.
|
|
276
|
+
|
|
277
|
+
Direct Objects: these are objects that appear inline and are
|
|
278
|
+
obtained directly.
|
|
279
|
+
|
|
280
|
+
Conforming Reader: is ann application that parses a PDF
|
|
281
|
+
file according to the PDF Standard.
|
|
282
|
+
|
|
283
|
+
A .pdf file is made up of a specific structure, usually a four-part
|
|
284
|
+
layout.
|
|
285
|
+
|
|
286
|
+
These four parts are:
|
|
287
|
+
|
|
288
|
+
Header
|
|
289
|
+
Body
|
|
290
|
+
Cross-reference table
|
|
291
|
+
Trailer
|
|
292
|
+
|
|
293
|
+
### The .pdf Header tag
|
|
294
|
+
|
|
295
|
+
The header may begin with an entry such as **%PDF-1.7**.
|
|
296
|
+
|
|
297
|
+
The general format for the header is:
|
|
298
|
+
|
|
299
|
+
%PDF- followed by the version number in the form of 1.N.
|
|
300
|
+
|
|
301
|
+
This is not valid for all .pdf files, though. Past PDF Version
|
|
302
|
+
1.4, the **Version** entry in the document's catalog dictionary,
|
|
303
|
+
which is within the **Root** entry of the **Trailer**, may be
|
|
304
|
+
used instead of the Header - **if present**.
|
|
305
|
+
|
|
306
|
+
If a .pdf file contains binary data - which most PDF files
|
|
307
|
+
will do nowadays, such as **stream objects** - then the
|
|
308
|
+
**Header** line shall be immediately followed by a line
|
|
309
|
+
containing at the least **four binary characters**. These
|
|
310
|
+
are character codes of 128 or greater.
|
|
311
|
+
|
|
312
|
+
### The .pdf Body tag
|
|
313
|
+
|
|
314
|
+
The body of a PDF File consist of these aforementioned **Indirect
|
|
315
|
+
Objects** representing the contents of a document.
|
|
316
|
+
|
|
317
|
+
**Indirect Objects** begin with a **unique object identifier**
|
|
318
|
+
that allows other objects to refer to them.
|
|
319
|
+
|
|
320
|
+
That identifier is made up of the following two components:
|
|
321
|
+
|
|
322
|
+
(1) Object Number: a positive Integer, can be in any arbitrary order
|
|
323
|
+
(2) Generation Number: a non-negative Integer)
|
|
324
|
+
|
|
325
|
+
The **Indirect Objects** can be referred to from elsewhere by an
|
|
326
|
+
Indirect Reference. This must consist of:
|
|
327
|
+
|
|
328
|
+
Object Number
|
|
329
|
+
Generation Number, and
|
|
330
|
+
keyword R # for instance: 4 0 R
|
|
331
|
+
|
|
332
|
+
After the identifier is the keyword **obj** (start of the object)
|
|
333
|
+
and **endobj** (end of the object). Anything in between that is
|
|
334
|
+
is a key-value pair that describes the object.
|
|
335
|
+
|
|
336
|
+
A a simple example showing the use of **Indirect Objects** will be
|
|
337
|
+
shown next:
|
|
338
|
+
|
|
339
|
+
1 0 obj % Object Number 1, Generation Number 0
|
|
340
|
+
<<
|
|
341
|
+
/Type /Pages % Describe type of object
|
|
342
|
+
/Kids [ 4 0 R ] $ Kids Entry referring to an indirect reference (Object number 4, Generation number 0)
|
|
343
|
+
/Count 1
|
|
344
|
+
>>
|
|
345
|
+
endobj
|
|
346
|
+
|
|
347
|
+
2 0 obj % Object Number 2, Generation Number 0
|
|
348
|
+
<<
|
|
349
|
+
/Type /Catalog % Describe type of object
|
|
350
|
+
/Pages 1 0 R % Referring another object via unique object identifier
|
|
351
|
+
>>
|
|
352
|
+
endobj
|
|
353
|
+
|
|
354
|
+
The **Body** section of a .pdf file is thus a tree of objects that
|
|
355
|
+
are linked together, ultimately coming down to the Root Object
|
|
356
|
+
(Defined by the **Root** entry in the **Trailer** section, as a
|
|
357
|
+
catalog dictionary).
|
|
358
|
+
|
|
359
|
+
The **Cross-Reference Table** is a table that contains a list of byte
|
|
360
|
+
offset pointing to the indirect objects.
|
|
361
|
+
|
|
362
|
+
A pdf-conforming reader uses the Cross-Reference Table as a lookup
|
|
363
|
+
table to access certain objects quickly when needed.
|
|
364
|
+
|
|
365
|
+
The format for entries in Cross-Reference Table can be summarized ass
|
|
366
|
+
follows:
|
|
367
|
+
|
|
368
|
+
- In the following format nnnnnnnnnn ggggg n eol, a total of 20 bytes
|
|
369
|
+
- nnnnnnnnnn is a 10-digit byte offset in the decoded stream
|
|
370
|
+
- ggggg 5-digit generation number
|
|
371
|
+
- n keyword for in-use entry or f keyword for free entry
|
|
372
|
+
- eol 2 character end-of-line sequence (Like CR LF)
|
|
373
|
+
|
|
374
|
+
The **Cross-Reference Table** always begins with the special entry
|
|
375
|
+
**0000000000 65535** - see the following example:
|
|
376
|
+
|
|
377
|
+
0000000000 65535 f % special entry, f denoting it is a free entry
|
|
378
|
+
|
|
379
|
+
## Graphical User Interfaces (GUIs)
|
|
380
|
+
|
|
381
|
+
The pdf_paradise gem comes with a few, small-ish widgets, primarily
|
|
382
|
+
written in ruby-gtk. Since as of August 2021 I am also experimenting
|
|
383
|
+
with libui but this is a slow process - stay tuned for more updates
|
|
384
|
+
in the coming months in this regard.
|
|
385
|
+
|
|
386
|
+
One big advantage of libui is that it works on windows out-of-the-box,
|
|
387
|
+
so we can use GUIs on windows as well. \o/
|
|
388
|
+
|
|
389
|
+
## Storing all open .pdf files in a yaml file
|
|
390
|
+
|
|
391
|
+
In **February 2022* the yaml file working_on_these_pdf_files.yml
|
|
392
|
+
was added at:
|
|
393
|
+
|
|
394
|
+
pdf_paradise/yaml/working_on_these_pdf_files.yml
|
|
395
|
+
|
|
396
|
+
The idea here is that this yaml-file retains the local path
|
|
397
|
+
to any .pdf file that the user (in this case me) is working
|
|
398
|
+
on, aka reading right now.
|
|
399
|
+
|
|
400
|
+
I needed this because I tend to work through .pdf files and
|
|
401
|
+
remove page after page when I read it. The idea is that
|
|
402
|
+
I do not lose that information when I reboot my computer
|
|
403
|
+
or when said computer crashes; I needed to make this
|
|
404
|
+
persistent information.
|
|
405
|
+
|
|
406
|
+
Why is this yaml file part of the pdf_paradise gem, though?
|
|
407
|
+
This is mostly due to convenience. I wanted to have this
|
|
408
|
+
available in one of my ruby gems by default. In the long
|
|
409
|
+
run I will add code that allows other users to adjust
|
|
410
|
+
this to their own use case (and perhaps in their home
|
|
411
|
+
directory rather than store this in the gem itself). As
|
|
412
|
+
of February 2022 code for the latter is currently not
|
|
413
|
+
part of the gem, but I may add code for this - either
|
|
414
|
+
in the **pdf_paradise** gem or the **roebe** gem.
|
|
415
|
+
|
|
416
|
+
## Splitting a single pdf file into individual several .pdf files
|
|
417
|
+
|
|
418
|
+
You can use the following toplevel API to split up a single
|
|
419
|
+
.pdf file into several .pdf files:
|
|
420
|
+
|
|
421
|
+
PdfParadise.burst(ARGV)
|
|
422
|
+
PdfParadise.burst('foobar.pdf')
|
|
423
|
+
|
|
424
|
+
A commandline variant exists as well, at <b>bin/burst_this_pdf_file</b>,
|
|
425
|
+
tapping into the code stored in the file
|
|
426
|
+
<b>pdf_paradise/utility_scripts/split_pdf.rb</b>.
|
|
427
|
+
|
|
428
|
+
Usage example for the commandline variant:
|
|
429
|
+
|
|
430
|
+
burst_this_pdf_file foobar.pdf
|
|
431
|
+
|
|
432
|
+
(Make sure this bin file can be found in <b>$PATH</b>.)
|
|
433
|
+
|
|
434
|
+
Be careful when using this script: it will dump the generated
|
|
435
|
+
individual .pdf files into the current working directory, so
|
|
436
|
+
you may want to create a subdirectory before invoking this
|
|
437
|
+
executable, and move your target .pdf into that file. While
|
|
438
|
+
functionality could be added to automatically create a
|
|
439
|
+
subdirectory and relocate the generated .pdf files into
|
|
440
|
+
that subdirectory, for now we'll keep it simple here and
|
|
441
|
+
just extract the individual .pdf pages into the current
|
|
442
|
+
working directory.
|
|
443
|
+
|
|
444
|
+
Note that <b>hexapdf</b> can also be used for this functionality.
|
|
445
|
+
In February 2023 it became the default; the old variant
|
|
446
|
+
via imagemagick's convert is retained in the file
|
|
447
|
+
<b>pdf_paradise/utility_scripts/split_pdf.rb</b> though.
|
|
448
|
+
|
|
449
|
+
## Merging pdf files
|
|
450
|
+
|
|
451
|
+
<b>class PdfParadise::MergePdf.new(ARGV)</b> can be used for
|
|
452
|
+
<b>merging .pdf files</b>. This functionality depends on
|
|
453
|
+
external software, so you have to install this first.
|
|
454
|
+
|
|
455
|
+
Currently <b>ghostscript</b> and <b>hexapdf</b> can be used for
|
|
456
|
+
the <b>merging</b> step.
|
|
457
|
+
|
|
458
|
+
Examples for how to use either of these two variants, as
|
|
459
|
+
far as <b>class PdfParadise::MergePdf</b> is concerned,
|
|
460
|
+
follows next:
|
|
461
|
+
|
|
462
|
+
mergepdf one.pdf two.pdf --use-ghostscript
|
|
463
|
+
mergepdf one.pdf two.pdf --use-hexapdf
|
|
464
|
+
mergepdf *.avif --use-hexapdf
|
|
465
|
+
mergepdf SCAN1.avif SCAN2.avif SCAN3.avif SCAN4.avif SCAN5.avif --use-hexapdf
|
|
466
|
+
mergepdf SCAN1.avif SCAN2.avif SCAN3.avif SCAN4.avif SCAN5.avif --use-ghostscript
|
|
467
|
+
mergepdf output-page1.pdf output-page2.pdf output-page3.pdf output-page4.pdf output-page5.pdf --use-ghostscript
|
|
468
|
+
mergepdf SCAN1_GUTACHTEN.pdf SCAN2_GUTACHTEN.pdf SCAN3_GUTACHTEN.pdf SCAN4_GUTACHTEN.pdf SCAN5_GUTACHTEN.pdf --use-ghostscript
|
|
469
|
+
mergepdf SCAN1_GUTACHTEN.pdf SCAN2_GUTACHTEN.pdf SCAN3_GUTACHTEN.pdf SCAN4_GUTACHTEN.pdf SCAN5_GUTACHTEN.pdf --hexapdf
|
|
470
|
+
|
|
471
|
+
(The two -- hyphen are mandatory for commandline arguments right now; otherwise
|
|
472
|
+
it is assumed to be a locally existing .pdf file.)
|
|
473
|
+
|
|
474
|
+
If you need to do this from within ruby code, consider
|
|
475
|
+
using the following code:
|
|
476
|
+
|
|
477
|
+
require 'pdf_paradise'
|
|
478
|
+
merge_pdf = PdfParadise::MergePdf.new('one.pdf two.pdf')
|
|
479
|
+
merge_pdf.feedback_where_it_is_stored # Call it manually.
|
|
480
|
+
|
|
481
|
+
require 'pdf_paradise'
|
|
482
|
+
merge_pdf = PdfParadise::MergePdf.new('one.pdf two.pdf')
|
|
483
|
+
merge_pdf.feedback_where_it_is_stored # Call it manually.
|
|
484
|
+
|
|
485
|
+
## Combining individual pages from .pdf files into a new .pdf file via class PdfParadise::CombineThesePdfPages
|
|
486
|
+
|
|
487
|
+
class **PdfParadise::CombineThesePdfPages** can be used to
|
|
488
|
+
extract individual pdf pages from a given .pdf file and
|
|
489
|
+
combine these into a new .pdf file.
|
|
490
|
+
|
|
491
|
+
There is also an executable at **bin/combine_these_pdf_pages**
|
|
492
|
+
which can be used on the commandline.
|
|
493
|
+
|
|
494
|
+
This functionality depends on the **hexapdf** gem.
|
|
495
|
+
|
|
496
|
+
Usage example:
|
|
497
|
+
|
|
498
|
+
combine_these_pdf_pages foobar.pdf 3,4,5
|
|
499
|
+
|
|
500
|
+
This would retain the pages at 3, 4 and 5 and create a new
|
|
501
|
+
.pdf file.
|
|
502
|
+
|
|
503
|
+
## Extracting all images from a .pdf file
|
|
504
|
+
|
|
505
|
+
If you make use of <b>poppler</b> then you can extract
|
|
506
|
+
all images from a given .pdf file.
|
|
507
|
+
|
|
508
|
+
A small libui-GUI was added for this functionality - this
|
|
509
|
+
is mostly for quick demo purposes. It does not work extremely
|
|
510
|
+
well.
|
|
511
|
+
|
|
512
|
+
On IceWM it looks like this right now:
|
|
513
|
+
|
|
514
|
+
<img src="https://i.imgur.com/QXelVyy.png" style="margin:1em">
|
|
515
|
+
|
|
516
|
+
Not pretty, but it took only about 20 minutes to write this.
|
|
517
|
+
|
|
518
|
+
<b>pdfimages</b> from poppler must be installed. On Windows
|
|
519
|
+
you can probably download an executable for poppler here:
|
|
520
|
+
|
|
521
|
+
https://blog.alivate.com.au/poppler-windows/
|
|
522
|
+
|
|
523
|
+
I tested whether the above executables work on windows, and
|
|
524
|
+
indeed, they still work fine. I also tested the libui
|
|
525
|
+
variant on windows, and it works. The code is a bit
|
|
526
|
+
brittle, so use with care, but I was able to use it
|
|
527
|
+
successfully on <b>August 2022</b> to extract all images
|
|
528
|
+
from a given .pdf file. At a later time I may add am
|
|
529
|
+
to-image converter via libui, probably in the other
|
|
530
|
+
gem called image_paradise. Stay tuned in this regard.
|
|
531
|
+
|
|
532
|
+
To start the libui wrapper from the commandline, you can
|
|
533
|
+
use the following:
|
|
534
|
+
|
|
535
|
+
/usr/bin/pdf_paradise --libui
|
|
536
|
+
bin/pdf_paradise --libui
|
|
537
|
+
pdf_paradise --libui # This variant should work, or try the other
|
|
538
|
+
# variants; it is stored in bin/pdf_paradise
|
|
539
|
+
# of this gem
|
|
540
|
+
|
|
541
|
+
## Numbering the pages in a given .pdf file automatically
|
|
542
|
+
|
|
543
|
+
If you use the external gem called <b>combine_pdf</b> then you can
|
|
544
|
+
make use of automatic numbering via the pdf_paradise gem.
|
|
545
|
+
|
|
546
|
+
The API for this is:
|
|
547
|
+
|
|
548
|
+
PdfParadise.number_this_pdf_file('foobar.pdf')
|
|
549
|
+
|
|
550
|
+
It is not a very flexible API as of right now. Perhaps at a later
|
|
551
|
+
point in time it may be extended.
|
|
552
|
+
|
|
553
|
+
## class PdfParadise::ToPdf
|
|
554
|
+
|
|
555
|
+
class <b>PdfParadise::ToPdf</b> can be used for two main
|
|
556
|
+
activities right now:
|
|
557
|
+
|
|
558
|
+
(1) You can convert .docx to .pdf files on the commandline, if you
|
|
559
|
+
have libreoffice installed.
|
|
560
|
+
|
|
561
|
+
(2) If you pass in a directory, then all image files of that
|
|
562
|
+
directory will be gathered, converted into a .pdf file, and
|
|
563
|
+
then the .pdf file will be assembled.
|
|
564
|
+
|
|
565
|
+
## The sinatra interface of the pdf_paradise gem
|
|
566
|
+
|
|
567
|
+
Since as of April 2019 there is a minimal sinatra interface to the
|
|
568
|
+
PdfParadise project. Consider this incomplete <b>work-in-progress</b>.
|
|
569
|
+
|
|
570
|
+
To start it, try:
|
|
571
|
+
|
|
572
|
+
pdf_paradise --sinatra
|
|
573
|
+
|
|
574
|
+
Since as of <b>July 2023</b> this now makes use of class <b>Cyberweb::HtmlTemplate</b>.
|
|
575
|
+
This is the generic class I use for generating HTML files (or rather, the
|
|
576
|
+
String that describes the .html file in question).
|
|
577
|
+
|
|
578
|
+
## Flipping / Rotating a .pdf file
|
|
579
|
+
|
|
580
|
+
This subsection will try to explain how a .pdf file can be flipped / rotated,
|
|
581
|
+
and how this may relate to the <b>pdf_paradise</b> gem here.
|
|
582
|
+
|
|
583
|
+
There are many ways how to do so. Let's start with an example via <b>qpdf</b>.
|
|
584
|
+
|
|
585
|
+
To rotate clockwise, 90°, use:
|
|
586
|
+
|
|
587
|
+
qpdf --rotate=+90 foo.pdf bar.pdf
|
|
588
|
+
|
|
589
|
+
This will generate a flipped .pdf file, rotated 90°, and call it <b>bar.pdf</b>.
|
|
590
|
+
|
|
591
|
+
The pdf_paradise gem has a class that is tasked with rotating .pdf files.
|
|
592
|
+
|
|
593
|
+
See:
|
|
594
|
+
|
|
595
|
+
require 'pdf_paradise/utility_scripts/rotate_pdf_file.rb'
|
|
596
|
+
PdfParadise::RotatePdfFile.new(ARGV)
|
|
597
|
+
|
|
598
|
+
To set the rotation you can invoke the method called <b>.set_rotate()</b>.
|
|
599
|
+
|
|
600
|
+
There is also a bin/ commandline executable for this, called
|
|
601
|
+
<b>rotate_pdf</b>.
|
|
602
|
+
|
|
603
|
+
There is also a little GUI wrapper around that functionality available, as
|
|
604
|
+
part of the pdf_paradise project.
|
|
605
|
+
|
|
606
|
+
See:
|
|
607
|
+
|
|
608
|
+
PdfParadise::GUI::LibUI::RotatePdfFile.new
|
|
609
|
+
|
|
610
|
+
## Deleting the last or the first page of a .pdf file
|
|
611
|
+
|
|
612
|
+
You can use **class DeleteLastPageOfThisPdfFile**, more
|
|
613
|
+
accurately called **class PdfParadise::DeleteLastPageOfThisPdfFile**,
|
|
614
|
+
to ***delete the last page in a .pdf file***.
|
|
615
|
+
|
|
616
|
+
In ruby code, you can invoke this like so:
|
|
617
|
+
|
|
618
|
+
require 'pdf_paradise'
|
|
619
|
+
|
|
620
|
+
PdfParadise::DeleteLastPageOfThisPdfFile.new('path_to_the_pdf_file/goes_in_here.pdf')
|
|
621
|
+
|
|
622
|
+
or shorter:
|
|
623
|
+
|
|
624
|
+
require 'pdf_paradise'
|
|
625
|
+
|
|
626
|
+
PdfParadise.delete_last_page_of_this_pdf_file('foobar.pdf')
|
|
627
|
+
|
|
628
|
+
A very similar API exists for deleting the first page of a given .pdf
|
|
629
|
+
file, too.
|
|
630
|
+
|
|
631
|
+
See:
|
|
632
|
+
|
|
633
|
+
In ruby code, you can invoke this like so:
|
|
634
|
+
|
|
635
|
+
require 'pdf_paradise'
|
|
636
|
+
|
|
637
|
+
PdfParadise::DeleteTheFirstPageOfThisPdfFile.new('path_to_the_pdf_file/goes_in_here.pdf')
|
|
638
|
+
|
|
639
|
+
or shorter:
|
|
640
|
+
|
|
641
|
+
require 'pdf_paradise'
|
|
642
|
+
|
|
643
|
+
PdfParadise.delete_the_first_page_of_this_pdf_file('foobar.pdf')
|
|
644
|
+
PdfParadise.delete_first_page_of_this_pdf_file('foobar.pdf') # Both variants work.
|
|
645
|
+
|
|
646
|
+
Note that a small libui-wrapper exists for this functionality,
|
|
647
|
+
under the gui/ subdirectory of this gem. It may look like
|
|
648
|
+
this:
|
|
649
|
+
|
|
650
|
+
<img src="https://i.imgur.com/tjN0WwS.png" style="margin: 1em">
|
|
651
|
+
|
|
652
|
+
An older ruby-gtk3 variant also exists:
|
|
653
|
+
|
|
654
|
+
<img src="https://i.imgur.com/PdMwxeP.png" style="margin: 1em">
|
|
655
|
+
|
|
656
|
+
However had, in October 2023 I found this layout confusing, and
|
|
657
|
+
since I was also on a journey to write as many jruby-swing
|
|
658
|
+
GUIs as possible, I rewrote the old ruby-gtk3 code, to then
|
|
659
|
+
be used as a basis for the jruby code at a later time.
|
|
660
|
+
|
|
661
|
+
The rewrite did not change much, but the new layout makes
|
|
662
|
+
more logical sense, I think - at the least compared to the
|
|
663
|
+
prior variant:
|
|
664
|
+
|
|
665
|
+
<img src="https://i.imgur.com/8obFWhb.png" style="margin: 1em">
|
|
666
|
+
|
|
667
|
+
In October 2023 the old class
|
|
668
|
+
DeleteFirstPageOfThisPdfFile was rewritten and
|
|
669
|
+
renamed, into DeleteTheFirstPageOfThisPdfFile. The code
|
|
670
|
+
was improved, in particular when working on windows - that was
|
|
671
|
+
one use case I had, that it had to work on the windows
|
|
672
|
+
platform as well.
|
|
673
|
+
|
|
674
|
+
## Commandline usage
|
|
675
|
+
|
|
676
|
+
You can use the **pdf_paradise** gem from the commandline,
|
|
677
|
+
as other examples on this homepage shows.
|
|
678
|
+
|
|
679
|
+
For instance, say that you wish to modify **the title of a .pdf file**,
|
|
680
|
+
you can use a commandline invocation such as via this way:
|
|
681
|
+
|
|
682
|
+
pdf_paradise --use-this-pdf-file=location_to_your_pdf_file.pdf --set_title="The title you want to use goes in here."
|
|
683
|
+
|
|
684
|
+
You can also **shrink** a .pdf file, by using the commandline
|
|
685
|
+
switch <b>--shrink-pdf-size-of=foobar.pdf</b> or just
|
|
686
|
+
<b>--shrink</b>, such as:
|
|
687
|
+
|
|
688
|
+
pdf_paradise --shrink-pdf-size-of=foobar.pdf
|
|
689
|
+
pdf_paradise --shrink=foobar.pdf
|
|
690
|
+
|
|
691
|
+
The <b>shrink</b> functionality is contained in the module-method
|
|
692
|
+
<b>PdfParadise.reduce_size_of_this_pdf_file()</b>.
|
|
693
|
+
|
|
694
|
+
## Converting .jpg files to .pdf files
|
|
695
|
+
|
|
696
|
+
If you have a use case to convert several .jpg files into .pdf files
|
|
697
|
+
then the following commandline example should be helpful:
|
|
698
|
+
|
|
699
|
+
convert /path/to/image foobar.pdf
|
|
700
|
+
convert *.jpg foobar.pdf
|
|
701
|
+
|
|
702
|
+
Note that this requires <b>ImageMagick</b>. <b>ImageMagick</b> is not
|
|
703
|
+
always perfect; it has a few problems, unfortunately.
|
|
704
|
+
|
|
705
|
+
For instance, in <b>April 2022</b> when I tried the above,
|
|
706
|
+
the image was repeated three times on the x-axis. I do not
|
|
707
|
+
know why, but that makes **absolutely no sense**. It is just
|
|
708
|
+
a single image, so why is the resulting .pdf file repeated
|
|
709
|
+
three times? Perhaps imagemagick's **convert** tool does
|
|
710
|
+
this automatically, but then I question the default behaviour -
|
|
711
|
+
**it makes no sense** for the use case I have. One image
|
|
712
|
+
should be one image, not three images or fifty images.
|
|
713
|
+
|
|
714
|
+
In the event that **ImageMagick** does not work very well
|
|
715
|
+
for your use case, consider using another software suite,
|
|
716
|
+
such as **img2pdf**.
|
|
717
|
+
|
|
718
|
+
The syntax for **img2pdf** goes something like this:
|
|
719
|
+
|
|
720
|
+
img2pdf *.jpg -o document.pdf
|
|
721
|
+
img2pdf SCAN1.jpg SCAN2.jpg SCAN3.jpg SCAN4.jpg SCAN5.jpg -o document.pdf
|
|
722
|
+
|
|
723
|
+
I liked this, so in **April 2022** this was added to
|
|
724
|
+
**ImageParadise**. The API for this is as follows:
|
|
725
|
+
|
|
726
|
+
ImageParadise.img2pdf('*.jpg') # If a '*' is part of the input Dir[] will be used.
|
|
727
|
+
|
|
728
|
+
As that functionality may be useful on the commandline as well, an
|
|
729
|
+
executable has been added at **bin/imageparadise_img2pdf**. Simply
|
|
730
|
+
pass the image files that you want to convert.
|
|
731
|
+
|
|
732
|
+
Usage example:
|
|
733
|
+
|
|
734
|
+
imageparadise_img2pdf *jpg
|
|
735
|
+
|
|
736
|
+
If you need the images to be ordered or sorted then you may have to
|
|
737
|
+
do so when specifying the image file at hand specifically, e. g. the
|
|
738
|
+
path to it.
|
|
739
|
+
|
|
740
|
+
So for instance:
|
|
741
|
+
|
|
742
|
+
imageparadise_img2pdf image3.jpg image1.jpg image2.png
|
|
743
|
+
|
|
744
|
+
The only drawback I have found with <b>img2pdf</b> so far is
|
|
745
|
+
that you can not easily add text to an image. This makes it
|
|
746
|
+
hard to identify which image is named how. A work around for
|
|
747
|
+
this is to embed the filename into the image itself, e. g.
|
|
748
|
+
create temporary images, and then pack them together via
|
|
749
|
+
<b>img2pdf</b>.
|
|
750
|
+
|
|
751
|
+
Unfortunately in September 2023 I realiased that img2pdf sometimes
|
|
752
|
+
creates .pdf files that are flawed. So img2pdf may not always be
|
|
753
|
+
an optimal choice.
|
|
754
|
+
|
|
755
|
+
## Compressing a .pdf file (optizime the size of a .pdf file)
|
|
756
|
+
|
|
757
|
+
Sometimes you may want to reduce the filesize of a given .pdf file at hand,
|
|
758
|
+
such as when you need to <b>upload</b> a .pdf file, and there is some file
|
|
759
|
+
size limit in place, thus making it obligatory to reduce the .pdf file below
|
|
760
|
+
a certain threshold. This actually happened to me a few times when using
|
|
761
|
+
webmail-based email services, where an automatic notice was generated and
|
|
762
|
+
issued to me when the .pdf file was too large, such as <b>above 25MB in
|
|
763
|
+
size</b> or something similar.
|
|
764
|
+
|
|
765
|
+
So, let us now assume that you <b>do</b> have a use case such as described
|
|
766
|
+
above, or any other use case - you want to reduce the file size of a given
|
|
767
|
+
.pdf file at hand.
|
|
768
|
+
|
|
769
|
+
How can this be done?
|
|
770
|
+
|
|
771
|
+
Well, there are several ways of course.
|
|
772
|
+
|
|
773
|
+
One is to use online-based tools, which tend to work surprisingly well; I
|
|
774
|
+
verified this in February 2022. One example for this is this
|
|
775
|
+
website:
|
|
776
|
+
|
|
777
|
+
https://www.ilovepdf.com/compress_pdf
|
|
778
|
+
|
|
779
|
+
But, as far as the gem here is concerned, we will focus primarily on means
|
|
780
|
+
that can be <b>used by you, on your own</b>, without having to depend on
|
|
781
|
+
external websites.
|
|
782
|
+
|
|
783
|
+
Two methods will be described here - the first one requiring <b>ghostscript</b>,
|
|
784
|
+
the second one requiring <b>hexapdf</b>.
|
|
785
|
+
|
|
786
|
+
The important parameter in regards for <b>ghostscript</b> is the
|
|
787
|
+
<b>dPDFSETTINGS</b> parameter. This one will determine the compression
|
|
788
|
+
level, which ultimately will affect the quality of the compressed .pdf
|
|
789
|
+
file.
|
|
790
|
+
|
|
791
|
+
Available parameters to <b>dPDFSETTINGS</b> include <b>/screen</b>,
|
|
792
|
+
<b>/ebook</b>, <b>/printer</b>, <b>/prepress</b> and <b>/default</b>.
|
|
793
|
+
|
|
794
|
+
The options are as follows:
|
|
795
|
+
|
|
796
|
+
| -dPDFSETTINGS Option | Explanation |
|
|
797
|
+
| ------------------------- | ------------------------------------------------------------------------------- |
|
|
798
|
+
| -dPDFSETTINGS=/screen | Has a lower quality and smaller size. (72 dpi) |
|
|
799
|
+
| -dPDFSETTINGS=/ebook | Has a better quality, but has a slightly larger size (150 dpi) |
|
|
800
|
+
| -dPDFSETTINGS=/prepress | Output is of a higher size and quality (300 dpi) |
|
|
801
|
+
| -dPDFSETTINGS=/printer | Output is of a printer type quality (300 dpi) |
|
|
802
|
+
| -dPDFSETTINGS=/default | Selects the output which is useful for multiple purposes. Can cause large PDFS. |
|
|
803
|
+
|
|
804
|
+
In particular /screen is optimal here if you want to reduce the file size. You can
|
|
805
|
+
achieve, for instance, a compression from a .pdf file the size of 73 MB down to
|
|
806
|
+
14 MB - which is quite neat.
|
|
807
|
+
|
|
808
|
+
class <b>PdfParadise::CompressThisPdfFile</b> can be of help here. Simply
|
|
809
|
+
pass, as argument to .new(), the path of the local .pdf to that class.
|
|
810
|
+
|
|
811
|
+
This class resides at:
|
|
812
|
+
|
|
813
|
+
pdf_paradise/compress/compress_this_pdf_file.rb
|
|
814
|
+
|
|
815
|
+
Note that class <b>PdfParadise::CompressThisPdfFile</b> currently only uses
|
|
816
|
+
ghostscript, so we have to use the above commandline options, such
|
|
817
|
+
as <b>-dPDFSETTINGS</b>.
|
|
818
|
+
|
|
819
|
+
You can also use a <b>toplevel method</b> if you'd like to:
|
|
820
|
+
|
|
821
|
+
require 'pdf_paradise'
|
|
822
|
+
PdfParadise.compress_this_pdf_file
|
|
823
|
+
PdfParadise.compress_this_pdf_file('/foobar.pdf') # ← Pass the path to the .pdf file into this method.
|
|
824
|
+
|
|
825
|
+
The variant using <b>hexapdf</b> is called:
|
|
826
|
+
|
|
827
|
+
PdfParadise.compress_via_pdf
|
|
828
|
+
PdfParadise.compress_via_pdf('foobar.pdf')
|
|
829
|
+
|
|
830
|
+
The <b>API</b> names may change at a later point in time; perhaps we will just
|
|
831
|
+
add a toplevel API called <b>PdfParadise.compress()</b>, but for the time being
|
|
832
|
+
the above APIs will be retained as they are.
|
|
833
|
+
|
|
834
|
+
In February 2024 I noticed that qpdf can also be used to compress .pdf
|
|
835
|
+
files.
|
|
836
|
+
|
|
837
|
+
Commandline variants in this regard may look like this:
|
|
838
|
+
|
|
839
|
+
qpdf --compress-streams=y --object-streams=generate --recompress-flate --optimize-images input_file_here.pdf output_file_there.pdf
|
|
840
|
+
|
|
841
|
+
To use the above in pdf_paradise you can use:
|
|
842
|
+
|
|
843
|
+
PdfParadise.compress_via_qpdf
|
|
844
|
+
|
|
845
|
+
## Licence
|
|
846
|
+
|
|
847
|
+
In <b>January 2024</b>, the licence of this project was changed from GPL-2.0
|
|
848
|
+
towards <b>"MIT No Attribution"</b>. You can read up on this MIT licence
|
|
849
|
+
here:
|
|
850
|
+
|
|
851
|
+
https://spdx.org/licenses/MIT-0.html
|
|
852
|
+
|
|
853
|
+
The two most important parts are the "no warranty", as well as "use this software
|
|
854
|
+
how you want to", so it is a fairly liberal licence, with almost no restrictions.
|
|
855
|
+
|
|
856
|
+
I will also copy/paste the full licence here, for convenience to the reader:
|
|
857
|
+
|
|
858
|
+
Copyright 2024 Robert A. Heiler
|
|
859
|
+
|
|
860
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy of
|
|
861
|
+
this software and associated documentation files (the "Software"), to deal in
|
|
862
|
+
the Software without restriction, including without limitation the rights to
|
|
863
|
+
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
|
|
864
|
+
of the Software, and to permit persons to whom the Software is furnished to
|
|
865
|
+
do so.
|
|
866
|
+
|
|
867
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
868
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
869
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
870
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
871
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
|
|
872
|
+
FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
|
|
873
|
+
IN THE SOFTWARE.
|
|
874
|
+
|
|
875
|
+
|
|
876
|
+
## Contact information and mandatory 2FA (no longer) coming up in 2022 / 2023
|
|
877
|
+
|
|
878
|
+
If your creative mind has ideas and specific suggestions to make this gem
|
|
879
|
+
more useful in general, feel free to drop me an email at any time, via:
|
|
880
|
+
|
|
881
|
+
shevy@inbox.lt
|
|
882
|
+
|
|
883
|
+
Before that email I used an email account at Google gmail, but in **2021** I
|
|
884
|
+
decided to slowly abandon gmail, for various reasons. In order to limit the
|
|
885
|
+
explanation here, allow me to just briefly state that I do not feel as if I
|
|
886
|
+
want to promote any Google service anymore when the user becomes the end
|
|
887
|
+
product (such as via data collection by upstream services, including other
|
|
888
|
+
proxy-services). My feeling is that this is a hugely flawed business model
|
|
889
|
+
to begin with, and I no longer wish to support this in any way, even if
|
|
890
|
+
only indirectly so, such as by using services of companies that try to
|
|
891
|
+
promote this flawed model.
|
|
892
|
+
|
|
893
|
+
In regards to responding to emails: please keep in mind that responding
|
|
894
|
+
may take some time, depending on the amount of work I may have at that
|
|
895
|
+
moment. So it is not that emails are ignored; it is more that I have not
|
|
896
|
+
(yet) found the time to read and reply. This means there may be a delay
|
|
897
|
+
of days, weeks and in some instances also months. There is, unfortunately,
|
|
898
|
+
not much I can do when I need to prioritise my time investment, but I try
|
|
899
|
+
to consider <b>all</b> feedback as an opportunity to improve my projects
|
|
900
|
+
nonetheless.
|
|
901
|
+
|
|
902
|
+
In <b>2022</b> rubygems.org decided to make 2FA mandatory for every
|
|
903
|
+
gem owner eventually:
|
|
904
|
+
|
|
905
|
+
see
|
|
906
|
+
https://blog.rubygems.org/2022/06/13/making-packages-more-secure.html
|
|
907
|
+
|
|
908
|
+
Mandatory 2FA will eventually be extended to all rubygems.org developers and
|
|
909
|
+
maintainers. As I can not use 2FA, for reasons I will skip explaining here,
|
|
910
|
+
this means that my projects will eventually be removed, as I no longer
|
|
911
|
+
have any control over my projects hosted on rubygems.org (because I
|
|
912
|
+
can not use 2FA).
|
|
913
|
+
|
|
914
|
+
At that point, I no longer have any control what is done to my projects
|
|
915
|
+
since whoever is controlling the gems ecosystem took away our control
|
|
916
|
+
here. I am not sure at which point ruby became corporate-controlled -
|
|
917
|
+
that was not the case several years ago, so something has
|
|
918
|
+
changed.
|
|
919
|
+
|
|
920
|
+
Ruby also only allows 2FA users to participate on the issue tracker these
|
|
921
|
+
days:
|
|
922
|
+
|
|
923
|
+
https://bugs.ruby-lang.org/issues/18800
|
|
924
|
+
|
|
925
|
+
But this has been reverted some months ago, so it is no longer applicable.
|
|
926
|
+
Suffice to say that I do not think that we should only be allowed to
|
|
927
|
+
interact on the world wide web when some 'authority' authenticated us,
|
|
928
|
+
such as via mandatory 2FA, so I hope this won't come back again.
|
|
929
|
+
|
|
930
|
+
Fighting spam is a noble goal, but when it also means you lock out
|
|
931
|
+
real human people then this is definitely NOT a good situation
|
|
932
|
+
to be had.
|
|
933
|
+
|