pdf_paradise 0.3.20
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +7 -0
- data/README.md +933 -0
- data/bin/automatic_pdf_title +7 -0
- data/bin/burst_this_pdf_file +7 -0
- data/bin/combine_these_pdf_pages +7 -0
- data/bin/compress_via_hexapdf +7 -0
- data/bin/convert_markdown_to_pdf +7 -0
- data/bin/convert_pdf_to_text +7 -0
- data/bin/delete_first_page_of_this_pdf_file +7 -0
- data/bin/djvu_to_pdf +7 -0
- data/bin/merge_then_open +7 -0
- data/bin/n_pages +10 -0
- data/bin/open_main_pdf +7 -0
- data/bin/pdf_paradise +9 -0
- data/bin/rotate_pdf +7 -0
- data/bin/set_main_book +7 -0
- data/bin/set_title_of_this_pdf_file +15 -0
- data/doc/README.gen +871 -0
- data/doc/todo/todo.md +13 -0
- data/images/Logo_for_the_pdf_paradise_project.avif +0 -0
- data/lib/pdf_paradise/base/base.rb +344 -0
- data/lib/pdf_paradise/base/colours.rb +67 -0
- data/lib/pdf_paradise/colours/colours.rb +27 -0
- data/lib/pdf_paradise/commandline/commandline.rb +109 -0
- data/lib/pdf_paradise/commandline/help.rb +77 -0
- data/lib/pdf_paradise/commandline/menu.rb +173 -0
- data/lib/pdf_paradise/compress/compress_this_pdf_file.rb +108 -0
- data/lib/pdf_paradise/compress/compress_via_hexapdf.rb +27 -0
- data/lib/pdf_paradise/compress/compress_via_qpdf.rb +32 -0
- data/lib/pdf_paradise/constants/constants.rb +76 -0
- data/lib/pdf_paradise/convert_text_to_pdf.rb +94 -0
- data/lib/pdf_paradise/css/project.css +17 -0
- data/lib/pdf_paradise/fpdf/README.md +2 -0
- data/lib/pdf_paradise/fpdf/bookmark.rb +129 -0
- data/lib/pdf_paradise/fpdf/chinese.rb +454 -0
- data/lib/pdf_paradise/fpdf/fpdf.rb +1902 -0
- data/lib/pdf_paradise/fpdf/fpdf_eps.rb +138 -0
- data/lib/pdf_paradise/fpdf/makefont.rb +1794 -0
- data/lib/pdf_paradise/gui/README.md +6 -0
- data/lib/pdf_paradise/gui/fox/split_pdf_file.rb +77 -0
- data/lib/pdf_paradise/gui/gtk2/pdf_viewer/pdf_viewer.rb +34 -0
- data/lib/pdf_paradise/gui/gtk2/split_pdf_file/split_pdf_file.rb +34 -0
- data/lib/pdf_paradise/gui/gtk2/statistics_widget/statistics_widget.rb +34 -0
- data/lib/pdf_paradise/gui/gtk3/controller/controller.rb +214 -0
- data/lib/pdf_paradise/gui/gtk3/pdf_viewer/pdf_viewer.rb +34 -0
- data/lib/pdf_paradise/gui/gtk3/split_pdf_file/split_pdf_file.rb +34 -0
- data/lib/pdf_paradise/gui/jruby/delete_the_first_or_the_last_page_of_this_pdf_file/delete_the_first_or_the_last_page_of_this_pdf_file.rb +167 -0
- data/lib/pdf_paradise/gui/jruby/remove_the_first_page_of_this_pdf_file/remove_the_first_page_of_this_pdf_file.rb +103 -0
- data/lib/pdf_paradise/gui/libui/extract_all_images_from_this_pdf_file/extract_all_images_from_this_pdf_file.rb +223 -0
- data/lib/pdf_paradise/gui/libui/remove_the_first_page_of_this_pdf_file/remove_the_first_page_of_this_pdf_file.rb +267 -0
- data/lib/pdf_paradise/gui/libui/rotate_pdf_file/rotate_pdf_file.rb +219 -0
- data/lib/pdf_paradise/gui/libui/statistics_widget/statistics_widget.rb +233 -0
- data/lib/pdf_paradise/gui/shared_code/pdf_viewer/pdf_viewer.css +5 -0
- data/lib/pdf_paradise/gui/shared_code/pdf_viewer/pdf_viewer_module.rb +287 -0
- data/lib/pdf_paradise/gui/shared_code/remove_the_first_page_of_this_pdf_file_module/remove_the_first_page_of_this_pdf_file_module.rb +31 -0
- data/lib/pdf_paradise/gui/shared_code/split_pdf_file/split_pdf_file_module.rb +295 -0
- data/lib/pdf_paradise/gui/universal_widgets/convert_pdf_to_text/convert_pdf_to_text.rb +366 -0
- data/lib/pdf_paradise/gui/universal_widgets/delete_the_first_or_the_last_page_of_this_pdf_file/delete_the_first_or_the_last_page_of_this_pdf_file.rb +776 -0
- data/lib/pdf_paradise/gui/universal_widgets/statistics_widget/statistics_widget.rb +407 -0
- data/lib/pdf_paradise/gui/universal_widgets/to_pdf/to_pdf.rb +351 -0
- data/lib/pdf_paradise/hexapdf/001_rainbow_pattern_example.rb +0 -0
- data/lib/pdf_paradise/hexapdf/hexapdf.rb +123 -0
- data/lib/pdf_paradise/images/PDF_PARADISE_LOGO.png +0 -0
- data/lib/pdf_paradise/main_pdf/main_pdf.rb +474 -0
- data/lib/pdf_paradise/merge_pdf/menu.rb +63 -0
- data/lib/pdf_paradise/merge_pdf/merge_pdf.rb +307 -0
- data/lib/pdf_paradise/merge_pdf_namespace.rb +9 -0
- data/lib/pdf_paradise/merge_then_open/merge_then_open.rb +105 -0
- data/lib/pdf_paradise/prawn_addons/README.md +2 -0
- data/lib/pdf_paradise/prawn_addons/prawn_addons.rb +17 -0
- data/lib/pdf_paradise/project/project.rb +22 -0
- data/lib/pdf_paradise/remove_pdf_password.rb +391 -0
- data/lib/pdf_paradise/requires/batch_require_toplevel_files.rb +22 -0
- data/lib/pdf_paradise/requires/colours.rb +11 -0
- data/lib/pdf_paradise/requires/colours_and_esystem_and_save_file_and_fileutils_and_opn.rb +13 -0
- data/lib/pdf_paradise/requires/esystem_and_colours.rb +11 -0
- data/lib/pdf_paradise/requires/esystem_and_opn_and_colours.rb +10 -0
- data/lib/pdf_paradise/requires/require_the_whole_project.rb +30 -0
- data/lib/pdf_paradise/requires/require_utility_scripts.rb +9 -0
- data/lib/pdf_paradise/set_main_book.rb +156 -0
- data/lib/pdf_paradise/set_pdf_title.rb +220 -0
- data/lib/pdf_paradise/sinatra/embeddable_interface.rb +389 -0
- data/lib/pdf_paradise/toplevel_methods/convert_epub_to_pdf.rb +27 -0
- data/lib/pdf_paradise/toplevel_methods/convert_markdown_to_pdf.rb +45 -0
- data/lib/pdf_paradise/toplevel_methods/convert_ppt_to_pdf.rb +35 -0
- data/lib/pdf_paradise/toplevel_methods/e.rb +16 -0
- data/lib/pdf_paradise/toplevel_methods/esystem.rb +20 -0
- data/lib/pdf_paradise/toplevel_methods/misc.rb +228 -0
- data/lib/pdf_paradise/toplevel_methods/number_pages.rb +38 -0
- data/lib/pdf_paradise/toplevel_methods/opened_pdf_files.rb +221 -0
- data/lib/pdf_paradise/toplevel_methods/query_pdf_title.rb +201 -0
- data/lib/pdf_paradise/toplevel_methods/reduce_size_of_this_pdf_file.rb +46 -0
- data/lib/pdf_paradise/toplevel_methods/roebe.rb +17 -0
- data/lib/pdf_paradise/toplevel_methods/to_pdf.rb +12 -0
- data/lib/pdf_paradise/utility_scripts/README.md +3 -0
- data/lib/pdf_paradise/utility_scripts/automatic_pdf_title.rb +104 -0
- data/lib/pdf_paradise/utility_scripts/check_syntax_of_pdf_files.rb +106 -0
- data/lib/pdf_paradise/utility_scripts/combine_these_pdf_pages.rb +118 -0
- data/lib/pdf_paradise/utility_scripts/convert_pdf_to_text.rb +179 -0
- data/lib/pdf_paradise/utility_scripts/delete_last_page_of_this_pdf_file.rb +180 -0
- data/lib/pdf_paradise/utility_scripts/delete_the_first_page_of_this_pdf_file/delete_the_first_page_of_this_pdf_file.rb +429 -0
- data/lib/pdf_paradise/utility_scripts/delete_this_page_of_this_pdf_file.rb +356 -0
- data/lib/pdf_paradise/utility_scripts/djvu_to_pdf.rb +87 -0
- data/lib/pdf_paradise/utility_scripts/extract_all_images_from_this_pdf_file.rb +129 -0
- data/lib/pdf_paradise/utility_scripts/extract_pdf_page.rb +283 -0
- data/lib/pdf_paradise/utility_scripts/pdf_file_n_total_pages.rb +348 -0
- data/lib/pdf_paradise/utility_scripts/pdf_optimizer.rb +111 -0
- data/lib/pdf_paradise/utility_scripts/pdf_statistics.rb +148 -0
- data/lib/pdf_paradise/utility_scripts/pdf_to_html.rb +75 -0
- data/lib/pdf_paradise/utility_scripts/remove_images.rb +110 -0
- data/lib/pdf_paradise/utility_scripts/rotate_pdf_file.rb +303 -0
- data/lib/pdf_paradise/utility_scripts/split_pdf.rb +364 -0
- data/lib/pdf_paradise/utility_scripts/to_pdf.rb +130 -0
- data/lib/pdf_paradise/utility_scripts/to_qdf.rb +66 -0
- data/lib/pdf_paradise/version/version.rb +19 -0
- data/lib/pdf_paradise/www/README.md +2 -0
- data/lib/pdf_paradise/www/sinatra/app.rb +304 -0
- data/lib/pdf_paradise/yaml/working_on_these_pdf_files.yml +4 -0
- data/lib/pdf_paradise.rb +5 -0
- data/pdf_paradise.gemspec +61 -0
- data/test/fpdf/001_minimal_example.rb +12 -0
- data/test/fpdf/002.pdf +0 -0
- data/test/fpdf/002_header_and_footer_example.rb +64 -0
- data/test/fpdf/003.pdf +98 -0
- data/test/fpdf/003_justified_paragraphs.rb +96 -0
- data/test/fpdf/file1.md +3 -0
- data/test/fpdf/file2.md +3 -0
- data/test/fpdf/test.pdf +0 -0
- data/test/testing_pdf_paradise.rb +12 -0
- metadata +239 -0
data/README.md
ADDED
@@ -0,0 +1,933 @@
|
|
1
|
+
[![forthebadge](https://forthebadge.com/images/badges/built-with-love.svg)](https://www.gobolinux.org/)
|
2
|
+
[![forthebadge](https://forthebadge.com/images/badges/made-with-ruby.svg)](https://www.ruby-lang.org/en/)
|
3
|
+
[![Gem Version](https://badge.fury.io/rb/pdf_paradise.svg)](https://badge.fury.io/rb/pdf_paradise)
|
4
|
+
|
5
|
+
This gem was <b>last updated</b> on the <span style="color: darkblue; font-weight: bold">04.02.2024</span> (dd.mm.yyyy notation), at <span style="color: steelblue; font-weight: bold">04:38:42</span> o'clock.
|
6
|
+
|
7
|
+
<img src="https://i.imgur.com/unhKNEw.png" style="margin-left: 2em">
|
8
|
+
<img src="https://i.imgur.com/xMrkTJM.png" style="margin-left: 1em">
|
9
|
+
|
10
|
+
This project can help with pdf-related activities, such as extracting
|
11
|
+
a .pdf page, converting .pdf page, merging .pdf files, splitting
|
12
|
+
.pdf files, setting the title of a .pdf page and similar actions.
|
13
|
+
|
14
|
+
The project has to remain quite flexible. We may use external programs
|
15
|
+
such as **ghoscript** or **qpdf**, or we may use pure ruby solutions,
|
16
|
+
such as via the gem called <b>combine_pdf</b>, <b>prawn</b> or
|
17
|
+
<b>hexapdf</b>.
|
18
|
+
|
19
|
+
The file here (README.gen, respectively the generated file called
|
20
|
+
**README.md**), will describe some of the components that make up
|
21
|
+
this gem.
|
22
|
+
|
23
|
+
## Rationale for making use of separate pdf-related projects
|
24
|
+
|
25
|
+
There are many different pdf-related toolkits available if you look
|
26
|
+
for them on the <b>www</b>.
|
27
|
+
|
28
|
+
For example, we have <b>prawn</b>, we have <b>qpdf</b>, we have
|
29
|
+
<b>calibre</b>, we have <b>hexapdf</b>, we have
|
30
|
+
<b>ghostscript</b>, and many more applications.
|
31
|
+
|
32
|
+
Some of these projects have unique features; and some of them have
|
33
|
+
overlapping functionality, such as reading the content of .pdf files
|
34
|
+
in a simplified manner (number of pages, title, author and so
|
35
|
+
forth).
|
36
|
+
|
37
|
+
The PdfParadise project attempts to support as many different
|
38
|
+
(open-source) projects as possible. From the point of view of
|
39
|
+
the PdfParadise project, it is also permissive to <b>support
|
40
|
+
closed source projects</b>, provided that **the code remains
|
41
|
+
simple** (and simple to change), for instance via ruby's
|
42
|
+
system / `` callbacks for commandline binaries.
|
43
|
+
|
44
|
+
The primary focus for the pdf_paradise gem is on open-source
|
45
|
+
projects, though, so closed-source is a secondary objective,
|
46
|
+
at best.
|
47
|
+
|
48
|
+
Why does the PdfParadise project attempt to support many different
|
49
|
+
pdf-related projects?
|
50
|
+
|
51
|
+
The answer to this question is rather simple: on Linux I have a <b>lot</b>
|
52
|
+
of flexibility and can use literally any pdf-related project just fine.
|
53
|
+
On Windows, however had, I am more restricted in what I can use. Not
|
54
|
+
all programs are available on windows or can be easily compiled or
|
55
|
+
be installed there. Thus, in order to allow the pdf_paradise .gem
|
56
|
+
to work on windows, we need a certain level of flexibility.
|
57
|
+
|
58
|
+
The reason why I added this subsection here in <b>June 2021</b> was that
|
59
|
+
I am slowly changing the sinatra-related part of the PdfParadise
|
60
|
+
project, in order to embed the functionality into my main controller
|
61
|
+
which is handled by the **Roebe** namespace. In that controller
|
62
|
+
I wanted to easily <b>offer pdf-related functionality</b>
|
63
|
+
"out of the box" when I start the sinatra-application on windows.
|
64
|
+
Because I want to be able to offer pdf-related modifications on
|
65
|
+
windows as well, the PdfParadise project had to become more
|
66
|
+
flexible, so that a simple toplevel route, such as **/pdf**, will
|
67
|
+
work properly, and lead to entry points (subroutes) that allow
|
68
|
+
me to tap into the features offered by the PdfParadise project.
|
69
|
+
That way I can then, for instance, easily display the number of
|
70
|
+
pages in a .pdf file on windows as well.
|
71
|
+
|
72
|
+
So, the primary **summary** here is this: <b>the PdfParadise project must
|
73
|
+
remain flexible in order to support a proper workflow on windows
|
74
|
+
systems as well</b>. (We could use WSL on windows, but not every
|
75
|
+
computer has this available, so I am targeting "vanilla" windows
|
76
|
+
really.)
|
77
|
+
|
78
|
+
Note that one slight drawback is that the sinatra part of
|
79
|
+
the PdfParadise project now has <b>a dependency</b> on the
|
80
|
+
**cyberweb** project, so if you want to use that, you also
|
81
|
+
have to install the cyberweb gem. This is a trade-off - for me
|
82
|
+
the more important part is long-term maintainability of
|
83
|
+
the pdf_paradise project in the long run, so a unified
|
84
|
+
code base had to be used in this regard.
|
85
|
+
|
86
|
+
## Converting a .pdf file to text
|
87
|
+
|
88
|
+
Sometimes you may wish to have a text-file describing the content of
|
89
|
+
a .pdf file, rather than the .pdf file itself.
|
90
|
+
|
91
|
+
Via class **PdfParadise::ConvertPdfToText**, residing in the file at
|
92
|
+
**pdf_paradise/convert_pdf_to_text.rb**, you can convert a .pdf file
|
93
|
+
to a text file.
|
94
|
+
|
95
|
+
Usage example from ruby, for the file called **foobar.pdf**:
|
96
|
+
|
97
|
+
PdfParadise::ConvertPdfToText.new(ARGV)
|
98
|
+
PdfParadise::ConvertPdfToText.new('foobar.pdf')
|
99
|
+
|
100
|
+
You can also use the bin/ file from the commandline:
|
101
|
+
|
102
|
+
convert_pdf_to_text
|
103
|
+
convert_pdf_to_text foobar.pdf
|
104
|
+
|
105
|
+
There is also a ruby-gtk3 widget that offers the functionality
|
106
|
+
from class **PdfParadise::ConvertPdfToText**, if the user
|
107
|
+
has gtk3 installed and the ruby-bindings to it as well.
|
108
|
+
|
109
|
+
You can start that ruby-gtk3 widget via:
|
110
|
+
|
111
|
+
convert_pdf_to_text --gui
|
112
|
+
|
113
|
+
## Storing the .pdf pages that are currently open
|
114
|
+
|
115
|
+
If you need to store the .pdf files that are currently open, you
|
116
|
+
can use the following commandline to do so:
|
117
|
+
|
118
|
+
pdfparadise --store-open-pdf-files
|
119
|
+
|
120
|
+
This will attempt to store the full path to the .pdf files
|
121
|
+
into a local file. That way you may also be able to batch-open
|
122
|
+
these .pdf files at a later time, e. g. when you switch your
|
123
|
+
window manager or after a reboot.
|
124
|
+
|
125
|
+
Since as of October 2022 I am not using this as much anymore as
|
126
|
+
before, because the <b>roebe</b> gem has a class called Books
|
127
|
+
(at <b>roebe/classes/books.rb</b>) that handles .pdf files for
|
128
|
+
me. I use that class as if I am reading different "books" - each
|
129
|
+
individual .pdf file is then a "book".
|
130
|
+
|
131
|
+
## Converting markdown .md files to .pdf files
|
132
|
+
|
133
|
+
If you use kramdown, prawn and kramdown-pdf-converter, then you
|
134
|
+
can convert .md files on the commandline, via:
|
135
|
+
|
136
|
+
convert_markdown_to_pdf path_to_pdf_file_goes_here.pdf
|
137
|
+
|
138
|
+
Install the necessary gems prior to using this commandline
|
139
|
+
functionality.
|
140
|
+
|
141
|
+
## Querying the title of a .pdf file
|
142
|
+
|
143
|
+
<b>class PdfParadise::QueryPdfTitle</b> will report the title of
|
144
|
+
any .pdf file that is passed into it, on the commandline.
|
145
|
+
|
146
|
+
This currently depends on <b>exiftool</b> but at a later time,
|
147
|
+
this may change to also allow a query via prawn or other tools.
|
148
|
+
|
149
|
+
If you need to determine whether a given .pdf file has a title
|
150
|
+
or whether it does not, you can use
|
151
|
+
<b>PdfParadise.does_this_pdf_file_have_a_title?</b>, such
|
152
|
+
as in:
|
153
|
+
|
154
|
+
PdfParadise.does_this_pdf_file_have_a_title? "foobar.pdf" # => true
|
155
|
+
|
156
|
+
This method will return **true** if the .pdf file at hand has a
|
157
|
+
title; and **false** otherwise.
|
158
|
+
|
159
|
+
## Determining how many pages a given .pdf file has
|
160
|
+
|
161
|
+
class **PdfParadise::PdfFileNTotalPages** can be used to query
|
162
|
+
how many pages a given .pdf file has.
|
163
|
+
|
164
|
+
The executable called **bin/n_pages** (thus, **n_pages**) can
|
165
|
+
be used to query this, on the commandline.
|
166
|
+
|
167
|
+
Example:
|
168
|
+
|
169
|
+
n_pages foobar.pdf
|
170
|
+
|
171
|
+
Do note that the class requires the external program
|
172
|
+
called **pdfinfo**.
|
173
|
+
|
174
|
+
It is possible to query the number of pages in a given .pdf
|
175
|
+
file without **pdfinfo**, but some .pdf files are a bit buggy,
|
176
|
+
and **pdfinfo** is simply more reliable than the regex that
|
177
|
+
was used until March 2020. So, past March 2020, the program
|
178
|
+
**pdfinfo** is now used by default. Note that pdfinfo is
|
179
|
+
part of the poppler software suite.
|
180
|
+
|
181
|
+
You can also use the following toplevel API for this:
|
182
|
+
|
183
|
+
PdfParadise.n_pages? 'THE_PATH_TO_THE_PDF_FILE_GOES_IN_HERE.pdf'
|
184
|
+
PdfParadise.n_pages? 'foobar.pdf'
|
185
|
+
|
186
|
+
## Adding page numbers to .pdf files
|
187
|
+
|
188
|
+
Via the combine_pdf gem it is now possible to add page numbers
|
189
|
+
to .pdf files. This has a few limitations for complex .pdf files,
|
190
|
+
due to combine_pdf having limitations in turn - but for simple
|
191
|
+
.pdf files this should work really well.
|
192
|
+
|
193
|
+
How to use that functionality?
|
194
|
+
|
195
|
+
Consider using the following toplevel API:
|
196
|
+
|
197
|
+
PdfParadise.number_pages('this_file.pdf')
|
198
|
+
|
199
|
+
The file called **this_file.pdf** has to exist in order for
|
200
|
+
this to work, of course.
|
201
|
+
|
202
|
+
The current default is to display the page numbers on the bottom
|
203
|
+
right side. This is hardcoded, but you could modify the code
|
204
|
+
to adapt to your needs; see also how combine_pdf does this.
|
205
|
+
(You have to pass an option-hash.)
|
206
|
+
|
207
|
+
## Various GUI component of the PdfParadise project
|
208
|
+
|
209
|
+
The **PdfParadise project** comes with some ruby-gtk3 specific
|
210
|
+
GUIs, but a few ruby-gtk2 and ruby-tk bindings may exist
|
211
|
+
as well. The **ruby-gtk3** components constitute the main GUI
|
212
|
+
elements of this project, though.
|
213
|
+
|
214
|
+
You can start, from the commandline, the gtk-wrapper
|
215
|
+
over the **split_pdf_file** functionality.
|
216
|
+
|
217
|
+
In order to do this, do either one of the following:
|
218
|
+
|
219
|
+
pdf_paradise --gui
|
220
|
+
pdf_paradise --gtk
|
221
|
+
|
222
|
+
This will require the **gtk_paradise** project and the gtk
|
223
|
+
bindings, so quite a lot. **gem install gtk3** and
|
224
|
+
**gem install gtk_paradise** should help.
|
225
|
+
|
226
|
+
The GUI for class SplitPdfFile is called **PdfParadise::Gtk::SplitPdfFile**.
|
227
|
+
The idea behind it is to allow you to determine some of the parameters
|
228
|
+
in a graphical fashion.
|
229
|
+
|
230
|
+
Since as of **September 2019**, there is also a mini-widget for quickly
|
231
|
+
removing the first page of a .pdf file. This is really minimal right
|
232
|
+
now and not very elegant; it may be improved in the future, but for
|
233
|
+
the time being it is what it is. It is more a proof-of-concept that
|
234
|
+
it can work.
|
235
|
+
|
236
|
+
You can start this via:
|
237
|
+
|
238
|
+
require 'pdf_paradise/gui/gtk2/remove_first_page_of_pdf_file.rb'
|
239
|
+
|
240
|
+
PdfParadise.start_gtk_gui_remove_first_page_of_pdf_file
|
241
|
+
|
242
|
+
Note that as of **January 2021** the gtk bindings will default to
|
243
|
+
**ruby-gtk3**. Support for ruby-gtk2 will be retained, though,
|
244
|
+
but new code may not necessarily be written for ruby-gtk2 in
|
245
|
+
mind. ruby-gtk3 is now the main GUI target for this project.
|
246
|
+
|
247
|
+
I am slowly porting the individual widgets.
|
248
|
+
|
249
|
+
The following widgets have been ported so far:
|
250
|
+
|
251
|
+
PdfParadise::GUI::Gtk::StatisticsWidget # can be found under pdf_paradise/gui/gtk3/statistics_widget/statistics_widget.rb
|
252
|
+
|
253
|
+
## Specification of the .pdf format
|
254
|
+
|
255
|
+
This subsection is a stub - I only needed it to gather information
|
256
|
+
about the .pdf specification. This is NOT complete - it only shall
|
257
|
+
contain some useful information and snippets about the .pdf
|
258
|
+
specification.
|
259
|
+
|
260
|
+
PDF stands short for **Portable Document Format**.
|
261
|
+
|
262
|
+
PDF has been standardized as **ISO-32000** in the year **2008**.
|
263
|
+
|
264
|
+
In the pdf-specification we can distinguish these entities:
|
265
|
+
|
266
|
+
Objects: these are not objects in the OOP sense, but simply the
|
267
|
+
basic data type of the PDF standard. There are 9 types of objects:
|
268
|
+
null, boolean, integer, real, name, string, array, dictionary and
|
269
|
+
stream.
|
270
|
+
|
271
|
+
Dictionary: this is a key-value pair that is unordered. They are
|
272
|
+
denoted by << and >> at the beginning and the end.
|
273
|
+
|
274
|
+
Indirect Objects: these are objects that are referred to by
|
275
|
+
reference.
|
276
|
+
|
277
|
+
Direct Objects: these are objects that appear inline and are
|
278
|
+
obtained directly.
|
279
|
+
|
280
|
+
Conforming Reader: is ann application that parses a PDF
|
281
|
+
file according to the PDF Standard.
|
282
|
+
|
283
|
+
A .pdf file is made up of a specific structure, usually a four-part
|
284
|
+
layout.
|
285
|
+
|
286
|
+
These four parts are:
|
287
|
+
|
288
|
+
Header
|
289
|
+
Body
|
290
|
+
Cross-reference table
|
291
|
+
Trailer
|
292
|
+
|
293
|
+
### The .pdf Header tag
|
294
|
+
|
295
|
+
The header may begin with an entry such as **%PDF-1.7**.
|
296
|
+
|
297
|
+
The general format for the header is:
|
298
|
+
|
299
|
+
%PDF- followed by the version number in the form of 1.N.
|
300
|
+
|
301
|
+
This is not valid for all .pdf files, though. Past PDF Version
|
302
|
+
1.4, the **Version** entry in the document's catalog dictionary,
|
303
|
+
which is within the **Root** entry of the **Trailer**, may be
|
304
|
+
used instead of the Header - **if present**.
|
305
|
+
|
306
|
+
If a .pdf file contains binary data - which most PDF files
|
307
|
+
will do nowadays, such as **stream objects** - then the
|
308
|
+
**Header** line shall be immediately followed by a line
|
309
|
+
containing at the least **four binary characters**. These
|
310
|
+
are character codes of 128 or greater.
|
311
|
+
|
312
|
+
### The .pdf Body tag
|
313
|
+
|
314
|
+
The body of a PDF File consist of these aforementioned **Indirect
|
315
|
+
Objects** representing the contents of a document.
|
316
|
+
|
317
|
+
**Indirect Objects** begin with a **unique object identifier**
|
318
|
+
that allows other objects to refer to them.
|
319
|
+
|
320
|
+
That identifier is made up of the following two components:
|
321
|
+
|
322
|
+
(1) Object Number: a positive Integer, can be in any arbitrary order
|
323
|
+
(2) Generation Number: a non-negative Integer)
|
324
|
+
|
325
|
+
The **Indirect Objects** can be referred to from elsewhere by an
|
326
|
+
Indirect Reference. This must consist of:
|
327
|
+
|
328
|
+
Object Number
|
329
|
+
Generation Number, and
|
330
|
+
keyword R # for instance: 4 0 R
|
331
|
+
|
332
|
+
After the identifier is the keyword **obj** (start of the object)
|
333
|
+
and **endobj** (end of the object). Anything in between that is
|
334
|
+
is a key-value pair that describes the object.
|
335
|
+
|
336
|
+
A a simple example showing the use of **Indirect Objects** will be
|
337
|
+
shown next:
|
338
|
+
|
339
|
+
1 0 obj % Object Number 1, Generation Number 0
|
340
|
+
<<
|
341
|
+
/Type /Pages % Describe type of object
|
342
|
+
/Kids [ 4 0 R ] $ Kids Entry referring to an indirect reference (Object number 4, Generation number 0)
|
343
|
+
/Count 1
|
344
|
+
>>
|
345
|
+
endobj
|
346
|
+
|
347
|
+
2 0 obj % Object Number 2, Generation Number 0
|
348
|
+
<<
|
349
|
+
/Type /Catalog % Describe type of object
|
350
|
+
/Pages 1 0 R % Referring another object via unique object identifier
|
351
|
+
>>
|
352
|
+
endobj
|
353
|
+
|
354
|
+
The **Body** section of a .pdf file is thus a tree of objects that
|
355
|
+
are linked together, ultimately coming down to the Root Object
|
356
|
+
(Defined by the **Root** entry in the **Trailer** section, as a
|
357
|
+
catalog dictionary).
|
358
|
+
|
359
|
+
The **Cross-Reference Table** is a table that contains a list of byte
|
360
|
+
offset pointing to the indirect objects.
|
361
|
+
|
362
|
+
A pdf-conforming reader uses the Cross-Reference Table as a lookup
|
363
|
+
table to access certain objects quickly when needed.
|
364
|
+
|
365
|
+
The format for entries in Cross-Reference Table can be summarized ass
|
366
|
+
follows:
|
367
|
+
|
368
|
+
- In the following format nnnnnnnnnn ggggg n eol, a total of 20 bytes
|
369
|
+
- nnnnnnnnnn is a 10-digit byte offset in the decoded stream
|
370
|
+
- ggggg 5-digit generation number
|
371
|
+
- n keyword for in-use entry or f keyword for free entry
|
372
|
+
- eol 2 character end-of-line sequence (Like CR LF)
|
373
|
+
|
374
|
+
The **Cross-Reference Table** always begins with the special entry
|
375
|
+
**0000000000 65535** - see the following example:
|
376
|
+
|
377
|
+
0000000000 65535 f % special entry, f denoting it is a free entry
|
378
|
+
|
379
|
+
## Graphical User Interfaces (GUIs)
|
380
|
+
|
381
|
+
The pdf_paradise gem comes with a few, small-ish widgets, primarily
|
382
|
+
written in ruby-gtk. Since as of August 2021 I am also experimenting
|
383
|
+
with libui but this is a slow process - stay tuned for more updates
|
384
|
+
in the coming months in this regard.
|
385
|
+
|
386
|
+
One big advantage of libui is that it works on windows out-of-the-box,
|
387
|
+
so we can use GUIs on windows as well. \o/
|
388
|
+
|
389
|
+
## Storing all open .pdf files in a yaml file
|
390
|
+
|
391
|
+
In **February 2022* the yaml file working_on_these_pdf_files.yml
|
392
|
+
was added at:
|
393
|
+
|
394
|
+
pdf_paradise/yaml/working_on_these_pdf_files.yml
|
395
|
+
|
396
|
+
The idea here is that this yaml-file retains the local path
|
397
|
+
to any .pdf file that the user (in this case me) is working
|
398
|
+
on, aka reading right now.
|
399
|
+
|
400
|
+
I needed this because I tend to work through .pdf files and
|
401
|
+
remove page after page when I read it. The idea is that
|
402
|
+
I do not lose that information when I reboot my computer
|
403
|
+
or when said computer crashes; I needed to make this
|
404
|
+
persistent information.
|
405
|
+
|
406
|
+
Why is this yaml file part of the pdf_paradise gem, though?
|
407
|
+
This is mostly due to convenience. I wanted to have this
|
408
|
+
available in one of my ruby gems by default. In the long
|
409
|
+
run I will add code that allows other users to adjust
|
410
|
+
this to their own use case (and perhaps in their home
|
411
|
+
directory rather than store this in the gem itself). As
|
412
|
+
of February 2022 code for the latter is currently not
|
413
|
+
part of the gem, but I may add code for this - either
|
414
|
+
in the **pdf_paradise** gem or the **roebe** gem.
|
415
|
+
|
416
|
+
## Splitting a single pdf file into individual several .pdf files
|
417
|
+
|
418
|
+
You can use the following toplevel API to split up a single
|
419
|
+
.pdf file into several .pdf files:
|
420
|
+
|
421
|
+
PdfParadise.burst(ARGV)
|
422
|
+
PdfParadise.burst('foobar.pdf')
|
423
|
+
|
424
|
+
A commandline variant exists as well, at <b>bin/burst_this_pdf_file</b>,
|
425
|
+
tapping into the code stored in the file
|
426
|
+
<b>pdf_paradise/utility_scripts/split_pdf.rb</b>.
|
427
|
+
|
428
|
+
Usage example for the commandline variant:
|
429
|
+
|
430
|
+
burst_this_pdf_file foobar.pdf
|
431
|
+
|
432
|
+
(Make sure this bin file can be found in <b>$PATH</b>.)
|
433
|
+
|
434
|
+
Be careful when using this script: it will dump the generated
|
435
|
+
individual .pdf files into the current working directory, so
|
436
|
+
you may want to create a subdirectory before invoking this
|
437
|
+
executable, and move your target .pdf into that file. While
|
438
|
+
functionality could be added to automatically create a
|
439
|
+
subdirectory and relocate the generated .pdf files into
|
440
|
+
that subdirectory, for now we'll keep it simple here and
|
441
|
+
just extract the individual .pdf pages into the current
|
442
|
+
working directory.
|
443
|
+
|
444
|
+
Note that <b>hexapdf</b> can also be used for this functionality.
|
445
|
+
In February 2023 it became the default; the old variant
|
446
|
+
via imagemagick's convert is retained in the file
|
447
|
+
<b>pdf_paradise/utility_scripts/split_pdf.rb</b> though.
|
448
|
+
|
449
|
+
## Merging pdf files
|
450
|
+
|
451
|
+
<b>class PdfParadise::MergePdf.new(ARGV)</b> can be used for
|
452
|
+
<b>merging .pdf files</b>. This functionality depends on
|
453
|
+
external software, so you have to install this first.
|
454
|
+
|
455
|
+
Currently <b>ghostscript</b> and <b>hexapdf</b> can be used for
|
456
|
+
the <b>merging</b> step.
|
457
|
+
|
458
|
+
Examples for how to use either of these two variants, as
|
459
|
+
far as <b>class PdfParadise::MergePdf</b> is concerned,
|
460
|
+
follows next:
|
461
|
+
|
462
|
+
mergepdf one.pdf two.pdf --use-ghostscript
|
463
|
+
mergepdf one.pdf two.pdf --use-hexapdf
|
464
|
+
mergepdf *.avif --use-hexapdf
|
465
|
+
mergepdf SCAN1.avif SCAN2.avif SCAN3.avif SCAN4.avif SCAN5.avif --use-hexapdf
|
466
|
+
mergepdf SCAN1.avif SCAN2.avif SCAN3.avif SCAN4.avif SCAN5.avif --use-ghostscript
|
467
|
+
mergepdf output-page1.pdf output-page2.pdf output-page3.pdf output-page4.pdf output-page5.pdf --use-ghostscript
|
468
|
+
mergepdf SCAN1_GUTACHTEN.pdf SCAN2_GUTACHTEN.pdf SCAN3_GUTACHTEN.pdf SCAN4_GUTACHTEN.pdf SCAN5_GUTACHTEN.pdf --use-ghostscript
|
469
|
+
mergepdf SCAN1_GUTACHTEN.pdf SCAN2_GUTACHTEN.pdf SCAN3_GUTACHTEN.pdf SCAN4_GUTACHTEN.pdf SCAN5_GUTACHTEN.pdf --hexapdf
|
470
|
+
|
471
|
+
(The two -- hyphen are mandatory for commandline arguments right now; otherwise
|
472
|
+
it is assumed to be a locally existing .pdf file.)
|
473
|
+
|
474
|
+
If you need to do this from within ruby code, consider
|
475
|
+
using the following code:
|
476
|
+
|
477
|
+
require 'pdf_paradise'
|
478
|
+
merge_pdf = PdfParadise::MergePdf.new('one.pdf two.pdf')
|
479
|
+
merge_pdf.feedback_where_it_is_stored # Call it manually.
|
480
|
+
|
481
|
+
require 'pdf_paradise'
|
482
|
+
merge_pdf = PdfParadise::MergePdf.new('one.pdf two.pdf')
|
483
|
+
merge_pdf.feedback_where_it_is_stored # Call it manually.
|
484
|
+
|
485
|
+
## Combining individual pages from .pdf files into a new .pdf file via class PdfParadise::CombineThesePdfPages
|
486
|
+
|
487
|
+
class **PdfParadise::CombineThesePdfPages** can be used to
|
488
|
+
extract individual pdf pages from a given .pdf file and
|
489
|
+
combine these into a new .pdf file.
|
490
|
+
|
491
|
+
There is also an executable at **bin/combine_these_pdf_pages**
|
492
|
+
which can be used on the commandline.
|
493
|
+
|
494
|
+
This functionality depends on the **hexapdf** gem.
|
495
|
+
|
496
|
+
Usage example:
|
497
|
+
|
498
|
+
combine_these_pdf_pages foobar.pdf 3,4,5
|
499
|
+
|
500
|
+
This would retain the pages at 3, 4 and 5 and create a new
|
501
|
+
.pdf file.
|
502
|
+
|
503
|
+
## Extracting all images from a .pdf file
|
504
|
+
|
505
|
+
If you make use of <b>poppler</b> then you can extract
|
506
|
+
all images from a given .pdf file.
|
507
|
+
|
508
|
+
A small libui-GUI was added for this functionality - this
|
509
|
+
is mostly for quick demo purposes. It does not work extremely
|
510
|
+
well.
|
511
|
+
|
512
|
+
On IceWM it looks like this right now:
|
513
|
+
|
514
|
+
<img src="https://i.imgur.com/QXelVyy.png" style="margin:1em">
|
515
|
+
|
516
|
+
Not pretty, but it took only about 20 minutes to write this.
|
517
|
+
|
518
|
+
<b>pdfimages</b> from poppler must be installed. On Windows
|
519
|
+
you can probably download an executable for poppler here:
|
520
|
+
|
521
|
+
https://blog.alivate.com.au/poppler-windows/
|
522
|
+
|
523
|
+
I tested whether the above executables work on windows, and
|
524
|
+
indeed, they still work fine. I also tested the libui
|
525
|
+
variant on windows, and it works. The code is a bit
|
526
|
+
brittle, so use with care, but I was able to use it
|
527
|
+
successfully on <b>August 2022</b> to extract all images
|
528
|
+
from a given .pdf file. At a later time I may add am
|
529
|
+
to-image converter via libui, probably in the other
|
530
|
+
gem called image_paradise. Stay tuned in this regard.
|
531
|
+
|
532
|
+
To start the libui wrapper from the commandline, you can
|
533
|
+
use the following:
|
534
|
+
|
535
|
+
/usr/bin/pdf_paradise --libui
|
536
|
+
bin/pdf_paradise --libui
|
537
|
+
pdf_paradise --libui # This variant should work, or try the other
|
538
|
+
# variants; it is stored in bin/pdf_paradise
|
539
|
+
# of this gem
|
540
|
+
|
541
|
+
## Numbering the pages in a given .pdf file automatically
|
542
|
+
|
543
|
+
If you use the external gem called <b>combine_pdf</b> then you can
|
544
|
+
make use of automatic numbering via the pdf_paradise gem.
|
545
|
+
|
546
|
+
The API for this is:
|
547
|
+
|
548
|
+
PdfParadise.number_this_pdf_file('foobar.pdf')
|
549
|
+
|
550
|
+
It is not a very flexible API as of right now. Perhaps at a later
|
551
|
+
point in time it may be extended.
|
552
|
+
|
553
|
+
## class PdfParadise::ToPdf
|
554
|
+
|
555
|
+
class <b>PdfParadise::ToPdf</b> can be used for two main
|
556
|
+
activities right now:
|
557
|
+
|
558
|
+
(1) You can convert .docx to .pdf files on the commandline, if you
|
559
|
+
have libreoffice installed.
|
560
|
+
|
561
|
+
(2) If you pass in a directory, then all image files of that
|
562
|
+
directory will be gathered, converted into a .pdf file, and
|
563
|
+
then the .pdf file will be assembled.
|
564
|
+
|
565
|
+
## The sinatra interface of the pdf_paradise gem
|
566
|
+
|
567
|
+
Since as of April 2019 there is a minimal sinatra interface to the
|
568
|
+
PdfParadise project. Consider this incomplete <b>work-in-progress</b>.
|
569
|
+
|
570
|
+
To start it, try:
|
571
|
+
|
572
|
+
pdf_paradise --sinatra
|
573
|
+
|
574
|
+
Since as of <b>July 2023</b> this now makes use of class <b>Cyberweb::HtmlTemplate</b>.
|
575
|
+
This is the generic class I use for generating HTML files (or rather, the
|
576
|
+
String that describes the .html file in question).
|
577
|
+
|
578
|
+
## Flipping / Rotating a .pdf file
|
579
|
+
|
580
|
+
This subsection will try to explain how a .pdf file can be flipped / rotated,
|
581
|
+
and how this may relate to the <b>pdf_paradise</b> gem here.
|
582
|
+
|
583
|
+
There are many ways how to do so. Let's start with an example via <b>qpdf</b>.
|
584
|
+
|
585
|
+
To rotate clockwise, 90°, use:
|
586
|
+
|
587
|
+
qpdf --rotate=+90 foo.pdf bar.pdf
|
588
|
+
|
589
|
+
This will generate a flipped .pdf file, rotated 90°, and call it <b>bar.pdf</b>.
|
590
|
+
|
591
|
+
The pdf_paradise gem has a class that is tasked with rotating .pdf files.
|
592
|
+
|
593
|
+
See:
|
594
|
+
|
595
|
+
require 'pdf_paradise/utility_scripts/rotate_pdf_file.rb'
|
596
|
+
PdfParadise::RotatePdfFile.new(ARGV)
|
597
|
+
|
598
|
+
To set the rotation you can invoke the method called <b>.set_rotate()</b>.
|
599
|
+
|
600
|
+
There is also a bin/ commandline executable for this, called
|
601
|
+
<b>rotate_pdf</b>.
|
602
|
+
|
603
|
+
There is also a little GUI wrapper around that functionality available, as
|
604
|
+
part of the pdf_paradise project.
|
605
|
+
|
606
|
+
See:
|
607
|
+
|
608
|
+
PdfParadise::GUI::LibUI::RotatePdfFile.new
|
609
|
+
|
610
|
+
## Deleting the last or the first page of a .pdf file
|
611
|
+
|
612
|
+
You can use **class DeleteLastPageOfThisPdfFile**, more
|
613
|
+
accurately called **class PdfParadise::DeleteLastPageOfThisPdfFile**,
|
614
|
+
to ***delete the last page in a .pdf file***.
|
615
|
+
|
616
|
+
In ruby code, you can invoke this like so:
|
617
|
+
|
618
|
+
require 'pdf_paradise'
|
619
|
+
|
620
|
+
PdfParadise::DeleteLastPageOfThisPdfFile.new('path_to_the_pdf_file/goes_in_here.pdf')
|
621
|
+
|
622
|
+
or shorter:
|
623
|
+
|
624
|
+
require 'pdf_paradise'
|
625
|
+
|
626
|
+
PdfParadise.delete_last_page_of_this_pdf_file('foobar.pdf')
|
627
|
+
|
628
|
+
A very similar API exists for deleting the first page of a given .pdf
|
629
|
+
file, too.
|
630
|
+
|
631
|
+
See:
|
632
|
+
|
633
|
+
In ruby code, you can invoke this like so:
|
634
|
+
|
635
|
+
require 'pdf_paradise'
|
636
|
+
|
637
|
+
PdfParadise::DeleteTheFirstPageOfThisPdfFile.new('path_to_the_pdf_file/goes_in_here.pdf')
|
638
|
+
|
639
|
+
or shorter:
|
640
|
+
|
641
|
+
require 'pdf_paradise'
|
642
|
+
|
643
|
+
PdfParadise.delete_the_first_page_of_this_pdf_file('foobar.pdf')
|
644
|
+
PdfParadise.delete_first_page_of_this_pdf_file('foobar.pdf') # Both variants work.
|
645
|
+
|
646
|
+
Note that a small libui-wrapper exists for this functionality,
|
647
|
+
under the gui/ subdirectory of this gem. It may look like
|
648
|
+
this:
|
649
|
+
|
650
|
+
<img src="https://i.imgur.com/tjN0WwS.png" style="margin: 1em">
|
651
|
+
|
652
|
+
An older ruby-gtk3 variant also exists:
|
653
|
+
|
654
|
+
<img src="https://i.imgur.com/PdMwxeP.png" style="margin: 1em">
|
655
|
+
|
656
|
+
However had, in October 2023 I found this layout confusing, and
|
657
|
+
since I was also on a journey to write as many jruby-swing
|
658
|
+
GUIs as possible, I rewrote the old ruby-gtk3 code, to then
|
659
|
+
be used as a basis for the jruby code at a later time.
|
660
|
+
|
661
|
+
The rewrite did not change much, but the new layout makes
|
662
|
+
more logical sense, I think - at the least compared to the
|
663
|
+
prior variant:
|
664
|
+
|
665
|
+
<img src="https://i.imgur.com/8obFWhb.png" style="margin: 1em">
|
666
|
+
|
667
|
+
In October 2023 the old class
|
668
|
+
DeleteFirstPageOfThisPdfFile was rewritten and
|
669
|
+
renamed, into DeleteTheFirstPageOfThisPdfFile. The code
|
670
|
+
was improved, in particular when working on windows - that was
|
671
|
+
one use case I had, that it had to work on the windows
|
672
|
+
platform as well.
|
673
|
+
|
674
|
+
## Commandline usage
|
675
|
+
|
676
|
+
You can use the **pdf_paradise** gem from the commandline,
|
677
|
+
as other examples on this homepage shows.
|
678
|
+
|
679
|
+
For instance, say that you wish to modify **the title of a .pdf file**,
|
680
|
+
you can use a commandline invocation such as via this way:
|
681
|
+
|
682
|
+
pdf_paradise --use-this-pdf-file=location_to_your_pdf_file.pdf --set_title="The title you want to use goes in here."
|
683
|
+
|
684
|
+
You can also **shrink** a .pdf file, by using the commandline
|
685
|
+
switch <b>--shrink-pdf-size-of=foobar.pdf</b> or just
|
686
|
+
<b>--shrink</b>, such as:
|
687
|
+
|
688
|
+
pdf_paradise --shrink-pdf-size-of=foobar.pdf
|
689
|
+
pdf_paradise --shrink=foobar.pdf
|
690
|
+
|
691
|
+
The <b>shrink</b> functionality is contained in the module-method
|
692
|
+
<b>PdfParadise.reduce_size_of_this_pdf_file()</b>.
|
693
|
+
|
694
|
+
## Converting .jpg files to .pdf files
|
695
|
+
|
696
|
+
If you have a use case to convert several .jpg files into .pdf files
|
697
|
+
then the following commandline example should be helpful:
|
698
|
+
|
699
|
+
convert /path/to/image foobar.pdf
|
700
|
+
convert *.jpg foobar.pdf
|
701
|
+
|
702
|
+
Note that this requires <b>ImageMagick</b>. <b>ImageMagick</b> is not
|
703
|
+
always perfect; it has a few problems, unfortunately.
|
704
|
+
|
705
|
+
For instance, in <b>April 2022</b> when I tried the above,
|
706
|
+
the image was repeated three times on the x-axis. I do not
|
707
|
+
know why, but that makes **absolutely no sense**. It is just
|
708
|
+
a single image, so why is the resulting .pdf file repeated
|
709
|
+
three times? Perhaps imagemagick's **convert** tool does
|
710
|
+
this automatically, but then I question the default behaviour -
|
711
|
+
**it makes no sense** for the use case I have. One image
|
712
|
+
should be one image, not three images or fifty images.
|
713
|
+
|
714
|
+
In the event that **ImageMagick** does not work very well
|
715
|
+
for your use case, consider using another software suite,
|
716
|
+
such as **img2pdf**.
|
717
|
+
|
718
|
+
The syntax for **img2pdf** goes something like this:
|
719
|
+
|
720
|
+
img2pdf *.jpg -o document.pdf
|
721
|
+
img2pdf SCAN1.jpg SCAN2.jpg SCAN3.jpg SCAN4.jpg SCAN5.jpg -o document.pdf
|
722
|
+
|
723
|
+
I liked this, so in **April 2022** this was added to
|
724
|
+
**ImageParadise**. The API for this is as follows:
|
725
|
+
|
726
|
+
ImageParadise.img2pdf('*.jpg') # If a '*' is part of the input Dir[] will be used.
|
727
|
+
|
728
|
+
As that functionality may be useful on the commandline as well, an
|
729
|
+
executable has been added at **bin/imageparadise_img2pdf**. Simply
|
730
|
+
pass the image files that you want to convert.
|
731
|
+
|
732
|
+
Usage example:
|
733
|
+
|
734
|
+
imageparadise_img2pdf *jpg
|
735
|
+
|
736
|
+
If you need the images to be ordered or sorted then you may have to
|
737
|
+
do so when specifying the image file at hand specifically, e. g. the
|
738
|
+
path to it.
|
739
|
+
|
740
|
+
So for instance:
|
741
|
+
|
742
|
+
imageparadise_img2pdf image3.jpg image1.jpg image2.png
|
743
|
+
|
744
|
+
The only drawback I have found with <b>img2pdf</b> so far is
|
745
|
+
that you can not easily add text to an image. This makes it
|
746
|
+
hard to identify which image is named how. A work around for
|
747
|
+
this is to embed the filename into the image itself, e. g.
|
748
|
+
create temporary images, and then pack them together via
|
749
|
+
<b>img2pdf</b>.
|
750
|
+
|
751
|
+
Unfortunately in September 2023 I realiased that img2pdf sometimes
|
752
|
+
creates .pdf files that are flawed. So img2pdf may not always be
|
753
|
+
an optimal choice.
|
754
|
+
|
755
|
+
## Compressing a .pdf file (optizime the size of a .pdf file)
|
756
|
+
|
757
|
+
Sometimes you may want to reduce the filesize of a given .pdf file at hand,
|
758
|
+
such as when you need to <b>upload</b> a .pdf file, and there is some file
|
759
|
+
size limit in place, thus making it obligatory to reduce the .pdf file below
|
760
|
+
a certain threshold. This actually happened to me a few times when using
|
761
|
+
webmail-based email services, where an automatic notice was generated and
|
762
|
+
issued to me when the .pdf file was too large, such as <b>above 25MB in
|
763
|
+
size</b> or something similar.
|
764
|
+
|
765
|
+
So, let us now assume that you <b>do</b> have a use case such as described
|
766
|
+
above, or any other use case - you want to reduce the file size of a given
|
767
|
+
.pdf file at hand.
|
768
|
+
|
769
|
+
How can this be done?
|
770
|
+
|
771
|
+
Well, there are several ways of course.
|
772
|
+
|
773
|
+
One is to use online-based tools, which tend to work surprisingly well; I
|
774
|
+
verified this in February 2022. One example for this is this
|
775
|
+
website:
|
776
|
+
|
777
|
+
https://www.ilovepdf.com/compress_pdf
|
778
|
+
|
779
|
+
But, as far as the gem here is concerned, we will focus primarily on means
|
780
|
+
that can be <b>used by you, on your own</b>, without having to depend on
|
781
|
+
external websites.
|
782
|
+
|
783
|
+
Two methods will be described here - the first one requiring <b>ghostscript</b>,
|
784
|
+
the second one requiring <b>hexapdf</b>.
|
785
|
+
|
786
|
+
The important parameter in regards for <b>ghostscript</b> is the
|
787
|
+
<b>dPDFSETTINGS</b> parameter. This one will determine the compression
|
788
|
+
level, which ultimately will affect the quality of the compressed .pdf
|
789
|
+
file.
|
790
|
+
|
791
|
+
Available parameters to <b>dPDFSETTINGS</b> include <b>/screen</b>,
|
792
|
+
<b>/ebook</b>, <b>/printer</b>, <b>/prepress</b> and <b>/default</b>.
|
793
|
+
|
794
|
+
The options are as follows:
|
795
|
+
|
796
|
+
| -dPDFSETTINGS Option | Explanation |
|
797
|
+
| ------------------------- | ------------------------------------------------------------------------------- |
|
798
|
+
| -dPDFSETTINGS=/screen | Has a lower quality and smaller size. (72 dpi) |
|
799
|
+
| -dPDFSETTINGS=/ebook | Has a better quality, but has a slightly larger size (150 dpi) |
|
800
|
+
| -dPDFSETTINGS=/prepress | Output is of a higher size and quality (300 dpi) |
|
801
|
+
| -dPDFSETTINGS=/printer | Output is of a printer type quality (300 dpi) |
|
802
|
+
| -dPDFSETTINGS=/default | Selects the output which is useful for multiple purposes. Can cause large PDFS. |
|
803
|
+
|
804
|
+
In particular /screen is optimal here if you want to reduce the file size. You can
|
805
|
+
achieve, for instance, a compression from a .pdf file the size of 73 MB down to
|
806
|
+
14 MB - which is quite neat.
|
807
|
+
|
808
|
+
class <b>PdfParadise::CompressThisPdfFile</b> can be of help here. Simply
|
809
|
+
pass, as argument to .new(), the path of the local .pdf to that class.
|
810
|
+
|
811
|
+
This class resides at:
|
812
|
+
|
813
|
+
pdf_paradise/compress/compress_this_pdf_file.rb
|
814
|
+
|
815
|
+
Note that class <b>PdfParadise::CompressThisPdfFile</b> currently only uses
|
816
|
+
ghostscript, so we have to use the above commandline options, such
|
817
|
+
as <b>-dPDFSETTINGS</b>.
|
818
|
+
|
819
|
+
You can also use a <b>toplevel method</b> if you'd like to:
|
820
|
+
|
821
|
+
require 'pdf_paradise'
|
822
|
+
PdfParadise.compress_this_pdf_file
|
823
|
+
PdfParadise.compress_this_pdf_file('/foobar.pdf') # ← Pass the path to the .pdf file into this method.
|
824
|
+
|
825
|
+
The variant using <b>hexapdf</b> is called:
|
826
|
+
|
827
|
+
PdfParadise.compress_via_pdf
|
828
|
+
PdfParadise.compress_via_pdf('foobar.pdf')
|
829
|
+
|
830
|
+
The <b>API</b> names may change at a later point in time; perhaps we will just
|
831
|
+
add a toplevel API called <b>PdfParadise.compress()</b>, but for the time being
|
832
|
+
the above APIs will be retained as they are.
|
833
|
+
|
834
|
+
In February 2024 I noticed that qpdf can also be used to compress .pdf
|
835
|
+
files.
|
836
|
+
|
837
|
+
Commandline variants in this regard may look like this:
|
838
|
+
|
839
|
+
qpdf --compress-streams=y --object-streams=generate --recompress-flate --optimize-images input_file_here.pdf output_file_there.pdf
|
840
|
+
|
841
|
+
To use the above in pdf_paradise you can use:
|
842
|
+
|
843
|
+
PdfParadise.compress_via_qpdf
|
844
|
+
|
845
|
+
## Licence
|
846
|
+
|
847
|
+
In <b>January 2024</b>, the licence of this project was changed from GPL-2.0
|
848
|
+
towards <b>"MIT No Attribution"</b>. You can read up on this MIT licence
|
849
|
+
here:
|
850
|
+
|
851
|
+
https://spdx.org/licenses/MIT-0.html
|
852
|
+
|
853
|
+
The two most important parts are the "no warranty", as well as "use this software
|
854
|
+
how you want to", so it is a fairly liberal licence, with almost no restrictions.
|
855
|
+
|
856
|
+
I will also copy/paste the full licence here, for convenience to the reader:
|
857
|
+
|
858
|
+
Copyright 2024 Robert A. Heiler
|
859
|
+
|
860
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy of
|
861
|
+
this software and associated documentation files (the "Software"), to deal in
|
862
|
+
the Software without restriction, including without limitation the rights to
|
863
|
+
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
|
864
|
+
of the Software, and to permit persons to whom the Software is furnished to
|
865
|
+
do so.
|
866
|
+
|
867
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
868
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
869
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
870
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
871
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
|
872
|
+
FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
|
873
|
+
IN THE SOFTWARE.
|
874
|
+
|
875
|
+
|
876
|
+
## Contact information and mandatory 2FA (no longer) coming up in 2022 / 2023
|
877
|
+
|
878
|
+
If your creative mind has ideas and specific suggestions to make this gem
|
879
|
+
more useful in general, feel free to drop me an email at any time, via:
|
880
|
+
|
881
|
+
shevy@inbox.lt
|
882
|
+
|
883
|
+
Before that email I used an email account at Google gmail, but in **2021** I
|
884
|
+
decided to slowly abandon gmail, for various reasons. In order to limit the
|
885
|
+
explanation here, allow me to just briefly state that I do not feel as if I
|
886
|
+
want to promote any Google service anymore when the user becomes the end
|
887
|
+
product (such as via data collection by upstream services, including other
|
888
|
+
proxy-services). My feeling is that this is a hugely flawed business model
|
889
|
+
to begin with, and I no longer wish to support this in any way, even if
|
890
|
+
only indirectly so, such as by using services of companies that try to
|
891
|
+
promote this flawed model.
|
892
|
+
|
893
|
+
In regards to responding to emails: please keep in mind that responding
|
894
|
+
may take some time, depending on the amount of work I may have at that
|
895
|
+
moment. So it is not that emails are ignored; it is more that I have not
|
896
|
+
(yet) found the time to read and reply. This means there may be a delay
|
897
|
+
of days, weeks and in some instances also months. There is, unfortunately,
|
898
|
+
not much I can do when I need to prioritise my time investment, but I try
|
899
|
+
to consider <b>all</b> feedback as an opportunity to improve my projects
|
900
|
+
nonetheless.
|
901
|
+
|
902
|
+
In <b>2022</b> rubygems.org decided to make 2FA mandatory for every
|
903
|
+
gem owner eventually:
|
904
|
+
|
905
|
+
see
|
906
|
+
https://blog.rubygems.org/2022/06/13/making-packages-more-secure.html
|
907
|
+
|
908
|
+
Mandatory 2FA will eventually be extended to all rubygems.org developers and
|
909
|
+
maintainers. As I can not use 2FA, for reasons I will skip explaining here,
|
910
|
+
this means that my projects will eventually be removed, as I no longer
|
911
|
+
have any control over my projects hosted on rubygems.org (because I
|
912
|
+
can not use 2FA).
|
913
|
+
|
914
|
+
At that point, I no longer have any control what is done to my projects
|
915
|
+
since whoever is controlling the gems ecosystem took away our control
|
916
|
+
here. I am not sure at which point ruby became corporate-controlled -
|
917
|
+
that was not the case several years ago, so something has
|
918
|
+
changed.
|
919
|
+
|
920
|
+
Ruby also only allows 2FA users to participate on the issue tracker these
|
921
|
+
days:
|
922
|
+
|
923
|
+
https://bugs.ruby-lang.org/issues/18800
|
924
|
+
|
925
|
+
But this has been reverted some months ago, so it is no longer applicable.
|
926
|
+
Suffice to say that I do not think that we should only be allowed to
|
927
|
+
interact on the world wide web when some 'authority' authenticated us,
|
928
|
+
such as via mandatory 2FA, so I hope this won't come back again.
|
929
|
+
|
930
|
+
Fighting spam is a noble goal, but when it also means you lock out
|
931
|
+
real human people then this is definitely NOT a good situation
|
932
|
+
to be had.
|
933
|
+
|