pdf_paradise 0.1.58
Sign up to get free protection for your applications and to get access to all the features.
Potentially problematic release.
This version of pdf_paradise might be problematic. Click here for more details.
- checksums.yaml +7 -0
- data/README.md +652 -0
- data/bin/automatic_pdf_title +7 -0
- data/bin/combine_these_pdf_pages +7 -0
- data/bin/compress_via_hexapdf +7 -0
- data/bin/convert_markdown_to_pdf +7 -0
- data/bin/convert_pdf_to_text +7 -0
- data/bin/delete_first_page_of_this_pdf_file +7 -0
- data/bin/merge_then_open +7 -0
- data/bin/n_pages +10 -0
- data/bin/open_main_pdf +7 -0
- data/bin/pdf_paradise +7 -0
- data/bin/set_main_book +7 -0
- data/bin/set_title_of_this_pdf_file +15 -0
- data/doc/README.gen +624 -0
- data/doc/todo/todo.md +7 -0
- data/lib/pdf_paradise/base/base.rb +218 -0
- data/lib/pdf_paradise/base/colours.rb +29 -0
- data/lib/pdf_paradise/commandline/help.rb +69 -0
- data/lib/pdf_paradise/commandline/menu.rb +184 -0
- data/lib/pdf_paradise/compress/compress_via_hexapdf.rb +27 -0
- data/lib/pdf_paradise/compress_this_pdf_file.rb +88 -0
- data/lib/pdf_paradise/constants/constants.rb +76 -0
- data/lib/pdf_paradise/convert_text_to_pdf.rb +94 -0
- data/lib/pdf_paradise/css/project.css +17 -0
- data/lib/pdf_paradise/djvu_to_pdf.rb +85 -0
- data/lib/pdf_paradise/gui/README.md +6 -0
- data/lib/pdf_paradise/gui/fox/split_pdf_file.rb +77 -0
- data/lib/pdf_paradise/gui/gtk2/delete_the_first_or_the_last_page_of_this_pdf_file/delete_the_first_or_the_last_page_of_this_pdf_file.rb +39 -0
- data/lib/pdf_paradise/gui/gtk2/pdf_viewer/pdf_viewer.rb +34 -0
- data/lib/pdf_paradise/gui/gtk2/split_pdf_file/split_pdf_file.rb +34 -0
- data/lib/pdf_paradise/gui/gtk2/statistics_widget/statistics_widget.rb +34 -0
- data/lib/pdf_paradise/gui/gtk2/to_pdf/to_pdf.rb +32 -0
- data/lib/pdf_paradise/gui/gtk3/controller/controller.rb +212 -0
- data/lib/pdf_paradise/gui/gtk3/convert_pdf_to_text/convert_pdf_to_text.rb +34 -0
- data/lib/pdf_paradise/gui/gtk3/delete_the_first_or_the_last_page_of_this_pdf_file/delete_the_first_or_the_last_page_of_this_pdf_file.rb +39 -0
- data/lib/pdf_paradise/gui/gtk3/pdf_viewer/pdf_viewer.rb +34 -0
- data/lib/pdf_paradise/gui/gtk3/split_pdf_file/split_pdf_file.rb +34 -0
- data/lib/pdf_paradise/gui/gtk3/statistics_widget/statistics_widget.rb +34 -0
- data/lib/pdf_paradise/gui/gtk3/to_pdf/to_pdf.rb +32 -0
- data/lib/pdf_paradise/gui/libui/statistics_widget/statistics_widget.rb +217 -0
- data/lib/pdf_paradise/gui/shared_code/convert_pdf_to_text/convert_pdf_to_text_module.rb +277 -0
- data/lib/pdf_paradise/gui/shared_code/delete_the_first_or_the_last_page_of_this_pdf_file/delete_the_first_or_the_last_page_of_this_pdf_file_module.rb +443 -0
- data/lib/pdf_paradise/gui/shared_code/pdf_viewer/pdf_viewer.css +5 -0
- data/lib/pdf_paradise/gui/shared_code/pdf_viewer/pdf_viewer_module.rb +284 -0
- data/lib/pdf_paradise/gui/shared_code/split_pdf_file/split_pdf_file.css +0 -0
- data/lib/pdf_paradise/gui/shared_code/split_pdf_file/split_pdf_file_module.rb +294 -0
- data/lib/pdf_paradise/gui/shared_code/statistics_widget/statistics_widget_module.rb +349 -0
- data/lib/pdf_paradise/gui/shared_code/to_pdf/to_pdf_module.rb +281 -0
- data/lib/pdf_paradise/hexapdf/001_rainbow_pattern_example.rb +0 -0
- data/lib/pdf_paradise/hexapdf/hexapdf.rb +123 -0
- data/lib/pdf_paradise/images/PDF_PARADISE_LOGO.png +0 -0
- data/lib/pdf_paradise/main_pdf/main_pdf.rb +445 -0
- data/lib/pdf_paradise/merge_pdf/menu.rb +63 -0
- data/lib/pdf_paradise/merge_pdf/merge_pdf.rb +298 -0
- data/lib/pdf_paradise/merge_pdf_namespace.rb +9 -0
- data/lib/pdf_paradise/merge_then_open/merge_then_open.rb +105 -0
- data/lib/pdf_paradise/pdf_file_n_total_pages.rb +249 -0
- data/lib/pdf_paradise/prawn_addons/README.md +2 -0
- data/lib/pdf_paradise/prawn_addons/prawn_addons.rb +17 -0
- data/lib/pdf_paradise/project/project.rb +22 -0
- data/lib/pdf_paradise/remove_pdf_password.rb +392 -0
- data/lib/pdf_paradise/requires/batch_require_toplevel_files.rb +22 -0
- data/lib/pdf_paradise/requires/colours.rb +7 -0
- data/lib/pdf_paradise/requires/colours_and_esystem_and_save_file_and_fileutils_and_opn.rb +11 -0
- data/lib/pdf_paradise/requires/esystem_and_colours.rb +10 -0
- data/lib/pdf_paradise/requires/esystem_and_opn_and_colours.rb +8 -0
- data/lib/pdf_paradise/requires/require_the_whole_project.rb +28 -0
- data/lib/pdf_paradise/requires/require_utility_scripts.rb +9 -0
- data/lib/pdf_paradise/set_main_book.rb +156 -0
- data/lib/pdf_paradise/set_pdf_title.rb +218 -0
- data/lib/pdf_paradise/sinatra/embeddable_interface.rb +318 -0
- data/lib/pdf_paradise/toplevel_methods/automatic_pdf_title.rb +55 -0
- data/lib/pdf_paradise/toplevel_methods/convert_epub_to_pdf.rb +27 -0
- data/lib/pdf_paradise/toplevel_methods/convert_markdown_to_pdf.rb +43 -0
- data/lib/pdf_paradise/toplevel_methods/convert_ppt_to_pdf.rb +35 -0
- data/lib/pdf_paradise/toplevel_methods/e.rb +16 -0
- data/lib/pdf_paradise/toplevel_methods/esystem.rb +19 -0
- data/lib/pdf_paradise/toplevel_methods/misc.rb +62 -0
- data/lib/pdf_paradise/toplevel_methods/number_pages.rb +38 -0
- data/lib/pdf_paradise/toplevel_methods/opened_pdf_files.rb +221 -0
- data/lib/pdf_paradise/toplevel_methods/query_pdf_title.rb +191 -0
- data/lib/pdf_paradise/toplevel_methods/reduce_size_of_this_pdf_file.rb +43 -0
- data/lib/pdf_paradise/toplevel_methods/roebe.rb +17 -0
- data/lib/pdf_paradise/toplevel_methods/rotate_pdf_file.rb +146 -0
- data/lib/pdf_paradise/toplevel_methods/to_pdf.rb +38 -0
- data/lib/pdf_paradise/utility_scripts/README.md +3 -0
- data/lib/pdf_paradise/utility_scripts/combine_these_pdf_pages.rb +118 -0
- data/lib/pdf_paradise/utility_scripts/convert_pdf_to_text.rb +175 -0
- data/lib/pdf_paradise/utility_scripts/delete_first_page_of_this_pdf_file.rb +215 -0
- data/lib/pdf_paradise/utility_scripts/delete_last_page_of_this_pdf_file.rb +180 -0
- data/lib/pdf_paradise/utility_scripts/delete_this_page_of_this_pdf_file.rb +310 -0
- data/lib/pdf_paradise/utility_scripts/extract_pdf_page.rb +280 -0
- data/lib/pdf_paradise/utility_scripts/pdf_optimizer.rb +113 -0
- data/lib/pdf_paradise/utility_scripts/pdf_statistics.rb +150 -0
- data/lib/pdf_paradise/utility_scripts/pdf_to_html.rb +75 -0
- data/lib/pdf_paradise/utility_scripts/remove_images.rb +110 -0
- data/lib/pdf_paradise/utility_scripts/split_pdf.rb +340 -0
- data/lib/pdf_paradise/utility_scripts/to_qdf.rb +82 -0
- data/lib/pdf_paradise/version/version.rb +19 -0
- data/lib/pdf_paradise/www/README.md +2 -0
- data/lib/pdf_paradise/www/sinatra/app.rb +276 -0
- data/lib/pdf_paradise/yaml/working_on_these_pdf_files.yml +4 -0
- data/lib/pdf_paradise.rb +5 -0
- data/pdf_paradise.gemspec +62 -0
- data/test/testing_pdf_paradise.rb +9 -0
- metadata +230 -0
checksums.yaml
ADDED
@@ -0,0 +1,7 @@
|
|
1
|
+
---
|
2
|
+
SHA256:
|
3
|
+
metadata.gz: 0657616fc38dfa2075ff3360f06cde561b9a202ec338a16c801cdc3682be9317
|
4
|
+
data.tar.gz: 579ce2ef9367f0415ae11785fbd69a0d1dc740c85a4d5a73d6a198eeb0a0e585
|
5
|
+
SHA512:
|
6
|
+
metadata.gz: e813b3337e4d0467da83209419d99e71686393883d211de54adc82da66d08f7d2d8a83694d42853761a0df3017914026c3a3bc4e21fc4ee5fb9b4d880d9ba626
|
7
|
+
data.tar.gz: 22f3af097b63260582116df4e99e8d5968d7c241c581a13b4d7867ff445ebb1438b883091182c1ee38c2f9023c2905704c632c2b684d45bae2fb65a266ca35eb
|
data/README.md
ADDED
@@ -0,0 +1,652 @@
|
|
1
|
+
[![forthebadge](http://forthebadge.com/images/badges/built-with-love.svg)](https://www.gobolinux.org/)
|
2
|
+
[![forthebadge](http://forthebadge.com/images/badges/made-with-ruby.svg)](https://www.ruby-lang.org/en/)
|
3
|
+
[![Gem Version](https://badge.fury.io/rb/pdf_paradise.svg)](https://badge.fury.io/rb/pdf_paradise)
|
4
|
+
|
5
|
+
This gem was <b>last updated</b> on the <span style="color: darkblue; font-weight: bold">20.05.2022</span> (dd.mm.yyyy notation), at <span style="color: steelblue; font-weight: bold">06:33:49</span> o'clock.
|
6
|
+
|
7
|
+
<img src="https://i.imgur.com/unhKNEw.png" style="margin-left: 2em">
|
8
|
+
|
9
|
+
This project can help with pdf-related activities, such as extracting
|
10
|
+
a .pdf page, converting .pdf page, merging .pdf files, splitting
|
11
|
+
.pdf files, setting the title of a .pdf page and similar actions.
|
12
|
+
|
13
|
+
The project has to remain quite flexible. We may use external
|
14
|
+
programs such as **ghoscript** or **qpdf**, or we may use pure
|
15
|
+
ruby solutions, such as via the gem called **combine_pdf**,
|
16
|
+
**prawn** or **hexapdf**.
|
17
|
+
|
18
|
+
The file here (README.gen, respectively the generated file called
|
19
|
+
**README.md**), will describe some of the components that make
|
20
|
+
up this gem.
|
21
|
+
|
22
|
+
## Rationale for making use of separate pdf-related projects
|
23
|
+
|
24
|
+
There are many pdf-related tools if you look on the www. For
|
25
|
+
example, we have prawn, we have qpdf, we have calibre, we
|
26
|
+
have hexapdf, we have ghostscript, and many more applications.
|
27
|
+
|
28
|
+
Some of these have unique features; and some of them have overlapping
|
29
|
+
functionality, such as reading the content of .pdf files in a
|
30
|
+
simplified manner (number of pages, title, author and so forth).
|
31
|
+
|
32
|
+
The PdfParadise project attempts to support as many different
|
33
|
+
(open-source) projects as possible. It is also permissive to
|
34
|
+
support closed source projects, provided that **the code remains
|
35
|
+
simple** (and simple to change). The primary focus is on
|
36
|
+
open-source projects, though.
|
37
|
+
|
38
|
+
Why does the PdfParadise project attempt to support many different
|
39
|
+
pdf-related projects?
|
40
|
+
|
41
|
+
The answer to this question is rather simple: on Linux I have a lot of
|
42
|
+
flexibility and can use literally any pdf-related project just fine. On
|
43
|
+
Windows, however had, I am more restricted in what I can use. Not all
|
44
|
+
programs are available on windows or can be easily compiled there. Thus,
|
45
|
+
in order to allow the pdf_paradise .gem to work on windows, we need
|
46
|
+
this flexibility.
|
47
|
+
|
48
|
+
The reason why I added this subsection here in June 2021 was that
|
49
|
+
I am slowly changing the sinatra-related part of the PdfParadise
|
50
|
+
project, in order to embed the functionality into my main controller
|
51
|
+
which is handled by the **Roebe** namespace. In that controller
|
52
|
+
I wanted to easily offer pdf-related functionality "out of the
|
53
|
+
box" when I start the sinatra-application on windows. Because I
|
54
|
+
want to be able to offer pdf-related modifications on windows
|
55
|
+
as well, the PdfParadise project had to become more flexible,
|
56
|
+
so that a simple toplevel route, such as **/pdf**, will work
|
57
|
+
properly, and lead to entry points (subroutes) that allow
|
58
|
+
us to tap into the features offered by the PdfParadise project.
|
59
|
+
|
60
|
+
So, the **summary** is: the PdfParadise project must remain
|
61
|
+
flexible in order to support a proper workflow on windows
|
62
|
+
systems as well. (We could use WSL on windows, but not every
|
63
|
+
computer has this available, so I am targeting "vanilla"
|
64
|
+
windows really.)
|
65
|
+
|
66
|
+
Note that one slight drawback is that the sinatra part of
|
67
|
+
the PdfParadise project now has a dependency on the
|
68
|
+
**cyberweb** project, so if you want to use that, you also
|
69
|
+
have to install the cyberweb gem. This is a trade-off - for me
|
70
|
+
the more important part is long-term maintainability of
|
71
|
+
the pdf_paradise project in the long run, so a unified
|
72
|
+
code base had to be used in this regard.
|
73
|
+
|
74
|
+
## Converting a .pdf file to text
|
75
|
+
|
76
|
+
Sometimes you may wish to have a text-file describing the content
|
77
|
+
of a .pdf file, rather than the .pdf file itself.
|
78
|
+
|
79
|
+
Via class **PdfParadise::ConvertPdfToText**, residing in the file
|
80
|
+
at **pdf_paradise/convert_pdf_to_text.rb**, you can convert a
|
81
|
+
.pdf file to a text file.
|
82
|
+
|
83
|
+
Usage example from ruby, for the file called **foobar.pdf**:
|
84
|
+
|
85
|
+
PdfParadise::ConvertPdfToText.new(ARGV)
|
86
|
+
PdfParadise::ConvertPdfToText.new('foobar.pdf')
|
87
|
+
|
88
|
+
You can also use the bin/ file from the commandline:
|
89
|
+
|
90
|
+
convert_pdf_to_text
|
91
|
+
convert_pdf_to_text foobar.pdf
|
92
|
+
|
93
|
+
There is also a ruby-gtk3 widget that offers the functionality
|
94
|
+
from class **PdfParadise::ConvertPdfToText**, if the user
|
95
|
+
has gtk3 installed and the ruby-bindings to it as well.
|
96
|
+
|
97
|
+
You can start that ruby-gtk3 widget via:
|
98
|
+
|
99
|
+
convert_pdf_to_text --gui
|
100
|
+
|
101
|
+
## Commandline usage
|
102
|
+
|
103
|
+
You can use the **pdf_paradise** gem from the commandline, as
|
104
|
+
the example above shows.
|
105
|
+
|
106
|
+
For instance, say that you wish to modify **the title of a .pdf
|
107
|
+
file**, you can use a commandline invocation such as via
|
108
|
+
this way:
|
109
|
+
|
110
|
+
pdf_paradise --use-this-pdf-file=location_to_your_pdf_file.pdf --set_title="The title you want to use goes in here."
|
111
|
+
|
112
|
+
You can also **shrink** a .pdf file, by using the commandline
|
113
|
+
switch <b>--shrink-pdf-size-of=foobar.pdf</b> or just
|
114
|
+
<b>--shrink</b>, such as:
|
115
|
+
|
116
|
+
pdf_paradise --shrink-pdf-size-of=foobar.pdf
|
117
|
+
pdf_paradise --shrink=foobar.pdf
|
118
|
+
|
119
|
+
The <b>shrink</b> functionality is contained in the module-method
|
120
|
+
<b>PdfParadise.reduce_size_of_this_pdf_file()</b>.
|
121
|
+
|
122
|
+
## Storing the .pdf pages that are currently open
|
123
|
+
|
124
|
+
If you need to store the .pdf files that are currently open,
|
125
|
+
you can use the following commandline to do so:
|
126
|
+
|
127
|
+
pdfparadise --store-open-pdf-files
|
128
|
+
|
129
|
+
This will attempt to store the full path to the .pdf files
|
130
|
+
into a local file. That way you may also be able to batch-open
|
131
|
+
these .pdf files at a later time, e. g. when you switch your
|
132
|
+
window manager or after a reboot.
|
133
|
+
|
134
|
+
## Deleting the last or the first page of a .pdf file
|
135
|
+
|
136
|
+
You can use **class DeleteLastPageOfThisPdfFile**, more
|
137
|
+
accurately called **class PdfParadise::DeleteLastPageOfThisPdfFile**,
|
138
|
+
to ***delete the last page in a .pdf file***.
|
139
|
+
|
140
|
+
In ruby code, you can invoke this like so:
|
141
|
+
|
142
|
+
require 'pdf_paradise'
|
143
|
+
|
144
|
+
PdfParadise::DeleteLastPageOfThisPdfFile.new('path_to_the_pdf_file/goes_in_here.pdf')
|
145
|
+
|
146
|
+
or shorter:
|
147
|
+
|
148
|
+
require 'pdf_paradise'
|
149
|
+
|
150
|
+
PdfParadise.delete_last_page_of_this_pdf_file('foobar.pdf')
|
151
|
+
|
152
|
+
A very similar API exists for deleting the first page of a given .pdf
|
153
|
+
file, too.
|
154
|
+
|
155
|
+
See:
|
156
|
+
|
157
|
+
In ruby code, you can invoke this like so:
|
158
|
+
|
159
|
+
require 'pdf_paradise'
|
160
|
+
|
161
|
+
PdfParadise::DeleteFirstPageOfThisPdfFile.new('path_to_the_pdf_file/goes_in_here.pdf')
|
162
|
+
|
163
|
+
or shorter:
|
164
|
+
|
165
|
+
require 'pdf_paradise'
|
166
|
+
|
167
|
+
PdfParadise.delete_first_page_of_this_pdf_file('foobar.pdf')
|
168
|
+
|
169
|
+
## Converting markdown .md files to .pdf files
|
170
|
+
|
171
|
+
If you use kramdown, prawn and kramdown-pdf-converter, then you
|
172
|
+
can convert .md files on the commandline, via:
|
173
|
+
|
174
|
+
convert_markdown_to_pdf path_to_pdf_file_goes_here.pdf
|
175
|
+
|
176
|
+
Install the necessary gems prior to using this commandline
|
177
|
+
functionality.
|
178
|
+
|
179
|
+
## sinatra interface
|
180
|
+
|
181
|
+
Since as of April 2019 there is a minimal sinatra interface to the
|
182
|
+
PdfParadise project. Consider this incomplete <b>work-in-progress</b>.
|
183
|
+
|
184
|
+
To start it, try:
|
185
|
+
|
186
|
+
pdf_paradise --sinatra
|
187
|
+
|
188
|
+
## Querying the title of a .pdf file
|
189
|
+
|
190
|
+
<b>class PdfParadise::QueryPdfTitle</b> will report the title of
|
191
|
+
any .pdf file that is passed into it, on the commandline.
|
192
|
+
|
193
|
+
This currently depends on <b>exiftool</b> but at a later time,
|
194
|
+
this may change to also allow a query via prawn or other tools.
|
195
|
+
|
196
|
+
If you need to determine whether a given .pdf file has a title
|
197
|
+
or whether it does not, you can use
|
198
|
+
<b>PdfParadise.does_this_pdf_file_have_a_title?</b>, such
|
199
|
+
as in:
|
200
|
+
|
201
|
+
PdfParadise.does_this_pdf_file_have_a_title? "foobar.pdf" # => true
|
202
|
+
|
203
|
+
This method will return **true** if the .pdf file at hand has a
|
204
|
+
title; and **false** otherwise.
|
205
|
+
|
206
|
+
## Determining how many pages a given .pdf file has
|
207
|
+
|
208
|
+
class **PdfParadise::PdfFileNTotalPages** can be used to query
|
209
|
+
how many pages a given .pdf file has.
|
210
|
+
|
211
|
+
The executable called **bin/n_pages** (thus, **n_pages**) can
|
212
|
+
be used to query this, on the commandline.
|
213
|
+
|
214
|
+
Example:
|
215
|
+
|
216
|
+
n_pages foobar.pdf
|
217
|
+
|
218
|
+
Do note that the class requires the external program
|
219
|
+
called **pdfinfo**.
|
220
|
+
|
221
|
+
It is possible to query the number of pages in a given .pdf
|
222
|
+
file without **pdfinfo**, but some .pdf files are a bit buggy,
|
223
|
+
and **pdfinfo** is simply more reliable than the regex that
|
224
|
+
was used until March 2020. So, past March 2020, the program
|
225
|
+
**pdfinfo** is now used by default. Note that pdfinfo is
|
226
|
+
part of the poppler software suite.
|
227
|
+
|
228
|
+
You can also use the following toplevel API for this:
|
229
|
+
|
230
|
+
PdfParadise.n_pages? 'THE_PATH_TO_THE_PDF_FILE_GOES_IN_HERE.pdf'
|
231
|
+
PdfParadise.n_pages? 'foobar.pdf'
|
232
|
+
|
233
|
+
## Adding page numbers to .pdf files
|
234
|
+
|
235
|
+
Via the combine_pdf gem it is now possible to add page numbers
|
236
|
+
to .pdf files. This has a few limitations for complex .pdf files,
|
237
|
+
due to combine_pdf having limitations in turn - but for simple
|
238
|
+
.pdf files this should work really well.
|
239
|
+
|
240
|
+
How to use that functionality?
|
241
|
+
|
242
|
+
Consider using the following toplevel API:
|
243
|
+
|
244
|
+
PdfParadise.number_pages('this_file.pdf')
|
245
|
+
|
246
|
+
The file called **this_file.pdf** has to exist in order for
|
247
|
+
this to work, of course.
|
248
|
+
|
249
|
+
The current default is to display the page numbers on the bottom
|
250
|
+
right side. This is hardcoded, but you could modify the code
|
251
|
+
to adapt to your needs; see also how combine_pdf does this.
|
252
|
+
(You have to pass an option-hash.)
|
253
|
+
|
254
|
+
## Various GUI component of the PdfParadise project
|
255
|
+
|
256
|
+
The **PdfParadise project** comes with some ruby-gtk3 specific
|
257
|
+
GUIs, but a few ruby-gtk2 and ruby-tk bindings may exist
|
258
|
+
as well. The **ruby-gtk3** components constitute the main GUI
|
259
|
+
elements of this project, though.
|
260
|
+
|
261
|
+
You can start, from the commandline, the gtk-wrapper
|
262
|
+
over the **split_pdf_file** functionality.
|
263
|
+
|
264
|
+
In order to do this, do either one of the following:
|
265
|
+
|
266
|
+
pdf_paradise --gui
|
267
|
+
pdf_paradise --gtk
|
268
|
+
|
269
|
+
This will require the **gtk_paradise** project and the gtk
|
270
|
+
bindings, so quite a lot. **gem install gtk3** and
|
271
|
+
**gem install gtk_paradise** should help.
|
272
|
+
|
273
|
+
The GUI for class SplitPdfFile is called **PdfParadise::Gtk::SplitPdfFile**.
|
274
|
+
The idea behind it is to allow you to determine some of the parameters
|
275
|
+
in a graphical fashion.
|
276
|
+
|
277
|
+
Since as of **September 2019**, there is also a mini-widget for quickly
|
278
|
+
removing the first page of a .pdf file. This is really minimal right
|
279
|
+
now and not very elegant; it may be improved in the future, but for
|
280
|
+
the time being it is what it is. It is more a proof-of-concept that
|
281
|
+
it can work.
|
282
|
+
|
283
|
+
You can start this via:
|
284
|
+
|
285
|
+
require 'pdf_paradise/gui/gtk2/remove_first_page_of_pdf_file.rb'
|
286
|
+
|
287
|
+
PdfParadise.start_gtk_gui_remove_first_page_of_pdf_file
|
288
|
+
|
289
|
+
Note that as of **January 2021** the gtk bindings will default to
|
290
|
+
**ruby-gtk3**. Support for ruby-gtk2 will be retained, though,
|
291
|
+
but new code may not necessarily be written for ruby-gtk2 in
|
292
|
+
mind. ruby-gtk3 is now the main GUI target for this project.
|
293
|
+
|
294
|
+
I am slowly porting the individual widgets.
|
295
|
+
|
296
|
+
The following widgets have been ported so far:
|
297
|
+
|
298
|
+
PdfParadise::GUI::Gtk::StatisticsWidget # can be found under pdf_paradise/gui/gtk3/statistics_widget/statistics_widget.rb
|
299
|
+
|
300
|
+
## Specification of the .pdf format
|
301
|
+
|
302
|
+
This subsection is a stub - I only needed it to gather information
|
303
|
+
about the .pdf specification. This is NOT complete - it only shall
|
304
|
+
contain some useful information and snippets about the .pdf
|
305
|
+
specification.
|
306
|
+
|
307
|
+
PDF stands short for **Portable Document Format**.
|
308
|
+
|
309
|
+
PDF has been standardized as **ISO-32000** in the year **2008**.
|
310
|
+
|
311
|
+
In the pdf-specification we can distinguish these entities:
|
312
|
+
|
313
|
+
Objects: these are not objects in the OOP sense, but simply the
|
314
|
+
basic data type of the PDF standard. There are 9 types of objects:
|
315
|
+
null, boolean, integer, real, name, string, array, dictionary and
|
316
|
+
stream.
|
317
|
+
|
318
|
+
Dictionary: this is a key-value pair that is unordered. They are
|
319
|
+
denoted by << and >> at the beginning and the end.
|
320
|
+
|
321
|
+
Indirect Objects: these are objects that are referred to by
|
322
|
+
reference.
|
323
|
+
|
324
|
+
Direct Objects: these are objects that appear inline and are
|
325
|
+
obtained directly.
|
326
|
+
|
327
|
+
Conforming Reader: is ann application that parses a PDF
|
328
|
+
file according to the PDF Standard.
|
329
|
+
|
330
|
+
A .pdf file is made up of a specific structure, usually a four-part
|
331
|
+
layout.
|
332
|
+
|
333
|
+
These four parts are:
|
334
|
+
|
335
|
+
Header
|
336
|
+
Body
|
337
|
+
Cross-reference table
|
338
|
+
Trailer
|
339
|
+
|
340
|
+
### The .pdf Header tag
|
341
|
+
|
342
|
+
The header may begin with an entry such as **%PDF-1.7**.
|
343
|
+
|
344
|
+
The general format for the header is:
|
345
|
+
|
346
|
+
%PDF- followed by the version number in the form of 1.N.
|
347
|
+
|
348
|
+
This is not valid for all .pdf files, though. Past PDF Version
|
349
|
+
1.4, the **Version** entry in the document's catalog dictionary,
|
350
|
+
which is within the **Root** entry of the **Trailer**, may be
|
351
|
+
used instead of the Header - **if present**.
|
352
|
+
|
353
|
+
If a .pdf file contains binary data - which most PDF files
|
354
|
+
will do nowadays, such as **stream objects** - then the
|
355
|
+
**Header** line shall be immediately followed by a line
|
356
|
+
containing at the least **four binary characters**. These
|
357
|
+
are character codes of 128 or greater.
|
358
|
+
|
359
|
+
### The .pdf Body tag
|
360
|
+
|
361
|
+
The body of a PDF File consist of these aforementioned **Indirect
|
362
|
+
Objects** representing the contents of a document.
|
363
|
+
|
364
|
+
**Indirect Objects** begin with a **unique object identifier**
|
365
|
+
that allows other objects to refer to them.
|
366
|
+
|
367
|
+
That identifier is made up of the following two components:
|
368
|
+
|
369
|
+
(1) Object Number: a positive Integer, can be in any arbitrary order
|
370
|
+
(2) Generation Number: a non-negative Integer)
|
371
|
+
|
372
|
+
The **Indirect Objects** can be referred to from elsewhere by an
|
373
|
+
Indirect Reference. This must consist of:
|
374
|
+
|
375
|
+
Object Number
|
376
|
+
Generation Number, and
|
377
|
+
keyword R # for instance: 4 0 R
|
378
|
+
|
379
|
+
After the identifier is the keyword **obj** (start of the object)
|
380
|
+
and **endobj** (end of the object). Anything in between that is
|
381
|
+
is a key-value pair that describes the object.
|
382
|
+
|
383
|
+
A a simple example showing the use of **Indirect Objects** will be
|
384
|
+
shown next:
|
385
|
+
|
386
|
+
1 0 obj % Object Number 1, Generation Number 0
|
387
|
+
<<
|
388
|
+
/Type /Pages % Describe type of object
|
389
|
+
/Kids [ 4 0 R ] $ Kids Entry referring to an indirect reference (Object number 4, Generation number 0)
|
390
|
+
/Count 1
|
391
|
+
>>
|
392
|
+
endobj
|
393
|
+
|
394
|
+
2 0 obj % Object Number 2, Generation Number 0
|
395
|
+
<<
|
396
|
+
/Type /Catalog % Describe type of object
|
397
|
+
/Pages 1 0 R % Referring another object via unique object identifier
|
398
|
+
>>
|
399
|
+
endobj
|
400
|
+
|
401
|
+
The **Body** section of a .pdf file is thus a tree of objects that
|
402
|
+
are linked together, ultimately coming down to the Root Object
|
403
|
+
(Defined by the **Root** entry in the **Trailer** section, as a
|
404
|
+
catalog dictionary).
|
405
|
+
|
406
|
+
The **Cross-Reference Table** is a table that contains a list of byte
|
407
|
+
offset pointing to the indirect objects.
|
408
|
+
|
409
|
+
A pdf-conforming reader uses the Cross-Reference Table as a lookup
|
410
|
+
table to access certain objects quickly when needed.
|
411
|
+
|
412
|
+
The format for entries in Cross-Reference Table can be summarized ass
|
413
|
+
follows:
|
414
|
+
|
415
|
+
- In the following format nnnnnnnnnn ggggg n eol, a total of 20 bytes
|
416
|
+
- nnnnnnnnnn is a 10-digit byte offset in the decoded stream
|
417
|
+
- ggggg 5-digit generation number
|
418
|
+
- n keyword for in-use entry or f keyword for free entry
|
419
|
+
- eol 2 character end-of-line sequence (Like CR LF)
|
420
|
+
|
421
|
+
The **Cross-Reference Table** always begins with the special entry
|
422
|
+
**0000000000 65535** - see the following example:
|
423
|
+
|
424
|
+
0000000000 65535 f % special entry, f denoting it is a free entry
|
425
|
+
|
426
|
+
## Graphical User Interfaces (GUIs)
|
427
|
+
|
428
|
+
The pdf_paradise gem comes with a few, small-ish widgets, primarily
|
429
|
+
written in ruby-gtk. Since as of August 2021 I am also experimenting
|
430
|
+
with libui but this is a slow process - stay tuned for more updates
|
431
|
+
in the coming months in this regard.
|
432
|
+
|
433
|
+
One big advantage of libui is that it works on windows out-of-the-box,
|
434
|
+
so we can use GUIs on windows as well. \o/
|
435
|
+
|
436
|
+
## Compressing a .pdf file (optizime the size of a .pdf file)
|
437
|
+
|
438
|
+
Sometimes you may have to reduce the filesize of a given .pdf
|
439
|
+
file, such as when you need to upload a .pdf file, and there
|
440
|
+
is some file size limit otherwise. This happened to me a few
|
441
|
+
times when using webmail-based email services, where an
|
442
|
+
automatic notice was generated when the .pdf file was too
|
443
|
+
large, e. g. above 25MB in size or something similar.
|
444
|
+
|
445
|
+
So, let us now assume that you **do** have a use case such
|
446
|
+
as described above, or any other use case - you want to
|
447
|
+
reduce the file size of a given .pdf file at hand.
|
448
|
+
|
449
|
+
How can this be done?
|
450
|
+
|
451
|
+
Well, there are several ways. One is to use online-based
|
452
|
+
tools, which tend to work surprisingly well; I verified
|
453
|
+
this in February 2022. But, as far as the gem here is
|
454
|
+
concerned, we will focus primarily on means that can be
|
455
|
+
used by you on your own, without having to depend on
|
456
|
+
external websites. Two methods will be described here -
|
457
|
+
the first one requiring **ghostscript**, the second
|
458
|
+
one requiring **hexapdf**.
|
459
|
+
|
460
|
+
The important parameter in regards for **ghostscript** is
|
461
|
+
the **dPDFSETTINGS** parameter. This one will determine
|
462
|
+
the compression level, which ultimately will affect
|
463
|
+
the quality of the compressed .pdf file.
|
464
|
+
|
465
|
+
Available parameters to **dPDFSETTINGS** include
|
466
|
+
**/screen**, **/ebook**, **/printer**, **/prepress**
|
467
|
+
and **/default**.
|
468
|
+
|
469
|
+
class **PdfParadise::CompressThisPdfFile** can be of
|
470
|
+
help here. Simply pass, as argument to .new(), the path
|
471
|
+
of the local .pdf to that class.
|
472
|
+
|
473
|
+
You can also use a toplevel method if you'd like to:
|
474
|
+
|
475
|
+
require 'pdf_paradise'
|
476
|
+
PdfParadise.compress_this_pdf_file
|
477
|
+
PdfParadise.compress_this_pdf_file('/foobar.pdf')
|
478
|
+
|
479
|
+
The variant using hexapdf is called:
|
480
|
+
|
481
|
+
PdfParadise.compress_via_pdf
|
482
|
+
PdfParadise.compress_via_pdf('foobar.pdf')
|
483
|
+
|
484
|
+
The API name may change at a later point in time; perhaps
|
485
|
+
we will just add a toplevel API called **PdfParadise.compress()**,
|
486
|
+
but for the time being the above APIs will be retained as they
|
487
|
+
are.
|
488
|
+
|
489
|
+
## Storing all open .pdf files in a yaml file
|
490
|
+
|
491
|
+
In **February 2022* the yaml file working_on_these_pdf_files.yml
|
492
|
+
was added at:
|
493
|
+
|
494
|
+
pdf_paradise/yaml/working_on_these_pdf_files.yml
|
495
|
+
|
496
|
+
The idea here is that this yaml-file retains the local path
|
497
|
+
to any .pdf file that the user (in this case me) is working
|
498
|
+
on, aka reading right now.
|
499
|
+
|
500
|
+
I needed this because I tend to work through .pdf files and
|
501
|
+
remove page after page when I read it. The idea is that
|
502
|
+
I do not lose that information when I reboot my computer
|
503
|
+
or when said computer crashes; I needed to make this
|
504
|
+
persistent information.
|
505
|
+
|
506
|
+
Why is this yaml file part of the pdf_paradise gem, though?
|
507
|
+
This is mostly due to convenience. I wanted to have this
|
508
|
+
available in one of my ruby gems by default. In the long
|
509
|
+
run I will add code that allows other users to adjust
|
510
|
+
this to their own use case (and perhaps in their home
|
511
|
+
directory rather than store this in the gem itself). As
|
512
|
+
of February 2022 code for the latter is currently not
|
513
|
+
part of the gem, but I may add code for this - either
|
514
|
+
in the **pdf_paradise** gem or the **roebe** gem.
|
515
|
+
|
516
|
+
## Converting .jpg files to .pdf files
|
517
|
+
|
518
|
+
If you have a use case to convert several .jpg files into .pdf
|
519
|
+
files then the following commandline example should be
|
520
|
+
helpful:
|
521
|
+
|
522
|
+
convert /path/to/image foobar.pdf
|
523
|
+
convert *.jpg foobar.pdf
|
524
|
+
|
525
|
+
Note that this requires **ImageMagick**. **ImageMagick** is
|
526
|
+
not always perfect; it has a few problems, unfortunately.
|
527
|
+
|
528
|
+
For instance, in <b>April 2022</b> when I tried the above,
|
529
|
+
the image was repeated three times on the x-axis. I do not
|
530
|
+
know why, but that makes **absolutely no sense**. It is just
|
531
|
+
a single image, so why is the resulting .pdf file repeated
|
532
|
+
three times? Perhaps imagemagick's **convert** tool does
|
533
|
+
this automatically, but then I question the default behaviour -
|
534
|
+
**it makes no sense** for the use case I have. One image
|
535
|
+
should be one image, not three images or fifty images.
|
536
|
+
|
537
|
+
In the event that **ImageMagick** does not work very well
|
538
|
+
for your use case, consider using another software suite,
|
539
|
+
such as **img2pdf**.
|
540
|
+
|
541
|
+
The syntax for **img2pdf** goes something like this:
|
542
|
+
|
543
|
+
img2pdf -o document.pdf *jpg
|
544
|
+
|
545
|
+
I liked this, so in **April 2022** this was added to
|
546
|
+
**ImageParadise**. The API for this is as follows:
|
547
|
+
|
548
|
+
ImageParadise.img2pdf('*.jpg') # If a '*' is part of the input Dir[] will be used.
|
549
|
+
|
550
|
+
As that functionality may be useful on the commandline
|
551
|
+
as well, an executable has been added at
|
552
|
+
**bin/imageparadise_img2pdf**. Simply pass the image
|
553
|
+
files that you want to convert.
|
554
|
+
|
555
|
+
Usage example:
|
556
|
+
|
557
|
+
imageparadise_img2pdf *jpg
|
558
|
+
|
559
|
+
If you need the images to be ordered or sorted then you
|
560
|
+
may have to do so when specifying the image file at hand
|
561
|
+
specifically, e. g. the path to it.
|
562
|
+
|
563
|
+
So for instance:
|
564
|
+
|
565
|
+
imageparadise_img2pdf image3.jpg image1.jpg image2.png
|
566
|
+
|
567
|
+
The only drawback I have found with <b>img2pdf</b> so far
|
568
|
+
is that you can not easily add text to an image. This makes
|
569
|
+
it hard to identify which image is named how. A work around
|
570
|
+
for this is to embed the filename into the image itself,
|
571
|
+
e. g. create temporary images, and then pack them together
|
572
|
+
via <b>img2pdf</b>.
|
573
|
+
|
574
|
+
## Splitting a single pdf file into individual several .pdf files
|
575
|
+
|
576
|
+
You can use the following toplevel API to split up a single
|
577
|
+
.pdf file into several .pdf files:
|
578
|
+
|
579
|
+
PdfParadise.burst(ARGV)
|
580
|
+
PdfParadise.burst('foobar.pdf')
|
581
|
+
|
582
|
+
## Merging pdf files
|
583
|
+
|
584
|
+
<b>class PdfParadise::MergePdf.new(ARGV)</b> can be used for
|
585
|
+
<b>merging .pdf files</b>. This functionality depends on
|
586
|
+
external software, so you have to install this first.
|
587
|
+
|
588
|
+
Currently <b>ghostscript</b> and <b>hexapdf</b> can be used for
|
589
|
+
the <b>merging</b> step.
|
590
|
+
|
591
|
+
Examples for how to use either of these two variants, as
|
592
|
+
far as <b>class PdfParadise::MergePdf</b> is concerned,
|
593
|
+
follows next:
|
594
|
+
|
595
|
+
mergepdf one.pdf two.pdf --use-ghostscript
|
596
|
+
mergepdf one.pdf two.pdf --use-hexapdf
|
597
|
+
|
598
|
+
(The two -- hyphen are mandatory for commandline arguments
|
599
|
+
right now; otherwise it is assumed to be a locally existing
|
600
|
+
.pdf file.)
|
601
|
+
|
602
|
+
If you need to do this from within ruby code, consider
|
603
|
+
using the following code:
|
604
|
+
|
605
|
+
require 'pdf_paradise'
|
606
|
+
merge_pdf = PdfParadise::MergePdf.new('one.pdf two.pdf')
|
607
|
+
merge_pdf.feedback_where_it_is_stored # Call it manually.
|
608
|
+
|
609
|
+
## Combining individual pages from .pdf files into a new .pdf file via class PdfParadise::CombineThesePdfPages
|
610
|
+
|
611
|
+
class **PdfParadise::CombineThesePdfPages** can be used to
|
612
|
+
extract individual pdf pages from a given .pdf file and
|
613
|
+
combine these into a new .pdf file.
|
614
|
+
|
615
|
+
There is also an executable at **bin/combine_these_pdf_pages**
|
616
|
+
which can be used on the commandline.
|
617
|
+
|
618
|
+
This functionality depends on the **hexapdf** gem.
|
619
|
+
|
620
|
+
Usage example:
|
621
|
+
|
622
|
+
combine_these_pdf_pages foobar.pdf 3,4,5
|
623
|
+
|
624
|
+
This would retain the pages at 3, 4 and 5 and create a new
|
625
|
+
.pdf file.
|
626
|
+
|
627
|
+
|
628
|
+
## Contact information
|
629
|
+
|
630
|
+
If your creative mind has ideas and specific suggestions to make this
|
631
|
+
gem more useful in general, feel free to drop me an email at any
|
632
|
+
time, via:
|
633
|
+
|
634
|
+
shevy@inbox.lt
|
635
|
+
|
636
|
+
Before that email I used an email account at Google gmail, but in **2021** I
|
637
|
+
decided to slowly abandon gmail for various reasons. In part this is because
|
638
|
+
the UI annoys me (on non-chrome browser loading takes too long), but also
|
639
|
+
because of Google's attempt to establish mass surveillance via its
|
640
|
+
federated cohorts sniffing (FLoC). I do not know what happened at Google,
|
641
|
+
but enough is enough - there is only so much you can take while supporting
|
642
|
+
greed. When it comes to data mining done by private groups, ultimately
|
643
|
+
the user became the product.
|
644
|
+
|
645
|
+
Do keep in mind that responding to emails may take some time,
|
646
|
+
depending on the amount of work I may have at that moment, due
|
647
|
+
to reallife time constraints. I will, however had, read feedback
|
648
|
+
eventually. Patches and code changes are welcome too, of course,
|
649
|
+
as long as they are in the spirit of the project at hand, e. g.
|
650
|
+
fitting to the general theme. For this I may make use of github
|
651
|
+
as a discussion site, but this has a low priority right now.
|
652
|
+
|
@@ -0,0 +1,7 @@
|
|
1
|
+
#!/usr/bin/ruby -w
|
2
|
+
# Encoding: UTF-8
|
3
|
+
# frozen_string_literal: true
|
4
|
+
# =========================================================================== #
|
5
|
+
require 'pdf_paradise/utility_scripts/combine_these_pdf_pages.rb'
|
6
|
+
|
7
|
+
PdfParadise::CombineThesePdfPages.new(ARGV)
|