pdf_paradise 0.1.58

Sign up to get free protection for your applications and to get access to all the features.

Potentially problematic release.


This version of pdf_paradise might be problematic. Click here for more details.

Files changed (107) hide show
  1. checksums.yaml +7 -0
  2. data/README.md +652 -0
  3. data/bin/automatic_pdf_title +7 -0
  4. data/bin/combine_these_pdf_pages +7 -0
  5. data/bin/compress_via_hexapdf +7 -0
  6. data/bin/convert_markdown_to_pdf +7 -0
  7. data/bin/convert_pdf_to_text +7 -0
  8. data/bin/delete_first_page_of_this_pdf_file +7 -0
  9. data/bin/merge_then_open +7 -0
  10. data/bin/n_pages +10 -0
  11. data/bin/open_main_pdf +7 -0
  12. data/bin/pdf_paradise +7 -0
  13. data/bin/set_main_book +7 -0
  14. data/bin/set_title_of_this_pdf_file +15 -0
  15. data/doc/README.gen +624 -0
  16. data/doc/todo/todo.md +7 -0
  17. data/lib/pdf_paradise/base/base.rb +218 -0
  18. data/lib/pdf_paradise/base/colours.rb +29 -0
  19. data/lib/pdf_paradise/commandline/help.rb +69 -0
  20. data/lib/pdf_paradise/commandline/menu.rb +184 -0
  21. data/lib/pdf_paradise/compress/compress_via_hexapdf.rb +27 -0
  22. data/lib/pdf_paradise/compress_this_pdf_file.rb +88 -0
  23. data/lib/pdf_paradise/constants/constants.rb +76 -0
  24. data/lib/pdf_paradise/convert_text_to_pdf.rb +94 -0
  25. data/lib/pdf_paradise/css/project.css +17 -0
  26. data/lib/pdf_paradise/djvu_to_pdf.rb +85 -0
  27. data/lib/pdf_paradise/gui/README.md +6 -0
  28. data/lib/pdf_paradise/gui/fox/split_pdf_file.rb +77 -0
  29. data/lib/pdf_paradise/gui/gtk2/delete_the_first_or_the_last_page_of_this_pdf_file/delete_the_first_or_the_last_page_of_this_pdf_file.rb +39 -0
  30. data/lib/pdf_paradise/gui/gtk2/pdf_viewer/pdf_viewer.rb +34 -0
  31. data/lib/pdf_paradise/gui/gtk2/split_pdf_file/split_pdf_file.rb +34 -0
  32. data/lib/pdf_paradise/gui/gtk2/statistics_widget/statistics_widget.rb +34 -0
  33. data/lib/pdf_paradise/gui/gtk2/to_pdf/to_pdf.rb +32 -0
  34. data/lib/pdf_paradise/gui/gtk3/controller/controller.rb +212 -0
  35. data/lib/pdf_paradise/gui/gtk3/convert_pdf_to_text/convert_pdf_to_text.rb +34 -0
  36. data/lib/pdf_paradise/gui/gtk3/delete_the_first_or_the_last_page_of_this_pdf_file/delete_the_first_or_the_last_page_of_this_pdf_file.rb +39 -0
  37. data/lib/pdf_paradise/gui/gtk3/pdf_viewer/pdf_viewer.rb +34 -0
  38. data/lib/pdf_paradise/gui/gtk3/split_pdf_file/split_pdf_file.rb +34 -0
  39. data/lib/pdf_paradise/gui/gtk3/statistics_widget/statistics_widget.rb +34 -0
  40. data/lib/pdf_paradise/gui/gtk3/to_pdf/to_pdf.rb +32 -0
  41. data/lib/pdf_paradise/gui/libui/statistics_widget/statistics_widget.rb +217 -0
  42. data/lib/pdf_paradise/gui/shared_code/convert_pdf_to_text/convert_pdf_to_text_module.rb +277 -0
  43. data/lib/pdf_paradise/gui/shared_code/delete_the_first_or_the_last_page_of_this_pdf_file/delete_the_first_or_the_last_page_of_this_pdf_file_module.rb +443 -0
  44. data/lib/pdf_paradise/gui/shared_code/pdf_viewer/pdf_viewer.css +5 -0
  45. data/lib/pdf_paradise/gui/shared_code/pdf_viewer/pdf_viewer_module.rb +284 -0
  46. data/lib/pdf_paradise/gui/shared_code/split_pdf_file/split_pdf_file.css +0 -0
  47. data/lib/pdf_paradise/gui/shared_code/split_pdf_file/split_pdf_file_module.rb +294 -0
  48. data/lib/pdf_paradise/gui/shared_code/statistics_widget/statistics_widget_module.rb +349 -0
  49. data/lib/pdf_paradise/gui/shared_code/to_pdf/to_pdf_module.rb +281 -0
  50. data/lib/pdf_paradise/hexapdf/001_rainbow_pattern_example.rb +0 -0
  51. data/lib/pdf_paradise/hexapdf/hexapdf.rb +123 -0
  52. data/lib/pdf_paradise/images/PDF_PARADISE_LOGO.png +0 -0
  53. data/lib/pdf_paradise/main_pdf/main_pdf.rb +445 -0
  54. data/lib/pdf_paradise/merge_pdf/menu.rb +63 -0
  55. data/lib/pdf_paradise/merge_pdf/merge_pdf.rb +298 -0
  56. data/lib/pdf_paradise/merge_pdf_namespace.rb +9 -0
  57. data/lib/pdf_paradise/merge_then_open/merge_then_open.rb +105 -0
  58. data/lib/pdf_paradise/pdf_file_n_total_pages.rb +249 -0
  59. data/lib/pdf_paradise/prawn_addons/README.md +2 -0
  60. data/lib/pdf_paradise/prawn_addons/prawn_addons.rb +17 -0
  61. data/lib/pdf_paradise/project/project.rb +22 -0
  62. data/lib/pdf_paradise/remove_pdf_password.rb +392 -0
  63. data/lib/pdf_paradise/requires/batch_require_toplevel_files.rb +22 -0
  64. data/lib/pdf_paradise/requires/colours.rb +7 -0
  65. data/lib/pdf_paradise/requires/colours_and_esystem_and_save_file_and_fileutils_and_opn.rb +11 -0
  66. data/lib/pdf_paradise/requires/esystem_and_colours.rb +10 -0
  67. data/lib/pdf_paradise/requires/esystem_and_opn_and_colours.rb +8 -0
  68. data/lib/pdf_paradise/requires/require_the_whole_project.rb +28 -0
  69. data/lib/pdf_paradise/requires/require_utility_scripts.rb +9 -0
  70. data/lib/pdf_paradise/set_main_book.rb +156 -0
  71. data/lib/pdf_paradise/set_pdf_title.rb +218 -0
  72. data/lib/pdf_paradise/sinatra/embeddable_interface.rb +318 -0
  73. data/lib/pdf_paradise/toplevel_methods/automatic_pdf_title.rb +55 -0
  74. data/lib/pdf_paradise/toplevel_methods/convert_epub_to_pdf.rb +27 -0
  75. data/lib/pdf_paradise/toplevel_methods/convert_markdown_to_pdf.rb +43 -0
  76. data/lib/pdf_paradise/toplevel_methods/convert_ppt_to_pdf.rb +35 -0
  77. data/lib/pdf_paradise/toplevel_methods/e.rb +16 -0
  78. data/lib/pdf_paradise/toplevel_methods/esystem.rb +19 -0
  79. data/lib/pdf_paradise/toplevel_methods/misc.rb +62 -0
  80. data/lib/pdf_paradise/toplevel_methods/number_pages.rb +38 -0
  81. data/lib/pdf_paradise/toplevel_methods/opened_pdf_files.rb +221 -0
  82. data/lib/pdf_paradise/toplevel_methods/query_pdf_title.rb +191 -0
  83. data/lib/pdf_paradise/toplevel_methods/reduce_size_of_this_pdf_file.rb +43 -0
  84. data/lib/pdf_paradise/toplevel_methods/roebe.rb +17 -0
  85. data/lib/pdf_paradise/toplevel_methods/rotate_pdf_file.rb +146 -0
  86. data/lib/pdf_paradise/toplevel_methods/to_pdf.rb +38 -0
  87. data/lib/pdf_paradise/utility_scripts/README.md +3 -0
  88. data/lib/pdf_paradise/utility_scripts/combine_these_pdf_pages.rb +118 -0
  89. data/lib/pdf_paradise/utility_scripts/convert_pdf_to_text.rb +175 -0
  90. data/lib/pdf_paradise/utility_scripts/delete_first_page_of_this_pdf_file.rb +215 -0
  91. data/lib/pdf_paradise/utility_scripts/delete_last_page_of_this_pdf_file.rb +180 -0
  92. data/lib/pdf_paradise/utility_scripts/delete_this_page_of_this_pdf_file.rb +310 -0
  93. data/lib/pdf_paradise/utility_scripts/extract_pdf_page.rb +280 -0
  94. data/lib/pdf_paradise/utility_scripts/pdf_optimizer.rb +113 -0
  95. data/lib/pdf_paradise/utility_scripts/pdf_statistics.rb +150 -0
  96. data/lib/pdf_paradise/utility_scripts/pdf_to_html.rb +75 -0
  97. data/lib/pdf_paradise/utility_scripts/remove_images.rb +110 -0
  98. data/lib/pdf_paradise/utility_scripts/split_pdf.rb +340 -0
  99. data/lib/pdf_paradise/utility_scripts/to_qdf.rb +82 -0
  100. data/lib/pdf_paradise/version/version.rb +19 -0
  101. data/lib/pdf_paradise/www/README.md +2 -0
  102. data/lib/pdf_paradise/www/sinatra/app.rb +276 -0
  103. data/lib/pdf_paradise/yaml/working_on_these_pdf_files.yml +4 -0
  104. data/lib/pdf_paradise.rb +5 -0
  105. data/pdf_paradise.gemspec +62 -0
  106. data/test/testing_pdf_paradise.rb +9 -0
  107. metadata +230 -0
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: 0657616fc38dfa2075ff3360f06cde561b9a202ec338a16c801cdc3682be9317
4
+ data.tar.gz: 579ce2ef9367f0415ae11785fbd69a0d1dc740c85a4d5a73d6a198eeb0a0e585
5
+ SHA512:
6
+ metadata.gz: e813b3337e4d0467da83209419d99e71686393883d211de54adc82da66d08f7d2d8a83694d42853761a0df3017914026c3a3bc4e21fc4ee5fb9b4d880d9ba626
7
+ data.tar.gz: 22f3af097b63260582116df4e99e8d5968d7c241c581a13b4d7867ff445ebb1438b883091182c1ee38c2f9023c2905704c632c2b684d45bae2fb65a266ca35eb
data/README.md ADDED
@@ -0,0 +1,652 @@
1
+ [![forthebadge](http://forthebadge.com/images/badges/built-with-love.svg)](https://www.gobolinux.org/)
2
+ [![forthebadge](http://forthebadge.com/images/badges/made-with-ruby.svg)](https://www.ruby-lang.org/en/)
3
+ [![Gem Version](https://badge.fury.io/rb/pdf_paradise.svg)](https://badge.fury.io/rb/pdf_paradise)
4
+
5
+ This gem was <b>last updated</b> on the <span style="color: darkblue; font-weight: bold">20.05.2022</span> (dd.mm.yyyy notation), at <span style="color: steelblue; font-weight: bold">06:33:49</span> o'clock.
6
+
7
+ <img src="https://i.imgur.com/unhKNEw.png" style="margin-left: 2em">
8
+
9
+ This project can help with pdf-related activities, such as extracting
10
+ a .pdf page, converting .pdf page, merging .pdf files, splitting
11
+ .pdf files, setting the title of a .pdf page and similar actions.
12
+
13
+ The project has to remain quite flexible. We may use external
14
+ programs such as **ghoscript** or **qpdf**, or we may use pure
15
+ ruby solutions, such as via the gem called **combine_pdf**,
16
+ **prawn** or **hexapdf**.
17
+
18
+ The file here (README.gen, respectively the generated file called
19
+ **README.md**), will describe some of the components that make
20
+ up this gem.
21
+
22
+ ## Rationale for making use of separate pdf-related projects
23
+
24
+ There are many pdf-related tools if you look on the www. For
25
+ example, we have prawn, we have qpdf, we have calibre, we
26
+ have hexapdf, we have ghostscript, and many more applications.
27
+
28
+ Some of these have unique features; and some of them have overlapping
29
+ functionality, such as reading the content of .pdf files in a
30
+ simplified manner (number of pages, title, author and so forth).
31
+
32
+ The PdfParadise project attempts to support as many different
33
+ (open-source) projects as possible. It is also permissive to
34
+ support closed source projects, provided that **the code remains
35
+ simple** (and simple to change). The primary focus is on
36
+ open-source projects, though.
37
+
38
+ Why does the PdfParadise project attempt to support many different
39
+ pdf-related projects?
40
+
41
+ The answer to this question is rather simple: on Linux I have a lot of
42
+ flexibility and can use literally any pdf-related project just fine. On
43
+ Windows, however had, I am more restricted in what I can use. Not all
44
+ programs are available on windows or can be easily compiled there. Thus,
45
+ in order to allow the pdf_paradise .gem to work on windows, we need
46
+ this flexibility.
47
+
48
+ The reason why I added this subsection here in June 2021 was that
49
+ I am slowly changing the sinatra-related part of the PdfParadise
50
+ project, in order to embed the functionality into my main controller
51
+ which is handled by the **Roebe** namespace. In that controller
52
+ I wanted to easily offer pdf-related functionality "out of the
53
+ box" when I start the sinatra-application on windows. Because I
54
+ want to be able to offer pdf-related modifications on windows
55
+ as well, the PdfParadise project had to become more flexible,
56
+ so that a simple toplevel route, such as **/pdf**, will work
57
+ properly, and lead to entry points (subroutes) that allow
58
+ us to tap into the features offered by the PdfParadise project.
59
+
60
+ So, the **summary** is: the PdfParadise project must remain
61
+ flexible in order to support a proper workflow on windows
62
+ systems as well. (We could use WSL on windows, but not every
63
+ computer has this available, so I am targeting "vanilla"
64
+ windows really.)
65
+
66
+ Note that one slight drawback is that the sinatra part of
67
+ the PdfParadise project now has a dependency on the
68
+ **cyberweb** project, so if you want to use that, you also
69
+ have to install the cyberweb gem. This is a trade-off - for me
70
+ the more important part is long-term maintainability of
71
+ the pdf_paradise project in the long run, so a unified
72
+ code base had to be used in this regard.
73
+
74
+ ## Converting a .pdf file to text
75
+
76
+ Sometimes you may wish to have a text-file describing the content
77
+ of a .pdf file, rather than the .pdf file itself.
78
+
79
+ Via class **PdfParadise::ConvertPdfToText**, residing in the file
80
+ at **pdf_paradise/convert_pdf_to_text.rb**, you can convert a
81
+ .pdf file to a text file.
82
+
83
+ Usage example from ruby, for the file called **foobar.pdf**:
84
+
85
+ PdfParadise::ConvertPdfToText.new(ARGV)
86
+ PdfParadise::ConvertPdfToText.new('foobar.pdf')
87
+
88
+ You can also use the bin/ file from the commandline:
89
+
90
+ convert_pdf_to_text
91
+ convert_pdf_to_text foobar.pdf
92
+
93
+ There is also a ruby-gtk3 widget that offers the functionality
94
+ from class **PdfParadise::ConvertPdfToText**, if the user
95
+ has gtk3 installed and the ruby-bindings to it as well.
96
+
97
+ You can start that ruby-gtk3 widget via:
98
+
99
+ convert_pdf_to_text --gui
100
+
101
+ ## Commandline usage
102
+
103
+ You can use the **pdf_paradise** gem from the commandline, as
104
+ the example above shows.
105
+
106
+ For instance, say that you wish to modify **the title of a .pdf
107
+ file**, you can use a commandline invocation such as via
108
+ this way:
109
+
110
+ pdf_paradise --use-this-pdf-file=location_to_your_pdf_file.pdf --set_title="The title you want to use goes in here."
111
+
112
+ You can also **shrink** a .pdf file, by using the commandline
113
+ switch <b>--shrink-pdf-size-of=foobar.pdf</b> or just
114
+ <b>--shrink</b>, such as:
115
+
116
+ pdf_paradise --shrink-pdf-size-of=foobar.pdf
117
+ pdf_paradise --shrink=foobar.pdf
118
+
119
+ The <b>shrink</b> functionality is contained in the module-method
120
+ <b>PdfParadise.reduce_size_of_this_pdf_file()</b>.
121
+
122
+ ## Storing the .pdf pages that are currently open
123
+
124
+ If you need to store the .pdf files that are currently open,
125
+ you can use the following commandline to do so:
126
+
127
+ pdfparadise --store-open-pdf-files
128
+
129
+ This will attempt to store the full path to the .pdf files
130
+ into a local file. That way you may also be able to batch-open
131
+ these .pdf files at a later time, e. g. when you switch your
132
+ window manager or after a reboot.
133
+
134
+ ## Deleting the last or the first page of a .pdf file
135
+
136
+ You can use **class DeleteLastPageOfThisPdfFile**, more
137
+ accurately called **class PdfParadise::DeleteLastPageOfThisPdfFile**,
138
+ to ***delete the last page in a .pdf file***.
139
+
140
+ In ruby code, you can invoke this like so:
141
+
142
+ require 'pdf_paradise'
143
+
144
+ PdfParadise::DeleteLastPageOfThisPdfFile.new('path_to_the_pdf_file/goes_in_here.pdf')
145
+
146
+ or shorter:
147
+
148
+ require 'pdf_paradise'
149
+
150
+ PdfParadise.delete_last_page_of_this_pdf_file('foobar.pdf')
151
+
152
+ A very similar API exists for deleting the first page of a given .pdf
153
+ file, too.
154
+
155
+ See:
156
+
157
+ In ruby code, you can invoke this like so:
158
+
159
+ require 'pdf_paradise'
160
+
161
+ PdfParadise::DeleteFirstPageOfThisPdfFile.new('path_to_the_pdf_file/goes_in_here.pdf')
162
+
163
+ or shorter:
164
+
165
+ require 'pdf_paradise'
166
+
167
+ PdfParadise.delete_first_page_of_this_pdf_file('foobar.pdf')
168
+
169
+ ## Converting markdown .md files to .pdf files
170
+
171
+ If you use kramdown, prawn and kramdown-pdf-converter, then you
172
+ can convert .md files on the commandline, via:
173
+
174
+ convert_markdown_to_pdf path_to_pdf_file_goes_here.pdf
175
+
176
+ Install the necessary gems prior to using this commandline
177
+ functionality.
178
+
179
+ ## sinatra interface
180
+
181
+ Since as of April 2019 there is a minimal sinatra interface to the
182
+ PdfParadise project. Consider this incomplete <b>work-in-progress</b>.
183
+
184
+ To start it, try:
185
+
186
+ pdf_paradise --sinatra
187
+
188
+ ## Querying the title of a .pdf file
189
+
190
+ <b>class PdfParadise::QueryPdfTitle</b> will report the title of
191
+ any .pdf file that is passed into it, on the commandline.
192
+
193
+ This currently depends on <b>exiftool</b> but at a later time,
194
+ this may change to also allow a query via prawn or other tools.
195
+
196
+ If you need to determine whether a given .pdf file has a title
197
+ or whether it does not, you can use
198
+ <b>PdfParadise.does_this_pdf_file_have_a_title?</b>, such
199
+ as in:
200
+
201
+ PdfParadise.does_this_pdf_file_have_a_title? "foobar.pdf" # => true
202
+
203
+ This method will return **true** if the .pdf file at hand has a
204
+ title; and **false** otherwise.
205
+
206
+ ## Determining how many pages a given .pdf file has
207
+
208
+ class **PdfParadise::PdfFileNTotalPages** can be used to query
209
+ how many pages a given .pdf file has.
210
+
211
+ The executable called **bin/n_pages** (thus, **n_pages**) can
212
+ be used to query this, on the commandline.
213
+
214
+ Example:
215
+
216
+ n_pages foobar.pdf
217
+
218
+ Do note that the class requires the external program
219
+ called **pdfinfo**.
220
+
221
+ It is possible to query the number of pages in a given .pdf
222
+ file without **pdfinfo**, but some .pdf files are a bit buggy,
223
+ and **pdfinfo** is simply more reliable than the regex that
224
+ was used until March 2020. So, past March 2020, the program
225
+ **pdfinfo** is now used by default. Note that pdfinfo is
226
+ part of the poppler software suite.
227
+
228
+ You can also use the following toplevel API for this:
229
+
230
+ PdfParadise.n_pages? 'THE_PATH_TO_THE_PDF_FILE_GOES_IN_HERE.pdf'
231
+ PdfParadise.n_pages? 'foobar.pdf'
232
+
233
+ ## Adding page numbers to .pdf files
234
+
235
+ Via the combine_pdf gem it is now possible to add page numbers
236
+ to .pdf files. This has a few limitations for complex .pdf files,
237
+ due to combine_pdf having limitations in turn - but for simple
238
+ .pdf files this should work really well.
239
+
240
+ How to use that functionality?
241
+
242
+ Consider using the following toplevel API:
243
+
244
+ PdfParadise.number_pages('this_file.pdf')
245
+
246
+ The file called **this_file.pdf** has to exist in order for
247
+ this to work, of course.
248
+
249
+ The current default is to display the page numbers on the bottom
250
+ right side. This is hardcoded, but you could modify the code
251
+ to adapt to your needs; see also how combine_pdf does this.
252
+ (You have to pass an option-hash.)
253
+
254
+ ## Various GUI component of the PdfParadise project
255
+
256
+ The **PdfParadise project** comes with some ruby-gtk3 specific
257
+ GUIs, but a few ruby-gtk2 and ruby-tk bindings may exist
258
+ as well. The **ruby-gtk3** components constitute the main GUI
259
+ elements of this project, though.
260
+
261
+ You can start, from the commandline, the gtk-wrapper
262
+ over the **split_pdf_file** functionality.
263
+
264
+ In order to do this, do either one of the following:
265
+
266
+ pdf_paradise --gui
267
+ pdf_paradise --gtk
268
+
269
+ This will require the **gtk_paradise** project and the gtk
270
+ bindings, so quite a lot. **gem install gtk3** and
271
+ **gem install gtk_paradise** should help.
272
+
273
+ The GUI for class SplitPdfFile is called **PdfParadise::Gtk::SplitPdfFile**.
274
+ The idea behind it is to allow you to determine some of the parameters
275
+ in a graphical fashion.
276
+
277
+ Since as of **September 2019**, there is also a mini-widget for quickly
278
+ removing the first page of a .pdf file. This is really minimal right
279
+ now and not very elegant; it may be improved in the future, but for
280
+ the time being it is what it is. It is more a proof-of-concept that
281
+ it can work.
282
+
283
+ You can start this via:
284
+
285
+ require 'pdf_paradise/gui/gtk2/remove_first_page_of_pdf_file.rb'
286
+
287
+ PdfParadise.start_gtk_gui_remove_first_page_of_pdf_file
288
+
289
+ Note that as of **January 2021** the gtk bindings will default to
290
+ **ruby-gtk3**. Support for ruby-gtk2 will be retained, though,
291
+ but new code may not necessarily be written for ruby-gtk2 in
292
+ mind. ruby-gtk3 is now the main GUI target for this project.
293
+
294
+ I am slowly porting the individual widgets.
295
+
296
+ The following widgets have been ported so far:
297
+
298
+ PdfParadise::GUI::Gtk::StatisticsWidget # can be found under pdf_paradise/gui/gtk3/statistics_widget/statistics_widget.rb
299
+
300
+ ## Specification of the .pdf format
301
+
302
+ This subsection is a stub - I only needed it to gather information
303
+ about the .pdf specification. This is NOT complete - it only shall
304
+ contain some useful information and snippets about the .pdf
305
+ specification.
306
+
307
+ PDF stands short for **Portable Document Format**.
308
+
309
+ PDF has been standardized as **ISO-32000** in the year **2008**.
310
+
311
+ In the pdf-specification we can distinguish these entities:
312
+
313
+ Objects: these are not objects in the OOP sense, but simply the
314
+ basic data type of the PDF standard. There are 9 types of objects:
315
+ null, boolean, integer, real, name, string, array, dictionary and
316
+ stream.
317
+
318
+ Dictionary: this is a key-value pair that is unordered. They are
319
+ denoted by << and >> at the beginning and the end.
320
+
321
+ Indirect Objects: these are objects that are referred to by
322
+ reference.
323
+
324
+ Direct Objects: these are objects that appear inline and are
325
+ obtained directly.
326
+
327
+ Conforming Reader: is ann application that parses a PDF
328
+ file according to the PDF Standard.
329
+
330
+ A .pdf file is made up of a specific structure, usually a four-part
331
+ layout.
332
+
333
+ These four parts are:
334
+
335
+ Header
336
+ Body
337
+ Cross-reference table
338
+ Trailer
339
+
340
+ ### The .pdf Header tag
341
+
342
+ The header may begin with an entry such as **%PDF-1.7**.
343
+
344
+ The general format for the header is:
345
+
346
+ %PDF- followed by the version number in the form of 1.N.
347
+
348
+ This is not valid for all .pdf files, though. Past PDF Version
349
+ 1.4, the **Version** entry in the document's catalog dictionary,
350
+ which is within the **Root** entry of the **Trailer**, may be
351
+ used instead of the Header - **if present**.
352
+
353
+ If a .pdf file contains binary data - which most PDF files
354
+ will do nowadays, such as **stream objects** - then the
355
+ **Header** line shall be immediately followed by a line
356
+ containing at the least **four binary characters**. These
357
+ are character codes of 128 or greater.
358
+
359
+ ### The .pdf Body tag
360
+
361
+ The body of a PDF File consist of these aforementioned **Indirect
362
+ Objects** representing the contents of a document.
363
+
364
+ **Indirect Objects** begin with a **unique object identifier**
365
+ that allows other objects to refer to them.
366
+
367
+ That identifier is made up of the following two components:
368
+
369
+ (1) Object Number: a positive Integer, can be in any arbitrary order
370
+ (2) Generation Number: a non-negative Integer)
371
+
372
+ The **Indirect Objects** can be referred to from elsewhere by an
373
+ Indirect Reference. This must consist of:
374
+
375
+ Object Number
376
+ Generation Number, and
377
+ keyword R # for instance: 4 0 R
378
+
379
+ After the identifier is the keyword **obj** (start of the object)
380
+ and **endobj** (end of the object). Anything in between that is
381
+ is a key-value pair that describes the object.
382
+
383
+ A a simple example showing the use of **Indirect Objects** will be
384
+ shown next:
385
+
386
+ 1 0 obj % Object Number 1, Generation Number 0
387
+ <<
388
+ /Type /Pages % Describe type of object
389
+ /Kids [ 4 0 R ] $ Kids Entry referring to an indirect reference (Object number 4, Generation number 0)
390
+ /Count 1
391
+ >>
392
+ endobj
393
+
394
+ 2 0 obj % Object Number 2, Generation Number 0
395
+ <<
396
+ /Type /Catalog % Describe type of object
397
+ /Pages 1 0 R % Referring another object via unique object identifier
398
+ >>
399
+ endobj
400
+
401
+ The **Body** section of a .pdf file is thus a tree of objects that
402
+ are linked together, ultimately coming down to the Root Object
403
+ (Defined by the **Root** entry in the **Trailer** section, as a
404
+ catalog dictionary).
405
+
406
+ The **Cross-Reference Table** is a table that contains a list of byte
407
+ offset pointing to the indirect objects.
408
+
409
+ A pdf-conforming reader uses the Cross-Reference Table as a lookup
410
+ table to access certain objects quickly when needed.
411
+
412
+ The format for entries in Cross-Reference Table can be summarized ass
413
+ follows:
414
+
415
+ - In the following format nnnnnnnnnn ggggg n eol, a total of 20 bytes
416
+ - nnnnnnnnnn is a 10-digit byte offset in the decoded stream
417
+ - ggggg 5-digit generation number
418
+ - n keyword for in-use entry or f keyword for free entry
419
+ - eol 2 character end-of-line sequence (Like CR LF)
420
+
421
+ The **Cross-Reference Table** always begins with the special entry
422
+ **0000000000 65535** - see the following example:
423
+
424
+ 0000000000 65535 f % special entry, f denoting it is a free entry
425
+
426
+ ## Graphical User Interfaces (GUIs)
427
+
428
+ The pdf_paradise gem comes with a few, small-ish widgets, primarily
429
+ written in ruby-gtk. Since as of August 2021 I am also experimenting
430
+ with libui but this is a slow process - stay tuned for more updates
431
+ in the coming months in this regard.
432
+
433
+ One big advantage of libui is that it works on windows out-of-the-box,
434
+ so we can use GUIs on windows as well. \o/
435
+
436
+ ## Compressing a .pdf file (optizime the size of a .pdf file)
437
+
438
+ Sometimes you may have to reduce the filesize of a given .pdf
439
+ file, such as when you need to upload a .pdf file, and there
440
+ is some file size limit otherwise. This happened to me a few
441
+ times when using webmail-based email services, where an
442
+ automatic notice was generated when the .pdf file was too
443
+ large, e. g. above 25MB in size or something similar.
444
+
445
+ So, let us now assume that you **do** have a use case such
446
+ as described above, or any other use case - you want to
447
+ reduce the file size of a given .pdf file at hand.
448
+
449
+ How can this be done?
450
+
451
+ Well, there are several ways. One is to use online-based
452
+ tools, which tend to work surprisingly well; I verified
453
+ this in February 2022. But, as far as the gem here is
454
+ concerned, we will focus primarily on means that can be
455
+ used by you on your own, without having to depend on
456
+ external websites. Two methods will be described here -
457
+ the first one requiring **ghostscript**, the second
458
+ one requiring **hexapdf**.
459
+
460
+ The important parameter in regards for **ghostscript** is
461
+ the **dPDFSETTINGS** parameter. This one will determine
462
+ the compression level, which ultimately will affect
463
+ the quality of the compressed .pdf file.
464
+
465
+ Available parameters to **dPDFSETTINGS** include
466
+ **/screen**, **/ebook**, **/printer**, **/prepress**
467
+ and **/default**.
468
+
469
+ class **PdfParadise::CompressThisPdfFile** can be of
470
+ help here. Simply pass, as argument to .new(), the path
471
+ of the local .pdf to that class.
472
+
473
+ You can also use a toplevel method if you'd like to:
474
+
475
+ require 'pdf_paradise'
476
+ PdfParadise.compress_this_pdf_file
477
+ PdfParadise.compress_this_pdf_file('/foobar.pdf')
478
+
479
+ The variant using hexapdf is called:
480
+
481
+ PdfParadise.compress_via_pdf
482
+ PdfParadise.compress_via_pdf('foobar.pdf')
483
+
484
+ The API name may change at a later point in time; perhaps
485
+ we will just add a toplevel API called **PdfParadise.compress()**,
486
+ but for the time being the above APIs will be retained as they
487
+ are.
488
+
489
+ ## Storing all open .pdf files in a yaml file
490
+
491
+ In **February 2022* the yaml file working_on_these_pdf_files.yml
492
+ was added at:
493
+
494
+ pdf_paradise/yaml/working_on_these_pdf_files.yml
495
+
496
+ The idea here is that this yaml-file retains the local path
497
+ to any .pdf file that the user (in this case me) is working
498
+ on, aka reading right now.
499
+
500
+ I needed this because I tend to work through .pdf files and
501
+ remove page after page when I read it. The idea is that
502
+ I do not lose that information when I reboot my computer
503
+ or when said computer crashes; I needed to make this
504
+ persistent information.
505
+
506
+ Why is this yaml file part of the pdf_paradise gem, though?
507
+ This is mostly due to convenience. I wanted to have this
508
+ available in one of my ruby gems by default. In the long
509
+ run I will add code that allows other users to adjust
510
+ this to their own use case (and perhaps in their home
511
+ directory rather than store this in the gem itself). As
512
+ of February 2022 code for the latter is currently not
513
+ part of the gem, but I may add code for this - either
514
+ in the **pdf_paradise** gem or the **roebe** gem.
515
+
516
+ ## Converting .jpg files to .pdf files
517
+
518
+ If you have a use case to convert several .jpg files into .pdf
519
+ files then the following commandline example should be
520
+ helpful:
521
+
522
+ convert /path/to/image foobar.pdf
523
+ convert *.jpg foobar.pdf
524
+
525
+ Note that this requires **ImageMagick**. **ImageMagick** is
526
+ not always perfect; it has a few problems, unfortunately.
527
+
528
+ For instance, in <b>April 2022</b> when I tried the above,
529
+ the image was repeated three times on the x-axis. I do not
530
+ know why, but that makes **absolutely no sense**. It is just
531
+ a single image, so why is the resulting .pdf file repeated
532
+ three times? Perhaps imagemagick's **convert** tool does
533
+ this automatically, but then I question the default behaviour -
534
+ **it makes no sense** for the use case I have. One image
535
+ should be one image, not three images or fifty images.
536
+
537
+ In the event that **ImageMagick** does not work very well
538
+ for your use case, consider using another software suite,
539
+ such as **img2pdf**.
540
+
541
+ The syntax for **img2pdf** goes something like this:
542
+
543
+ img2pdf -o document.pdf *jpg
544
+
545
+ I liked this, so in **April 2022** this was added to
546
+ **ImageParadise**. The API for this is as follows:
547
+
548
+ ImageParadise.img2pdf('*.jpg') # If a '*' is part of the input Dir[] will be used.
549
+
550
+ As that functionality may be useful on the commandline
551
+ as well, an executable has been added at
552
+ **bin/imageparadise_img2pdf**. Simply pass the image
553
+ files that you want to convert.
554
+
555
+ Usage example:
556
+
557
+ imageparadise_img2pdf *jpg
558
+
559
+ If you need the images to be ordered or sorted then you
560
+ may have to do so when specifying the image file at hand
561
+ specifically, e. g. the path to it.
562
+
563
+ So for instance:
564
+
565
+ imageparadise_img2pdf image3.jpg image1.jpg image2.png
566
+
567
+ The only drawback I have found with <b>img2pdf</b> so far
568
+ is that you can not easily add text to an image. This makes
569
+ it hard to identify which image is named how. A work around
570
+ for this is to embed the filename into the image itself,
571
+ e. g. create temporary images, and then pack them together
572
+ via <b>img2pdf</b>.
573
+
574
+ ## Splitting a single pdf file into individual several .pdf files
575
+
576
+ You can use the following toplevel API to split up a single
577
+ .pdf file into several .pdf files:
578
+
579
+ PdfParadise.burst(ARGV)
580
+ PdfParadise.burst('foobar.pdf')
581
+
582
+ ## Merging pdf files
583
+
584
+ <b>class PdfParadise::MergePdf.new(ARGV)</b> can be used for
585
+ <b>merging .pdf files</b>. This functionality depends on
586
+ external software, so you have to install this first.
587
+
588
+ Currently <b>ghostscript</b> and <b>hexapdf</b> can be used for
589
+ the <b>merging</b> step.
590
+
591
+ Examples for how to use either of these two variants, as
592
+ far as <b>class PdfParadise::MergePdf</b> is concerned,
593
+ follows next:
594
+
595
+ mergepdf one.pdf two.pdf --use-ghostscript
596
+ mergepdf one.pdf two.pdf --use-hexapdf
597
+
598
+ (The two -- hyphen are mandatory for commandline arguments
599
+ right now; otherwise it is assumed to be a locally existing
600
+ .pdf file.)
601
+
602
+ If you need to do this from within ruby code, consider
603
+ using the following code:
604
+
605
+ require 'pdf_paradise'
606
+ merge_pdf = PdfParadise::MergePdf.new('one.pdf two.pdf')
607
+ merge_pdf.feedback_where_it_is_stored # Call it manually.
608
+
609
+ ## Combining individual pages from .pdf files into a new .pdf file via class PdfParadise::CombineThesePdfPages
610
+
611
+ class **PdfParadise::CombineThesePdfPages** can be used to
612
+ extract individual pdf pages from a given .pdf file and
613
+ combine these into a new .pdf file.
614
+
615
+ There is also an executable at **bin/combine_these_pdf_pages**
616
+ which can be used on the commandline.
617
+
618
+ This functionality depends on the **hexapdf** gem.
619
+
620
+ Usage example:
621
+
622
+ combine_these_pdf_pages foobar.pdf 3,4,5
623
+
624
+ This would retain the pages at 3, 4 and 5 and create a new
625
+ .pdf file.
626
+
627
+
628
+ ## Contact information
629
+
630
+ If your creative mind has ideas and specific suggestions to make this
631
+ gem more useful in general, feel free to drop me an email at any
632
+ time, via:
633
+
634
+ shevy@inbox.lt
635
+
636
+ Before that email I used an email account at Google gmail, but in **2021** I
637
+ decided to slowly abandon gmail for various reasons. In part this is because
638
+ the UI annoys me (on non-chrome browser loading takes too long), but also
639
+ because of Google's attempt to establish mass surveillance via its
640
+ federated cohorts sniffing (FLoC). I do not know what happened at Google,
641
+ but enough is enough - there is only so much you can take while supporting
642
+ greed. When it comes to data mining done by private groups, ultimately
643
+ the user became the product.
644
+
645
+ Do keep in mind that responding to emails may take some time,
646
+ depending on the amount of work I may have at that moment, due
647
+ to reallife time constraints. I will, however had, read feedback
648
+ eventually. Patches and code changes are welcome too, of course,
649
+ as long as they are in the spirit of the project at hand, e. g.
650
+ fitting to the general theme. For this I may make use of github
651
+ as a discussion site, but this has a low priority right now.
652
+
@@ -0,0 +1,7 @@
1
+ #!/usr/bin/ruby -w
2
+ # Encoding: UTF-8
3
+ # frozen_string_literal: true
4
+ # =========================================================================== #
5
+ require 'pdf_paradise/toplevel_methods/automatic_pdf_title.rb'
6
+
7
+ PdfParadise.automatic_pdf_title(ARGV)
@@ -0,0 +1,7 @@
1
+ #!/usr/bin/ruby -w
2
+ # Encoding: UTF-8
3
+ # frozen_string_literal: true
4
+ # =========================================================================== #
5
+ require 'pdf_paradise/utility_scripts/combine_these_pdf_pages.rb'
6
+
7
+ PdfParadise::CombineThesePdfPages.new(ARGV)
@@ -0,0 +1,7 @@
1
+ #!/usr/bin/ruby -w
2
+ # Encoding: UTF-8
3
+ # frozen_string_literal: true
4
+ # =========================================================================== #
5
+ require 'pdf_paradise/compress/compress_via_hexapdf.rb'
6
+
7
+ PdfParadise.compress_via_pdf(ARGV)
@@ -0,0 +1,7 @@
1
+ #!/usr/bin/ruby -w
2
+ # Encoding: UTF-8
3
+ # frozen_string_literal: true
4
+ # =========================================================================== #
5
+ require 'pdf_paradise/toplevel_methods/convert_markdown_to_pdf.rb'
6
+
7
+ PdfParadise.convert_markdown_to_pdf(ARGV)
@@ -0,0 +1,7 @@
1
+ #!/usr/bin/ruby -w
2
+ # Encoding: UTF-8
3
+ # frozen_string_literal: true
4
+ # =========================================================================== #
5
+ require 'pdf_paradise/utility_scripts/convert_pdf_to_text.rb'
6
+
7
+ PdfParadise::ConvertPdfToText.new(ARGV)