pdf_paradise 0.1.66

Sign up to get free protection for your applications and to get access to all the features.

Potentially problematic release.


This version of pdf_paradise might be problematic. Click here for more details.

Files changed (110) hide show
  1. checksums.yaml +7 -0
  2. data/README.md +705 -0
  3. data/bin/automatic_pdf_title +7 -0
  4. data/bin/combine_these_pdf_pages +7 -0
  5. data/bin/compress_via_hexapdf +7 -0
  6. data/bin/convert_markdown_to_pdf +7 -0
  7. data/bin/convert_pdf_to_text +7 -0
  8. data/bin/delete_first_page_of_this_pdf_file +7 -0
  9. data/bin/merge_then_open +7 -0
  10. data/bin/n_pages +10 -0
  11. data/bin/open_main_pdf +7 -0
  12. data/bin/pdf_paradise +9 -0
  13. data/bin/set_main_book +7 -0
  14. data/bin/set_title_of_this_pdf_file +15 -0
  15. data/doc/README.gen +662 -0
  16. data/doc/todo/todo.md +7 -0
  17. data/lib/pdf_paradise/base/base.rb +239 -0
  18. data/lib/pdf_paradise/base/colours.rb +36 -0
  19. data/lib/pdf_paradise/commandline/commandline.rb +101 -0
  20. data/lib/pdf_paradise/commandline/help.rb +73 -0
  21. data/lib/pdf_paradise/commandline/menu.rb +142 -0
  22. data/lib/pdf_paradise/compress/compress_via_hexapdf.rb +27 -0
  23. data/lib/pdf_paradise/compress_this_pdf_file.rb +87 -0
  24. data/lib/pdf_paradise/constants/constants.rb +76 -0
  25. data/lib/pdf_paradise/convert_text_to_pdf.rb +94 -0
  26. data/lib/pdf_paradise/css/project.css +17 -0
  27. data/lib/pdf_paradise/djvu_to_pdf.rb +85 -0
  28. data/lib/pdf_paradise/gui/README.md +6 -0
  29. data/lib/pdf_paradise/gui/fox/split_pdf_file.rb +77 -0
  30. data/lib/pdf_paradise/gui/gtk2/delete_the_first_or_the_last_page_of_this_pdf_file/delete_the_first_or_the_last_page_of_this_pdf_file.rb +39 -0
  31. data/lib/pdf_paradise/gui/gtk2/pdf_viewer/pdf_viewer.rb +34 -0
  32. data/lib/pdf_paradise/gui/gtk2/split_pdf_file/split_pdf_file.rb +34 -0
  33. data/lib/pdf_paradise/gui/gtk2/statistics_widget/statistics_widget.rb +34 -0
  34. data/lib/pdf_paradise/gui/gtk2/to_pdf/to_pdf.rb +32 -0
  35. data/lib/pdf_paradise/gui/gtk3/controller/controller.rb +212 -0
  36. data/lib/pdf_paradise/gui/gtk3/convert_pdf_to_text/convert_pdf_to_text.rb +34 -0
  37. data/lib/pdf_paradise/gui/gtk3/delete_the_first_or_the_last_page_of_this_pdf_file/delete_the_first_or_the_last_page_of_this_pdf_file.rb +39 -0
  38. data/lib/pdf_paradise/gui/gtk3/pdf_viewer/pdf_viewer.rb +34 -0
  39. data/lib/pdf_paradise/gui/gtk3/split_pdf_file/split_pdf_file.rb +34 -0
  40. data/lib/pdf_paradise/gui/gtk3/statistics_widget/statistics_widget.rb +34 -0
  41. data/lib/pdf_paradise/gui/gtk3/to_pdf/to_pdf.rb +32 -0
  42. data/lib/pdf_paradise/gui/libui/extract_all_images_from_this_pdf_file/extract_all_images_from_this_pdf_file.rb +223 -0
  43. data/lib/pdf_paradise/gui/libui/statistics_widget/statistics_widget.rb +233 -0
  44. data/lib/pdf_paradise/gui/shared_code/convert_pdf_to_text/convert_pdf_to_text_module.rb +277 -0
  45. data/lib/pdf_paradise/gui/shared_code/delete_the_first_or_the_last_page_of_this_pdf_file/delete_the_first_or_the_last_page_of_this_pdf_file_module.rb +443 -0
  46. data/lib/pdf_paradise/gui/shared_code/pdf_viewer/pdf_viewer.css +5 -0
  47. data/lib/pdf_paradise/gui/shared_code/pdf_viewer/pdf_viewer_module.rb +284 -0
  48. data/lib/pdf_paradise/gui/shared_code/split_pdf_file/split_pdf_file.css +0 -0
  49. data/lib/pdf_paradise/gui/shared_code/split_pdf_file/split_pdf_file_module.rb +294 -0
  50. data/lib/pdf_paradise/gui/shared_code/statistics_widget/statistics_widget_module.rb +349 -0
  51. data/lib/pdf_paradise/gui/shared_code/to_pdf/to_pdf_module.rb +281 -0
  52. data/lib/pdf_paradise/hexapdf/001_rainbow_pattern_example.rb +0 -0
  53. data/lib/pdf_paradise/hexapdf/hexapdf.rb +123 -0
  54. data/lib/pdf_paradise/images/PDF_PARADISE_LOGO.png +0 -0
  55. data/lib/pdf_paradise/main_pdf/main_pdf.rb +444 -0
  56. data/lib/pdf_paradise/merge_pdf/menu.rb +63 -0
  57. data/lib/pdf_paradise/merge_pdf/merge_pdf.rb +306 -0
  58. data/lib/pdf_paradise/merge_pdf_namespace.rb +9 -0
  59. data/lib/pdf_paradise/merge_then_open/merge_then_open.rb +105 -0
  60. data/lib/pdf_paradise/pdf_file_n_total_pages.rb +249 -0
  61. data/lib/pdf_paradise/prawn_addons/README.md +2 -0
  62. data/lib/pdf_paradise/prawn_addons/prawn_addons.rb +17 -0
  63. data/lib/pdf_paradise/project/project.rb +22 -0
  64. data/lib/pdf_paradise/remove_pdf_password.rb +391 -0
  65. data/lib/pdf_paradise/requires/batch_require_toplevel_files.rb +22 -0
  66. data/lib/pdf_paradise/requires/colours.rb +7 -0
  67. data/lib/pdf_paradise/requires/colours_and_esystem_and_save_file_and_fileutils_and_opn.rb +11 -0
  68. data/lib/pdf_paradise/requires/esystem_and_colours.rb +10 -0
  69. data/lib/pdf_paradise/requires/esystem_and_opn_and_colours.rb +8 -0
  70. data/lib/pdf_paradise/requires/require_the_whole_project.rb +28 -0
  71. data/lib/pdf_paradise/requires/require_utility_scripts.rb +9 -0
  72. data/lib/pdf_paradise/set_main_book.rb +156 -0
  73. data/lib/pdf_paradise/set_pdf_title.rb +220 -0
  74. data/lib/pdf_paradise/sinatra/embeddable_interface.rb +318 -0
  75. data/lib/pdf_paradise/toplevel_methods/automatic_pdf_title.rb +55 -0
  76. data/lib/pdf_paradise/toplevel_methods/convert_epub_to_pdf.rb +27 -0
  77. data/lib/pdf_paradise/toplevel_methods/convert_markdown_to_pdf.rb +45 -0
  78. data/lib/pdf_paradise/toplevel_methods/convert_ppt_to_pdf.rb +35 -0
  79. data/lib/pdf_paradise/toplevel_methods/e.rb +16 -0
  80. data/lib/pdf_paradise/toplevel_methods/esystem.rb +19 -0
  81. data/lib/pdf_paradise/toplevel_methods/misc.rb +76 -0
  82. data/lib/pdf_paradise/toplevel_methods/number_pages.rb +38 -0
  83. data/lib/pdf_paradise/toplevel_methods/opened_pdf_files.rb +221 -0
  84. data/lib/pdf_paradise/toplevel_methods/query_pdf_title.rb +191 -0
  85. data/lib/pdf_paradise/toplevel_methods/reduce_size_of_this_pdf_file.rb +46 -0
  86. data/lib/pdf_paradise/toplevel_methods/roebe.rb +17 -0
  87. data/lib/pdf_paradise/toplevel_methods/rotate_pdf_file.rb +143 -0
  88. data/lib/pdf_paradise/toplevel_methods/to_pdf.rb +38 -0
  89. data/lib/pdf_paradise/utility_scripts/README.md +3 -0
  90. data/lib/pdf_paradise/utility_scripts/combine_these_pdf_pages.rb +118 -0
  91. data/lib/pdf_paradise/utility_scripts/convert_pdf_to_text.rb +175 -0
  92. data/lib/pdf_paradise/utility_scripts/delete_first_page_of_this_pdf_file.rb +221 -0
  93. data/lib/pdf_paradise/utility_scripts/delete_last_page_of_this_pdf_file.rb +180 -0
  94. data/lib/pdf_paradise/utility_scripts/delete_this_page_of_this_pdf_file.rb +329 -0
  95. data/lib/pdf_paradise/utility_scripts/extract_all_images_from_this_pdf_file.rb +129 -0
  96. data/lib/pdf_paradise/utility_scripts/extract_pdf_page.rb +283 -0
  97. data/lib/pdf_paradise/utility_scripts/pdf_optimizer.rb +111 -0
  98. data/lib/pdf_paradise/utility_scripts/pdf_statistics.rb +148 -0
  99. data/lib/pdf_paradise/utility_scripts/pdf_to_html.rb +75 -0
  100. data/lib/pdf_paradise/utility_scripts/remove_images.rb +110 -0
  101. data/lib/pdf_paradise/utility_scripts/split_pdf.rb +340 -0
  102. data/lib/pdf_paradise/utility_scripts/to_qdf.rb +82 -0
  103. data/lib/pdf_paradise/version/version.rb +19 -0
  104. data/lib/pdf_paradise/www/README.md +2 -0
  105. data/lib/pdf_paradise/www/sinatra/app.rb +276 -0
  106. data/lib/pdf_paradise/yaml/working_on_these_pdf_files.yml +4 -0
  107. data/lib/pdf_paradise.rb +5 -0
  108. data/pdf_paradise.gemspec +61 -0
  109. data/test/testing_pdf_paradise.rb +9 -0
  110. metadata +219 -0
data/doc/README.gen ADDED
@@ -0,0 +1,662 @@
1
+ ADD_RUBY_HEADER
2
+ ADD_TIME_STAMP
3
+
4
+ <img src="https://i.imgur.com/unhKNEw.png" style="margin-left: 2em">
5
+
6
+ This project can help with pdf-related activities, such as extracting
7
+ a .pdf page, converting .pdf page, merging .pdf files, splitting
8
+ .pdf files, setting the title of a .pdf page and similar actions.
9
+
10
+ The project has to remain quite flexible. We may use external
11
+ programs such as **ghoscript** or **qpdf**, or we may use pure
12
+ ruby solutions, such as via the gem called **combine_pdf**,
13
+ **prawn** or **hexapdf**.
14
+
15
+ The file here (README.gen, respectively the generated file called
16
+ **README.md**), will describe some of the components that make
17
+ up this gem.
18
+
19
+ ## Rationale for making use of separate pdf-related projects
20
+
21
+ There are many pdf-related tools if you look on the www. For
22
+ example, we have prawn, we have qpdf, we have calibre, we
23
+ have hexapdf, we have ghostscript, and many more applications.
24
+
25
+ Some of these have unique features; and some of them have overlapping
26
+ functionality, such as reading the content of .pdf files in a
27
+ simplified manner (number of pages, title, author and so forth).
28
+
29
+ The PdfParadise project attempts to support as many different
30
+ (open-source) projects as possible. It is also permissive to
31
+ support closed source projects, provided that **the code remains
32
+ simple** (and simple to change). The primary focus is on
33
+ open-source projects, though.
34
+
35
+ Why does the PdfParadise project attempt to support many different
36
+ pdf-related projects?
37
+
38
+ The answer to this question is rather simple: on Linux I have a lot of
39
+ flexibility and can use literally any pdf-related project just fine. On
40
+ Windows, however had, I am more restricted in what I can use. Not all
41
+ programs are available on windows or can be easily compiled there. Thus,
42
+ in order to allow the pdf_paradise .gem to work on windows, we need
43
+ this flexibility.
44
+
45
+ The reason why I added this subsection here in June 2021 was that
46
+ I am slowly changing the sinatra-related part of the PdfParadise
47
+ project, in order to embed the functionality into my main controller
48
+ which is handled by the **Roebe** namespace. In that controller
49
+ I wanted to easily offer pdf-related functionality "out of the
50
+ box" when I start the sinatra-application on windows. Because I
51
+ want to be able to offer pdf-related modifications on windows
52
+ as well, the PdfParadise project had to become more flexible,
53
+ so that a simple toplevel route, such as **/pdf**, will work
54
+ properly, and lead to entry points (subroutes) that allow
55
+ us to tap into the features offered by the PdfParadise project.
56
+
57
+ So, the **summary** is: the PdfParadise project must remain
58
+ flexible in order to support a proper workflow on windows
59
+ systems as well. (We could use WSL on windows, but not every
60
+ computer has this available, so I am targeting "vanilla"
61
+ windows really.)
62
+
63
+ Note that one slight drawback is that the sinatra part of
64
+ the PdfParadise project now has a dependency on the
65
+ **cyberweb** project, so if you want to use that, you also
66
+ have to install the cyberweb gem. This is a trade-off - for me
67
+ the more important part is long-term maintainability of
68
+ the pdf_paradise project in the long run, so a unified
69
+ code base had to be used in this regard.
70
+
71
+ ## Converting a .pdf file to text
72
+
73
+ Sometimes you may wish to have a text-file describing the content
74
+ of a .pdf file, rather than the .pdf file itself.
75
+
76
+ Via class **PdfParadise::ConvertPdfToText**, residing in the file
77
+ at **pdf_paradise/convert_pdf_to_text.rb**, you can convert a
78
+ .pdf file to a text file.
79
+
80
+ Usage example from ruby, for the file called **foobar.pdf**:
81
+
82
+ PdfParadise::ConvertPdfToText.new(ARGV)
83
+ PdfParadise::ConvertPdfToText.new('foobar.pdf')
84
+
85
+ You can also use the bin/ file from the commandline:
86
+
87
+ convert_pdf_to_text
88
+ convert_pdf_to_text foobar.pdf
89
+
90
+ There is also a ruby-gtk3 widget that offers the functionality
91
+ from class **PdfParadise::ConvertPdfToText**, if the user
92
+ has gtk3 installed and the ruby-bindings to it as well.
93
+
94
+ You can start that ruby-gtk3 widget via:
95
+
96
+ convert_pdf_to_text --gui
97
+
98
+ ## Commandline usage
99
+
100
+ You can use the **pdf_paradise** gem from the commandline, as
101
+ the example above shows.
102
+
103
+ For instance, say that you wish to modify **the title of a .pdf
104
+ file**, you can use a commandline invocation such as via
105
+ this way:
106
+
107
+ pdf_paradise --use-this-pdf-file=location_to_your_pdf_file.pdf --set_title="The title you want to use goes in here."
108
+
109
+ You can also **shrink** a .pdf file, by using the commandline
110
+ switch <b>--shrink-pdf-size-of=foobar.pdf</b> or just
111
+ <b>--shrink</b>, such as:
112
+
113
+ pdf_paradise --shrink-pdf-size-of=foobar.pdf
114
+ pdf_paradise --shrink=foobar.pdf
115
+
116
+ The <b>shrink</b> functionality is contained in the module-method
117
+ <b>PdfParadise.reduce_size_of_this_pdf_file()</b>.
118
+
119
+ ## Storing the .pdf pages that are currently open
120
+
121
+ If you need to store the .pdf files that are currently open,
122
+ you can use the following commandline to do so:
123
+
124
+ pdfparadise --store-open-pdf-files
125
+
126
+ This will attempt to store the full path to the .pdf files
127
+ into a local file. That way you may also be able to batch-open
128
+ these .pdf files at a later time, e. g. when you switch your
129
+ window manager or after a reboot.
130
+
131
+ ## Deleting the last or the first page of a .pdf file
132
+
133
+ You can use **class DeleteLastPageOfThisPdfFile**, more
134
+ accurately called **class PdfParadise::DeleteLastPageOfThisPdfFile**,
135
+ to ***delete the last page in a .pdf file***.
136
+
137
+ In ruby code, you can invoke this like so:
138
+
139
+ require 'pdf_paradise'
140
+
141
+ PdfParadise::DeleteLastPageOfThisPdfFile.new('path_to_the_pdf_file/goes_in_here.pdf')
142
+
143
+ or shorter:
144
+
145
+ require 'pdf_paradise'
146
+
147
+ PdfParadise.delete_last_page_of_this_pdf_file('foobar.pdf')
148
+
149
+ A very similar API exists for deleting the first page of a given .pdf
150
+ file, too.
151
+
152
+ See:
153
+
154
+ In ruby code, you can invoke this like so:
155
+
156
+ require 'pdf_paradise'
157
+
158
+ PdfParadise::DeleteFirstPageOfThisPdfFile.new('path_to_the_pdf_file/goes_in_here.pdf')
159
+
160
+ or shorter:
161
+
162
+ require 'pdf_paradise'
163
+
164
+ PdfParadise.delete_first_page_of_this_pdf_file('foobar.pdf')
165
+
166
+ ## Converting markdown .md files to .pdf files
167
+
168
+ If you use kramdown, prawn and kramdown-pdf-converter, then you
169
+ can convert .md files on the commandline, via:
170
+
171
+ convert_markdown_to_pdf path_to_pdf_file_goes_here.pdf
172
+
173
+ Install the necessary gems prior to using this commandline
174
+ functionality.
175
+
176
+ ## sinatra interface
177
+
178
+ Since as of April 2019 there is a minimal sinatra interface to the
179
+ PdfParadise project. Consider this incomplete <b>work-in-progress</b>.
180
+
181
+ To start it, try:
182
+
183
+ pdf_paradise --sinatra
184
+
185
+ ## Querying the title of a .pdf file
186
+
187
+ <b>class PdfParadise::QueryPdfTitle</b> will report the title of
188
+ any .pdf file that is passed into it, on the commandline.
189
+
190
+ This currently depends on <b>exiftool</b> but at a later time,
191
+ this may change to also allow a query via prawn or other tools.
192
+
193
+ If you need to determine whether a given .pdf file has a title
194
+ or whether it does not, you can use
195
+ <b>PdfParadise.does_this_pdf_file_have_a_title?</b>, such
196
+ as in:
197
+
198
+ PdfParadise.does_this_pdf_file_have_a_title? "foobar.pdf" # => true
199
+
200
+ This method will return **true** if the .pdf file at hand has a
201
+ title; and **false** otherwise.
202
+
203
+ ## Determining how many pages a given .pdf file has
204
+
205
+ class **PdfParadise::PdfFileNTotalPages** can be used to query
206
+ how many pages a given .pdf file has.
207
+
208
+ The executable called **bin/n_pages** (thus, **n_pages**) can
209
+ be used to query this, on the commandline.
210
+
211
+ Example:
212
+
213
+ n_pages foobar.pdf
214
+
215
+ Do note that the class requires the external program
216
+ called **pdfinfo**.
217
+
218
+ It is possible to query the number of pages in a given .pdf
219
+ file without **pdfinfo**, but some .pdf files are a bit buggy,
220
+ and **pdfinfo** is simply more reliable than the regex that
221
+ was used until March 2020. So, past March 2020, the program
222
+ **pdfinfo** is now used by default. Note that pdfinfo is
223
+ part of the poppler software suite.
224
+
225
+ You can also use the following toplevel API for this:
226
+
227
+ PdfParadise.n_pages? 'THE_PATH_TO_THE_PDF_FILE_GOES_IN_HERE.pdf'
228
+ PdfParadise.n_pages? 'foobar.pdf'
229
+
230
+ ## Adding page numbers to .pdf files
231
+
232
+ Via the combine_pdf gem it is now possible to add page numbers
233
+ to .pdf files. This has a few limitations for complex .pdf files,
234
+ due to combine_pdf having limitations in turn - but for simple
235
+ .pdf files this should work really well.
236
+
237
+ How to use that functionality?
238
+
239
+ Consider using the following toplevel API:
240
+
241
+ PdfParadise.number_pages('this_file.pdf')
242
+
243
+ The file called **this_file.pdf** has to exist in order for
244
+ this to work, of course.
245
+
246
+ The current default is to display the page numbers on the bottom
247
+ right side. This is hardcoded, but you could modify the code
248
+ to adapt to your needs; see also how combine_pdf does this.
249
+ (You have to pass an option-hash.)
250
+
251
+ ## Various GUI component of the PdfParadise project
252
+
253
+ The **PdfParadise project** comes with some ruby-gtk3 specific
254
+ GUIs, but a few ruby-gtk2 and ruby-tk bindings may exist
255
+ as well. The **ruby-gtk3** components constitute the main GUI
256
+ elements of this project, though.
257
+
258
+ You can start, from the commandline, the gtk-wrapper
259
+ over the **split_pdf_file** functionality.
260
+
261
+ In order to do this, do either one of the following:
262
+
263
+ pdf_paradise --gui
264
+ pdf_paradise --gtk
265
+
266
+ This will require the **gtk_paradise** project and the gtk
267
+ bindings, so quite a lot. **gem install gtk3** and
268
+ **gem install gtk_paradise** should help.
269
+
270
+ The GUI for class SplitPdfFile is called **PdfParadise::Gtk::SplitPdfFile**.
271
+ The idea behind it is to allow you to determine some of the parameters
272
+ in a graphical fashion.
273
+
274
+ Since as of **September 2019**, there is also a mini-widget for quickly
275
+ removing the first page of a .pdf file. This is really minimal right
276
+ now and not very elegant; it may be improved in the future, but for
277
+ the time being it is what it is. It is more a proof-of-concept that
278
+ it can work.
279
+
280
+ You can start this via:
281
+
282
+ require 'pdf_paradise/gui/gtk2/remove_first_page_of_pdf_file.rb'
283
+
284
+ PdfParadise.start_gtk_gui_remove_first_page_of_pdf_file
285
+
286
+ Note that as of **January 2021** the gtk bindings will default to
287
+ **ruby-gtk3**. Support for ruby-gtk2 will be retained, though,
288
+ but new code may not necessarily be written for ruby-gtk2 in
289
+ mind. ruby-gtk3 is now the main GUI target for this project.
290
+
291
+ I am slowly porting the individual widgets.
292
+
293
+ The following widgets have been ported so far:
294
+
295
+ PdfParadise::GUI::Gtk::StatisticsWidget # can be found under pdf_paradise/gui/gtk3/statistics_widget/statistics_widget.rb
296
+
297
+ ## Specification of the .pdf format
298
+
299
+ This subsection is a stub - I only needed it to gather information
300
+ about the .pdf specification. This is NOT complete - it only shall
301
+ contain some useful information and snippets about the .pdf
302
+ specification.
303
+
304
+ PDF stands short for **Portable Document Format**.
305
+
306
+ PDF has been standardized as **ISO-32000** in the year **2008**.
307
+
308
+ In the pdf-specification we can distinguish these entities:
309
+
310
+ Objects: these are not objects in the OOP sense, but simply the
311
+ basic data type of the PDF standard. There are 9 types of objects:
312
+ null, boolean, integer, real, name, string, array, dictionary and
313
+ stream.
314
+
315
+ Dictionary: this is a key-value pair that is unordered. They are
316
+ denoted by << and >> at the beginning and the end.
317
+
318
+ Indirect Objects: these are objects that are referred to by
319
+ reference.
320
+
321
+ Direct Objects: these are objects that appear inline and are
322
+ obtained directly.
323
+
324
+ Conforming Reader: is ann application that parses a PDF
325
+ file according to the PDF Standard.
326
+
327
+ A .pdf file is made up of a specific structure, usually a four-part
328
+ layout.
329
+
330
+ These four parts are:
331
+
332
+ Header
333
+ Body
334
+ Cross-reference table
335
+ Trailer
336
+
337
+ ### The .pdf Header tag
338
+
339
+ The header may begin with an entry such as **%PDF-1.7**.
340
+
341
+ The general format for the header is:
342
+
343
+ %PDF- followed by the version number in the form of 1.N.
344
+
345
+ This is not valid for all .pdf files, though. Past PDF Version
346
+ 1.4, the **Version** entry in the document's catalog dictionary,
347
+ which is within the **Root** entry of the **Trailer**, may be
348
+ used instead of the Header - **if present**.
349
+
350
+ If a .pdf file contains binary data - which most PDF files
351
+ will do nowadays, such as **stream objects** - then the
352
+ **Header** line shall be immediately followed by a line
353
+ containing at the least **four binary characters**. These
354
+ are character codes of 128 or greater.
355
+
356
+ ### The .pdf Body tag
357
+
358
+ The body of a PDF File consist of these aforementioned **Indirect
359
+ Objects** representing the contents of a document.
360
+
361
+ **Indirect Objects** begin with a **unique object identifier**
362
+ that allows other objects to refer to them.
363
+
364
+ That identifier is made up of the following two components:
365
+
366
+ (1) Object Number: a positive Integer, can be in any arbitrary order
367
+ (2) Generation Number: a non-negative Integer)
368
+
369
+ The **Indirect Objects** can be referred to from elsewhere by an
370
+ Indirect Reference. This must consist of:
371
+
372
+ Object Number
373
+ Generation Number, and
374
+ keyword R # for instance: 4 0 R
375
+
376
+ After the identifier is the keyword **obj** (start of the object)
377
+ and **endobj** (end of the object). Anything in between that is
378
+ is a key-value pair that describes the object.
379
+
380
+ A a simple example showing the use of **Indirect Objects** will be
381
+ shown next:
382
+
383
+ 1 0 obj % Object Number 1, Generation Number 0
384
+ <<
385
+ /Type /Pages % Describe type of object
386
+ /Kids [ 4 0 R ] $ Kids Entry referring to an indirect reference (Object number 4, Generation number 0)
387
+ /Count 1
388
+ >>
389
+ endobj
390
+
391
+ 2 0 obj % Object Number 2, Generation Number 0
392
+ <<
393
+ /Type /Catalog % Describe type of object
394
+ /Pages 1 0 R % Referring another object via unique object identifier
395
+ >>
396
+ endobj
397
+
398
+ The **Body** section of a .pdf file is thus a tree of objects that
399
+ are linked together, ultimately coming down to the Root Object
400
+ (Defined by the **Root** entry in the **Trailer** section, as a
401
+ catalog dictionary).
402
+
403
+ The **Cross-Reference Table** is a table that contains a list of byte
404
+ offset pointing to the indirect objects.
405
+
406
+ A pdf-conforming reader uses the Cross-Reference Table as a lookup
407
+ table to access certain objects quickly when needed.
408
+
409
+ The format for entries in Cross-Reference Table can be summarized ass
410
+ follows:
411
+
412
+ - In the following format nnnnnnnnnn ggggg n eol, a total of 20 bytes
413
+ - nnnnnnnnnn is a 10-digit byte offset in the decoded stream
414
+ - ggggg 5-digit generation number
415
+ - n keyword for in-use entry or f keyword for free entry
416
+ - eol 2 character end-of-line sequence (Like CR LF)
417
+
418
+ The **Cross-Reference Table** always begins with the special entry
419
+ **0000000000 65535** - see the following example:
420
+
421
+ 0000000000 65535 f % special entry, f denoting it is a free entry
422
+
423
+ ## Graphical User Interfaces (GUIs)
424
+
425
+ The pdf_paradise gem comes with a few, small-ish widgets, primarily
426
+ written in ruby-gtk. Since as of August 2021 I am also experimenting
427
+ with libui but this is a slow process - stay tuned for more updates
428
+ in the coming months in this regard.
429
+
430
+ One big advantage of libui is that it works on windows out-of-the-box,
431
+ so we can use GUIs on windows as well. \o/
432
+
433
+ ## Compressing a .pdf file (optizime the size of a .pdf file)
434
+
435
+ Sometimes you may have to reduce the filesize of a given .pdf
436
+ file, such as when you need to upload a .pdf file, and there
437
+ is some file size limit otherwise. This happened to me a few
438
+ times when using webmail-based email services, where an
439
+ automatic notice was generated when the .pdf file was too
440
+ large, e. g. above 25MB in size or something similar.
441
+
442
+ So, let us now assume that you **do** have a use case such
443
+ as described above, or any other use case - you want to
444
+ reduce the file size of a given .pdf file at hand.
445
+
446
+ How can this be done?
447
+
448
+ Well, there are several ways. One is to use online-based
449
+ tools, which tend to work surprisingly well; I verified
450
+ this in February 2022. But, as far as the gem here is
451
+ concerned, we will focus primarily on means that can be
452
+ used by you on your own, without having to depend on
453
+ external websites. Two methods will be described here -
454
+ the first one requiring **ghostscript**, the second
455
+ one requiring **hexapdf**.
456
+
457
+ The important parameter in regards for **ghostscript** is
458
+ the **dPDFSETTINGS** parameter. This one will determine
459
+ the compression level, which ultimately will affect
460
+ the quality of the compressed .pdf file.
461
+
462
+ Available parameters to **dPDFSETTINGS** include
463
+ **/screen**, **/ebook**, **/printer**, **/prepress**
464
+ and **/default**.
465
+
466
+ class **PdfParadise::CompressThisPdfFile** can be of
467
+ help here. Simply pass, as argument to .new(), the path
468
+ of the local .pdf to that class.
469
+
470
+ You can also use a toplevel method if you'd like to:
471
+
472
+ require 'pdf_paradise'
473
+ PdfParadise.compress_this_pdf_file
474
+ PdfParadise.compress_this_pdf_file('/foobar.pdf')
475
+
476
+ The variant using hexapdf is called:
477
+
478
+ PdfParadise.compress_via_pdf
479
+ PdfParadise.compress_via_pdf('foobar.pdf')
480
+
481
+ The API name may change at a later point in time; perhaps
482
+ we will just add a toplevel API called **PdfParadise.compress()**,
483
+ but for the time being the above APIs will be retained as they
484
+ are.
485
+
486
+ ## Storing all open .pdf files in a yaml file
487
+
488
+ In **February 2022* the yaml file working_on_these_pdf_files.yml
489
+ was added at:
490
+
491
+ pdf_paradise/yaml/working_on_these_pdf_files.yml
492
+
493
+ The idea here is that this yaml-file retains the local path
494
+ to any .pdf file that the user (in this case me) is working
495
+ on, aka reading right now.
496
+
497
+ I needed this because I tend to work through .pdf files and
498
+ remove page after page when I read it. The idea is that
499
+ I do not lose that information when I reboot my computer
500
+ or when said computer crashes; I needed to make this
501
+ persistent information.
502
+
503
+ Why is this yaml file part of the pdf_paradise gem, though?
504
+ This is mostly due to convenience. I wanted to have this
505
+ available in one of my ruby gems by default. In the long
506
+ run I will add code that allows other users to adjust
507
+ this to their own use case (and perhaps in their home
508
+ directory rather than store this in the gem itself). As
509
+ of February 2022 code for the latter is currently not
510
+ part of the gem, but I may add code for this - either
511
+ in the **pdf_paradise** gem or the **roebe** gem.
512
+
513
+ ## Splitting a single pdf file into individual several .pdf files
514
+
515
+ You can use the following toplevel API to split up a single
516
+ .pdf file into several .pdf files:
517
+
518
+ PdfParadise.burst(ARGV)
519
+ PdfParadise.burst('foobar.pdf')
520
+
521
+ ## Merging pdf files
522
+
523
+ <b>class PdfParadise::MergePdf.new(ARGV)</b> can be used for
524
+ <b>merging .pdf files</b>. This functionality depends on
525
+ external software, so you have to install this first.
526
+
527
+ Currently <b>ghostscript</b> and <b>hexapdf</b> can be used for
528
+ the <b>merging</b> step.
529
+
530
+ Examples for how to use either of these two variants, as
531
+ far as <b>class PdfParadise::MergePdf</b> is concerned,
532
+ follows next:
533
+
534
+ mergepdf one.pdf two.pdf --use-ghostscript
535
+ mergepdf one.pdf two.pdf --use-hexapdf
536
+
537
+ (The two -- hyphen are mandatory for commandline arguments
538
+ right now; otherwise it is assumed to be a locally existing
539
+ .pdf file.)
540
+
541
+ If you need to do this from within ruby code, consider
542
+ using the following code:
543
+
544
+ require 'pdf_paradise'
545
+ merge_pdf = PdfParadise::MergePdf.new('one.pdf two.pdf')
546
+ merge_pdf.feedback_where_it_is_stored # Call it manually.
547
+
548
+ ## Combining individual pages from .pdf files into a new .pdf file via class PdfParadise::CombineThesePdfPages
549
+
550
+ class **PdfParadise::CombineThesePdfPages** can be used to
551
+ extract individual pdf pages from a given .pdf file and
552
+ combine these into a new .pdf file.
553
+
554
+ There is also an executable at **bin/combine_these_pdf_pages**
555
+ which can be used on the commandline.
556
+
557
+ This functionality depends on the **hexapdf** gem.
558
+
559
+ Usage example:
560
+
561
+ combine_these_pdf_pages foobar.pdf 3,4,5
562
+
563
+ This would retain the pages at 3, 4 and 5 and create a new
564
+ .pdf file.
565
+
566
+ ## Extracting all images from a .pdf file
567
+
568
+ If you make use of <b>poppler</b> then you can extract
569
+ all images from a given .pdf file.
570
+
571
+ A small libui-GUI was added for this functionality - this
572
+ is mostly for quick demo purposes. It does not work extremely
573
+ well.
574
+
575
+ On IceWM it looks like this right now:
576
+
577
+ <img src="https://i.imgur.com/QXelVyy.png" style="margin:1em">
578
+
579
+ Not pretty, but it took only about 20 minutes to write this.
580
+
581
+ <b>pdfimages</b> from poppler must be installed. On Windows
582
+ you can probably download an executable for poppler here:
583
+
584
+ https://blog.alivate.com.au/poppler-windows/
585
+
586
+ I tested whether the above executables work on windows, and
587
+ indeed, they still work fine. I also tested the libui
588
+ variant on windows, and it works. The code is a bit
589
+ brittle, so use with care, but I was able to use it
590
+ successfully on <b>August 2022</b> to extract all images
591
+ from a given .pdf file. At a later time I may add am
592
+ to-image converter via libui, probably in the other
593
+ gem called image_paradise. Stay tuned in this regard.
594
+
595
+ To start the libui wrapper from the commandline, you can
596
+ use the following:
597
+
598
+ /usr/bin/pdf_paradise --libui
599
+ bin/pdf_paradise --libui
600
+ pdf_paradise --libui # This variant should work, or try the other
601
+ # variants; it is stored in bin/pdf_paradise
602
+ # of this gem
603
+
604
+ ## Converting .jpg files to .pdf files
605
+
606
+ If you have a use case to convert several .jpg files into .pdf
607
+ files then the following commandline example should be
608
+ helpful:
609
+
610
+ convert /path/to/image foobar.pdf
611
+ convert *.jpg foobar.pdf
612
+
613
+ Note that this requires **ImageMagick**. **ImageMagick** is
614
+ not always perfect; it has a few problems, unfortunately.
615
+
616
+ For instance, in <b>April 2022</b> when I tried the above,
617
+ the image was repeated three times on the x-axis. I do not
618
+ know why, but that makes **absolutely no sense**. It is just
619
+ a single image, so why is the resulting .pdf file repeated
620
+ three times? Perhaps imagemagick's **convert** tool does
621
+ this automatically, but then I question the default behaviour -
622
+ **it makes no sense** for the use case I have. One image
623
+ should be one image, not three images or fifty images.
624
+
625
+ In the event that **ImageMagick** does not work very well
626
+ for your use case, consider using another software suite,
627
+ such as **img2pdf**.
628
+
629
+ The syntax for **img2pdf** goes something like this:
630
+
631
+ img2pdf -o document.pdf *jpg
632
+
633
+ I liked this, so in **April 2022** this was added to
634
+ **ImageParadise**. The API for this is as follows:
635
+
636
+ ImageParadise.img2pdf('*.jpg') # If a '*' is part of the input Dir[] will be used.
637
+
638
+ As that functionality may be useful on the commandline
639
+ as well, an executable has been added at
640
+ **bin/imageparadise_img2pdf**. Simply pass the image
641
+ files that you want to convert.
642
+
643
+ Usage example:
644
+
645
+ imageparadise_img2pdf *jpg
646
+
647
+ If you need the images to be ordered or sorted then you
648
+ may have to do so when specifying the image file at hand
649
+ specifically, e. g. the path to it.
650
+
651
+ So for instance:
652
+
653
+ imageparadise_img2pdf image3.jpg image1.jpg image2.png
654
+
655
+ The only drawback I have found with <b>img2pdf</b> so far is
656
+ that you can not easily add text to an image. This makes it
657
+ hard to identify which image is named how. A work around for
658
+ this is to embed the filename into the image itself, e. g.
659
+ create temporary images, and then pack them together via
660
+ <b>img2pdf</b>.
661
+
662
+ ADD_CONTACT_INFORMATION
data/doc/todo/todo.md ADDED
@@ -0,0 +1,7 @@
1
+ - Add a converter-GUI.
2
+
3
+ From .docx to .pdf via libreoffice.
4
+ ^^^^ support this via that GUI.
5
+ ^^^ yeah
6
+
7
+ ^^ this works but has to be polished still.