pdftoimage 0.2.1 → 0.3.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 7385d5aaa8f461f7214d25b9972e5b3cdfd528da1da7c7dfd18b61309e9dd010
4
- data.tar.gz: ed6c231fa756f90c9330d9f68fcec469aa447b5446ded5074d711ddee9579d04
3
+ metadata.gz: 382ac5bc0e37e99acc44c9b7b40883ca8ca0a7b72595229e460fd4c0d4f6ee27
4
+ data.tar.gz: 6163c9afa9cc35a3bf6231af43eaf7ee1a5ae490c946b886f2904f63af2af9b6
5
5
  SHA512:
6
- metadata.gz: 182bac990daff942767ca44b8cd56e3979fa39f334f565a2b7efabb8497b8be042b04e0abfafc735ba0f2023440732093c6bc77edb7ac12ed1ebbb8fc7287634
7
- data.tar.gz: 37df49043986c6c02720dec32d076144c199cac1b6b8fcf87c895f4fb1131e297eb85b91f0df1915a7571561076641c74527d18d9d2b74267b813eec4de63d51
6
+ metadata.gz: 640b7bbd60b1a15db121c27c1d56b360dbb509c6cdce62d525ed3ea8aceeed3e67ee359eae9db350e0d031c06b178ad5723cc29990075b96f762ad075627c614
7
+ data.tar.gz: 76a9559c9e50853022d5ef8748f1a0733bd065e5a832e47d3cb5106f8825b9860b786d5c1d87edb495fa511f62efc55b41b089e947bae3fc39c7e3dcd6ef4599
data/CHANGELOG.md ADDED
@@ -0,0 +1,75 @@
1
+ # Changelog
2
+
3
+ All notable changes to this project will be documented in this file.
4
+
5
+ The format is based on [Keep a Changelog](https://keepachangelog.com).
6
+
7
+ ## [0.3.1] - 2026-03-21
8
+
9
+ ### Added
10
+ - `Image#crop(x, y, w, h)` for extracting a rectangular region from a PDF page
11
+
12
+ ## [0.3.0] - 2026-03-20
13
+
14
+ ### Added
15
+ - `PDFToImage.open` now accepts IO objects in addition to file paths (#11)
16
+ - `PDFToImage.from_blob` for opening PDFs from raw binary data (#11)
17
+ - `Image#save` now accepts IO objects for output, enabling fully in-memory workflows (#11)
18
+ - Support for opening PDFs from remote URLs (#16)
19
+
20
+ ### Changed
21
+ - Replaced direct ImageMagick CLI calls with MiniMagick (#15)
22
+
23
+ ### Removed
24
+ - Removed iconv dependency (#17)
25
+
26
+ ## [0.2.1] - 2025-04-08
27
+
28
+ ### Fixed
29
+ - "Error determining page count" (#13)
30
+ - Updated shellwords dependency to 0.2.0+ (#7)
31
+
32
+ ## [0.2.0] - 2023-04-20
33
+
34
+ ### Fixed
35
+ - Use of deprecated `File.exists?` method (#4)
36
+ - File paths are now escaped to properly handle spaces and special characters (#3)
37
+
38
+ ### Added
39
+ - Specifying dpi resolution is now supported (#5)
40
+
41
+ ## [0.1.7] - 2018-05-01
42
+
43
+ ### Fixed
44
+ - Updated yard to resolve a vulnerability
45
+
46
+ ## [0.1.6] - 2011-07-13
47
+
48
+ ### Fixed
49
+ - Buggy PDF generators encoding CreationDate and ModDate as UTF-16 instead of ASCII, causing parsing errors
50
+
51
+ ## [0.1.5] - 2011-03-08
52
+
53
+ ### Fixed
54
+ - poppler_utils no longer leaves off the extra padded zero
55
+
56
+ ## [0.1.4] - 2010-11-15
57
+
58
+ ### Fixed
59
+ - Documents with page counts that are exact powers of 10 not parsing properly due to poppler_utils zero-padding behavior
60
+
61
+ ## [0.1.3] - 2010-11-12
62
+
63
+ ### Fixed
64
+ - PDF documents with more than 9 pages not parsing properly (zero-padding issue)
65
+
66
+ ## [0.1.2] - 2010-11-11
67
+
68
+ ### Added
69
+ - Support for blocks upon opening a PDF
70
+ - `quality` method for JPEG/MIFF/PNG compression levels
71
+ - Lazy conversion: PDF conversion is now deferred until saving, improving performance for partial conversions
72
+
73
+ ## [0.1.1] - 2010-11-10
74
+
75
+ - Initial release
data/README.md ADDED
@@ -0,0 +1,91 @@
1
+ # pdftoimage
2
+
3
+ A Ruby gem for converting PDF documents into images using [poppler_utils](https://poppler.freedesktop.org/) and [MiniMagick](https://github.com/minimagick/minimagick).
4
+
5
+ ## Installation
6
+
7
+ ```sh
8
+ gem install pdftoimage
9
+ ```
10
+
11
+ ### Requirements
12
+
13
+ - [poppler_utils](https://poppler.freedesktop.org/)
14
+ - [ImageMagick](https://imagemagick.org/)
15
+
16
+ ## Usage
17
+
18
+ ### From a file
19
+
20
+ ```ruby
21
+ require 'pdftoimage'
22
+
23
+ images = PDFToImage.open('somefile.pdf')
24
+ images.each do |page|
25
+ page.resize('50%').save("output/page-#{page.page}.jpg")
26
+ end
27
+ ```
28
+
29
+ ### With a block
30
+
31
+ ```ruby
32
+ PDFToImage.open('report.pdf') do |page|
33
+ page.resize('150').quality('80%').save("out/thumbnail-#{page.page}.jpg")
34
+ end
35
+ ```
36
+
37
+ ### From a URL
38
+
39
+ ```ruby
40
+ pages = PDFToImage.open('https://example.com/report.pdf')
41
+ pages[0].save('first_page.png')
42
+ ```
43
+
44
+ ### From an IO object
45
+
46
+ ```ruby
47
+ File.open('report.pdf', 'rb') do |io|
48
+ pages = PDFToImage.open(io)
49
+ pages[0].save('first_page.png')
50
+ end
51
+ ```
52
+
53
+ ### From binary data
54
+
55
+ ```ruby
56
+ pdf_data = download_pdf_from_s3(key)
57
+ pages = PDFToImage.from_blob(pdf_data)
58
+ pages[0].save('first_page.png')
59
+ ```
60
+
61
+ ### Saving to an IO object
62
+
63
+ ```ruby
64
+ pages = PDFToImage.open('report.pdf')
65
+
66
+ io = StringIO.new(''.b)
67
+ pages[0].save(io)
68
+ io.rewind
69
+ ```
70
+
71
+ ### Cropping a region
72
+
73
+ ```ruby
74
+ PDFToImage.open('report.pdf') do |page|
75
+ page.crop(0, 300, 100, 300).save("out/cropped-#{page.page}.jpg")
76
+ end
77
+ ```
78
+
79
+ ### Setting resolution
80
+
81
+ ```ruby
82
+ PDFToImage.open('report.pdf') do |page|
83
+ page.r(350).save("out/hires-#{page.page}.jpg")
84
+ end
85
+ ```
86
+
87
+ ## License
88
+
89
+ Copyright (c) 2026 Rob Flynn
90
+
91
+ See [LICENSE](LICENSE) for details.
@@ -1,3 +1,5 @@
1
+ require 'tempfile'
2
+
1
3
  module PDFToImage
2
4
  # A class which is instantiated by PDFToImage when a PDF document
3
5
  # is opened.
@@ -26,7 +28,7 @@ module PDFToImage
26
28
 
27
29
  CUSTOM_IMAGE_METHODS.each do |method|
28
30
  define_method(method.to_sym) do |*args|
29
- @args << "-#{method} #{args.join(' ')}"
31
+ @args << [method, args]
30
32
 
31
33
  self
32
34
  end
@@ -67,19 +69,19 @@ module PDFToImage
67
69
  # @param outname [String] The output filename of the image
68
70
  #
69
71
  def save(outname)
70
- generate_temp_file
71
-
72
- cmd = "convert "
73
-
74
- if not @args.empty?
75
- cmd += "#{@args.join(' ')} "
72
+ if outname.respond_to?(:write)
73
+ save_to_io(outname)
74
+ else
75
+ save_to_file(outname)
76
76
  end
77
77
 
78
- cmd += "#{Shellwords.escape(@filename)} #{Shellwords.escape(outname)}"
78
+ return true
79
+ end
79
80
 
80
- PDFToImage.exec(cmd)
81
+ def crop(x, y, w, h)
82
+ @pdf_args.push("-x #{x}", "-y #{y}", "-W #{w}", "-H #{h}")
81
83
 
82
- return true
84
+ self
83
85
  end
84
86
 
85
87
  def <=>(img)
@@ -94,6 +96,29 @@ module PDFToImage
94
96
 
95
97
  private
96
98
 
99
+ def save_to_file(outname)
100
+ generate_temp_file
101
+
102
+ image = MiniMagick::Image.open(@filename)
103
+ @args.each do |method, args|
104
+ image.send(method, *args)
105
+ end
106
+ image.write(outname)
107
+ end
108
+
109
+ def save_to_io(io)
110
+ tempfile = Tempfile.new(['pdftoimage', '.png'])
111
+ tempfile.binmode
112
+ begin
113
+ save_to_file(tempfile.path)
114
+ tempfile.rewind
115
+ IO.copy_stream(tempfile, io)
116
+ ensure
117
+ tempfile.close
118
+ tempfile.unlink
119
+ end
120
+ end
121
+
97
122
  def generate_temp_file
98
123
  if @opened == false
99
124
  cmd = "pdftoppm -png -f #{@page} #{@pdf_args.join(" ")} -l #{@page} #{Shellwords.escape(@pdf_name)} #{Shellwords.escape(@filename)}"
@@ -1,4 +1,4 @@
1
1
  module PDFToImage
2
2
  # pdftoimage version
3
- VERSION = "0.2.1"
3
+ VERSION = "0.3.1"
4
4
  end
data/lib/pdftoimage.rb CHANGED
@@ -2,8 +2,11 @@ require 'pdftoimage/version'
2
2
  require 'pdftoimage/image'
3
3
 
4
4
  require 'tmpdir'
5
- require 'iconv'
6
5
  require 'shellwords'
6
+ require 'mini_magick'
7
+ require 'open-uri'
8
+ require 'uri'
9
+ require 'stringio'
7
10
 
8
11
  module PDFToImage
9
12
  class PDFError < RuntimeError; end
@@ -20,20 +23,15 @@ module PDFToImage
20
23
  raise PDFToImage::PDFError, "poppler_utils not installed"
21
24
  end
22
25
 
23
- begin
24
- tmp = `identify -version 2>&1`
25
- raise(PDFToImage::PDFError, "ImageMagick not installed") unless tmp.index('ImageMagick')
26
- rescue Errno::ENOENT
27
- raise PDFToImage::PDFError, "ImageMagick not installed"
28
- end
29
-
30
26
  class << self
31
27
  # Opens a PDF document and prepares it for splitting into images.
32
28
  #
33
- # @param filename [String] The filename of the PDF to open
29
+ # @param source [String, IO] A filename, URL, or IO object containing PDF data
34
30
  #
35
31
  # @return [Array] An array of images
36
- def open(filename, &block)
32
+ def open(source, &block)
33
+ filename = resolve_source(source)
34
+
37
35
  if not File.exist?(filename)
38
36
  raise PDFError, "File '#{filename}' not found."
39
37
  end
@@ -54,6 +52,16 @@ module PDFToImage
54
52
  return images
55
53
  end
56
54
 
55
+ # Opens a PDF from raw binary data.
56
+ #
57
+ # @param data [String] Binary string of PDF content
58
+ #
59
+ # @return [Array] An array of images
60
+ def from_blob(data, &block)
61
+ filename = write_to_tempfile(data)
62
+ open(filename, &block)
63
+ end
64
+
57
65
  # Executes the specified command, returning the output.
58
66
  #
59
67
  # @param cmd [String] The command to run
@@ -102,6 +110,41 @@ module PDFToImage
102
110
  return matches[1].to_i
103
111
  end
104
112
 
113
+ def resolve_source(source)
114
+ if source.respond_to?(:read)
115
+ write_to_tempfile(source.read)
116
+ elsif url?(source)
117
+ download_file(source)
118
+ else
119
+ source
120
+ end
121
+ end
122
+
123
+ def write_to_tempfile(data)
124
+ tempfile = File.join(@@pdf_temp_dir, "#{random_name}.pdf")
125
+ File.open(tempfile, 'wb') { |f| f.write(data) }
126
+ tempfile
127
+ end
128
+
129
+ def url?(filename)
130
+ uri = URI.parse(filename)
131
+ uri.is_a?(URI::HTTP) || uri.is_a?(URI::HTTPS)
132
+ rescue URI::InvalidURIError
133
+ false
134
+ end
135
+
136
+ def download_file(url)
137
+ tempfile = File.join(@@pdf_temp_dir, "#{random_name}.pdf")
138
+ remote = URI.open(url)
139
+ File.open(tempfile, 'wb') do |file|
140
+ file.write(remote.read)
141
+ end
142
+ remote.close
143
+ tempfile
144
+ rescue OpenURI::HTTPError, SocketError, Errno::ECONNREFUSED => e
145
+ raise PDFError, "Failed to download '#{url}': #{e.message}"
146
+ end
147
+
105
148
  # Generate a random file name in the system's tmp folder
106
149
  def random_filename
107
150
  File.join(@@pdf_temp_dir, random_name)
metadata CHANGED
@@ -1,43 +1,42 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: pdftoimage
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.2.1
4
+ version: 0.3.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - Rob Flynn
8
- autorequire:
9
8
  bindir: bin
10
9
  cert_chain: []
11
- date: 2025-04-08 00:00:00.000000000 Z
10
+ date: 1980-01-02 00:00:00.000000000 Z
12
11
  dependencies:
13
12
  - !ruby/object:Gem::Dependency
14
- name: iconv
13
+ name: shellwords
15
14
  requirement: !ruby/object:Gem::Requirement
16
15
  requirements:
17
16
  - - "~>"
18
17
  - !ruby/object:Gem::Version
19
- version: '1.0'
18
+ version: 0.2.2
20
19
  type: :runtime
21
20
  prerelease: false
22
21
  version_requirements: !ruby/object:Gem::Requirement
23
22
  requirements:
24
23
  - - "~>"
25
24
  - !ruby/object:Gem::Version
26
- version: '1.0'
25
+ version: 0.2.2
27
26
  - !ruby/object:Gem::Dependency
28
- name: shellwords
27
+ name: mini_magick
29
28
  requirement: !ruby/object:Gem::Requirement
30
29
  requirements:
31
30
  - - "~>"
32
31
  - !ruby/object:Gem::Version
33
- version: 0.2.2
32
+ version: '4.0'
34
33
  type: :runtime
35
34
  prerelease: false
36
35
  version_requirements: !ruby/object:Gem::Requirement
37
36
  requirements:
38
37
  - - "~>"
39
38
  - !ruby/object:Gem::Version
40
- version: 0.2.2
39
+ version: '4.0'
41
40
  description: A ruby gem for converting PDF documents into a series of images. This
42
41
  module is based off poppler_utils and ImageMagick.
43
42
  email: rob@thingerly.com
@@ -45,9 +44,9 @@ executables: []
45
44
  extensions: []
46
45
  extra_rdoc_files: []
47
46
  files:
48
- - ChangeLog.rdoc
47
+ - CHANGELOG.md
49
48
  - LICENSE
50
- - README.rdoc
49
+ - README.md
51
50
  - lib/pdftoimage.rb
52
51
  - lib/pdftoimage/image.rb
53
52
  - lib/pdftoimage/version.rb
@@ -55,9 +54,8 @@ homepage: https://github.com/robflynn/pdftoimage
55
54
  licenses:
56
55
  - MIT
57
56
  metadata:
58
- changelog_uri: https://github.com/robflynn/pdftoimage/blob/master/ChangeLog.rdoc
57
+ changelog_uri: https://github.com/robflynn/pdftoimage/blob/main/CHANGELOG.md
59
58
  source_code_uri: https://github.com/robflynn/pdftoimage/
60
- post_install_message:
61
59
  rdoc_options: []
62
60
  require_paths:
63
61
  - lib
@@ -65,15 +63,14 @@ required_ruby_version: !ruby/object:Gem::Requirement
65
63
  requirements:
66
64
  - - ">="
67
65
  - !ruby/object:Gem::Version
68
- version: '0'
66
+ version: '2.7'
69
67
  required_rubygems_version: !ruby/object:Gem::Requirement
70
68
  requirements:
71
69
  - - ">="
72
70
  - !ruby/object:Gem::Version
73
71
  version: '0'
74
72
  requirements: []
75
- rubygems_version: 3.0.3.1
76
- signing_key:
73
+ rubygems_version: 3.6.7
77
74
  specification_version: 4
78
75
  summary: A ruby gem for converting PDF documents into a series of images.
79
76
  test_files: []
data/ChangeLog.rdoc DELETED
@@ -1,28 +0,0 @@
1
- === 0.2.0 / 2023-04-20
2
- * Fixed use of deprecated File.exists? method (pr#4 from Thornolf)
3
- * File paths are now escaped to properly handle spaces and special characters (pr#3 from drnic)
4
- * Specifying dpi resolution is now supported (pr#5 from lehf)
5
-
6
- === 0.1.7 / 2018-05-01
7
- * Updated yard to resolve a vulnerability.
8
-
9
- === 0.1.6 / 2011-07-13
10
- * Buggy PDF generators try to encode CreationDate and ModDate as UTF-16 as opposed to ASCII. This leads to parsing errors where the code was assuming UTF-8 encoding was in use.
11
-
12
- === 0.1.5 / 2011-03-08
13
- * Fixed a bug due to the fact that poppler_utils no longer leaves off the extra padded zero.
14
-
15
- === 0.1.4 / 2010-11-15
16
- * Fixed a bug concerning documents with page counts that are exact powers of 10. poppler_utils prepends one less zero to the page counts when a document count is a power of 10. This is now fixed in PDFToImage.
17
-
18
- === 0.1.3 / 2010-11-12
19
- * Fixed a problem where PDF documents with more than 9 pages were not parsing properly. (embarrassing 0 padding problem.)
20
-
21
- === 0.1.2 / 2010-11-11
22
- * Added support for blocks upon opening a PDF
23
- * Image objects now support the "quality" method for JPEG/MIFF/PNG compression levels.
24
- * PDF conversion is now deferred until saving. This greatly speeds up the conversion process in cases where you only want a few pages out of a large document converted.
25
-
26
- === 0.1.1 / 2010-11-10
27
-
28
- * Initial release:
data/README.rdoc DELETED
@@ -1,51 +0,0 @@
1
- = pdftoimage
2
-
3
- * {Homepage}[http://rubygems.org/gems/pdftoimage]
4
-
5
- == Description
6
-
7
- PDFToImage is a ruby gem which allows for conversion of a PDF document into
8
- images. It uses poppler_utils to first convert the document to PNG and then
9
- allows usage of ImageMagick to convert the image into other formats.
10
-
11
- The reasoning behind using poppler_utils is due to the fact that ghostscript
12
- occasionally has trouble with certain PDF documents which poppler_utils seems
13
- to be able to parse without error.
14
-
15
- == Examples
16
-
17
- require 'pdftoimage'
18
- images = PDFToImage.open('somefile.pdf')
19
- images.each do |img|
20
- img.resize('50%').save("output/page-#{img.page}.jpg")
21
- end
22
-
23
- require 'pdftoimage'
24
- PDFToImage.open('anotherpdf.pdf') do |page|
25
- page.resize('150').quality('80%').save('out/thumbnail-#{page.page}.jpg")
26
- end
27
-
28
- require 'pdftoimage'
29
- PDFToImage.open('anotherpdf.pdf') do |page|
30
- # Set the resolution to 350dpi
31
- page.r(350).save('out/thumbnail-#{page.page}.jpg")
32
- end
33
-
34
-
35
-
36
- == Requirements
37
-
38
- poppler_utils
39
-
40
- ImageMagick
41
-
42
-
43
- == Install
44
-
45
- $ gem install pdftoimage
46
-
47
- == Copyright
48
-
49
- Copyright (c) 2023 Rob Flynn
50
-
51
- See LICENSE for details.