RubyGems - grim - Versions diffs - 0.1.0 → 0.2.0 - Mend

grim 0.1.0 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (10) hide show

data/LICENSE ADDED Viewed

@@ -0,0 +1,20 @@
+Copyright (c) 2011 Jonathan Hoyt
+Permission is hereby granted, free of charge, to any person obtaining
+a copy of this software and associated documentation files (the
+"Software"), to deal in the Software without restriction, including
+without limitation the rights to use, copy, modify, merge, publish,
+distribute, sublicense, and/or sell copies of the Software, and to
+permit persons to whom the Software is furnished to do so, subject to
+the following conditions:
+The above copyright notice and this permission notice shall be
+included in all copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
+LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
+OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
+WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

data/README.textile CHANGED Viewed

@@ -1,17 +1,34 @@
 h1. Grim
-Grim is a simple gem for extracting a page from a pdf and converting it to an image as well as extract the text from the page as a string. It basically gives you an easy to use api to ghostscript, imagemagick, and pdftotext specific to this use case.
+Grim is a simple gem for extracting (reaping) a page from a pdf and converting it to an image as well as extract the text from the page as a string. It basically gives you an easy to use api to ghostscript, imagemagick, and pdftotext specific to this use case.
 h2. Prerequisites
-You will need ghostscript, imagemagick, and xpdf installed. On the Mac (OSX) I highly recommend using "Homebrew":http://mxcl.github.com/homebrew/ to get them installed, its as simple as "brew install ghostscript", "brew install imagemagick", and "brew install xpdf".
+You will need ghostscript, imagemagick, and xpdf installed. On the Mac (OSX) I highly recommend using "Homebrew":http://mxcl.github.com/homebrew/ to get them installed.
+<pre><code>
+  brew install ghostscript imagemagick xpdf
+</code></pre>
+h2. Installation
+<pre><code>
+  gem install grim
+</code></pre>
 h2. Usage
 <pre><code>
-  instance    = Grim.new("/path/to/pdf")
-  page_count  = instance.page_count                            # returns the number of pages in the pdf
-  png  = instance.page(1).to_image("/path/to/save/image.png")  # saves png to path and returns File instance
-  jpeg = instance.page(2).to_image("/path/to/save/image.jpeg") # saves jpeg to path and returns File instance
-  text = instance.page(3).text                                 # returns text as a string
-</pre></code>
+  pdf   = Grim.reap("/path/to/pdf")         # returns Grim::Pdf instance for pdf
+  count = pdf.count                         # returns the number of pages in the pdf
+  png   = pdf[3].save('/path/to/image.png') # will return true if page was saved or false if not
+  text  = pdf[3].text                       # returns text as a String
+  pdf.each do |page|
+    puts page.text
+  end
+</pre></code>
+h2. License
+See LICENSE for details.

data/grim.gemspec CHANGED Viewed

@@ -7,11 +7,12 @@ Gem::Specification.new do |s|
   s.version     = Grim::VERSION
   s.authors     = ["Jonathan Hoyt"]
   s.email       = ["jonmagic@gmail.com"]
-  s.homepage    = ""
+  s.homepage    = "http://github.com/jonmagic/grim"
   s.summary     = %q{Extract slides and text from a PDF.}
   s.description = %q{Grim is a simple gem for extracting a page from a pdf and converting it to an image as well as extract the text from the page as a string. It basically gives you an easy to use api to ghostscript, imagemagick, and pdftotext specific to this use case.}
   s.rubyforge_project = "grim"
+  s.add_dependency 'safe_shell', '~> 1.0.0'
   s.files         = `git ls-files`.split("\n")
   s.test_files    = `git ls-files -- {test,spec,features}/*`.split("\n")

data/lib/grim.rb CHANGED Viewed

@@ -1,17 +1,8 @@
-# Grim is a class with instance methods for getting number of pages in a pdf,
-# extracting a page as an image, and extracting the text from a page.
-#
-# For example:
-#
-#    instance    = Grim.new("/path/to/pdf")
-#    page_count  = instance.page_count
-#    png         = instance.page(1).to_png("/path/to/save/png")
-#    jpeg        = instance.page(2).to_jpeg("/path/to/save/jpeg")
-#    text        = instance.page(3).text
-#
-class Grim
+require 'safe_shell'
+module Grim
   # VERSION
-  VERSION = "0.1.0"
+  VERSION = "0.2.0"
   # Default resize output width, any positive integer
   WIDTH = 1024
@@ -30,75 +21,24 @@ class Grim
   class PdfNotFound < Grim::Exception
   end
-  # be able to store what page instance should focus on
-  attr_accessor :page_number
-  # initialize is called when a new instance is created and accepts path.
-  def initialize(path)
-    raise Grim::PdfNotFound unless File.exists?(path)
-    @page_number = 1
-    @path = path
+  # Exception that is raised if pdf does not have page
+  class PageNotFound < Grim::Exception
   end
-  # page_count uses the memoized path and shells out to ghostscript
-  # to read the pdf with the pdf_info.ps script as a filter,
-  # returning the number of pages in the pdf as an integer.
-  #
-  # For example:
-  #
-  #    instance.page_count
-  # => 4
+  # Creates and returns a new instance of Grim::Pdf
   #
-  # Returns an integer.
-  def page_count
-    @page_count ||= begin
-      `gs -dNODISPLAY -q -sFile=#{@path} ./lib/pdf_info.ps`.to_i
-    end
-  end
-  # page just sets the page attribute on the instance.
+  # path - a path string or object
   #
   # For example:
   #
-  #    instance.page(1)
-  # => instance
+  #   pdf = Grim.reap(/path/to/pdf)
   #
-  # Returns self.
-  def page(number)
-    @page_number = number
-    self
-  end
-  # Returns page_number minus 1
-  def index
-    @page_number - 1
-  end
-  # to_image extracts the selected page and turns it into an image.
-  # Tested on png and jpeg.
-  #
-  # For example:
+  # Returns an instance of Grim::Pdf
   #
-  #    instance.page(2).to_image(/path/to/save/image)
-  # => File
-  #
-  # Returns an instance of File
-  def to_image(path)
-    `convert -resize #{Grim::WIDTH} -antialias -render -quality #{Grim::QUALITY} -colorspace RGB -interlace none -density #{Grim::DENSITY} #{@path}[#{index}] #{path}`
-    file = File.open(path)
-    file.rewind
-    file
+  def self.reap(path)
+    Grim::Pdf.new(path)
   end
+end
-  # text is an instance method that extracts the text from the selected page.
-  #
-  # For example:
-  #
-  #    instance.page(2).text
-  # => "This is text from slide 2.\n\nAnd even more text from slide 2."
-  #
-  # Returns a string
-  def text
-    `pdftotext -enc UTF-8 -f #{@page_number} -l #{@page_number} #{@path} -`
-  end
-end
+require 'grim/pdf'
+require 'grim/page'

data/lib/grim/page.rb ADDED Viewed

@@ -0,0 +1,50 @@
+module Grim
+  class Page
+    attr_reader :number
+    # Sets up some instance variables on new instance.
+    #
+    # pdf - the pdf this page belongs to
+    # index - the index of the page in the array of pages
+    #
+    def initialize(pdf, index)
+      @pdf    = pdf
+      @index  = index
+      @number   = index + 1
+    end
+    # Extracts the selected page and turns it into an image.
+    # Tested on png and jpeg.
+    #
+    # path - String of the path to save to
+    #
+    # For example:
+    #
+    #   pdf[1].save(/path/to/save/image.png)
+    #   # => true
+    #
+    # Returns a File.
+    #
+    def save(path)
+      SafeShell.execute("convert", "-resize", Grim::WIDTH, "-antialias", "-render",
+                        "-quality", Grim::QUALITY, "-colorspace", "RGB",
+                        "-interlace", "none", "-density", Grim::DENSITY,
+                        "#{@pdf.path}[#{@index}]", path)
+      File.exists?(path)
+    end
+    # Extracts the text from the selected page.
+    #
+    # For example:
+    #
+    #   pdf[1].text
+    #   # => "This is text from slide 2.\n\nAnd even more text from slide 2."
+    #
+    # Returns a String.
+    #
+    def text
+      SafeShell.execute("pdftotext", "-enc", "UTF-8", "-f", @number, "-l", @number, @pdf.path, "-")
+    end
+  end
+end

data/lib/grim/pdf.rb ADDED Viewed

@@ -0,0 +1,60 @@
+module Grim
+  class Pdf
+    include Enumerable
+    attr_reader :path
+    # ghostscript prints out a warning, this regex matches it
+    WarningRegex = /\*\*\*\*.*\n/
+    # Raises an error if pdf not found and sets some instance
+    # variables if pdf is found.
+    #
+    # path - A String or Path to the pdf
+    #
+    def initialize(path)
+      raise Grim::PdfNotFound unless File.exists?(path)
+      @path = path
+    end
+    # Shells out to ghostscript to read the pdf with the pdf_info.ps script
+    # as a filter, returning the number of pages in the pdf as an integer.
+    #
+    # For example:
+    #
+    #   pdf.count
+    #   # => 4
+    #
+    # Returns an Integer.
+    #
+    def count
+      @count ||= begin
+        result = SafeShell.execute("gs", "-dNODISPLAY", "-q", "-sFile=#{@path}", "./lib/pdf_info.ps")
+        result.gsub(WarningRegex, '').to_i
+      end
+    end
+    # Creates an instance Grim::Page for the index passed in.
+    #
+    # index - accepts Integer for position in array
+    #
+    # For example:
+    #
+    #   pdf[4]  # returns 5th page
+    #
+    # Returns an instance of Grim::Page.
+    #
+    def [](index)
+      raise Grim::PageNotFound unless index >= 0 && index < count
+      Grim::Page.new(self, index)
+    end
+    def each
+      (0..(count-1)).each do |index|
+        yield Grim::Page.new(self, index)
+      end
+    end
+  end
+end

data/spec/lib/grim/page_spec.rb ADDED Viewed

@@ -0,0 +1,36 @@
+require 'fileutils'
+require 'spec_helper'
+describe Grim::Page do
+  after(:all) do
+    FileUtils.rm_rf(tmp_dir)
+  end
+  it "should have number" do
+    Grim::Page.new(Grim::Pdf.new(fixture_path("smoker.pdf")), 1).number.should == 2
+  end
+  describe "#save" do
+    before(:all) do
+      pdf = Grim::Pdf.new(fixture_path("smoker.pdf"))
+      pdf[0].save(tmp_path("to_png_spec.png"))
+      @file = File.open(tmp_path("to_png_spec.png"))
+    end
+    it "should create the file" do
+      File.exist?(tmp_path("to_png_spec.png")).should be_true
+    end
+    it "should have the right file size" do
+      @file.stat.size.should == 188515
+    end
+  end
+  describe "#text" do
+    it "should return the text from the selected page" do
+      pdf = Grim::Pdf.new(fixture_path("smoker.pdf"))
+      pdf[1].text.should == "Step 1: get someone to print this curve for you to scale, 72\342\200\235 wide\n\nStep 2: Get a couple 55 gallon drums\n\n\f"
+    end
+  end
+end

data/spec/lib/grim/pdf_spec.rb ADDED Viewed

@@ -0,0 +1,48 @@
+require 'spec_helper'
+describe Grim::Pdf do
+  it "should have a path" do
+    Grim::Pdf.new(fixture_path("smoker.pdf")).path.should == fixture_path("smoker.pdf")
+  end
+  describe "#initialize" do
+    it "should raise an error if pdf does not exist" do
+      lambda { Grim::Pdf.new(fixture_path("booboo.pdf")) }.should raise_error(Grim::PdfNotFound)
+    end
+    it "should set path on pdf" do
+      pdf = Grim::Pdf.new(fixture_path("smoker.pdf"))
+      pdf.path.should == fixture_path("smoker.pdf")
+    end
+  end
+  describe "#count" do
+    it "should return 25" do
+      pdf = Grim::Pdf.new(fixture_path("smoker.pdf"))
+      pdf.count.should == 25
+    end
+  end
+  describe "#[]" do
+    before(:each) do
+      @pdf = Grim::Pdf.new(fixture_path("smoker.pdf"))
+    end
+    it "should raise Grim::PageDoesNotExist if page doesn't exist" do
+      lambda { @pdf[25] }.should raise_error(Grim::PageNotFound)
+    end
+    it "should return an instance of Grim::Page if page exists" do
+      @pdf[24].class.should == Grim::Page
+    end
+  end
+  describe "#each" do
+    it "should be iterable" do
+      pdf = Grim::Pdf.new(fixture_path("smoker.pdf"))
+      pdf.map {|p| p.number }.should == (1..25).to_a
+    end
+  end
+end

data/spec/lib/grim_spec.rb CHANGED Viewed

@@ -1,95 +1,25 @@
-require 'fileutils'
 require 'spec_helper'
 describe Grim do
-  after(:all) do
-    FileUtils.rm_rf(tmp_dir)
-  end
   it "should have a VERSION constant" do
     Grim.const_defined?('VERSION').should be_true
   end
-  describe "Pdf" do
-    describe "#initialize" do
-      it "should raise an error if pdf does not exist" do
-        lambda { Grim.new(fixture_path("booboo.pdf")) }.should raise_error(Grim::PdfNotFound)
-      end
-    end
-    describe "#page_count" do
-      it "should return an integer" do
-        instance = Grim.new(fixture_path("smoker.pdf"))
-        instance.page_count.should == 25
-      end
-    end
-    describe "#page" do
-      it "should be set to 1 by default" do
-        instance = Grim.new(fixture_path("smoker.pdf"))
-        instance.page_number.should == 1
-      end
-      it "should set page attribute and return instance" do
-        instance = Grim.new(fixture_path("smoker.pdf"))
-        instance.page(2).should == instance
-        instance.page_number.should == 2
-      end
-    end
-    describe "#index" do
-      it "should return page minus 1" do
-        instance = Grim.new(fixture_path("smoker.pdf"))
-        instance.page(2)
-        instance.index.should == 1
-      end
-    end
-    describe "#to_image" do
-      describe "output png" do
-        before(:all) do
-          instance = Grim.new(fixture_path("smoker.pdf"))
-          @png = instance.to_image(tmp_path("to_png_spec.png"))
-        end
-        it "should create the file" do
-          File.exist?(tmp_path("to_png_spec.png")).should be_true
-        end
-        it "should return an instance of File" do
-          @png.class.should == File
-        end
-        it "should have the right file size" do
-          @png.stat.size.should == 188515
-        end
-      end
-      describe "output jpeg" do
-        before(:all) do
-          instance = Grim.new(fixture_path("smoker.pdf"))
-          @jpeg = instance.to_image(tmp_path("to_jpeg_spec.jpeg"))
-        end
-        it "should create the file" do
-          File.exist?(tmp_path("to_jpeg_spec.jpeg")).should be_true
-        end
+  it "should have WIDTH constant set to 1024" do
+    Grim::WIDTH.should == 1024
+  end
-        it "should return an instance of File" do
-          @jpeg.class.should == File
-        end
+  it "should have QUALITY constant set to 90" do
+    Grim::QUALITY.should == 90
+  end
-        it "should have the right file size" do
-          @jpeg.stat.size.should == 53980
-        end
-      end
-    end
+  it "should have DENSITY constant set to 300" do
+    Grim::DENSITY.should == 300
+  end
-    describe "#text" do
-      it "should return the text from the selected page" do
-        instance = Grim.new(fixture_path("smoker.pdf"))
-        instance.page(2).text.should == "Step 1: get someone to print this curve for you to scale, 72\342\200\235 wide\n\nStep 2: Get a couple 55 gallon drums\n\n\f"
-      end
+  describe "#new" do
+    it "should return an instance of Grim::Pdf" do
+      Grim.reap(fixture_path("smoker.pdf")).class.should == Grim::Pdf
     end
   end
 end

metadata CHANGED Viewed

@@ -1,13 +1,13 @@
 --- !ruby/object:Gem::Specification
 name: grim
 version: !ruby/object:Gem::Version
-  hash: 27
+  hash: 23
   prerelease:
   segments:
   - 0
-  - 1
+  - 2
   - 0
-  version: 0.1.0
+  version: 0.2.0
 platform: ruby
 authors:
 - Jonathan Hoyt
@@ -15,10 +15,25 @@ autorequire:
 bindir: bin
 cert_chain: []
-date: 2011-09-05 00:00:00 -04:00
+date: 2011-09-06 00:00:00 -04:00
 default_executable:
-dependencies: []
+dependencies:
+- !ruby/object:Gem::Dependency
+  name: safe_shell
+  prerelease: false
+  requirement: &id001 !ruby/object:Gem::Requirement
+    none: false
+    requirements:
+    - - ~>
+      - !ruby/object:Gem::Version
+        hash: 23
+        segments:
+        - 1
+        - 0
+        - 0
+        version: 1.0.0
+  type: :runtime
+  version_requirements: *id001
 description: Grim is a simple gem for extracting a page from a pdf and converting it to an image as well as extract the text from the page as a string. It basically gives you an easy to use api to ghostscript, imagemagick, and pdftotext specific to this use case.
 email:
 - jonmagic@gmail.com
@@ -31,16 +46,21 @@ extra_rdoc_files: []
 files:
 - .gitignore
 - Gemfile
+- LICENSE
 - README.textile
 - Rakefile
 - grim.gemspec
 - lib/grim.rb
+- lib/grim/page.rb
+- lib/grim/pdf.rb
 - lib/pdf_info.ps
 - spec/fixtures/smoker.pdf
+- spec/lib/grim/page_spec.rb
+- spec/lib/grim/pdf_spec.rb
 - spec/lib/grim_spec.rb
 - spec/spec_helper.rb
 has_rdoc: true
-homepage: ""
+homepage: http://github.com/jonmagic/grim
 licenses: []
 post_install_message:
@@ -75,5 +95,7 @@ specification_version: 3
 summary: Extract slides and text from a PDF.
 test_files:
 - spec/fixtures/smoker.pdf
+- spec/lib/grim/page_spec.rb
+- spec/lib/grim/pdf_spec.rb
 - spec/lib/grim_spec.rb
 - spec/spec_helper.rb