RubyGems - htmls_to_pdf - Versions diffs - 0.0.6 → 0.0.7 - Mend

htmls_to_pdf 0.0.6 → 0.0.7

Files changed (7) hide show

data/README.markdown +106 -22
data/htmls_to_pdf.gemspec +2 -0
data/lib/htmls_to_pdf/htmls_to_pdf.rb +4 -4
data/lib/htmls_to_pdf/version.rb +1 -1
data/spec/htmls_to_pdf_spec.rb +126 -0
data/spec/spec_helper.rb +7 -1
metadata +25 -2

data/README.markdown CHANGED Viewed

@@ -6,10 +6,12 @@ HtmlsToPdf enables you to package one or more (ordered) HTML pages as a PDF.
 ## USEFULNESS
-I created HtmlsToPdf because I often see multi-page websites with content I would rather have in a single PDF file, so I can view it offline and search through it. The Ruby on Rails Guides is one example. RSpec documentation is another.
+I created HtmlsToPdf because I often see multi-page websites with content I would rather have in a single PDF file for searching and offline viewing. Examples include: The Ruby on Rails Guides and RSpec documentation.
 ## REQUIREMENTS
+I have run this only on Linux. It likely works on OS X. It may not work on Windows.
 HtmlsToPdf uses the PDFKit gem, which itself uses the [wkhtmltopdf](http://madalgo.au.dk/~jakobt/wkhtmltoxdoc/wkhtmltopdf-0.9.9-doc.html) program, which uses qtwebkit.
 Dependence chain summary: HtmlsToPdf -> PDFKit -> wkhtmltopdf -> qtwebkit -> webkit
@@ -36,29 +38,59 @@ For information on PDFKit:
 ## BASIC USAGE
-You will find 16 example scripts in the /examples directory. Each creates a PDF from a website:
-- The 12 Factor App (Adam Wiggins)
-- Bash Guide (Greg Wooledge)
-- Coffeescript Meet Backbone.js (Adam J. Spooner)
-- Coffeescript Cookbook ([Various authors](http://coffeescriptcookbook.com/authors))
-- Coffeescript official documentation
-- Exploring Coffeescript (ElegantCode.com)
-- Advanced Rails (Jumpstart Labs)
-- The Little Book on Coffeescript (Alex MacCaw)
-- Natural Language Processing for the Working Programmer (Daniël de Kok)
-- Learn Python the Hard Way (Zed A. Shaw)
-- Practicing Ruby Vol 2 (Gregory Brown)
+To use HtmlsToPdf, you create a new HtmlsToPdf object and pass in all your configuration options. You then call create_pdf on the new object:
+    require 'rubygems'
+    require 'htmls_to_pdf'
+    config = {}
+    config[:urls] = ['http://.../url1.htm', 'https://.../url2.html']
+    config[:savedir] = '~/my/savedir'
+    config[:savename] = 'Name_to_save_file_as.pdf'
+    HtmlsToPdf.new(config).create_pdf
+## OPTIONS
+`config[:css]` takes an array of CSS files to apply during PDF rendering. (You can also pass a single CSS file as a string.)
+`config[:overwrite_existing_pdf]` (default: false) determines whether the program can overwrite a previously generated PDF file
+`config[:options]` takes a hash of options that are passed through to PdfKit
+`config[:remove_css_files]` (default: true) determines whether CSS files used to generate the PDF file are deleted or retained. You probably want to set this to false if you want to modify the CSS file(s).
+`config[:remove_html_files]` (default: true) determines whether HTML files downloaded from websites and used to generate the PDF file are deleted or retained. You probably want to set this to false if you think you may want to regenerate the PDF again, perhaps because you're tweaking the CSS file to adjust rendering.
+`config[:remove_tmp_pdf_files]` (default: true) determines whether temporary PDF files (one per HTML file) created during the PDF generation process are deleted or retained. You probably want to accept the default and always regenerate the temporary PDFs.
+`config[:remove_temp_files]` (default: false) sets `:remove_css_files`, `:remove_html_files`, and `:remove_tmp_pdf_files` all to true
+## EXAMPLES
+You will find 17 example scripts in the /examples directory. Each creates a PDF from a website:
+- [The 12 Factor App](http://www.12factor.net) (Adam Wiggins)
+- [Bash Guide](http://mywiki.wooledge.org/BashGuide) (Greg Wooledge)
+- [Coffeescript Meet Backbone.js](http://adamjspooner.github.com/coffeescript-meet-backbonejs/) (Adam J. Spooner)
+- [Coffeescript Cookbook](http://coffeescriptcookbook.com) ([Various authors](http://coffeescriptcookbook.com/authors))
+- [Coffeescript official documentation](http://coffeescript.org/)
+- [Exploring Coffeescript](http://elegantcode.com/2011/08/09/exploring-coffeescript-part-6-show-me-the-goodies/) (ElegantCode.com)
+- [Advanced Rails - Five-Day](http://tutorials.jumpstartlab.com/paths/advanced_rails_five_day.html) (Jumpstart Labs)
+- [The Little Book on Coffeescript](http://arcturo.github.com/library/coffeescript/) (Alex MacCaw)
+- [Natural Language Processing for the Working Programmer](nlpwp.org/book/) (Daniël de Kok)
+- [Learn Python the Hard Way](http://learnpythonthehardway.org) (Zed A. Shaw)
+- [Practicing Ruby Vol 2](http://community.mendicantuniversity.org/articles/practicing-ruby-volume-2-now-freely-avai) (Gregory Brown)
 - Rails 3.1 release notes
-- Ruby on Rails Guides
-- RSpec-Rails documentation
-- RSpec documentation
-- Learn Ruby the Hard Way (Zed A. Shaw)
-- RubyGems User Guide
+- [Ruby on Rails Guides](http://guides.rubyonrails.org)
+- [RSpec-Rails documentation](https://www.relishapp.com/rspec/rspec-rails/docs)
+- [RSpec documentation](https://www.relishapp.com/rspec/rspec-rails/docs)
+- [Learn Ruby the Hard Way](http://ruby.learncodethehardway.org) (Zed A. Shaw)
+- [RubyGems User Guide](http://docs.rubygems.org/read/book/1)
 After you install HtmlsToPdf and its dependencies, you can write an ordinary Ruby script to save multiple ordered HTML pages as a single PDF.
-### EXAMPLE 1
+### EXAMPLE 1: Single HTML page without CSS
 Annotated version of /examples/get\_rails\_3\_1\_release\_notes.rb:
@@ -84,7 +116,7 @@ Annotated version of /examples/get\_rails\_3\_1\_release\_notes.rb:
     # on the new object
     HtmlsToPdf.new(config).create_pdf
-### EXAMPLE 2
+### EXAMPLE 2: Multiple HTML pages without CSS
 Annotated version of /examples/get\_rubygems\_user\_guide.rb:
@@ -117,7 +149,7 @@ Annotated version of /examples/get\_rubygems\_user\_guide.rb:
     # on the new object
     HtmlsToPdf.new(config).create_pdf
-### EXAMPLE 3
+### EXAMPLE 3: Multiple HTML pages with CSS & PdfKit formatting options
 Annotated version of /examples/get\_coffeescript\_meet\_backbone.rb:
@@ -146,6 +178,58 @@ Annotated version of /examples/get\_coffeescript\_meet\_backbone.rb:
     HtmlsToPdf.new(config).create_pdf
+### EXAMPLE 4: Multiple HTML pages with hand-modified CSS file to adjust rendering
+Annotated version of /examples/get\_ruby\_core\_docs.rb:
+    require 'rubygems'
+    require 'htmls_to_pdf'
+    # Get 'Ruby Core documentation' as pdf file
+    # Source: 'http://www.ruby-doc.org/core-1.9.3/'
+    config = {}
+    config[:urls] = %w(
+    ARGF.html
+    ArgumentError.html
+    Array.html
+    BasicObject.html
+    ...
+    ZeroDivisionError.html
+    fatal.html)
+    config[:urls] = config[:urls].map { |u| 'http://www.ruby-doc.org/core-1.9.3/' + u }
+    config[:savedir] = '~/Tech/Ruby/DOCUMENTATION'
+    config[:savename] = 'Ruby_Core_docs.pdf'
+    # Specify a CSS file
+    config[:css] = 'http://www.ruby-doc.org/core-1.9.3/css/obf.css'
+    # Tell HtmlsToPdf not to remove the CSS file
+    config[:remove_css_files] = false
+    # You are now free to create a "obf.css" file in the directory
+    # and edit it however you choose. It will not be overwritten.
+    # (Alternatively, you can run the program once and then modify
+    # the downloaded CSS file.)
+    #
+    # I added the following to the CSS file to suppress unwanted output:
+    #
+    # .info, noscript, #footer, #metadata, #actionbar, .dsq-brlink {
+    #   display: none;
+    #   width: 0;
+    # }
+    # .class #documentation, .file #documentation, .module #documentation {
+    #   margin: 2em 1em 5em 1em;
+    # }
+    #
+    # If you're playing around with CSS to optimize the display in your
+    # PDF, I recommend you set config[:remove_html_files] = false to
+    # avoid repeatedly downloading the HTML files from the server.
+    HtmlsToPdf.new(config).create_pdf
 ## LEGAL DISCLAIMER
 Please use at your own risk. I guarantee nothing about this program.

data/htmls_to_pdf.gemspec CHANGED Viewed

@@ -11,6 +11,8 @@ Gem::Specification.new do |s|
   s.description     = %q{Creates single PDF file from 1+ HTML pages using PDFKit}
   s.add_runtime_dependency 'pdfkit', '~> 0.5', '>= 0.5.2'
   s.add_development_dependency 'rspec'
+  s.add_development_dependency 'webmock'
+  s.add_development_dependency 'fakefs'
   s.require_paths   = ['lib']
   s.files           = `git ls-files`.split("\n")
   s.test_files      = `git ls-files -- {test,spec,features}/*`.split("\n")

data/lib/htmls_to_pdf/htmls_to_pdf.rb CHANGED Viewed

@@ -7,7 +7,7 @@ require 'net/http'
 class HtmlsToPdf
-  attr_reader :htmlarray, :pdfarray, :cssarray, :urls, :savedir, :savename, :remove_html_files, :remove_css_files, :remove_tmp_pdf_files
+  attr_reader :htmlarray, :pdfarray, :cssarray, :urls, :savedir, :savename, :remove_temp_files, :remove_html_files, :remove_css_files, :remove_tmp_pdf_files, :options, :overwrite_existing_pdf
   TMP_HTML_PREFIX = 'tmp_html_file_'
   TMP_PDF_PREFIX = 'tmp_pdf_file_'
@@ -20,10 +20,10 @@ class HtmlsToPdf
       :remove_tmp_pdf_files => true,
       :overwrite_existing_pdf => false,
       :options => {},
-      :savedir => File.expand_path("~"),
+      :savedir => "~",
       :savename => 'htmls_to_pdf.pdf'
     }.merge(in_config)
-    set_dir(config[:savedir])
+    set_dir(File.expand_path(config[:savedir]))
     @savename = config[:savename]
     @overwrite_existing_pdf = config[:overwrite_existing_pdf]
     exit_if_pdf_exists unless @overwrite_existing_pdf
@@ -75,7 +75,7 @@ class HtmlsToPdf
   end
   def add_dot_html(urls)
-    urls.map { |url| url.match(/\.html?$/) ? url : url + '.html' }
+    urls.map { |url| url.match(/\.s?html?$/) ? url : url + '.html' }
   end
   def create_pdfarray

data/lib/htmls_to_pdf/version.rb CHANGED Viewed

@@ -1,3 +1,3 @@
 module HtmlsToPdf
-  VERSION = "0.0.6"
+  VERSION = "0.0.7"
 end

data/spec/htmls_to_pdf_spec.rb ADDED Viewed

@@ -0,0 +1,126 @@
+require "spec_helper"
+require 'htmls_to_pdf'
+describe "initialization" do
+  context "without argument" do
+    let(:in_config) {  }
+    subject { HtmlsToPdf.new }
+    it "should initialize" do
+      subject.should be_true
+    end
+    its(:overwrite_existing_pdf) { should be_false }
+    its(:remove_temp_files) { should be_false }
+    its(:remove_css_files) { should be_true }
+    its(:remove_html_files) { should be_true }
+    its(:remove_tmp_pdf_files) { should be_true }
+    its(:cssarray) { should be_empty }
+    its(:urls) { should be_empty }
+    its(:savedir) { should == File.expand_path("~") }
+    its(:savename) { should == 'htmls_to_pdf.pdf' }
+    its(:options) { should be_kind_of Hash }
+    its(:options) { should be_empty }
+  end
+  context "with basic config" do
+    let(:url_arr) { %w(http://www.fakeurl.com/adfsdafds.html https://anotherfakedomain.com/blog/posts/143.htm) }
+    let(:in_config) { {savedir: '~/my/savedir',
+                       savename: 'Name_to_save_file_as.pdf',
+                       urls: url_arr }
+                    }
+    subject { HtmlsToPdf.new(in_config) }
+    #subject { HtmlsToPdf.new(attributes_for(:config)) }
+    it "should initialize" do
+      subject.should be_true
+    end
+    its(:overwrite_existing_pdf) { should be_false }
+    its(:remove_temp_files) { should be_false }
+    its(:remove_css_files) { should be_true }
+    its(:remove_html_files) { should be_true }
+    its(:remove_tmp_pdf_files) { should be_true }
+    its(:cssarray) { should be_empty }
+    its(:urls) { should == url_arr }
+    its(:savedir) { should == File.expand_path("~/my/savedir") }
+    its(:savename) { should == 'Name_to_save_file_as.pdf' }
+    its(:options) { should be_kind_of Hash }
+    its(:options) { should be_empty }
+    it "should call all subfunctions of create_pdf" do
+      subject.should_receive(:clean_temp_files).twice.and_return('Temp files cleaned')
+      subject.should_receive(:download_files).once.and_return('HTML files downloaded')
+      subject.should_receive(:generate_pdfs).once.and_return('PDF files generated')
+      subject.should_receive(:join_pdfs).once.and_return('PDF files joined')
+      subject.create_pdf
+    end
+  end
+  context "with more complicated config" do
+    let(:url_arr) { %w(http://www.fakeurl.com/adfsdafds.html https://anotherfakedomain.com/blog/posts/143.htm) }
+    let(:css_arr) { %w(http://fakeurl.com/assets/cssfile.css https://www.anotherfakedomain.com/public/css/CSS-file.css) }
+    let(:in_config) { {savedir: '~/my/savedir',
+                       savename: 'Name_to_save_file_as.pdf',
+                       urls: url_arr,
+                       css: css_arr,
+                       remove_css_files: false,
+                       remove_html_files: false,
+                       overwrite_existing_pdf: true }
+                    }
+    subject { HtmlsToPdf.new(in_config) }
+    #subject { HtmlsToPdf.new(attributes_for(:config)) }
+    it "should initialize" do
+      subject.should be_true
+    end
+    its(:overwrite_existing_pdf) { should be_true }
+    its(:remove_temp_files) { should be_false }
+    its(:remove_css_files) { should be_false }
+    its(:remove_html_files) { should be_false }
+    its(:remove_tmp_pdf_files) { should be_true }
+    its(:cssarray) { should == css_arr }
+    its(:urls) { should == url_arr }
+    its(:savedir) { should == File.expand_path("~/my/savedir") }
+    its(:savename) { should == 'Name_to_save_file_as.pdf' }
+    its(:options) { should be_kind_of Hash }
+    its(:options) { should be_empty }
+    it "should call all subfunctions of create_pdf" do
+      subject.should_receive(:clean_temp_files).twice.and_return('Temp files cleaned')
+      subject.should_receive(:download_files).once.and_return('HTML files downloaded')
+      subject.should_receive(:generate_pdfs).once.and_return('PDF files generated')
+      subject.should_receive(:join_pdfs).once.and_return('PDF files joined')
+      subject.create_pdf
+    end
+    it "should not delete html or css files" do
+      stub_request(:get, "http://www.fakeurl.com/adfsdafds.html").
+         with(:headers => {'Accept'=>'*/*', 'User-Agent'=>'Ruby'}).
+         to_return(:status => 200, :body => "", :headers => {})
+      stub_request(:get, "http://anotherfakedomain.com:443/blog/posts/143.htm").
+         with(:headers => {'Accept'=>'*/*', 'User-Agent'=>'Ruby'}).
+         to_return(:status => 200, :body => "", :headers => {})
+      stub_request(:get, "http://fakeurl.com/assets/cssfile.css").
+         with(:headers => {'Accept'=>'*/*', 'User-Agent'=>'Ruby'}).
+         to_return(:status => 200, :body => "", :headers => {})
+      stub_request(:get, "http://www.anotherfakedomain.com:443/public/css/CSS-file.css").
+         with(:headers => {'Accept'=>'*/*', 'User-Agent'=>'Ruby'}).
+         to_return(:status => 200, :body => "", :headers => {})
+      subject.should_receive(:generate_pdfs).once.and_return('PDF files generated')
+      subject.should_receive(:join_pdfs).once.and_return('PDF files joined')
+      subject.should_not_receive(:delete_html_files)
+      subject.should_not_receive(:delete_css_files)
+      subject.create_pdf
+    end
+    it "should request the HTML files" do
+      stub_request(:get, "http://www.fakeurl.com/adfsdafds.html").
+         with(:headers => {'Accept'=>'*/*', 'User-Agent'=>'Ruby'}).
+         to_return(:status => 200, :body => "", :headers => {})
+      stub_request(:get, "http://anotherfakedomain.com:443/blog/posts/143.htm").
+         with(:headers => {'Accept'=>'*/*', 'User-Agent'=>'Ruby'}).
+         to_return(:status => 200, :body => "", :headers => {})
+      stub_request(:get, "http://fakeurl.com/assets/cssfile.css").
+         with(:headers => {'Accept'=>'*/*', 'User-Agent'=>'Ruby'}).
+         to_return(:status => 200, :body => "", :headers => {})
+      stub_request(:get, "http://www.anotherfakedomain.com:443/public/css/CSS-file.css").
+         with(:headers => {'Accept'=>'*/*', 'User-Agent'=>'Ruby'}).
+         to_return(:status => 200, :body => "", :headers => {})
+      subject.should_receive(:generate_pdfs).once.and_return('PDF files generated')
+      subject.should_receive(:join_pdfs).once.and_return('PDF files joined')
+      subject.create_pdf
+      a_request(:get, "www.fakeurl.com/adfsdafds.html").should have_been_made
+    end
+  end
+end

data/spec/spec_helper.rb CHANGED Viewed

@@ -1,2 +1,8 @@
-Dir["./spec/support/**/*.rb"].each {|f| require f}
+require 'fakefs/spec_helpers'
+require 'webmock/rspec'
+RSpec.configure do |config|
+  config.include FakeFS::SpecHelpers
+end
+Dir["./spec/support/**/*.rb"].each {|f| require f}

metadata CHANGED Viewed

@@ -2,14 +2,14 @@
 name: htmls_to_pdf
 version: !ruby/object:Gem::Version
   prerelease:
-  version: 0.0.6
+  version: 0.0.7
 platform: ruby
 authors:
 - James Lavin
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2012-03-27 00:00:00.000000000Z
+date: 2012-03-28 00:00:00.000000000Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: pdfkit
@@ -36,6 +36,28 @@ dependencies:
   requirement: *2082
   prerelease: false
   type: :development
+- !ruby/object:Gem::Dependency
+  name: webmock
+  version_requirements: &2100 !ruby/object:Gem::Requirement
+    requirements:
+    - - ! '>='
+      - !ruby/object:Gem::Version
+        version: '0'
+    none: false
+  requirement: *2100
+  prerelease: false
+  type: :development
+- !ruby/object:Gem::Dependency
+  name: fakefs
+  version_requirements: &2116 !ruby/object:Gem::Requirement
+    requirements:
+    - - ! '>='
+      - !ruby/object:Gem::Version
+        version: '0'
+    none: false
+  requirement: *2116
+  prerelease: false
+  type: :development
 description: Creates single PDF file from 1+ HTML pages using PDFKit
 email:
 - htmls_to_pdf@futureresearch.com
@@ -70,6 +92,7 @@ files:
 - lib/htmls_to_pdf/version.rb
 - license.txt
 - spec/ensure_pdfkit_installed_spec.rb
+- spec/htmls_to_pdf_spec.rb
 - spec/spec_helper.rb
 homepage:
 licenses: []