htmls_to_pdf 0.0.6 → 0.0.7

Sign up to get free protection for your applications and to get access to all the features.
data/README.markdown CHANGED
@@ -6,10 +6,12 @@ HtmlsToPdf enables you to package one or more (ordered) HTML pages as a PDF.
6
6
 
7
7
  ## USEFULNESS
8
8
 
9
- I created HtmlsToPdf because I often see multi-page websites with content I would rather have in a single PDF file, so I can view it offline and search through it. The Ruby on Rails Guides is one example. RSpec documentation is another.
9
+ I created HtmlsToPdf because I often see multi-page websites with content I would rather have in a single PDF file for searching and offline viewing. Examples include: The Ruby on Rails Guides and RSpec documentation.
10
10
 
11
11
  ## REQUIREMENTS
12
12
 
13
+ I have run this only on Linux. It likely works on OS X. It may not work on Windows.
14
+
13
15
  HtmlsToPdf uses the PDFKit gem, which itself uses the [wkhtmltopdf](http://madalgo.au.dk/~jakobt/wkhtmltoxdoc/wkhtmltopdf-0.9.9-doc.html) program, which uses qtwebkit.
14
16
 
15
17
  Dependence chain summary: HtmlsToPdf -> PDFKit -> wkhtmltopdf -> qtwebkit -> webkit
@@ -36,29 +38,59 @@ For information on PDFKit:
36
38
 
37
39
  ## BASIC USAGE
38
40
 
39
- You will find 16 example scripts in the /examples directory. Each creates a PDF from a website:
40
-
41
- - The 12 Factor App (Adam Wiggins)
42
- - Bash Guide (Greg Wooledge)
43
- - Coffeescript Meet Backbone.js (Adam J. Spooner)
44
- - Coffeescript Cookbook ([Various authors](http://coffeescriptcookbook.com/authors))
45
- - Coffeescript official documentation
46
- - Exploring Coffeescript (ElegantCode.com)
47
- - Advanced Rails (Jumpstart Labs)
48
- - The Little Book on Coffeescript (Alex MacCaw)
49
- - Natural Language Processing for the Working Programmer (Daniël de Kok)
50
- - Learn Python the Hard Way (Zed A. Shaw)
51
- - Practicing Ruby Vol 2 (Gregory Brown)
41
+ To use HtmlsToPdf, you create a new HtmlsToPdf object and pass in all your configuration options. You then call create_pdf on the new object:
42
+
43
+ require 'rubygems'
44
+ require 'htmls_to_pdf'
45
+
46
+ config = {}
47
+ config[:urls] = ['http://.../url1.htm', 'https://.../url2.html']
48
+ config[:savedir] = '~/my/savedir'
49
+ config[:savename] = 'Name_to_save_file_as.pdf'
50
+
51
+ HtmlsToPdf.new(config).create_pdf
52
+
53
+ ## OPTIONS
54
+
55
+ `config[:css]` takes an array of CSS files to apply during PDF rendering. (You can also pass a single CSS file as a string.)
56
+
57
+ `config[:overwrite_existing_pdf]` (default: false) determines whether the program can overwrite a previously generated PDF file
58
+
59
+ `config[:options]` takes a hash of options that are passed through to PdfKit
60
+
61
+ `config[:remove_css_files]` (default: true) determines whether CSS files used to generate the PDF file are deleted or retained. You probably want to set this to false if you want to modify the CSS file(s).
62
+
63
+ `config[:remove_html_files]` (default: true) determines whether HTML files downloaded from websites and used to generate the PDF file are deleted or retained. You probably want to set this to false if you think you may want to regenerate the PDF again, perhaps because you're tweaking the CSS file to adjust rendering.
64
+
65
+ `config[:remove_tmp_pdf_files]` (default: true) determines whether temporary PDF files (one per HTML file) created during the PDF generation process are deleted or retained. You probably want to accept the default and always regenerate the temporary PDFs.
66
+
67
+ `config[:remove_temp_files]` (default: false) sets `:remove_css_files`, `:remove_html_files`, and `:remove_tmp_pdf_files` all to true
68
+
69
+ ## EXAMPLES
70
+
71
+ You will find 17 example scripts in the /examples directory. Each creates a PDF from a website:
72
+
73
+ - [The 12 Factor App](http://www.12factor.net) (Adam Wiggins)
74
+ - [Bash Guide](http://mywiki.wooledge.org/BashGuide) (Greg Wooledge)
75
+ - [Coffeescript Meet Backbone.js](http://adamjspooner.github.com/coffeescript-meet-backbonejs/) (Adam J. Spooner)
76
+ - [Coffeescript Cookbook](http://coffeescriptcookbook.com) ([Various authors](http://coffeescriptcookbook.com/authors))
77
+ - [Coffeescript official documentation](http://coffeescript.org/)
78
+ - [Exploring Coffeescript](http://elegantcode.com/2011/08/09/exploring-coffeescript-part-6-show-me-the-goodies/) (ElegantCode.com)
79
+ - [Advanced Rails - Five-Day](http://tutorials.jumpstartlab.com/paths/advanced_rails_five_day.html) (Jumpstart Labs)
80
+ - [The Little Book on Coffeescript](http://arcturo.github.com/library/coffeescript/) (Alex MacCaw)
81
+ - [Natural Language Processing for the Working Programmer](nlpwp.org/book/) (Daniël de Kok)
82
+ - [Learn Python the Hard Way](http://learnpythonthehardway.org) (Zed A. Shaw)
83
+ - [Practicing Ruby Vol 2](http://community.mendicantuniversity.org/articles/practicing-ruby-volume-2-now-freely-avai) (Gregory Brown)
52
84
  - Rails 3.1 release notes
53
- - Ruby on Rails Guides
54
- - RSpec-Rails documentation
55
- - RSpec documentation
56
- - Learn Ruby the Hard Way (Zed A. Shaw)
57
- - RubyGems User Guide
85
+ - [Ruby on Rails Guides](http://guides.rubyonrails.org)
86
+ - [RSpec-Rails documentation](https://www.relishapp.com/rspec/rspec-rails/docs)
87
+ - [RSpec documentation](https://www.relishapp.com/rspec/rspec-rails/docs)
88
+ - [Learn Ruby the Hard Way](http://ruby.learncodethehardway.org) (Zed A. Shaw)
89
+ - [RubyGems User Guide](http://docs.rubygems.org/read/book/1)
58
90
 
59
91
  After you install HtmlsToPdf and its dependencies, you can write an ordinary Ruby script to save multiple ordered HTML pages as a single PDF.
60
92
 
61
- ### EXAMPLE 1
93
+ ### EXAMPLE 1: Single HTML page without CSS
62
94
 
63
95
  Annotated version of /examples/get\_rails\_3\_1\_release\_notes.rb:
64
96
 
@@ -84,7 +116,7 @@ Annotated version of /examples/get\_rails\_3\_1\_release\_notes.rb:
84
116
  # on the new object
85
117
  HtmlsToPdf.new(config).create_pdf
86
118
 
87
- ### EXAMPLE 2
119
+ ### EXAMPLE 2: Multiple HTML pages without CSS
88
120
 
89
121
  Annotated version of /examples/get\_rubygems\_user\_guide.rb:
90
122
 
@@ -117,7 +149,7 @@ Annotated version of /examples/get\_rubygems\_user\_guide.rb:
117
149
  # on the new object
118
150
  HtmlsToPdf.new(config).create_pdf
119
151
 
120
- ### EXAMPLE 3
152
+ ### EXAMPLE 3: Multiple HTML pages with CSS & PdfKit formatting options
121
153
 
122
154
  Annotated version of /examples/get\_coffeescript\_meet\_backbone.rb:
123
155
 
@@ -146,6 +178,58 @@ Annotated version of /examples/get\_coffeescript\_meet\_backbone.rb:
146
178
 
147
179
  HtmlsToPdf.new(config).create_pdf
148
180
 
181
+ ### EXAMPLE 4: Multiple HTML pages with hand-modified CSS file to adjust rendering
182
+
183
+ Annotated version of /examples/get\_ruby\_core\_docs.rb:
184
+
185
+ require 'rubygems'
186
+ require 'htmls_to_pdf'
187
+
188
+ # Get 'Ruby Core documentation' as pdf file
189
+ # Source: 'http://www.ruby-doc.org/core-1.9.3/'
190
+
191
+ config = {}
192
+
193
+ config[:urls] = %w(
194
+ ARGF.html
195
+ ArgumentError.html
196
+ Array.html
197
+ BasicObject.html
198
+ ...
199
+ ZeroDivisionError.html
200
+ fatal.html)
201
+
202
+ config[:urls] = config[:urls].map { |u| 'http://www.ruby-doc.org/core-1.9.3/' + u }
203
+ config[:savedir] = '~/Tech/Ruby/DOCUMENTATION'
204
+ config[:savename] = 'Ruby_Core_docs.pdf'
205
+
206
+ # Specify a CSS file
207
+ config[:css] = 'http://www.ruby-doc.org/core-1.9.3/css/obf.css'
208
+
209
+ # Tell HtmlsToPdf not to remove the CSS file
210
+ config[:remove_css_files] = false
211
+
212
+ # You are now free to create a "obf.css" file in the directory
213
+ # and edit it however you choose. It will not be overwritten.
214
+ # (Alternatively, you can run the program once and then modify
215
+ # the downloaded CSS file.)
216
+ #
217
+ # I added the following to the CSS file to suppress unwanted output:
218
+ #
219
+ # .info, noscript, #footer, #metadata, #actionbar, .dsq-brlink {
220
+ # display: none;
221
+ # width: 0;
222
+ # }
223
+ # .class #documentation, .file #documentation, .module #documentation {
224
+ # margin: 2em 1em 5em 1em;
225
+ # }
226
+ #
227
+ # If you're playing around with CSS to optimize the display in your
228
+ # PDF, I recommend you set config[:remove_html_files] = false to
229
+ # avoid repeatedly downloading the HTML files from the server.
230
+
231
+ HtmlsToPdf.new(config).create_pdf
232
+
149
233
  ## LEGAL DISCLAIMER
150
234
 
151
235
  Please use at your own risk. I guarantee nothing about this program.
data/htmls_to_pdf.gemspec CHANGED
@@ -11,6 +11,8 @@ Gem::Specification.new do |s|
11
11
  s.description = %q{Creates single PDF file from 1+ HTML pages using PDFKit}
12
12
  s.add_runtime_dependency 'pdfkit', '~> 0.5', '>= 0.5.2'
13
13
  s.add_development_dependency 'rspec'
14
+ s.add_development_dependency 'webmock'
15
+ s.add_development_dependency 'fakefs'
14
16
  s.require_paths = ['lib']
15
17
  s.files = `git ls-files`.split("\n")
16
18
  s.test_files = `git ls-files -- {test,spec,features}/*`.split("\n")
@@ -7,7 +7,7 @@ require 'net/http'
7
7
 
8
8
  class HtmlsToPdf
9
9
 
10
- attr_reader :htmlarray, :pdfarray, :cssarray, :urls, :savedir, :savename, :remove_html_files, :remove_css_files, :remove_tmp_pdf_files
10
+ attr_reader :htmlarray, :pdfarray, :cssarray, :urls, :savedir, :savename, :remove_temp_files, :remove_html_files, :remove_css_files, :remove_tmp_pdf_files, :options, :overwrite_existing_pdf
11
11
 
12
12
  TMP_HTML_PREFIX = 'tmp_html_file_'
13
13
  TMP_PDF_PREFIX = 'tmp_pdf_file_'
@@ -20,10 +20,10 @@ class HtmlsToPdf
20
20
  :remove_tmp_pdf_files => true,
21
21
  :overwrite_existing_pdf => false,
22
22
  :options => {},
23
- :savedir => File.expand_path("~"),
23
+ :savedir => "~",
24
24
  :savename => 'htmls_to_pdf.pdf'
25
25
  }.merge(in_config)
26
- set_dir(config[:savedir])
26
+ set_dir(File.expand_path(config[:savedir]))
27
27
  @savename = config[:savename]
28
28
  @overwrite_existing_pdf = config[:overwrite_existing_pdf]
29
29
  exit_if_pdf_exists unless @overwrite_existing_pdf
@@ -75,7 +75,7 @@ class HtmlsToPdf
75
75
  end
76
76
 
77
77
  def add_dot_html(urls)
78
- urls.map { |url| url.match(/\.html?$/) ? url : url + '.html' }
78
+ urls.map { |url| url.match(/\.s?html?$/) ? url : url + '.html' }
79
79
  end
80
80
 
81
81
  def create_pdfarray
@@ -1,3 +1,3 @@
1
1
  module HtmlsToPdf
2
- VERSION = "0.0.6"
2
+ VERSION = "0.0.7"
3
3
  end
@@ -0,0 +1,126 @@
1
+ require "spec_helper"
2
+ require 'htmls_to_pdf'
3
+
4
+ describe "initialization" do
5
+ context "without argument" do
6
+ let(:in_config) { }
7
+ subject { HtmlsToPdf.new }
8
+ it "should initialize" do
9
+ subject.should be_true
10
+ end
11
+ its(:overwrite_existing_pdf) { should be_false }
12
+ its(:remove_temp_files) { should be_false }
13
+ its(:remove_css_files) { should be_true }
14
+ its(:remove_html_files) { should be_true }
15
+ its(:remove_tmp_pdf_files) { should be_true }
16
+ its(:cssarray) { should be_empty }
17
+ its(:urls) { should be_empty }
18
+ its(:savedir) { should == File.expand_path("~") }
19
+ its(:savename) { should == 'htmls_to_pdf.pdf' }
20
+ its(:options) { should be_kind_of Hash }
21
+ its(:options) { should be_empty }
22
+ end
23
+ context "with basic config" do
24
+ let(:url_arr) { %w(http://www.fakeurl.com/adfsdafds.html https://anotherfakedomain.com/blog/posts/143.htm) }
25
+ let(:in_config) { {savedir: '~/my/savedir',
26
+ savename: 'Name_to_save_file_as.pdf',
27
+ urls: url_arr }
28
+ }
29
+ subject { HtmlsToPdf.new(in_config) }
30
+ #subject { HtmlsToPdf.new(attributes_for(:config)) }
31
+ it "should initialize" do
32
+ subject.should be_true
33
+ end
34
+ its(:overwrite_existing_pdf) { should be_false }
35
+ its(:remove_temp_files) { should be_false }
36
+ its(:remove_css_files) { should be_true }
37
+ its(:remove_html_files) { should be_true }
38
+ its(:remove_tmp_pdf_files) { should be_true }
39
+ its(:cssarray) { should be_empty }
40
+ its(:urls) { should == url_arr }
41
+ its(:savedir) { should == File.expand_path("~/my/savedir") }
42
+ its(:savename) { should == 'Name_to_save_file_as.pdf' }
43
+ its(:options) { should be_kind_of Hash }
44
+ its(:options) { should be_empty }
45
+ it "should call all subfunctions of create_pdf" do
46
+ subject.should_receive(:clean_temp_files).twice.and_return('Temp files cleaned')
47
+ subject.should_receive(:download_files).once.and_return('HTML files downloaded')
48
+ subject.should_receive(:generate_pdfs).once.and_return('PDF files generated')
49
+ subject.should_receive(:join_pdfs).once.and_return('PDF files joined')
50
+ subject.create_pdf
51
+ end
52
+ end
53
+ context "with more complicated config" do
54
+ let(:url_arr) { %w(http://www.fakeurl.com/adfsdafds.html https://anotherfakedomain.com/blog/posts/143.htm) }
55
+ let(:css_arr) { %w(http://fakeurl.com/assets/cssfile.css https://www.anotherfakedomain.com/public/css/CSS-file.css) }
56
+ let(:in_config) { {savedir: '~/my/savedir',
57
+ savename: 'Name_to_save_file_as.pdf',
58
+ urls: url_arr,
59
+ css: css_arr,
60
+ remove_css_files: false,
61
+ remove_html_files: false,
62
+ overwrite_existing_pdf: true }
63
+ }
64
+ subject { HtmlsToPdf.new(in_config) }
65
+ #subject { HtmlsToPdf.new(attributes_for(:config)) }
66
+ it "should initialize" do
67
+ subject.should be_true
68
+ end
69
+ its(:overwrite_existing_pdf) { should be_true }
70
+ its(:remove_temp_files) { should be_false }
71
+ its(:remove_css_files) { should be_false }
72
+ its(:remove_html_files) { should be_false }
73
+ its(:remove_tmp_pdf_files) { should be_true }
74
+ its(:cssarray) { should == css_arr }
75
+ its(:urls) { should == url_arr }
76
+ its(:savedir) { should == File.expand_path("~/my/savedir") }
77
+ its(:savename) { should == 'Name_to_save_file_as.pdf' }
78
+ its(:options) { should be_kind_of Hash }
79
+ its(:options) { should be_empty }
80
+ it "should call all subfunctions of create_pdf" do
81
+ subject.should_receive(:clean_temp_files).twice.and_return('Temp files cleaned')
82
+ subject.should_receive(:download_files).once.and_return('HTML files downloaded')
83
+ subject.should_receive(:generate_pdfs).once.and_return('PDF files generated')
84
+ subject.should_receive(:join_pdfs).once.and_return('PDF files joined')
85
+ subject.create_pdf
86
+ end
87
+ it "should not delete html or css files" do
88
+ stub_request(:get, "http://www.fakeurl.com/adfsdafds.html").
89
+ with(:headers => {'Accept'=>'*/*', 'User-Agent'=>'Ruby'}).
90
+ to_return(:status => 200, :body => "", :headers => {})
91
+ stub_request(:get, "http://anotherfakedomain.com:443/blog/posts/143.htm").
92
+ with(:headers => {'Accept'=>'*/*', 'User-Agent'=>'Ruby'}).
93
+ to_return(:status => 200, :body => "", :headers => {})
94
+ stub_request(:get, "http://fakeurl.com/assets/cssfile.css").
95
+ with(:headers => {'Accept'=>'*/*', 'User-Agent'=>'Ruby'}).
96
+ to_return(:status => 200, :body => "", :headers => {})
97
+ stub_request(:get, "http://www.anotherfakedomain.com:443/public/css/CSS-file.css").
98
+ with(:headers => {'Accept'=>'*/*', 'User-Agent'=>'Ruby'}).
99
+ to_return(:status => 200, :body => "", :headers => {})
100
+ subject.should_receive(:generate_pdfs).once.and_return('PDF files generated')
101
+ subject.should_receive(:join_pdfs).once.and_return('PDF files joined')
102
+ subject.should_not_receive(:delete_html_files)
103
+ subject.should_not_receive(:delete_css_files)
104
+ subject.create_pdf
105
+ end
106
+ it "should request the HTML files" do
107
+ stub_request(:get, "http://www.fakeurl.com/adfsdafds.html").
108
+ with(:headers => {'Accept'=>'*/*', 'User-Agent'=>'Ruby'}).
109
+ to_return(:status => 200, :body => "", :headers => {})
110
+ stub_request(:get, "http://anotherfakedomain.com:443/blog/posts/143.htm").
111
+ with(:headers => {'Accept'=>'*/*', 'User-Agent'=>'Ruby'}).
112
+ to_return(:status => 200, :body => "", :headers => {})
113
+ stub_request(:get, "http://fakeurl.com/assets/cssfile.css").
114
+ with(:headers => {'Accept'=>'*/*', 'User-Agent'=>'Ruby'}).
115
+ to_return(:status => 200, :body => "", :headers => {})
116
+ stub_request(:get, "http://www.anotherfakedomain.com:443/public/css/CSS-file.css").
117
+ with(:headers => {'Accept'=>'*/*', 'User-Agent'=>'Ruby'}).
118
+ to_return(:status => 200, :body => "", :headers => {})
119
+ subject.should_receive(:generate_pdfs).once.and_return('PDF files generated')
120
+ subject.should_receive(:join_pdfs).once.and_return('PDF files joined')
121
+ subject.create_pdf
122
+ a_request(:get, "www.fakeurl.com/adfsdafds.html").should have_been_made
123
+ end
124
+
125
+ end
126
+ end
data/spec/spec_helper.rb CHANGED
@@ -1,2 +1,8 @@
1
- Dir["./spec/support/**/*.rb"].each {|f| require f}
1
+ require 'fakefs/spec_helpers'
2
+ require 'webmock/rspec'
3
+
4
+ RSpec.configure do |config|
5
+ config.include FakeFS::SpecHelpers
6
+ end
2
7
 
8
+ Dir["./spec/support/**/*.rb"].each {|f| require f}
metadata CHANGED
@@ -2,14 +2,14 @@
2
2
  name: htmls_to_pdf
3
3
  version: !ruby/object:Gem::Version
4
4
  prerelease:
5
- version: 0.0.6
5
+ version: 0.0.7
6
6
  platform: ruby
7
7
  authors:
8
8
  - James Lavin
9
9
  autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2012-03-27 00:00:00.000000000Z
12
+ date: 2012-03-28 00:00:00.000000000Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  name: pdfkit
@@ -36,6 +36,28 @@ dependencies:
36
36
  requirement: *2082
37
37
  prerelease: false
38
38
  type: :development
39
+ - !ruby/object:Gem::Dependency
40
+ name: webmock
41
+ version_requirements: &2100 !ruby/object:Gem::Requirement
42
+ requirements:
43
+ - - ! '>='
44
+ - !ruby/object:Gem::Version
45
+ version: '0'
46
+ none: false
47
+ requirement: *2100
48
+ prerelease: false
49
+ type: :development
50
+ - !ruby/object:Gem::Dependency
51
+ name: fakefs
52
+ version_requirements: &2116 !ruby/object:Gem::Requirement
53
+ requirements:
54
+ - - ! '>='
55
+ - !ruby/object:Gem::Version
56
+ version: '0'
57
+ none: false
58
+ requirement: *2116
59
+ prerelease: false
60
+ type: :development
39
61
  description: Creates single PDF file from 1+ HTML pages using PDFKit
40
62
  email:
41
63
  - htmls_to_pdf@futureresearch.com
@@ -70,6 +92,7 @@ files:
70
92
  - lib/htmls_to_pdf/version.rb
71
93
  - license.txt
72
94
  - spec/ensure_pdfkit_installed_spec.rb
95
+ - spec/htmls_to_pdf_spec.rb
73
96
  - spec/spec_helper.rb
74
97
  homepage:
75
98
  licenses: []